iambullivant wrote:
That's very interesting, thank you. I understand what you are saying about the challenges around the training sets for Irish.
As an L1/L2 learner without a teacher I use Abair a lot to try and grasp the pronunciation of words and phrases, particularly where Munster and Connacht diverge from each other and an Caighdeán. I find Abair useful because Teanglann.ie doesn't have the verb conjugations.
The question is: can I currently rely on Abair being reasonably accurate most of the time for both dialects or am I learning 'bad' Irish if I rely on it too much?
To put it another way: is Abair a genuinely a useful tool for learners or 'just' an interesting academic experiment? Not that there is anything wrong with interesting academic experiments.
The short answer is that, if I were you, I wouldn't rely on it alone, but neither would I have relied on the older version alone either. It is certainly a very useful tool in principal, for the exact reason you mention, to learn dialectal pronunciations as a non-native speaker. But the best way of doing that is always going to be to listen to native speakers, whether that be on the radio, the television, online, or best of all - if you can manage it - in conversation. The abair.ie synthesis model can only ever be fall-back option, albeit a very convenient one.
As for whether you can trust sounds produced by an AI speech synthesis model over one deliberately programmed by humans who know the language, that's a bit more of a philosophical question. If you want to know which approach is "the best", both have their flaws. It would be like teaching a child how to identify different animals by only showing them pictures. You could show them images generated by an AI model, or you could choose to only show them paintings and drawings made by humans. We've all seen AI generated images and videos. A lot of them are very good, but they often have weird problems like body parts morphing into each other, and just not looking quite right. Then again, if the person producing the images isn't a reasonably talented artist, their drawings or paintings may be worse than those produced by the AI model, and even in the case of great artists, works by absurdists like Salvador Dali and Michael Cheval can be even weirder than AI generated images. This is to say nothing of the amount of time it takes a human to produce a picture, while an AI image generator might take only seconds to produce several. This is analogous to the problem with the AI speech synthesiser. On the one hand, the model is probably being trained on pronunciations by real dialectal speakers, whereas those programming the model may be L2 speakers. On the other hand, it's very difficult to know what kind of weird mistakes an AI model might make until it makes them, and then fixing them requires gathering more data to target that one mispronunciation and retraining the model, and repeating this process until it gets it right, hopefully all without negatively impacting the pronunciation of other words. By contrast, it's easy to program a rule-based model to make an exception for the pronunciation of a particular word.
I don't think it's only value is as a research experiment, nor that the move to an AI model just a whimsical odyssey undertaken for no better reason than an academic curiosity. I believe the AI model is genuinely intended to be useful to learners, and I suspect the whole reason for moving to an AI model was to ensure it will continue to improve, and become more and more useful over time, even after funding dries up for the project and researchers from the group that made it move on to other projects. It's much easier to just push a button and let a computer train a new model on its own when more data becomes available than it is to convince a funding body, even a governmental one which should have a vested interest in the national language, to provide funding to update a project they've already funded once. They would have to pay new researchers to come onboard and manually update a rule-based speech synthesis engine every ten to twenty years. A much better way to ensure future improvements are made to the system is to build-in the ability to automatically improve itself when more data becomes available. As for how good the AI model is right now, I dare say it's already useful, even if it does occasionally produce absurd "pronunciations". And, it will only get better with time.