I agree. An Saol Ó Dheas is the first thing that came to my mind too.
galaxyrocker wrote:
...
I know they're working on Speech-to-Text models too, but I also know they opened the recording challenge to everyone with no filter for quality of Irish. It immediately makes me skeptical to the quality of their models, as they let people self-identify as natives and what dialect they speak, which won't cause any issues, I'm sure. ... Now, supposedly that's only for their speech-to-text models, but still I want some assurance of quality assurance of their actual text-to-speech models being trained solely on competent, Gaeltacht-raised natives.
If that's only for their speech-to-text models, then it's the right move to include all manner of speakers. Such models need to be able to correctly interpret even the pronunciation of non-native speakers. Siri or Alexa wouldn't be considered very good if they couldn't understand the accents of non-native English speakers, and it's arguably non-native speakers who could benefit most from speach-to-text systems. It would allow them to ask a digital assistants "what's the definition of
X?" or "how do I use
Y in a sentence?"
From a technical perspective, too, training AI models on large amounts data, though some of it may not be good quality, is better than training on a small amount of high quality data. Repeated studies have shown the benefit of transfer learning, i.e. even training on data from different languages can improve models for less resourced languages for which a sufficient quantity of training data does not exist.
Of course, ideally many native speakers would take part also, but it's entirely up to individual native speakers whether or not they want to be involved. There's certainly not much point complaining (not that I'm accusing anyone here of complaining) that systems like this don't understand native speech if native speakers aren't interested or inclined to help in their development.