Artificial intelligence - TCD Abair speech generator

iambullivant · **Joined:** Tue 09 Jan 2024 8:15 pm **Posts:** 39

So https://www.abair.tcd.ie/ga/synthesis has replaced the 'engines' it previously used to generate speech with 'AI' and something called 'Bunta'. I am not not sure how I feel about that. I am very sceptical about LLMs and their problems with hallucination. I appreciate that speech generation AI is not an LLM but I do wonder if the AI hype is overblown.

I'd be curious if any of the fluent/native speakers on here think that the change is generating a more accurate rendition of the speech. I assume that the people at TCD think that it does.

djwebb2021 · **Joined:** Thu 27 May 2021 3:22 am **Posts:** 1258

iambullivant wrote:

So https://www.abair.tcd.ie/ga/synthesis has replaced the 'engines' it previously used to generate speech with 'AI' and something called 'Bunta'. I am not not sure how I feel about that. I am very sceptical about LLMs and their problems with hallucination.

I'd be curious if any of the fluent/native speakers on here think that the change is generating a more accurate rendition of the speech. I assume that the people at TCD think that it does.

Well, the Munster male voice is AI only, and if you put in oirirc, you get a weird audio result.

iambullivant · **Joined:** Tue 09 Jan 2024 8:15 pm **Posts:** 39

djwebb2021 wrote:

iambullivant wrote:

So https://www.abair.tcd.ie/ga/synthesis has replaced the 'engines' it previously used to generate speech with 'AI' and something called 'Bunta'. I am not not sure how I feel about that. I am very sceptical about LLMs and their problems with hallucination.

I'd be curious if any of the fluent/native speakers on here think that the change is generating a more accurate rendition of the speech. I assume that the people at TCD think that it does.

Well, the Munster male voice is AI only, and if you put in oirirc, you get a weird audio result.

That's interesting.

Maolra · **Posted:** Fri 01 Mar 2024 11:09 pm

djwebb2021 wrote:

iambullivant wrote:

So https://www.abair.tcd.ie/ga/synthesis has replaced the 'engines' it previously used to generate speech with 'AI' and something called 'Bunta'. I am not not sure how I feel about that. I am very sceptical about LLMs and their problems with hallucination.

I'd be curious if any of the fluent/native speakers on here think that the change is generating a more accurate rendition of the speech. I assume that the people at TCD think that it does.

Well, the Munster male voice is AI only, and if you put in oirirc, you get a weird audio result.

It seems that only the Ulster produces a normal result using 'oirirc'. The other two were complete nonsense.

Ade · **Posted:** Sat 02 Mar 2024 12:20 am

iambullivant wrote:

So https://www.abair.tcd.ie/ga/synthesis has replaced the 'engines' it previously used to generate speech with 'AI' and something called 'Bunta'. I am not not sure how I feel about that. I am very sceptical about LLMs and their problems with hallucination. I appreciate that speech generation AI is not an LLM but I do wonder if the AI hype is overblown.

I'd be curious if any of the fluent/native speakers on here think that the change is generating a more accurate rendition of the speech. I assume that the people at TCD think that it does.

It's a very interesting area of research but it's not without its downsides. As for whether the research team at TCD think the AI model is currently more accurate than the earlier speech synthesis engines, I think probably not. I imagine the benefit of it to them is that they don't have to manually tweak the AI model as you would with a rule-based model (which I expect is how the earlier engines worked).

The thing about AI speach synthesis is that it's only as good as the data which is available to train it. These things need a lot of data (recordings of native speakers mapped to written text), and it needs to be good quality data. Unfortunately, good quality recordings can be difficult to get for Irish. Native speakers have better things to do than sit around making and transcribing recordings of themselves, and a lot of people are uncomfortable with recordings of them being used to train AI anyhow (just look at the trouble Apple have with Siri, and they have millions of users). I suspect the AI model the TCD team are using is pretty good, but where you get weird examples of words it doens't pronounce well, it's probably because they don't have sufficient pronunciation data for that character sequence to train the model on.

The upside is that, as more speech data becomes available over time, improving the model should be as easy as running a script to retrain on the new data. The downside is that it's impossible to know when enough data may become available for training this kind of model to the point that we can safely say it's better than the synthesis engines they originally used. I think moving to an AI model was probably the right choice. It's certainly the forward-thinking approach to take. But it may have been just a little premature.

iambullivant · **Joined:** Tue 09 Jan 2024 8:15 pm **Posts:** 39

Ade wrote:

iambullivant wrote:

So https://www.abair.tcd.ie/ga/synthesis has replaced the 'engines' it previously used to generate speech with 'AI' and something called 'Bunta'. I am not not sure how I feel about that. I am very sceptical about LLMs and their problems with hallucination. I appreciate that speech generation AI is not an LLM but I do wonder if the AI hype is overblown.

I'd be curious if any of the fluent/native speakers on here think that the change is generating a more accurate rendition of the speech. I assume that the people at TCD think that it does.

It's a very interesting area of research but it's not without its downsides. As for whether the research team at TCD think the AI model is currently more accurate than the earlier speech synthesis engines, I think probably not. I imagine the benefit of it to them is that they don't have to manually tweak the AI model as you would with a rule-based model (which I expect is how the earlier engines worked).

The thing about AI speach synthesis is that it's only as good as the data which is available to train it. These things need a lot of data (recordings of native speakers mapped to written text), and it needs to be good quality data. Unfortunately, good quality recordings can be difficult to get for Irish. Native speakers have better things to do than sit around making and transcribing recordings of themselves, and a lot of people are uncomfortable with recordings of them being used to train AI anyhow (just look at the trouble Apple have with Siri, and they have millions of users). I suspect the AI model the TCD team are using is pretty good, but where you get weird examples of words it doens't pronounce well, it's probably because they don't have sufficient pronunciation data for that character sequence to train the model on.

The upside is that, as more speech data becomes available over time, improving the model should be as easy as running a script to retrain on the new data. The downside is that it's impossible to know when enough data may become available for training this kind of model to the point that we can safely say it's better than the synthesis engines they originally used. I think moving to an AI model was probably the right choice. It's certainly the forward-thinking approach to take. But it may have been just a little premature.

That's very interesting, thank you. I understand what you are saying about the challenges around the training sets for Irish.

As an L1/L2 learner without a teacher I use Abair a lot to try and grasp the pronunciation of words and phrases, particularly where Munster and Connacht diverge from each other and an Caighdeán. I find Abair useful because Teanglann.ie doesn't have the verb conjugations.

The question is: can I currently rely on Abair being reasonably accurate most of the time for both dialects or am I learning 'bad' Irish if I rely on it too much?

To put it another way: is Abair a genuinely a useful tool for learners or 'just' an interesting academic experiment? Not that there is anything wrong with interesting academic experiments.

Ade · **Posted:** Sat 02 Mar 2024 1:21 pm

iambullivant wrote:

That's very interesting, thank you. I understand what you are saying about the challenges around the training sets for Irish.

As an L1/L2 learner without a teacher I use Abair a lot to try and grasp the pronunciation of words and phrases, particularly where Munster and Connacht diverge from each other and an Caighdeán. I find Abair useful because Teanglann.ie doesn't have the verb conjugations.

The question is: can I currently rely on Abair being reasonably accurate most of the time for both dialects or am I learning 'bad' Irish if I rely on it too much?

To put it another way: is Abair a genuinely a useful tool for learners or 'just' an interesting academic experiment? Not that there is anything wrong with interesting academic experiments.

The short answer is that, if I were you, I wouldn't rely on it alone, but neither would I have relied on the older version alone either. It is certainly a very useful tool in principal, for the exact reason you mention, to learn dialectal pronunciations as a non-native speaker. But the best way of doing that is always going to be to listen to native speakers, whether that be on the radio, the television, online, or best of all - if you can manage it - in conversation. The abair.ie synthesis model can only ever be fall-back option, albeit a very convenient one.

As for whether you can trust sounds produced by an AI speech synthesis model over one deliberately programmed by humans who know the language, that's a bit more of a philosophical question. If you want to know which approach is "the best", both have their flaws. It would be like teaching a child how to identify different animals by only showing them pictures. You could show them images generated by an AI model, or you could choose to only show them paintings and drawings made by humans. We've all seen AI generated images and videos. A lot of them are very good, but they often have weird problems like body parts morphing into each other, and just not looking quite right. Then again, if the person producing the images isn't a reasonably talented artist, their drawings or paintings may be worse than those produced by the AI model, and even in the case of great artists, works by absurdists like Salvador Dali and Michael Cheval can be even weirder than AI generated images. This is to say nothing of the amount of time it takes a human to produce a picture, while an AI image generator might take only seconds to produce several. This is analogous to the problem with the AI speech synthesiser. On the one hand, the model is probably being trained on pronunciations by real dialectal speakers, whereas those programming the model may be L2 speakers. On the other hand, it's very difficult to know what kind of weird mistakes an AI model might make until it makes them, and then fixing them requires gathering more data to target that one mispronunciation and retraining the model, and repeating this process until it gets it right, hopefully all without negatively impacting the pronunciation of other words. By contrast, it's easy to program a rule-based model to make an exception for the pronunciation of a particular word.

I don't think it's only value is as a research experiment, nor that the move to an AI model just a whimsical odyssey undertaken for no better reason than an academic curiosity. I believe the AI model is genuinely intended to be useful to learners, and I suspect the whole reason for moving to an AI model was to ensure it will continue to improve, and become more and more useful over time, even after funding dries up for the project and researchers from the group that made it move on to other projects. It's much easier to just push a button and let a computer train a new model on its own when more data becomes available than it is to convince a funding body, even a governmental one which should have a vested interest in the national language, to provide funding to update a project they've already funded once. They would have to pay new researchers to come onboard and manually update a rule-based speech synthesis engine every ten to twenty years. A much better way to ensure future improvements are made to the system is to build-in the ability to automatically improve itself when more data becomes available. As for how good the AI model is right now, I dare say it's already useful, even if it does occasionally produce absurd "pronunciations". And, it will only get better with time.

djwebb2021 · **Joined:** Thu 27 May 2021 3:22 am **Posts:** 1258

iambullivant wrote:

As an L1/L2 learner without a teacher

Well, if you are an L1 speaker, then you are certainly not a learner. So you are the latest native speaker to appear on this forum after Bríd Mhór and An Ceanntuigheoireacht. Maybe you could offer yourself to the Abair programmers as a input speaker, given your L1 credentials, so that the AI learners from you? I welcome all native speaker on this forum!

iambullivant · **Joined:** Tue 09 Jan 2024 8:15 pm **Posts:** 39

djwebb2021 wrote:

iambullivant wrote:

As an L1/L2 learner without a teacher

Well, if you are an L1 speaker, then you are certainly not a learner. So you are the latest native speaker to appear on this forum after Bríd Mhór and An Ceanntuigheoireacht. Maybe you could offer yourself to the Abair programmers as a input speaker, given your L1 credentials, so that the AI learners from you? I welcome all native speaker on this forum!

Sorry! I got my terms mixed up! I meant A1/A2 early beginner levels. I am most certainly not an L1 speaker! :LOL:

iambullivant · **Joined:** Tue 09 Jan 2024 8:15 pm **Posts:** 39

Ade wrote:

iambullivant wrote:

That's very interesting, thank you. I understand what you are saying about the challenges around the training sets for Irish.

As an L1/L2 learner without a teacher I use Abair a lot to try and grasp the pronunciation of words and phrases, particularly where Munster and Connacht diverge from each other and an Caighdeán. I find Abair useful because Teanglann.ie doesn't have the verb conjugations.

The question is: can I currently rely on Abair being reasonably accurate most of the time for both dialects or am I learning 'bad' Irish if I rely on it too much?

To put it another way: is Abair a genuinely a useful tool for learners or 'just' an interesting academic experiment? Not that there is anything wrong with interesting academic experiments.

The short answer is that, if I were you, I wouldn't rely on it alone, but neither would I have relied on the older version alone either. It is certainly a very useful tool in principal, for the exact reason you mention, to learn dialectal pronunciations as a non-native speaker. But the best way of doing that is always going to be to listen to native speakers, whether that be on the radio, the television, online, or best of all - if you can manage it - in conversation. The abair.ie synthesis model can only ever be fall-back option, albeit a very convenient one.

As for whether you can trust sounds produced by an AI speech synthesis model over one deliberately programmed by humans who know the language, that's a bit more of a philosophical question. If you want to know which approach is "the best", both have their flaws. It would be like teaching a child how to identify different animals by only showing them pictures. You could show them images generated by an AI model, or you could choose to only show them paintings and drawings made by humans. We've all seen AI generated images and videos. A lot of them are very good, but they often have weird problems like body parts morphing into each other, and just not looking quite right. Then again, if the person producing the images isn't a reasonably talented artist, their drawings or paintings may be worse than those produced by the AI model, and even in the case of great artists, works by absurdists like Salvador Dali and Michael Cheval can be even weirder than AI generated images. This is to say nothing of the amount of time it takes a human to produce a picture, while an AI image generator might take only seconds to produce several. This is analogous to the problem with the AI speech synthesiser. On the one hand, the model is probably being trained on pronunciations by real dialectal speakers, whereas those programming the model may be L2 speakers. On the other hand, it's very difficult to know what kind of weird mistakes an AI model might make until it makes them, and then fixing them requires gathering more data to target that one mispronunciation and retraining the model, and repeating this process until it gets it right, hopefully all without negatively impacting the pronunciation of other words. By contrast, it's easy to program a rule-based model to make an exception for the pronunciation of a particular word.

I don't think it's only value is as a research experiment, nor that the move to an AI model just a whimsical odyssey undertaken for no better reason than an academic curiosity. I believe the AI model is genuinely intended to be useful to learners, and I suspect the whole reason for moving to an AI model was to ensure it will continue to improve, and become more and more useful over time, even after funding dries up for the project and researchers from the group that made it move on to other projects. It's much easier to just push a button and let a computer train a new model on its own when more data becomes available than it is to convince a funding body, even a governmental one which should have a vested interest in the national language, to provide funding to update a project they've already funded once. They would have to pay new researchers to come onboard and manually update a rule-based speech synthesis engine every ten to twenty years. A much better way to ensure future improvements are made to the system is to build-in the ability to automatically improve itself when more data becomes available. As for how good the AI model is right now, I dare say it's already useful, even if it does occasionally produce absurd "pronunciations". And, it will only get better with time.

Thank you for your comments. They are very helpful. So to summarise: it is some use, but don't rely on it. And try and find other sources of speech in the different dialects.

Which native speakers or sources in the Munster dialect would you recommend be it on the radio, the television or online?

Forum rules

Artificial intelligence - TCD Abair speech generator

Who is online