"Learning word pronunciations from transcribed acoustic data"
Many speech recognition systems that provide over-the-phone services, e.g. name dialers, stock quote providers, location finders, rely on the accurate recognition of proper names. For this to happen, the systems need to know how their users will pronounce these words. However, predicting the pronunciation of a proper name is a notoriously difficult problem as it depends on the origin of the name, the linguistic background of the speaker, and other cultural and sociological factors, in addition of course to the word spelling.
We will describe an algorithm to learn word pronunciations from acoustic data. The algorithm jointly optimizes the pronunciation of a word using (a) the acoustic match of this pronunciation to the observed data, and (b) a measure of how "linguistically reasonable" the pronunciation is. We will describe how this linguistic knowledge can be automatically acquired from a hand-made pronunciation dictionary.
We will present experiment results on Name Dialing databases, and show that the proposed algorithm can reduce the name dialing error rate by as much as 40% with respect to a letter-to-phone pronunciation engine.
Speaker Bio:
I grew up in Brussels, Belgium. I came to California when I was 23, with the firm intention of getting a Master's from Stanford and going back home 9 months later. Well, I'm still here and I'm now ... well, I'm older. I did get my MS, but then I stayed for a PhD, all in EE. After Stanford, I spent about 5 years at SRI, in the speech group, and now another 5 years at Nuance, still as a researcher in speech recognition.
When I'm not hacking algorithms or taking care of my little girls, I try to learn to play the guitar. It's substantially harder than speech rec.