Talks at the International Computer Science Institute

The International Computer Science Institute
is pleased to present a talk:


"Learning word pronunciations from transcribed acoustic data"

Francoise Beufays
Nuance

Tuesday, July 13, 2004
ICSI, Conference Room 5A
12:30 pm

Abstract:

Many speech recognition systems that provide over-the-phone services, e.g. name dialers, stock quote providers, location finders, rely on the accurate recognition of proper names. For this to happen, the systems need to know how their users will pronounce these words. However, predicting the pronunciation of a proper name is a notoriously difficult problem as it depends on the origin of the name, the linguistic background of the speaker, and other cultural and sociological factors, in addition of course to the word spelling.

We will describe an algorithm to learn word pronunciations from acoustic data. The algorithm jointly optimizes the pronunciation of a word using (a) the acoustic match of this pronunciation to the observed data, and (b) a measure of how "linguistically reasonable" the pronunciation is. We will describe how this linguistic knowledge can be automatically acquired from a hand-made pronunciation dictionary.

We will present experiment results on Name Dialing databases, and show that the proposed algorithm can reduce the name dialing error rate by as much as 40% with respect to a letter-to-phone pronunciation engine.

Speaker Bio:

I grew up in Brussels, Belgium. I came to California when I was 23, with the firm intention of getting a Master's from Stanford and going back home 9 months later. Well, I'm still here and I'm now ... well, I'm older. I did get my MS, but then I stayed for a PhD, all in EE. After Stanford, I spent about 5 years at SRI, in the speech group, and now another 5 years at Nuance, still as a researcher in speech recognition.

When I'm not hacking algorithms or taking care of my little girls, I try to learn to play the guitar. It's substantially harder than speech rec.