Recent Projects

SpeechCorder: spoken information retrieval for meetings

Multi-stream decoding

Switchboard recognizer

Dynamic pronunciation modeling (dissertation): Accurate speaker-independent recognition of large-vocabulary speech by computers remains an unattained goal, particularly with spontaneous speech examples. High transcription error rate is caused in part by poor modeling of pronunciations within spontaneous speech. My dissertation examines how speaking rate and word predictability can be used to estimate when greater pronunciation variation can be expected; speaking rate and word predictability are also correlated with speech recognition errors. The results of these studies suggest that for spontaneous speech, it may be appropriate to build models for syllables and words that dynamically change the pronunciations used in the speech recognizer based on the context. Implementation of new pronunciation models automatically derived from data within the ICSI speech recognition system has shown a 4-5% relative improvement in transcribing television and radio news reports and interviews. Roughly two thirds of these gains can be attributed to static baseform improvements; adding the ability to dynamically adjust pronunciations within the recognizer provides the other third of the improvement. The corpus also allows for comparison of performance on different styles of speech: the new pronunciation models do not help for pre-planned speech, but they provide a significant gain for spontaneous speech. Not only do the automatically learned pronunciation models capture some of the linguistic variation due to the speaking style, but the models also represent variation in the acoustic model due to channel effects. The largest improvement was seen in the telephone speech condition, in which 12% of the errors produced by the baseline system were corrected.

Previous projects

Speaking rate estimation

Berkeley Restaurant Project


Modified on $Date: 1999/11/23 21:26:24 $