International Computer Science Institute Talks Talks at the International Computer Science Institute
Wednesday, April 1, 1998
3:00 - 4:00 p.m.

Su-Lin Wu
ICSI/UC Berkeley

sulin icsi.berkeley.edu

"Incorporating Information from Syllable-length Time Scales into Automatic Speech Recognition"

Incorporating the concept of the syllable into automatic speech recognition may improve recognition accuracy by helping to integrate information over syllable-length time spans. Evidence from psychoacoustics and phonology suggests that humans use the syllable as a basic perceptual unit in speech processing. Nonetheless, the explicit use of such long-time-span units is comparatively unusual in modern automatic speech recognition systems for English.

The work to be described in this talk explored the utility of information collected over syllable-based time-scales. The first approach involved integrating syllable segmentation information into the speech recognition process. The addition of acoustically-estimated syllabic onsets resulted in a 10% relative reduction in word-error rate. The second approach began with developing four speech recognition systems based on long-time-span features and units, including modulation spectrogram features. Analysis suggested the strategy of combining, which led to the implementation of methods that merged the outputs of syllable-based recognition systems with the phone-oriented baseline system at the frame level, the syllable level and the whole-utterance level. These combined systems exhibited relative improvements of 20-40% compared to the baseline system for clean and reverberant speech test cases.

This talk will be held in the Main Lecture Hall at ICSI,
1947 Center Street, Sixth Floor, Berkeley, CA 94704-1198
(on Center between Milvia and Martin Luther King Jr. Way).
Click here for a map.