Publication Details

Title: Incorporating Information From Syllable-Length Time Scales into Automatic Speech Recognition
Author: S.-L. Wu
Group: ICSI Technical Reports
Date: May 1998
PDF: ftp://ftp.icsi.berkeley.edu/pub/techreports/1998/tr-98-014.pdf

Overview:
Incorporating the concept of the syllable into speech recognition may improve recognition accuracy through the integration of information over syllable-length time spans. Evidence from psychoacoustics and phonology suggests that humans use the syllable as a basic perceptual unit. Nonetheless, the explicit use of such long-time-span units is comparatively unusual in automatic speech recognition systems for English. The work described in this thesis explored the utility of information collected over syllable-related time-scales. The first approach involved integrating syllable segmentation information into the speech recognition process. The addition of acoustically-based syllable onset estimates (Shire 1997) resulted in a 10% relative reduction in word-error rate. The second approach began with developing four speech recognition systems based on long-time-span features and units, including modulation spectrogram features (Greenberg & Kingsbury 1997). Error analysis suggested the strategy of combining, which led to the implementation of methods that merged the outputs of syllable-based recognition systems with the phone-oriented baseline system at the frame level, the syllable level and the whole-utterance level. These combined systems exhibited relative improvements of 20-40% compared to the baseline system for clean and reverberant speech test cases. Keywords: speech recognition, syllable, combination, syllabic onsets, human auditory perception, reverberation, neural network

Bibliographic Information:
ICSI Technical Report TR-98-014

Bibliographic Reference:
S.-L. Wu. Incorporating Information From Syllable-Length Time Scales into Automatic Speech Recognition. ICSI Technical Report TR-98-014, May 1998