Improving Automatic Speech Recognition by Learning from Human Errors

TitleImproving Automatic Speech Recognition by Learning from Human Errors
Publication TypeConference Paper
Year of Publication2011
AuthorsMeyer, B. T.
Other Numbers3208

This work presents a series of experiments that compare the performance of human speech recognition (HSR) and automaticspeech recognition (ASR). The goal of this line of research is to learn from the differences between HSR and ASR,and to use this knowledge to incorporate new signal processing strategies from the human auditory system in automaticclassifiers. A database with noisy nonsense utterances is used both for HSR and ASR experiments with focus on the influenceof intrinsic variation (arising from changes in speaking rate, effort, and style). A standard ASR system is found toreach human performance level only when the signal-to-noise ratio is increased by 15 dB, which can be seen as thehuman-machine gap for speech recognition on a sub-lexical level. The sources of intrinsic variation are found to severelydegrade phoneme recognition scores both in HSR and in ASR. A comparison of utterances produced at different speakingrates indicates that temporal cues are not optimally exploited in ASR, which results in a strong increase of vowel confusions.Alternative feature extraction methods that take into account temporal and spectro-temporal modulations of speechsignals are discussed.


Significant contributions to the research summarized in this study were madeby Birger Kollmeier, Thomas Brand, Tim J¨urgens, and Thorsten Wesker.It was supported by the DFG (SFB/TRR 31 ’The active auditory system’;URL: Bernd T. Meyer has been supportedby a post-doctoral fellowship of the German Academic Exchange Service(DAAD).

Bibliographic Notes

Proceedings of the 162nd Meeting of the Acoustical Society of America, Vol. 14, San Diego, California

Abbreviated Authors

B. T. Meyer

ICSI Research Group


ICSI Publication Type

Article in conference proceedings