Features Based on Auditory Physiology and Perception

TitleFeatures Based on Auditory Physiology and Perception
Publication TypeMiscellaneous
Year of Publication2012
AuthorsStern, R. M., & Morgan N.
Page(s)193-227
Other Numbers3326
Abstract

It is well known that human speech processing capabilities far surpass the capabilitiesof current automatic speech recognition and related technologies, despite very intensiveresearch in automated speech technologies in recent decades. Indeed, since the early 1980’s,this observation has motivated the development of speech recognition feature extractionapproaches that are inspired by auditory processing and perception, but it is only relativelyrecently that these approaches have become effective in their application to computer speechprocessing. The goal of this chapter is to review some of the major ways in which featureextraction schemes based on auditory processing have facilitated greater speech recognitionaccuracy in recent years, as well as to provide some insight into the nature of current trendsand future directions in this area.We begin this chapter with a brief review of some of the major physiological and perceptualphenomena that have motivated feature extraction algorithms based on auditory processing.We continue with a review and discussion of three seminal ‘classical’ auditory models of the1980s that have had a major impact on the approaches taken by more recent contributors tothis field. Finally, we turn our attention to selected more recent topics of interest in auditoryfeature analysis, along with some of the feature extraction approaches that have been basedon them.We conclude with a discussion of the attributes of auditory models that appear to bemost effective in improving speech recognition accuracy in difficult acoustic environments.

Acknowledgment

This research was supported by NSF (Grants IIS-0420866 and IIS-0916918) at CMU, andCisco, Microsoft, and Intel Corporations and internal funds at ICSI. The authors are gratefulto Chanwoo Kim and Yu-Hsiang (Bosco) Chiu for sharing their data, along with MarkHarvilla, Kshitiz Kumar, Bhiksha Raj, and Rita Singh at CMU, as well as Suman Ravuri,Bernd Meyer, and Sherry Zhao at ICSI for many helpful discussions.

URLhttp://www.icsi.berkeley.edu/pubs/speech/SternMorganVirtanenChap12.pdf
Bibliographic Notes

Chapter in Techniques for Noise Robustness in Automatic Speech Recognition, T. Virtanen, B. Raj, and R. Singh, eds., Wiley Press, pp. 193-227

Abbreviated Authors

R. M. Stern and N. Morgan

ICSI Research Group

Speech

ICSI Publication Type

None