Features Based on Auditory Physiology and Perception
Title | Features Based on Auditory Physiology and Perception |
Publication Type | Miscellaneous |
Year of Publication | 2012 |
Authors | Stern, R. M., & Morgan N. |
Page(s) | 193-227 |
Other Numbers | 3326 |
Abstract | It is well known that human speech processing capabilities far surpass the capabilitiesof current automatic speech recognition and related technologies, despite very intensiveresearch in automated speech technologies in recent decades. Indeed, since the early 1980s,this observation has motivated the development of speech recognition feature extractionapproaches that are inspired by auditory processing and perception, but it is only relativelyrecently that these approaches have become effective in their application to computer speechprocessing. The goal of this chapter is to review some of the major ways in which featureextraction schemes based on auditory processing have facilitated greater speech recognitionaccuracy in recent years, as well as to provide some insight into the nature of current trendsand future directions in this area.We begin this chapter with a brief review of some of the major physiological and perceptualphenomena that have motivated feature extraction algorithms based on auditory processing.We continue with a review and discussion of three seminal classical auditory models of the1980s that have had a major impact on the approaches taken by more recent contributors tothis field. Finally, we turn our attention to selected more recent topics of interest in auditoryfeature analysis, along with some of the feature extraction approaches that have been basedon them.We conclude with a discussion of the attributes of auditory models that appear to bemost effective in improving speech recognition accuracy in difficult acoustic environments. |
Acknowledgment | This research was supported by NSF (Grants IIS-0420866 and IIS-0916918) at CMU, andCisco, Microsoft, and Intel Corporations and internal funds at ICSI. The authors are gratefulto Chanwoo Kim and Yu-Hsiang (Bosco) Chiu for sharing their data, along with MarkHarvilla, Kshitiz Kumar, Bhiksha Raj, and Rita Singh at CMU, as well as Suman Ravuri,Bernd Meyer, and Sherry Zhao at ICSI for many helpful discussions. |
URL | http://www.icsi.berkeley.edu/pubs/speech/SternMorganVirtanenChap12.pdf |
Bibliographic Notes | Chapter in Techniques for Noise Robustness in Automatic Speech Recognition, T. Virtanen, B. Raj, and R. Singh, eds., Wiley Press, pp. 193-227 |
Abbreviated Authors | R. M. Stern and N. Morgan |
ICSI Research Group | Speech |
ICSI Publication Type | None |