Microphone Array Processing for Robust Speech Recognition in
Noisy and Reverberant Environments
Speech recognition performance degrades significantly in hands-free environments, where the speech signals can be severely distored by additive noise and reverberation. In such environments, the use of microphone arrays has been proposed as a means of improving the quality of captured speech signals. Currently, microphone-array-based speech recognition is performed in two independent stages: array processing, and then recognition. Array processing algorithms designed for signal enhancement reduce the waveform distortion in the speech signal prior to recognition. This approach assumes that reducing the waveform distortion will necessarily result in better recognition performance. However, recognition systems do not interpret the speech waveform itself, but rather a set of features extracted from the waveform. In this talk, a new aproach to this problem will be described in which the array processor and the speech recognizer are considered two components of a single system, operating with the common goal of improved recognition accuracy. In this system, the goal of the array processor is to generated features which maximize the likelihood of the correct hypothesis. This is accomplished through the use of a new objective function which utilizes information from the recognition engine itself to optimize the parameters of a filter-and-sum processsor. Using the proposed approach, significant improvements in recognition accuracy over conventional methods are achieved on microphone array tassks in a variety of noisy and reverbeerant environments.