A Multi-Band Approach to Automatic Speech Recognition

TitleA Multi-Band Approach to Automatic Speech Recognition
Publication TypeTechnical Report
Year of Publication1999
AuthorsMirghafori, N.
Other Numbers1161
Abstract

Multi-band approaches have recently generated a great deal of interest in the automatic speech recognition (ASR) community. In this paradigm, each sub-frequency region of the speech signal is treated as a distinct source of information and the streams are combined after each is processed independently. Motivations for the multi-band paradigm include results from psycho-acoustic studies, robustness to noise, and potential for parallel processing.The main contribution of this dissertation is the systematic exploration of an area of great interest to many in the research community, showing that multi-band ASR is a viable option, not just for improving recognition accuracy in the presence of noise, but also for clean speech. The work focused on the design and implementation of a multi-band system, analysis of some of its characteristics, and development of extensions to the paradigm.An analysis in terms of phonetic feature transmission showed multi-band processing to be better than a comparable traditional full-band design in many cases. It was observed that some bands were more accurate in discriminating between some phonetic categories. It was hypothesized that combining the confused sub-band classes would reduce the number of input classes and improve generalization. The size of the input space was reduced by almost 30%, and yet the global frame-level phonetic discrimination improved and the word recognition error did not change (the observed improvement was not statistically significant). The results were consistent with the original hypothesis.The analysis also showed that the phonetic transitions in the sub-bands do not necessarily occur synchronously and are affected by conditions such as speaking rate and room reverberation. Relaxing the synchrony constraints in the sub-bands during word recognition was investigated. The experimental results suggested that removing the synchrony constraints for all phone to phone transitions is unlikely to be advantageous while significantly increasing computational cost.The combination of the multi-band and the full-band system was studied. This combination reduced the word recognition error rate for the experimental clean speech task by about 23-29% compared to the baseline system. The results obtained are the best that we know of on the Numbers95 experimental database.

URLhttp://www.icsi.berkeley.edu/ftp/global/pub/techreports/1999/tr-99-004.pdf
Bibliographic Notes

ICSI Technical Report TR-99-004

Abbreviated Authors

N. Mirghafori

ICSI Research Group

Speech

ICSI Publication Type

Technical Report