Robust CNN-Based Speech Recognition With Gabor Filter Kernels

TitleRobust CNN-Based Speech Recognition With Gabor Filter Kernels
Publication TypeConference Paper
Year of Publication2014
AuthorsChang, S-Y., & Morgan N.
Other Numbers3666
Abstract

As has been extensively shown, acoustic features for speech recognition can be learned from neural networks with multiple hidden layers. However, the learned transformations may not sufficiently generalize to test sets that have a significant mismatch to the training data. Gabor features, on the other hand, are generated from spectro-temporal filters designed to model human auditory processing. In previous work, these features are used as inputs to neural networks, which improved word accuracy for speech recognition in the presence of noise. Here we propose a neural network architecture called a Gabor Convolutional Neural Network (GCNN) that incorporates Gabor functions into convolutional filter kernels. In this architecture, a variety of Gabor features served as the multiple feature maps of the convolutional layer. The filter coefficients are further tuned by back-propagation training. Experiments used two noisy versions of the WSJ corpus: Aurora 4, and RATS re-noised WSJ. In both cases, the proposed architecture performs better than other noise-robust features that we have tried, namely, ETSI-AFE, PNCC, Gabor features without the CNN-based approach, and our best neural network features that don’t incorporate Gabor functions.

Acknowledgment

This material is based on work supported by theDefense Advanced Research Projects Agency (DARPA) underContract No. D10PC20024. Any opinions, findings, andconclusions or recommendations expressed in this material arethose of the author(s) and do not necessarily reflect the view ofthe DARPA or its Contracting Agent, the U.S. Department ofthe Interior, National Business Center, Acquisition & PropertyManagement Division, Southwest Branch.

URLhttps://www.icsi.berkeley.edu/pubs/speech/robustCNN14.pdf
Bibliographic Notes

Proceedings of the 15th Annual Conference of the International Speech Communication Association (Interspeech 2014), Singapore

Abbreviated Authors

S.-Y. Chang and N. Morgan

ICSI Research Group

Speech

ICSI Publication Type

Article in conference proceedings