Publication Details

Title: Hooking Up Spectro-Temporal Filters with Auditory-Inspired Representations for Robust Automatic Speech Recognition
Author: B. Meyer, C. Spille, B. Kollmeier, and N. Morgan
Bibliographic Information: Proceedings of the 13th Annual Conference of the International Speech Communication Association (InterSpeech 2012), Portland, Oregon
Date: September 2012
Research Area: Speech
Type: Article in conference proceedings
PDF: http://www.icsi.berkeley.edu/pubs/speech/ICSI_hookingup12.pdf

Overview:
Spectro-temporal filtering has been shown to result in features that can help to increase the robustness of automatic speech recognition (ASR) in the past. We replace the spectro-temporal representation used in previous work with spectrograms that incorporate knowledge about the signal processing of the human auditory system and which are derived from Power-Normalized Cepstral Coefficients (PNCCs). 2D-Gabor filters are applied to these spectrograms to extract features evaluated on a noisy digit recognition task. The filter bank is adapted to the new representation by optimizing the spectral modulation frequencies associated with each Gabor function. A comparison of optimized parameters and the spectral modulation of vowels shows a good match between optimized and expected range of frequencies. When processed with a non-linear neural net and combined with PNCCs, Gabor features decrease the error rate compared to the baseline and PNCCs by at least 19%.

Bibliographic Reference:
B. Meyer, C. Spille, B. Kollmeier, and N. Morgan. Hooking Up Spectro-Temporal Filters with Auditory-Inspired Representations for Robust Automatic Speech Recognition. Proceedings of the 13th Annual Conference of the International Speech Communication Association (InterSpeech 2012), Portland, Oregon, September 2012