Publication Details

Title: Spectro-Temporal Features for Robust Speech Recognition Using Power-Law Nonlinearity and Power-Bias Subtraction
Author: S.-Y. Chang, B.Meyer, and N. Morgan
Bibliographic Information: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013), Vancouver, Canada
Date: May 2013
Research Area: Speech
Type: Article in conference proceedings
PDF: https://www.icsi.berkeley.edu/pubs/speech/spectrotemporal13.pdf

Overview:
Previous work has demonstrated that spectro-temporal Gabor features reduced word error rates for automatic speech recognition under noisy conditions. However, the features based on melspectra were easily corrupt ed in the presence of noise or channel distortion. We have exploited an algorithm for power normalized cepstral coefficients (PNCCs) to generate a more robust spectro-temporal representation. We refer to it as power normalized spectrum (PNS), and to the corresponding output processed by Gabor filters and MLP nonlinear weighting as PNS-Gabor. We show that the proposed feature outperforms state- of-the-art noise-robust features, ETSI-AFE and PNCC for both Aurora2 and a noisy version of the Wall Street Jounal (WSJ) Corpus. A comparison of the individual processing steps of mel spectra and PNS shows that power bias subtraction is the most important aspect of PNS-Gabor features to provide an improvement over Mel-Gabor features. The result indicates that Gabor processing compensates the limitation of PNCC for channels with frequency-shift characteristic. Overall, PNS-Gabor features decrease the word error rate by 32% relative to MFCC and 13% relative to PNCC in Aurora2. For noisy WSJ, they decrease the word error rate by 30.9% relative to MFCC and 24.7% relative to PNCC.

Acknowledgements:
This work was partially supported by funding provided to ICSI by the U.S. Defense Advanced Research Projects Agency (DARPA) under contract number D10PC20024. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors or originators and do not necessarily reflect the views of DARPA or of the U.S. Government.

Bibliographic Reference:
S.-Y. Chang, B.Meyer, and N. Morgan. Spectro-Temporal Features for Robust Speech Recognition Using Power-Law Nonlinearity and Power-Bias Subtraction. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013), Vancouver, Canada, May 2013