Longer Features: They Do a Speech Detector Good
Title | Longer Features: They Do a Speech Detector Good |
Publication Type | Conference Paper |
Year of Publication | 2012 |
Authors | Tsai, T.. J., & Morgan N. |
Other Numbers | 3363 |
Abstract | We have incorporated spectrotemporal features in a speech activity detection (SAD) task for the Speech in Noisy Environments 2 (SPINE2) data set. The features were generated by applying 2D Gabor filters to the mel spectrogram in order to measure the strength of various spectral and temporal modulation frequencies in different patches of the spectrogram. Using several different back-ends, the Gabor features significantly outperformed MFCCs, yielding relative reductions in equal error rate (EER) of between 40 and 50%. Compared to the other backends, Adaboost with tree stumps performed particularly well with Gabor features and particularly poorly with MFCCs. An investigation into the reasons for this disparity suggests that the most useful features for SAD incorporate information over longer time scales. |
Acknowledgment | This work was partially supported by funding provided to ICSI by the U.S. Defense Advanced Research Projects Agency (DARPA) under contract number D10PC20024. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors or originators and do not necessarily reflect the views of DARPA or of the U.S. Government. |
URL | https://www.icsi.berkeley.edu/pubs/speech/longerfeatures12.pdf |
Bibliographic Notes | Proceedings of the 13th Annual Conference of the International Speech Communication Association (InterSpeech 2012), Portland, Oregon |
Abbreviated Authors | T. J. Tsai and N. Morgan |
ICSI Research Group | Speech |
ICSI Publication Type | Article in conference proceedings |