Publication Details

Title: Speech Activity Detection: An Economics Approach
Author: T. J. Tsai and N. Morgan
Bibliographic Information: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013), Vancouver, Canada
Date: May 2013
Research Area: Speech
Type: Article in conference proceedings
PDF: https://www.icsi.berkeley.edu/pubs/speech/speechactivity13.pdf

Overview:
This paper proposes an approach to frame-level speech ac- tivity detection based on the extended metaphor of an eco- nomics marketplace. As in a real marketplace, the simulated marketplace encourages features to specialize. Features that might not have impressive average performance across the en- tire data set might nonetheless perform very well on a subset of the data, and the marketplace capitalizes on this special- ization by consulting the features only when their expertise is relevant. On an experimental data set, we show that the framework is able to effectively utilize the expertise of a set of voicing-related features. For the 50% of the data that fell within these features’ realm of expertise, we observe an 83% reduction in false alarm errors and 19% reduction in miss de- tect errors compared to a baseline HMM-GMM system with MFCCs. Even when we consult these features for the entire data set, thus including the other 50% of data outside their realm of expertise, we still observe a 20% total reduction in equal error rate compared to the baseline system. Analysis of the marketplace transactions also yields useful insight into how the errors are distributed across the data and which types of features are most useful.

Acknowledgements:
This work was partially supported by funding provided to ICSI by the U.S. Defense Advanced Research Projects Agency (DARPA) under contract number D10PC20024. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors or originators and do not necessarily reflect the views of DARPA or of the U.S. Government.

Bibliographic Reference:
T. J. Tsai and N. Morgan. Speech Activity Detection: An Economics Approach. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013), Vancouver, Canada, May 2013