Lost in Segmentation: Three Approaches for Speech/Non-Speech Detection in Consumer-Produced Videos

TitleLost in Segmentation: Three Approaches for Speech/Non-Speech Detection in Consumer-Produced Videos
Publication TypeConference Paper
Year of Publication2013
AuthorsElizalde, B. Martinez, & Friedland G.
Other Numbers3420
Abstract

Traditional speech/non-speech segmentation systemshave been designed for specific acoustic conditions, suchas broadcast news or meetings. However, little researchhas been done on consumer-produced audio. This type ofmedia is constantly growing and has complex characteristicssuch as low quality recordings, environmental noise andoverlapping sounds. This paper discusses an evaluation ofthree different approaches for speech/non-speech detectionon consumer-produced audio. The approaches are state-ofthe-art speech/non-speech detectors–one based on GaussianMixture Models (GMM), another on Support Vector Machines(SVM), and the last on Neural Networks (NN). Usingthe TRECVID MED 2012 database, we designed training/testing sets combinations to aid the understanding of whatspeech/non-speech detection on consumer-produced mediaentails and how traditional approaches to this detection performedin this domain. The results revealed that the crossdomainstate-of-the-art GMM and SVM systems’ tests underperformed

Acknowledgment

Supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number D11PC20066. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusion contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsement, either expressed or implied, of IARPA, DOI/NBC, or the U.S. Government.

URLhttps://www.icsi.berkeley.edu/pubs/speech/losttranslation13.pdf
Bibliographic Notes

Proceedings of the IEEE International Conference on Multimedia and Expo (ICME 2013), San Jose, California

Abbreviated Authors

B. Elizalde and G. Friedland

ICSI Research Group

Audio and Multimedia

ICSI Publication Type

Article in conference proceedings