Taming the Wild: Acoustic Segmentation in Consumer?Produced Videos

TitleTaming the Wild: Acoustic Segmentation in Consumer?Produced Videos
Publication TypeTechnical Report
Year of Publication2013
AuthorsElizalde, B. Martinez, & Friedland G.
Other Numbers3395

Audio segmentation is the process of partitioning data and identifying boundaries betweendifferent sounds, this task is commonly an early stage in speech processing tasks such asAutomatic Speech Recognition (ASR) or Speaker Identification (SID). While traditionalspeech/non?speech segmentation systems have been designed for specific data conditions suchas broadcast news or meetings, the growth of web videos brings new challenges for segmentingconsumer?produced, aka ``wild," audio. This type of audio is an unstructured domain with littlecontrol over recording conditions. Despite the growth of ``wild" audio, little research has beendone on this domain or on domain?independent audio segmentation systems. The followingpaper attempts to close that gap by creating and testing a semi?supervised approach with aCodebook?Histogram Features (CHF) segmentation using Support Vector Machines (SVM) forspeech detection in consumer?produced videos. Using the web videos TRECVID MED 2011dataset and a well?known speech detection meetings corpus, training/testing datacombinations were designed to evaluate and understand better the performance of this newapproach in contrast to a state?of?the?art traditional Gaussian Mixture Models (GMM) system.

Bibliographic Notes

ICSI Technical Report TR-12-016

Abbreviated Authors

B. Elizalde and G. Friedland

ICSI Research Group

Audio and Multimedia

ICSI Publication Type

Technical Report