Taming the Wild: Acoustic Segmentation in Consumer?Produced Videos
Title | Taming the Wild: Acoustic Segmentation in Consumer?Produced Videos |
Publication Type | Technical Report |
Year of Publication | 2013 |
Authors | Elizalde, B. Martinez, & Friedland G. |
Other Numbers | 3395 |
Abstract | Audio segmentation is the process of partitioning data and identifying boundaries betweendifferent sounds, this task is commonly an early stage in speech processing tasks such asAutomatic Speech Recognition (ASR) or Speaker Identification (SID). While traditionalspeech/non?speech segmentation systems have been designed for specific data conditions suchas broadcast news or meetings, the growth of web videos brings new challenges for segmentingconsumer?produced, aka ``wild," audio. This type of audio is an unstructured domain with littlecontrol over recording conditions. Despite the growth of ``wild" audio, little research has beendone on this domain or on domain?independent audio segmentation systems. The followingpaper attempts to close that gap by creating and testing a semi?supervised approach with aCodebook?Histogram Features (CHF) segmentation using Support Vector Machines (SVM) forspeech detection in consumer?produced videos. Using the web videos TRECVID MED 2011dataset and a well?known speech detection meetings corpus, training/testing datacombinations were designed to evaluate and understand better the performance of this newapproach in contrast to a state?of?the?art traditional Gaussian Mixture Models (GMM) system. |
URL | http://www.icsi.berkeley.edu/pubs/techreports/ICSI_TR-12-016.pdf |
Bibliographic Notes | ICSI Technical Report TR-12-016 |
Abbreviated Authors | B. Elizalde and G. Friedland |
ICSI Research Group | Audio and Multimedia |
ICSI Publication Type | Technical Report |