Audio Concept Ranking for Video Event Detection on User-Generated Content

TitleAudio Concept Ranking for Video Event Detection on User-Generated Content
Publication TypeConference Paper
Year of Publication2013
AuthorsElizalde, B. Martinez, Ravanelli M., & Friedland G.
Other Numbers3465

Video event detection on user-generated content (UGC) aims to find videos that show an observable event such as a wedding ceremony or birthday party rather than an object, such as a wedding dress, or an audio concept such as music, speech or clapping. Different events are better described by different concepts. Therefore, proper audio concept classification enhances the search for acoustic cues in this challenge. However, audio concepts for training are typically chosen and annotated by humans and are not necessarily relevant to a specific event or the distinguishing factor for a particular event. A typical ad-hoc annotation process ignores the complex characteristics of UGC audio such as concept ambiguities, overlap concepts and concept duration. This paper presents a methodology to rank audio concepts based on relevance to the events and contribution to discriminability. A ranking measure guides an automatic or user-based selection of concepts in order to improve audio concept classification with the goal to improve video event detection. The ranking aids to determine and select the most relevant concepts for each event, discard meaningless concepts, combine ambiguous sounds to enhance a concept, thereby suggesting a focus for annotation and understanding of the UGC audio. Experiments show an improvement of the audio concepts mean classification accuracy as well as a better-defined diagonal in the confusion matrix. The selection of top 40 audio concepts using our methodology outperforms a best-accuracy-based selection by a relative 17.56% and a frame-frequency-based selection by 5.74%.


This work was partially supported by funding provided to ICSI by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number D11PC20066. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusion contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsement, either expressed or implied, of IARPA, DOI/NBC, or the U.S. Government.

Bibliographic Notes

Proceedings of the InterSpeech First Workshop on Speech, Language and Audio in Multimedia (SLAM '13), Marseille, France

Abbreviated Authors

B. Elizalde, M. Ravanelli, and G. Friedland

ICSI Research Group

Audio and Multimedia

ICSI Publication Type

Article in conference proceedings