Publication Details
Title: Multimodal City-Identification on Flickr Videos Using Acoustic and Textual Features
Author: H. Lei, J. Choi, and G. Friedland
Group: ICSI Technical Reports
Date: May 2012
PDF: http://www.icsi.berkeley.edu/pubs/techreports/TR-12-007.pdf
Overview:
We have performed city-verification of videos based on the videos' audio and metadata, using videos from the MediaEval Placing Task's video set, which contain consumer-produced videos “from-the-wild.” Eighteen cities were used as targets, for which acoustic and language models were trained, and against which test videos were scored. We have obtained the first known results for the city verification task, with an EER minimum of 21.8 percent. This result is well above-chance, even though the videos contain very few city-specific audio and metadata features. We have also demonstrated the complementarity of audio and metadata for this task.
Acknowledgements:
This research is supported by NGA NURI grant number HM11582-10-1-0008, NSF EAGER grant IIS-1138599, and NSF Award CNS-1065240. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the sponsors.
Bibliographic Information:
ICSI Technical Report TR-12-007
Bibliographic Reference:
H. Lei, J. Choi, and G. Friedland. Multimodal City-Identification on Flickr Videos Using Acoustic and Textual Features. ICSI Technical Report TR-12-007, May 2012
Author: H. Lei, J. Choi, and G. Friedland
Group: ICSI Technical Reports
Date: May 2012
PDF: http://www.icsi.berkeley.edu/pubs/techreports/TR-12-007.pdf
Overview:
We have performed city-verification of videos based on the videos' audio and metadata, using videos from the MediaEval Placing Task's video set, which contain consumer-produced videos “from-the-wild.” Eighteen cities were used as targets, for which acoustic and language models were trained, and against which test videos were scored. We have obtained the first known results for the city verification task, with an EER minimum of 21.8 percent. This result is well above-chance, even though the videos contain very few city-specific audio and metadata features. We have also demonstrated the complementarity of audio and metadata for this task.
Acknowledgements:
This research is supported by NGA NURI grant number HM11582-10-1-0008, NSF EAGER grant IIS-1138599, and NSF Award CNS-1065240. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the sponsors.
Bibliographic Information:
ICSI Technical Report TR-12-007
Bibliographic Reference:
H. Lei, J. Choi, and G. Friedland. Multimodal City-Identification on Flickr Videos Using Acoustic and Textual Features. ICSI Technical Report TR-12-007, May 2012
