Detection Bank: An Object Detection Based Video Representation for Multimedia Event Recognition

Year of Publication2012
AuthorsAlthoff, T., Song H. Oh, & Darrell T.
While low-level image features have proven to be effectiverepresentations for visual recognition tasks such as objectrecognition and scene classification, they are inadequate tocapture complex semantic meaning required to solve highlevelvisual tasks such as multimedia event detection andrecognition. Recognition or retrieval of events and activitiescan be improved if specific discriminative objects aredetected in a video sequence. In this paper, we propose animage representation, called Detection Bank, based on thedetection images from a large number of windowed objectdetectors where an image is represented by different statisticsderived from these detections. This representation isextended to video by aggregating the key frame level imagerepresentations through mean and max pooling. We empiricallyshow that it captures complementary informationto state-of-the-art representations such as Spatial PyramidMatching and Object Bank. These descriptors combinedwith our Detection Bank representation significantly outperformsany of the representations alone on TRECVID MED2011 data.


This work was partially supported by funding provided by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number D11PC20066. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusion contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsement, either expressed or implied, of IARPA, DOI/NBC, or the U.S. Government

Proceedings of the ACM Multimedia Conference (ACMMM '12), Nara, Japan

T. Althoff, H. O. Song, and T. Darrell

