Learning Cross-Modality Similarity for Multinomal Data

TitleLearning Cross-Modality Similarity for Multinomal Data
Publication TypeConference Paper
Year of Publication2011
AuthorsJia, Y., Salzmann M., & Darrell T.
Page(s)2407-2414
Other Numbers3232
Abstract

Many applications involve multiple-modalities such astext and images that describe the problem of interest. Inorder to leverage the information present in all the modalities,one must model the relationships between them. Whilesome techniques have been proposed to tackle this problem,they either are restricted to words describing visualobjects only, or require full correspondences between thedifferent modalities. As a consequence, they are unable totackle more realistic scenarios where a narrative text is onlyloosely related to an image, and where only a few image-textpairs are available. In this paper, we propose a model thataddresses both these challenges. Our model can be seenas a Markov random field of topic models, which connectsthe documents based on their similarity. As a consequence,the topics learned with our model are shared across connecteddocuments, thus encoding the relations between differentmodalities. We demonstrate the effectiveness of ourmodel for image retrieval from a loosely related text.

URLhttp://www.icsi.berkeley.edu/pubs/vision/learningcrossmodality11.pdf
Bibliographic Notes

Proceedings of the International Conference on Computer Vision (ICCV 2011), pp. 2407-2414, Barcelona, Spain

Abbreviated Authors

J. Yangqing, M. Salzmann, and T. Darrell

ICSI Research Group

Vision

ICSI Publication Type

Article in conference proceedings