Learning Cross-Modality Similarity for Multinomal Data
Title | Learning Cross-Modality Similarity for Multinomal Data |
Publication Type | Conference Paper |
Year of Publication | 2011 |
Authors | Jia, Y., Salzmann M., & Darrell T. |
Page(s) | 2407-2414 |
Other Numbers | 3232 |
Abstract | Many applications involve multiple-modalities such astext and images that describe the problem of interest. Inorder to leverage the information present in all the modalities,one must model the relationships between them. Whilesome techniques have been proposed to tackle this problem,they either are restricted to words describing visualobjects only, or require full correspondences between thedifferent modalities. As a consequence, they are unable totackle more realistic scenarios where a narrative text is onlyloosely related to an image, and where only a few image-textpairs are available. In this paper, we propose a model thataddresses both these challenges. Our model can be seenas a Markov random field of topic models, which connectsthe documents based on their similarity. As a consequence,the topics learned with our model are shared across connecteddocuments, thus encoding the relations between differentmodalities. We demonstrate the effectiveness of ourmodel for image retrieval from a loosely related text. |
URL | http://www.icsi.berkeley.edu/pubs/vision/learningcrossmodality11.pdf |
Bibliographic Notes | Proceedings of the International Conference on Computer Vision (ICCV 2011), pp. 2407-2414, Barcelona, Spain |
Abbreviated Authors | J. Yangqing, M. Salzmann, and T. Darrell |
ICSI Research Group | Vision |
ICSI Publication Type | Article in conference proceedings |