From Intra-Modal to Inter-Modal Space: Multi-Task Learning of Shared Representations for Cross-Modal Retrieval

TitleFrom Intra-Modal to Inter-Modal Space: Multi-Task Learning of Shared Representations for Cross-Modal Retrieval
Publication TypeConference Paper
Year of Publication2019
AuthorsChoi, J., Larson M., Friedland G., & Hanjalic A.
Published inProceedings of 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM)
Page(s)1-10
Date Published09/2019
PublisherIEEE
Abstract

Learning a robust shared representation space is critical for effective multimedia retrieval, and is increasingly important as multimodal data grows in volume and diversity. The labeled datasets necessary for learning such a space are limited in size and also in coverage of semantic concepts. These limitations constrain performance: a shared representation learned on one dataset may not generalize well to another. We address this issue by building on the insight that, given limited data, it is easier to optimize the semantic structure of a space within a modality, than across modalities. We propose a two-stage shared representation learning framework with intra-modal optimization and subsequent cross-modal transfer learning of semantic structure that produces a robust shared representation space.

URLhttps://ieeexplore.ieee.org/iel7/8910138/8919254/08919383.pdf