From intra-modal to inter-modal space

Choi, Jaeyoung; Larson, MA; Friedland, Gerald; Hanjalic, A.

From intra-modal to inter-modal space

Multi-task learning of shared representations for cross-modal retrieval

Conference paper (2019)

Authors

Jaeyoung Choi , International Computer Science Institute

MA Larson Radboud Universiteit Nijmegen,

Gerald Friedland University of California

A. Hanjalic Intelligent Systems

Video retrieval Multi-task learning Image retrieval Cross-modal retrieval

To reference this document use:

http://resolver.tudelft.nl/uuid:a9eac0f8-0bd8-429b-98f8-e91e056c707a

More Info

expand_more

Published Date

01-09-2019

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Learning a robust shared representation space is critical for effective multimedia retrieval, and is increasingly important as multimodal data grows in volume and diversity. The labeled datasets necessary for learning such a space are limited in size and also in coverage of semantic concepts. These limitations constrain performance: a shared representation learned on one dataset may not generalize well to another. We address this issue by building on the insight that, given limited data, it is easier to optimize the semantic structure of a space within a modality, than across modalities. We propose a two-stage shared representation learning framework with intra-modal optimization and subsequent cross-modal transfer learning of semantic structure that produces a robust shared representation space. We integrate multi-task learning into each step, making it possible to leverage multiple datasets, annotated with different concepts, as if they were one large dataset. Large-scale systematic experiments demonstrate improvements over previously reported state-of-the-art methods on cross-modal retrieval tasks.

Files

08919383.pdf

(pdf | 0.233 Mb)

Unknown license

Download not available