Self-Supervised Representation Learning for Relational Multimodal Data

Mc Auliffe, I.

Self-Supervised Representation Learning for Relational Multimodal Data

Should we combine multiple pretext tasks?

Bachelor thesis (2024)

Authors

I. Mc Auliffe Electrical Engineering, Mathematics and Computer Science

Contributors

Kubilay Atasu Data-Intensive Systems (mentor)

T.A. Akyıldız Data-Intensive Systems (mentor)

Burcu Kulahcioglu Ozkan Software Engineering (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Representation Learning Self-supervised learning Multi-task learning

To reference this document use:

http://resolver.tudelft.nl/uuid:13ca8b99-c261-4388-9030-27197c8930f9

More Info

expand_more

Published Date

27-06-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Deep Learning models can use pretext tasks to learn representations on unlabelled datasets. Although there have been several works on representation learning and pre-training, to the best of our knowledge combining pretext tasks in a multi-task setting for relational multimodal data has not been done before. In this work, we implemented 4 pretext tasks on top of a framework for handling relational multi-modal data and evaluated them based on two datasets. We first identified the best-performing masking strategy for pretext tasks that require masking. Then, we compared different combinations of the pretext tasks based on self-supervised metrics as a proxy for the quality of the representation learned. The results reveal that masking values by replacing from the column's empirical distribution yields 4.6\% and 4\% higher accuracy on each dataset respectively than replacing them with a fixed value. We also found that different combinations of pretext tasks, even with different numbers of tasks, converge to marginally different values and MoCo further reduces this difference. Our findings imply that the number of pretext tasks can scale efficiently allowing for a more diverse representation to be learned.

Files

Ilias_RP_paper.pdf

(pdf | 0.384 Mb)

Unknown license