DM

Danny Merkx

1 records found

Word recognition in a model of visually grounded speech

An analysis using techniques inspired by human speech processing research

A Visually Grounded Speech model is a neural model which is trained to embed image caption pairs closely together in a common embedding space. As a result, such a model can retrieve semantically related images given a speech caption and vice versa. The purpose of this research is ...