The Relation Between Words and Definitions in Combined Word-Sentence Space

More Info
expand_more

Abstract

Methods for learning vector space representations of words have yielded spaces which contain semantic and syntactic regularities. These regularities mean that vector arithmetic operations in the latent space represent meaningful and interpretable relations between words. These word vectors have been so successful in capturing such relations, as well as transfer between domains almost seamlessly, that they have become central in the foundation of modern natural language processing. There have been multiple proposals for extending these methods to sentences and documents, as well as entirely new approaches based on modern sequence models. So far, none of these methods have demonstrated the same kind of widespread applicability as word vector methods, instead being constrained to a single domain. The question that emerges is then whether these document vector methods yield spaces with the same kind of regularities as are observed in the word vector models. This work focusses on whether these spaces encode a particular relation, that between a word and its definition. Since most of these methods allow only for a conversion from sentence to vector and not the reverse, the problem has been phrased as a ranking problem over a set of candidate words. In the strict case where the relation is assumed to be linear as with the word vectors this yields a model which ranks the correct vector first $26.6\%$ of the time, with a median rank of the correct answer of $19$ out of $2000$ options. Relaxing this requirement and using a 3 layer Multi Layer Perceptron yields an improvement in this metric, predicting $37.8\%$ correct, and improves the median rank to $3$. The performance of these models suggests that words and sentences can be naturally thought of as occupying a single space. The results in this work suggest that it may be possible to generate correct definitions of words in a way that comes very close to being unsupervised, needing only a mean difference between words and definitions. When this was attempted, however, the decoder ended up converging to a fixed output.

Files

Main.pdf
(pdf | 0.84 Mb)
Unknown license