Generalization by Visual Attention

More Info
expand_more

Abstract

Most deep learning models fail to generalize in production. Indeed, sometimes data used during training does not completely reflect the deployed environment. The test data is then considered out-of-distribution compared to the training data. In this paper, we focus on out-of-distribution performance for image classification. In fact, transformers, which are a novel neural network architecture compared to the more traditionally used convolutional neural networks (CNN), have been shown to work well for image classification. This is why, in this paper, we firstly explore the different capabilities of both models on out-of-distribution. This is then followed by an in-depth investigation of individual architectural components of the transformer and their impact on the generalization capability of the model.

Files