X. Zhang

Master thesis (1)

1 records found

How to encode location in the Vision Transformer? A study on position embeddings

Master thesis (2022) - X. Zhang (author), J.C. van Gemert (mentor), R. Bruintjes (graduation committee member), J. Yang (coach)

Location information is essential for the ViT model. Image data has three types of location information: absolute location, relative direction, and relative distance. Various position embeddings methods have been used to introduce location information to the ViT model. Some exist ...