On the decomposition of visual sets using Transformers

Alfieri, Andrea

On the decomposition of visual sets using Transformers

Title

On the decomposition of visual sets using Transformers

Author

Alfieri, Andrea (TU Delft Electrical Engineering, Mathematics and Computer Science; TU Delft Pattern Recognition and Bioinformatics)

Contributor

van Gemert, J.C. (mentor)
Pintea, S. (graduation committee)
Chen, Y. (graduation committee)

Degree granting institution

Delft University of Technology

Date

2021-07-12

Abstract

Transformers can generate predictions auto-regressively by conditioning each sequence element on the previous ones, or can produce output sequences in parallel. While research has mostly explored upon this difference on tasks that are sequential in nature, we study this contrast on visual set prediction tasks, to analyze the core behaviour of the Transformer model. Multi-label classification, object detection and polygonal shape prediction are all visual set prediction tasks. Precisely predicting polygons in images is an important set prediction problem because polygons are representative of numerous types of objects, such as buildings, people, or obstacles for aerial vehicles. Set prediction is a difficult challenge for deep learning architectures as sets can have different cardinalities and are permutation invariant. We provide evidence on the importance of natural orders for Transformers, analyze the strengths and weaknesses of different solutions that can solve the set prediction task directly, and show the benefit of decomposing complex polygons into sets of ordered points in an auto-regressive manner.

Subject

Transformers
Set prediction
Polygon Detection
UAV
IROS
DETR
Attention

To reference this document use:

http://resolver.tudelft.nl/uuid:8ed688af-49a9-4063-8385-d766046b14b4

Part of collection

Student theses

Document type

master thesis

Rights

Files

PDF

Andrea_Alfieri_thesis_report.pdf

4.57 MB

Close viewer