Print Email Facebook Twitter On the decomposition of visual sets using Transformers Title On the decomposition of visual sets using Transformers Author Alfieri, Andrea (TU Delft Electrical Engineering, Mathematics and Computer Science; TU Delft Pattern Recognition and Bioinformatics) Contributor van Gemert, J.C. (mentor) Pintea, S. (graduation committee) Chen, Y. (graduation committee) Degree granting institution Delft University of Technology Date 2021-07-12 Abstract Transformers can generate predictions auto-regressively by conditioning each sequence element on the previous ones, or can produce output sequences in parallel. While research has mostly explored upon this difference on tasks that are sequential in nature, we study this contrast on visual set prediction tasks, to analyze the core behaviour of the Transformer model. Multi-label classification, object detection and polygonal shape prediction are all visual set prediction tasks. Precisely predicting polygons in images is an important set prediction problem because polygons are representative of numerous types of objects, such as buildings, people, or obstacles for aerial vehicles. Set prediction is a difficult challenge for deep learning architectures as sets can have different cardinalities and are permutation invariant. We provide evidence on the importance of natural orders for Transformers, analyze the strengths and weaknesses of different solutions that can solve the set prediction task directly, and show the benefit of decomposing complex polygons into sets of ordered points in an auto-regressive manner. Subject TransformersSet predictionPolygon DetectionUAVIROSDETRAttention To reference this document use: http://resolver.tudelft.nl/uuid:8ed688af-49a9-4063-8385-d766046b14b4 Part of collection Student theses Document type master thesis Rights © 2021 Andrea Alfieri Files PDF Andrea_Alfieri_thesis_report.pdf 4.57 MB Close viewer /islandora/object/uuid:8ed688af-49a9-4063-8385-d766046b14b4/datastream/OBJ/view