Feature-based Cross-architecture Self-supervised Knowledge Distillation

Iskandar, A.

Feature-based Cross-architecture Self-supervised Knowledge Distillation

Master thesis (2024)

Authors

A. Iskandar Mechanical Engineering

Contributors

D. M. Gavrila Intelligent Vehicles (mentor)

Julian F.P. Kooij Intelligent Vehicles (graduation committee member)

Jan van van Gemert Pattern Recognition and Bioinformatics (graduation committee member)

Jonas Uhrig Mercedes-Benz (mentor)

Faculty

Mechanical Engineering

To reference this document use:

http://resolver.tudelft.nl/uuid:0a8b0039-664c-489f-aa35-5d95287c4678

More Info

expand_more

Published Date

26-09-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Mechanical Engineering

Abstract

In the context of open-world scenarios in autonomous vehicles (AVs), previously unseen classes may arise. To address this, effective extraction of well generalizable features is essential for AV downstream tasks, especially in the context of zero-shot learning. This can be achieved using transformers, and Swin Transformer in particular (vision backbone of most Vision-Language Models). However, to enable on-board applications, knowledge distillation must be utilized to create a lightweight model capable of real-time processing. We explore self-supervised knowledge distillation, given that AV datasets need to generalize to previously unseen classes. Our contributions include adapting existing CNN-to-CNN output-based self-supervised knowledge distillation algorithms to Transformer-to-CNN for benchmarking and enhancing them with a cross-architecture loss function. By leveraging DisCo, the best performing output-based self-supervised knowledge distillation method, and using EfficientNetB0 as the student model, we achieve a 3.9% relative improvement in top-1 accuracy over the supervised Swin-T teacher on our modified ImageNet for open-world classification, up to 5.0% with our loss.

Files

Final_Thesis_Alain.pdf

Unknown license

File under embargo until 28-10-2026