Resource Efficient Knowledge Distillation on Edge Devices

Master thesis (2024)

Authors

T. Liang Electrical Engineering, Mathematics and Computer Science

Contributors

G. Lan Embedded Systems - (mentor)

K.G. Langendoen Embedded Systems - (mentor)

G. Iosifidis Networked Systems - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science

Knowledge Distillation Self-supervised learning Model Compression

To reference this document use:

http://resolver.tudelft.nl/uuid:ed505cf4-4fb9-408c-93ff-14795b0ea26e

More Info

expand_more

Published Date

28-08-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

In practical situations, computer vision technique is applied to solve various tasks, including image classification, object detection, image segmentation, and so on. The commonly used supervised learning training paradigm for the network models used to solve these tasks requires training data as well as the ground truth labels specifying the data samples' reference information for the task. However, getting labels for every task would be expensive or even almost impossible, such as medical images due to privacy reasons and expert annotations from medical professionals and facial recognition also because of privacy concerns. Many large-scale general datasets exist, like ILSVRC2012 for both image classification and object detection tasks and COCO also for objection tasks. , and the corresponding pre-trained models whose knowledge can be transferred to other fields. While deeper models typically have better performance and can learn better feature representation from the same tasks, increasing network models introduces difficulties in practical deployment, especially regarding resource limitation and response latency. We hope to explore the learning methods in knowledge distillation to help the smaller student network learn better features from the unlabeled training data and have better transfer performance on downstream tasks. With the remarkable success of contrastive learning, it has become one of the most promising methods of learning from unlabeled data. In this thesis, we proposed an unsupervised knowledge distillation method that applies a contrastive learning method to construct and extract relational knowledge from the feature representations of the intermediate layers as well as the final layer. The evaluation with the ILSVRC2012 dataset proves the effectiveness of the proposed method on feature learning as the method helped the ResNet-18 model achieve a 2% improvement in linear evaluation accuracy compared to the baseline model. Extensive experiments were conducted on eight transfer learning tasks, and the model trained by the proposed method outperformed its baseline model in all the eight classification tasks and also achieved better fine-tuning accuracy in the situation where only a small fraction of the ground truth label was available for fine-tuning.

Files

TUD_ENS_MSc_Thesis_Ting_Liang_... (pdf)

(pdf | 27.4 Mb)

Unknown license