Resource Efficient Knowledge Distillation on Edge Devices

More Info
expand_more

Abstract

In practical situations, computer vision technique is applied to solve various tasks, including image classification, object detection, image segmentation, and so on. The commonly used supervised learning training paradigm for the network models used to solve these tasks requires training data as well as the ground truth labels specifying the data samples' reference information for the task. However, getting labels for every task would be expensive or even almost impossible, such as medical images due to privacy reasons and expert annotations from medical professionals and facial recognition also because of privacy concerns. Many large-scale general datasets exist, like ILSVRC2012 for both image classification and object detection tasks and COCO also for objection tasks. , and the corresponding pre-trained models whose knowledge can be transferred to other fields. While deeper models typically have better performance and can learn better feature representation from the same tasks, increasing network models introduces difficulties in practical deployment, especially regarding resource limitation and response latency. We hope to explore the learning methods in knowledge distillation to help the smaller student network learn better features from the unlabeled training data and have better transfer performance on downstream tasks. With the remarkable success of contrastive learning, it has become one of the most promising methods of learning from unlabeled data. In this thesis, we proposed an unsupervised knowledge distillation method that applies a contrastive learning method to construct and extract relational knowledge from the feature representations of the intermediate layers as well as the final layer. The evaluation with the ILSVRC2012 dataset proves the effectiveness of the proposed method on feature learning as the method helped the ResNet-18 model achieve a 2% improvement in linear evaluation accuracy compared to the baseline model. Extensive experiments were conducted on eight transfer learning tasks, and the model trained by the proposed method outperformed its baseline model in all the eight classification tasks and also achieved better fine-tuning accuracy in the situation where only a small fraction of the ground truth label was available for fine-tuning.

Files