Meta-learning for few-shot on-chip sequence classification

More Info
expand_more

Abstract

The growing interest in edge computing is driving the demand for more efficient deep learning models that fit into resource-constrained edge devices like Internet-of-Things (IoT) sensors. The challenging limitations of these devices in terms of size and power has given rise to the field of tinyML, focusing on enabling low-cost machine learning on edge devices. Up until recently, the work in this space was primarily focused on static inference scenarios. However, a prominent issue with this is that models cannot adapt post-deployment, leading to robustness issues with shifting data distributions or the introduction of new features in the data. However, at the edge, full on-device retraining, or communicating all new data to a central server, is infeasible: this necessitates the development of data-efficient learning algorithms to adapt locally and autonomously from streaming data. This challenge at the intersection of edge computing and data-efficient learning is currently an open challenge.
In this thesis, we propose to solve this challenge with meta-learning. To clarify in which way the application of meta-learning is the most suitable for edge hardware, for the first time, a principled approach for meta-learning at the edge is outlined and investigated in three parts.
The first part of this thesis details the selection of a suitable neural network architecture for few-shot learning over sequential data. By not being fixated on one architecture from the start, it is possible to explore different approaches to learning over sequences of temporal data, leading to the identification of the most effective architecture for generalizing from limited temporal examples. The quantitatively evaluated architectures are a recurrent neural network (RNN), a gated recurrent unit (GRU), a long-short-term memory (LSTM) and a temporal convolutional network (TCN). We show that TCNs outperform all architectures, while GRUs and LSTMs have a lower activation memory requirement. However, the latter require a linearly increasing number of multiplications with input sequence length, while it scales logarithmically for TCNs. Our results show that TCNs therefore provide the most favorable trade-off for low-cost temporal feature extraction at the edge.
The second part of the thesis focuses on the algorithmic developments of the few-shot learning setup. Building on recent results from machine learning research, we highlight how meta-learning techniques primarily rely on learning high-quality features that generalize well. Taking into account hardware-driven considerations such as memory and compute overheads and through detailed quantitative analyses, we demonstrate that the best performance-cost trade-off is reached with a simple supervised pre-training scheme, where on-chip learning is performed by comparing the outputs of a TCN-based feature extractor with Manhattan distance. We also analyze the impact of quantization on this trade-off and, accordingly, we select a scheme with 4-bit logarithmic weights and 4-bit unsigned activations...

Files

MSc_Thesis_ddenblanken.pdf
warning

File under embargo until 28-09-2025