Distributed Radar-based Human Activity Recognition using Vision Transformer and CNNs
More Info
expand_more
Abstract
The feasibility of classifying human activities measured by a distributed ultra-wideband (UWB) radar system using Range-Doppler (RD) images as the input to classifiers is investigated. Kinematic characteristics of different human activities are expected to be captured in high-resolution range-Doppler images measured by UWB radars. To construct the dataset, 5 distributed monostatic Humatics P410 radars are used to record 15 participants performing 9 activities in arbitrary directions along a designated trajectory. For the first time a convolution-free neural network based on the novel multi-head attention mechanism (the Vision Transformer architecture) is adopted as the classifier, attaining an accuracy of 76.5 %. A comparison between Vision Transformer and more conventional CNN-based architectures, such as ResNet and AlexNet, is also conducted. The robustness of Vision Transformer and the other networks against unseen participants is also validated by testing via Leave One Participant Out validation.