In the recent past, real-time video processing using state-of-the-art deep neural networks (DNN) has achieved human-like accuracy but at the cost of high energy consumption, making them infeasible for edge device deployment. The energy consumed by running DNNs on hardware acceler
...
In the recent past, real-time video processing using state-of-the-art deep neural networks (DNN) has achieved human-like accuracy but at the cost of high energy consumption, making them infeasible for edge device deployment. The energy consumed by running DNNs on hardware accelerators is dominated by the number of memory read/writes and multiplyaccumulate (MAC) operations required. As a potential solution, this work explores the role of activation sparsity in efficient DNN inference. As the predominant operation in DNNs is matrix-vector multiplication of weights with activations, skipping operations and memory
fetches where (at least) one of them is zero can make inference more energy efficient. Although spatial sparsification of activations is researched extensively, introducing and exploiting temporal sparsity is much less explored in DNN literature. This work presents a new DNN layer (called temporal delta layer) whose primary objective is to induce temporal activation sparsity during training. The temporal delta layer promotes activation sparsity by performing delta operation facilitated by activation quantization and l1 norm based penalty to the cost function. During inference, the resulting model acts as a conventional quantized
DNN with high temporal activation sparsity. The new layer was incorporated as a part of the standard ResNet50 architecture to be trained and tested on the popular human action recognition dataset (UCF101). The method caused 2x improvement in activation sparsity, with 5% accuracy loss.