Masa

Cox, Bart; Galjaard, Jeroen; Ghiassi, S.; Birke, Robert; Chen, Lydia

Masa

Responsive Multi-DNN Inference on the Edge

Conference paper (2021)

Authors

Bart Cox Data-Intensive Systems

Jeroen Galjaard Student

S. Ghiassi Data-Intensive Systems

Robert Birke ABB Research Switzerland

Lydia Chen Data-Intensive Systems

Research Group

Data-Intensive Systems

Multiple DNNs inference Average response time Edge devices Memory-aware scheduling

To reference this document use:

http://resolver.tudelft.nl/uuid:04c18800-276c-489d-becb-d2b7516d2a03

More Info

expand_more

Published Date

2021

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Research Group

Data-Intensive Systems

Abstract

Deep neural networks (DNNs) are becoming the core components of many applications running on edge devices, especially for real time image-based analysis. Increasingly, multi-faced knowledge is extracted via executing multiple DNNs inference models, e.g., identifying objects, faces, and genders from images. The response times of multi-DNN highly affect users' quality of experience and safety as well. Different DNNs exhibit diversified resource requirements and execution patterns across layers and networks, which may easily exceed the available device memory and riskily degrade the responsiveness. In this paper, we design and implement Masa, a responsive memory-aware multi-DNN execution framework, an on-device middleware featuring on modeling inter- and intra-network dependency and leveraging complimentary memory usage of each layer. Masa can consistently ensure the average response time when deterministically and stochastically executing multiple DNN-based image analyses. We extensively evaluate Masa on three configurations of Raspberry Pi and a large set of popular DNN models triggered by different generation patterns of images. Our evaluation results show that Masa can achieve lower average response times by up to 90% on devices with small memory, i.e., 512 MB to 1 GB, compared to the state of the art multi-DNN scheduling solutions.

Files

Masa_Responsive_Multi_DNN_Infe... (pdf)

(pdf | 1.14 Mb)

License info not available

Download not available