Approximating vision transformers for edge

Katare, Dewant; Leroux, Sam; Janssen, MFWHA; Ding, Aaron

Approximating vision transformers for edge

variational inference and mixed-precision for multi-modal data

Journal article (2025)

Authors

Dewant Katare Information and Communication Technology

Sam Leroux Universiteit Gent

MFWHA Janssen Engineering, Systems and Services

Aaron Ding Information and Communication Technology

Research Group

Information and Communication Technology

Multimodality Edge AI Quantization Mixed precision Model approximation Variational parameters Vision transformers

To reference this document use:

http://resolver.tudelft.nl/uuid:5cdb40f0-fa72-4574-8b94-7a7f20a5dd89

More Info

expand_more

Published Date

2025

Language

English

Research Group

Information and Communication Technology

Abstract

Vision transformer (ViTs) models have shown higher accuracy, robustness and large volume data processing ability, creating new baselines and references for perception tasks. However, these advantages require large memory and high-performance processors and computing units, which makes model adaptability and deployment challenging within resource-constrained environments such as memory-restricted and battery-powered edge devices. This paper addresses the model deployment challenges by proposing a model approximation approach VI-ViT, for edge deployment using variational inference with mixed precision for processing multi-modalities, such as point clouds and images. Our experimental evaluation on the nuScenes and Waymo datasets show up to 37% and 31% reduction in model parameters and Flops while maintaining a mean average precision of 70.5 compared to 74.8 of the baseline model. This work presents a practical deployment approach for approximating and optimizing Vision Transformers for edge AI applications by balancing model metrics such as parameters, flops, latency, energy consumption, and accuracy, which can easily be adapted to other transformer models and datasets.

Files

S00607-025-01427-w.pdf

(pdf | 2.7 Mb)