A Power-Efficient Parameter Quantization Technique for CNN Accelerators

Kalali, E.; van Leuken, R. van

A Power-Efficient Parameter Quantization Technique for CNN Accelerators

Conference paper (2021)

Authors

E. Kalali Signal Processing Systems

R. van van Leuken Signal Processing Systems

Research Group

Signal Processing Systems

Deep learning Low power ASIC Quantization Hardware implementation

To reference this document use:

http://resolver.tudelft.nl/uuid:0b8e8305-07cd-4180-b8bf-058c34d24ca1

More Info

expand_more

Published Date

2021

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Research Group

Signal Processing Systems

Abstract

Quantization techniques are widely used in CNN inference to reduce the cost of hardware at the expense of small accuracy losses. However, after the quantization, there is still a multiplication cost for the fixed-point quantized CNN weights. Therefore, a novel CNN quantization technique is introduced, which can be implemented without using any multiplier. We evaluated our quantization technique using VGG-16 and Alexnet networks, and the Tiny ImageNet dataset. The quantization technique causes 0.39% and 0.98% accuracy losses for the 8-bit CNN weights compared to floating-point implementations of VGG-16 and Alexnet, respectively. After, a fine-tuning method for our quantization is introduced, which further reduces the accuracy loss. The fine-tuning reduced the accuracy losses on 8-bit quantized VGG-16 and Alexnet to 0.24% and 0.39%, respectively. Two different processing element architectures, which do not include any multiplier hardware, are designed to perform multiply-accumulate (MAC) operations of CNN models quantized by our technique. Two different systolic array prototypes are designed employing the two PE architectures to compare with the traditional fixed-point MAC implementation. The systolic array architectures containing our processing element designs reduced the power consumption of the systolic array up to 14.2% and 21.6%.

Files

A_Power_Efficient_Parameter_Qu... (pdf)

(pdf | 1.71 Mb)

Unknown license

Download not available