Near-Precise Parameter Approximation for Multiple Multiplications on A Single DSP Block

Kalali, E.; van Leuken, R. van

doi:10.1109/TC.2021.3119187

Near-Precise Parameter Approximation for Multiple Multiplications on A Single DSP Block

Journal article (2022)

Authors

E. Kalali Signal Processing Systems

R. van van Leuken Signal Processing Systems

Research Group

Signal Processing Systems

DOI: https://doi.org/10.1109/TC.2021.3119187

FPGA Approximate computing Multiple multiplications DSP blocks Systolic array

To reference this document use:

http://resolver.tudelft.nl/uuid:9a38d662-3171-4242-8ce1-8c044242eee2

More Info

expand_more

Published Date

2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Research Group

Signal Processing Systems

Abstract

DSP blocks are one of the efficient solutions to implement multiply-accumulate (MAC) operations on FPGAs. However, since the DSP blocks have wide multiplier and adder blocks, MAC operations using low bit-length parameters lead to an underutilization. Hence, an efficient approximation technique is introduced. The technique includes manipulation and approximation of the low bit-length parameters based upon a Single DSP - Multiple Multiplication (SDMM) execution. The accuracy of the developed optimization technique was evaluated for different CNN weight bit precisions using the Alexnet and VGG-16 networks and the ImageNet ILSVRC-2012 dataset. The optimization can be implemented without loss of accuracy in almost all cases, while it causes slight accuracy losses in a few cases. Through these optimizations, multiple parameter multiplications are performed in a single DSP block at the cost of a small hardware overhead. As a result of our optimizations, the parameters are represented in a different format on off-chip memory, providing up to 33% compression without any hardware cost. A prototype systolic array architecture was implemented employing our optimizations on a Xilinx Zynq FPGA. It reduced the number of DSP blocks by 66.6%, 75%, and 83.3% for 8, 6, and 4-bit input variables, respectively.

Files

Near_Precise_Parameter_Approxi... (pdf)

(pdf | 1.11 Mb)

- Embargo expired in 01-07-2023

Unknown license