Accelerating Large-Scale Graph Processing with FPGAs

Procaccini, Marco; Sahebi, Amin; Barbone, Marco; Luk, Wayne; Gaydadjiev, G.; Giorgi, Roberto

Accelerating Large-Scale Graph Processing with FPGAs

Lesson Learned and Future Directions

Conference paper (2024)

Authors

Marco Procaccini University of Siena

Amin Sahebi University of Siena

Marco Barbone Imperial College London

Wayne Luk Imperial College London

G. Gaydadjiev Quantum Circuit Architectures and Technology -

Roberto Giorgi University of Siena

Research Group

Quantum Circuit Architectures and Technology () (TU Delft)

Accelerators FPGA Graph processing Distributed computing Grid partitioning

To reference this document use:

http://resolver.tudelft.nl/uuid:d61a6e0a-cc06-486b-a78d-dad20e686e53

More Info

expand_more

Published Date

2024

Language

English

Faculty

Electrical Engineering, Mathematics and Computer Science

Department

Quantum & Computer Engineering

Research Group

Quantum Circuit Architectures and Technology

Abstract

Processing graphs on a large scale presents a range of difficulties, including irregular memory access patterns, device memory limitations, and the need for effective partitioning in distributed systems, all of which can lead to performance problems on traditional architectures such as CPUs and GPUs. To address these challenges, recent research emphasizes the use of Field-Programmable Gate Arrays (FPGAs) within distributed frameworks, harnessing the power of FPGAs in a distributed environment for accelerated graph processing. This paper examines the effectiveness of a multi-FPGA distributed architecture in combination with a partitioning system to improve data locality and reduce inter-partition communication. Utilizing Hadoop at a higher level, the framework maps the graph to the hardware, efficiently distributing pre-processed data to FPGAs. The FPGA processing engine, integrated into a cluster framework, optimizes data transfers, using offline partitioning for large-scale graph distribution. A first evaluation of the framework is based on the popular PageRank algorithm, which assigns a value to each node in a graph based on its importance. In the realm of large-scale graphs, the single FPGA solution outperformed the GPU solution that were restricted by memory capacity and surpassing CPU speedup by 26x compared to 12x. Moreover, when a single FPGA device was limited due to the size of the graph, our performance model showed that a distributed system with multiple FPGAs could increase performance by around 12x. This highlights the effectiveness of our solution for handling large datasets that surpass on-chip memory restrictions.

Files

OASIcs.PARMA-DITAM.2024.6.pdf

(pdf | 0.974 Mb)