Acceleration of hybrid CPU-GPU query execution engine in Arrow Format

More Info
expand_more

Abstract

General-purpose GPUs, renowned for their exceptional parallel processing capabilities and throughput, hold great promise for enhancing the efficiency of data analytics tasks. At the same time, recent developments in query execution engines have integrated the support of OLAP operations in a way that benefits from the zero serialization overhead provided by the Apache Arrow memory format.
In this project, our objective is to perform a study to evaluate the acceleration potential on GPUs of Arrow-based query execution engines, specifically with libcudf, a C++ GPU DataFrame library with Arrow format.
With this purpose, we design and implement four micro-benchmarks for different operators to understand the characteristics of workloads that result in high acceleration, and their
possible bottlenecks and limitations. When we exclude data transfer durations, inherently parallelizable workloads exhibit high potential for GPU acceleration. However, this advantage diminishes considerably when considering data transfer overheads. Stemming from these micro-benchmark outcomes, we designed an on-the-fly scheduler at the operator level to dynamically accelerate query execution engines in a hybrid CPU/GPU system. The scheduler can decide whether to distribute an operator on the CPU or GPU based on the input data location, data volume, data-related parameters, and the operator type so
that we can accelerate query execution engines in a hybrid CPU-GPU system according to a statistics cost model.
The conclusion is that,
with the scheduler, we achieve a maximum of 4.88x speedup for Filter Operator, 2.52x speedup for Sort Operator, and 1.52x speedup for Copy Operator when handling an array of 1e8 in length.

Files

Kexin_Su_Msc_Thesis_Acero_GPU_... (pdf)
warning

File under embargo until 25-09-2025