Acceleration of hybrid CPU-GPU query execution engine in Arrow Format

Su, K.

Acceleration of hybrid CPU-GPU query execution engine in Arrow Format

Master thesis (2023)

Authors

K. Su Electrical Engineering, Mathematics and Computer Science

Contributors

Z. Al-Ars Computer Engineering - (mentor)

Y. Tian Computer Engineering - (coach)

A Katsifodimos Web Information Systems - (coach)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Apache Arrow GPU Acceleration Query execution engine

To reference this document use:

http://resolver.tudelft.nl/uuid:f80c051b-3120-4250-b91d-e312c2eefd85

More Info

expand_more

Published Date

25-09-2023

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

General-purpose GPUs, renowned for their exceptional parallel processing capabilities and throughput, hold great promise for enhancing the efficiency of data analytics tasks. At the same time, recent developments in query execution engines have integrated the support of OLAP operations in a way that benefits from the zero serialization overhead provided by the Apache Arrow memory format.
In this project, our objective is to perform a study to evaluate the acceleration potential on GPUs of Arrow-based query execution engines, specifically with libcudf, a C++ GPU DataFrame library with Arrow format.
With this purpose, we design and implement four micro-benchmarks for different operators to understand the characteristics of workloads that result in high acceleration, and their
possible bottlenecks and limitations. When we exclude data transfer durations, inherently parallelizable workloads exhibit high potential for GPU acceleration. However, this advantage diminishes considerably when considering data transfer overheads. Stemming from these micro-benchmark outcomes, we designed an on-the-fly scheduler at the operator level to dynamically accelerate query execution engines in a hybrid CPU/GPU system. The scheduler can decide whether to distribute an operator on the CPU or GPU based on the input data location, data volume, data-related parameters, and the operator type so
that we can accelerate query execution engines in a hybrid CPU-GPU system according to a statistics cost model.
The conclusion is that,
with the scheduler, we achieve a maximum of 4.88x speedup for Filter Operator, 2.52x speedup for Sort Operator, and 1.52x speedup for Copy Operator when handling an array of 1e8 in length.

Files

Kexin_Su_Msc_Thesis_Acero_GPU_... (pdf)

Unknown license

File under embargo until 25-09-2025