Vector Processing in NUMA Systems

Hu, Z.

Vector Processing in NUMA Systems

Master thesis (2023)

Authors

Z. Hu Electrical Engineering, Mathematics and Computer Science

Contributors

GN Gaydadjiev Quantum Circuit Architectures and Technology (mentor)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Computer Architecture Vector Processor NUMA

To reference this document use:

http://resolver.tudelft.nl/uuid:d9833a6f-9609-435f-a06e-ec4d76bec8d0

More Info

expand_more

Published Date

29-08-2023

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Over the last decade, applications like self-driving, image recognition and speech processing are having more and more impact on the society, all these applications are based on machine learning, and machine learning is all about metrics and vectors. For that reason, vector processors are getting attraction again.

Most of the previous research of vector processor focuses on single-core performance, and most of today’s large-scale computer system use non-uniform memory access (NUMA) architecture, how to efficiently deploy the vector processor in a NUMA environment remains a problem. NUMA is a shared memory model, with a NUMA system, there are multiple memories distributed in the system and usually each NUMA node has one memory. It will take the processors longer to access memories in other nodes then the memory in the same node, and this feature shows some opportunities to increase the performance of the NUMA system by accelerating the remote memory accesses.

In this thesis, a subset of the PARSEC benchmark are vectorized for both ARM Scalable Vector Instructions (SVE) and RISC-V Vector Instructions (RVV), and the single-core performance of these two types of processors will be compared based on this benchmark. Then a NUMA system is made for ARM SVE and the memory access pattern is analysed with gem5 simulator.

Files

Zhewen_MSc_Thesis.pdf

(pdf | 7.13 Mb)

License info not available