The increasing demand for data-intensive applications such as artificial intelligence and big data analytics is hitting the limitations of traditional computing architectures. Near-memory processing architectures, like UPMEM's Data Processing Units (DPUs), offer a promising solut
...
The increasing demand for data-intensive applications such as artificial intelligence and big data analytics is hitting the limitations of traditional computing architectures. Near-memory processing architectures, like UPMEM's Data Processing Units (DPUs), offer a promising solution by integrating computation with memory, reducing data movement and energy consumption. However, UPMEM's scratchpad-centric design imposes some critical programming challenges, requiring explicit programmer intervention for efficient memory management, which increases program complexity and limits portability.
This thesis investigates a compiler-driven approach to automatically exploit scratchpad memory on UPMEM's DPUs, aiming to simplify programming and achieve performance comparable to hand-optimized code. A novel compilation pipeline is proposed that analyses loops in DMA-unaware C programs and optimizes them into efficient, DMA-aware machine code. The design leverages static analysis, including alias analysis and symbolic analysis, to insert DMA instructions efficiently.
To evaluate the compilation pipeline, the Processing-in-Memory Compiler Benchmarks, based on the Processing-in-Memory Benchmarks proposed by the SAFARI Research Group, are proposed. Experimental results demonstrate significant improvements, achieving an average of 75% of the runtime of hand-optimized programs and sometimes even exceeding the runtimes of the hand-optimized programs.
By automating scratchpad memory management, programmers can focus on high-level functionality while maintaining system performance and compatibility. Future research directions include extending optimizations beyond loops, improving global memory management, and extending the compiler benchmarks with application-based benchmarks.