T. Shahroodi

Master thesis (2)

2 records found

Improving Near-Memory Processing

Automatic Scratchpad Memory Exploitation via Static Analysis for a Computation-Near-Memory Processor

Master thesis (2024) - J.G. van Doorn (author) , Stephan Wong (mentor) , Taha Shahroodi (mentor) , S.S. Chakraborty (graduation committee member)

The increasing demand for data-intensive applications such as artificial intelligence and big data analytics is hitting the limitations of traditional computing architectures. Near-memory processing architectures, like UPMEM's Data Processing Units (DPUs), offer a promising solut ...

CIM-architecture for acceleration of DNA pre-alignment filters

Master thesis (2023) - M. Miao (author) , Stephan Wong (mentor) , Marco Zuñiga Zamalloa (graduation committee member) , Taha Shahroodi (mentor)

Due to recent developments in DNA sequencing technology, there is a growing abundance of available genomic data. To process this information for use in fields such as healthcare and forensics, raw sequencing data have to be processed using computationally intensive algorithms. Currently, one of the major bottlenecks in this processing pipeline is the alignment step, which makes use of dynamic-programming algorithms. To reduce computation times, numerous solutions have been proposed aimed at reducing the execution time of the alignment step. This is done either by accelerating alignment itself using hardware accelerators and heuristics or by reducing the amount of input data through the use of pre-alignment filters. The algorithms associated with the latter solution are less computationally intensive than DP-based alignment, which reduces the end-to-end alignment time.

Currently, pre-alignment filters are effective to the point where the alignment bottleneck is shifted to the filtering step. Therefore, the filters are accelerated on hardware solutions such as GPUs and FPGAs. While these solutions show orders of magnitude improvement in execution times, they are insufficient for removing the filtering bottleneck entirely. The performance of these hardware accelerators is limited by the rate at which data can be supplied. As a solution, we propose a CIM-based accelerator to reduce data-movement overheads between the host device and the accelerator. Additionally, this architecture makes use of emerging non-volatile memories to perform Boolean operations directly within its memory elements. In doing so, it can exploit parallelism in the algorithms to achieve higher throughput.

In this work, we explore commonly found operations in existing pre-alignment filters and devise ways to implement them on the CIM-architecture. The proposed architecture is flexible in supporting multiple pre-alignment filters and a wide range of input data. The functionality of the architecture is verified through simulation and its effectiveness is tested using real data sets.

Using this architecture, we can achieve improvement in end-to-end execution time over the state of the art ranging from 7.2x to 119.6x for the evaluated data sets, while also achieving a reduction of up to 59% and 79.7% in chip-area and power consumption, respectively.

Furthermore, the provided work offers a platform for the development of future pre-alignment filtering algorithms to further improve performance.