Over the past 20 years, the cost of sequencing genomes has reduced drastically. As DNA data grows at an unprecedented rate, the need for fast and affordable software and hardware solutions for DNA analysis is higher than ever. A critical and time-consuming component in any DNA an
...
Over the past 20 years, the cost of sequencing genomes has reduced drastically. As DNA data grows at an unprecedented rate, the need for fast and affordable software and hardware solutions for DNA analysis is higher than ever. A critical and time-consuming component in any DNA analysis pipeline is sequence alignment, which refers to mapping smaller sequences onto larger reference regions exhibiting high similarity. Many modern aligners break the problem of alignment into seed and extension. First, the algorithm finds matching substrings between the references and the query and then extends these strings on both ends to contain areas around the seeds. One of the most popular and utilized alignment tools is the Burrows-Wheeler aligner, specifically its Maximal Exact Match version called BWA-MEM. This tool solves the alignment problem by first finding seeds using a unique index called the FMD index, which reduces the complexity of the string matching problem while the extension is performed using a modified Smith-Waterman algorithm. Since the seed and extension methods are time-consuming tasks, they have been the target of various optimizations and acceleration attempts in the past. In this thesis, we aimed to integrate two previously developed GPU libraries, namely GPUSeed which finds seeds in a BWA-MEM like manner and GASAL2 which aims to accelerate the extension, into the BWA-MEM pipeline. The three software components were first analyzed from a theoretical standpoint and then profiled in order to determine the algorithmic and execution differences between the baseline CPU functions of BWA-MEM and the accelerated GPU implementations of GPUSeed and GASAL2. For the GPUSeed library, we found that seeding is performed 24× faster compared to the baseline BWA-MEM. For the GASAL2 library, we found around 1.5× better execution time over the baseline, while the main alignment results of the good quality alignments only differ by around 1.8% due to the GASAL2 acceleration heuristics. Based on our profiling observations, we integrated the three libraries into one complete accelerated BWA-MEM implementation and optimized various stages to achieve better performance and lower results deviations. Our integrated solution achieves speedups between 6.82× and 8.72× for single thread execution, depending on the dataset. For multithreaded execution, we find speedups between 2× and 2.8× over the baseline. For the alignment results, we found that our integrated program has around 2% of lines in the main result different from the baseline program, and if we filter alignments with mapping quality below 20, the percentage of different lines reduces to around 1%. Finally, depending on the query dataset, our accelerated library achieved between 60% and 75% GPU utilization during the seeding phase. For the extension part, we found a consistent 100% utilization for a dataset with 250 bases per query when using 14 CPU threads and varying GPU utilization between 40% and 95% for datasets with smaller query sizes. Our accelerated BWA-MEM implementation is freely available on GitHub.