Accelerating DNA basecalling of Nanopore reads on FPGAs
More Info
expand_more
Abstract
Genomics has revolutionized our understanding of evolution, hereditary diseases, and more. The advent of long-read DNA sequencers i.e. Oxford Nanopore Technologies' innovations, has opened many new research potentials in genomics. These sequencers produce significantly longer DNA reads, facilitating novel applications. However, this technological leap brings challenges, particularly in accurate basecalling which is the process of converting raw sequenced measurements into digital base pair sequences. While advances in basecalling accuracy have been steadily improving over the years, the computational intensity remains a bottleneck in genomic analysis workflows, demanding costly high-end GPUs for probabilistic neural network models.
The main problem this thesis addresses is the implementation of an accelerated hardware solution for the compute-intensive process of basecalling long-read sequences. The thesis presents an FPGA-based implementation of the computationally demanding Long Short-Term Memory (LSTM) layers within the basecalling network known as Bonito. However, due to the lack of floating-point arithmetic units available on the FPGA, the FPGA implementation could not achieve competitive performance compared to GPUs.
While the FPGA implementation falls short of GPU performance, it serves as a possible stepping stone toward developing an ASIC solution for implementing the Bonito LSTM layers or potentially implementing the entire Bonito model. An ASIC implementation has the potential for superior performance up to 9 times faster than a GPU implementation while additionally being cost-effective. This suggests that ASICs hold promise as a future direction for accelerating long-read sequence basecalling, allowing for faster and more affordable genomics research.