Learning Structured Sparsity for Efficient Nanopore DNA Basecalling Using Delayed Masking

Frensel, Mees; Al-Ars, Z; Peter Hofstee, H.

Learning Structured Sparsity for Efficient Nanopore DNA Basecalling Using Delayed Masking

Conference paper (2024)

Authors

Mees Frensel Student

Z Al-Ars Computer Engineering

H. Peter Hofstee IBM, Computer Engineering

Research Group

Computer Engineering

Deep neural networks Nanopore sequencing Genomics Basecalling Recurrent neural networks Model compression Pruning Learning sparse models

To reference this document use:

http://resolver.tudelft.nl/uuid:7a9306d3-604f-45cd-82fc-86d28e3d3ae3

More Info

expand_more

Published Date

2024

Language

English

Research Group

Computer Engineering

Abstract

High accuracy nanopore basecalling uses large deep neural networks, requiring powerful GPUs, which is undesirable for sequencing experiments outside the lab. Research has shown that this can be circumvented by using smaller models to increase efficiency as well as basecalling speed. However, this comes at the cost of reduced accuracy, going against the trend of increasingly more complex models to extract the highest possible accuracy out of the source data. We propose learning structured sparsity during model training to find an improved trade-off between accuracy and model size, and thus basecalling speed. Our work introduces an improved pruning method with a delayed masking scheduler and removes redundant masks, saving compute, and is optimized for the basecaller training process. We find that the model size can be reduced by up to 21× with a reduction in match rate of 0.1% to 1.3% compared to Bonito-HAC, using a standardized benchmarking method. Our results indicate that the size of basecalling models can be reduced drastically without affecting accuracy, as long as researchers use appropriate training methods. Furthermore, our work helps democratize nanopore DNA sequencing, broadening the reach and impact of this technology. The code with the masking mechanism to reproduce our results is available at https://github.com/meesfrensel/efficient-basecallers.

Files

3698587.3701357.pdf

(pdf | 0.786 Mb)