Kallisto Repurposed

Using sequencing reads from the spike, nucleocapsid, and a middle region of nsp3 in the kallisto pipeline to better predict SARS-CoV-2 variants in wastewater

More Info
expand_more

Abstract

During a viral infection, we expel remnants of the virus. This makes it possible to conduct wastewater analysis which aid in the efforts to track the evolution of the current Covid-19 pandemic. It has been shown that by repurposing the kallisto algorithm, the abundance of SARS-CoV-2 variants in wastewater samples can be estimated. Since this is a novel method for this scope, its precision could probably be improved by adjusting certain aspects. In this work, I look at one of those aspects: sequencing particular genomic regions of the virus rather than the entire genome. I have indeed found that the regions that code for the spike (S) and nucleocapsid (N) regions and a section around the region coding for the non-structural protein 3 (nsp3) give particularly accurate results when sequenced on their own. In addition, in at least one case, combining two well-performing regions further improves accuracy at lower simulated abundances of variants. This suggests that sequencing depth is preferred over sequencing breath as long as the region being sequenced contains enough information to distinguish between variants. These findings are important as they can aid in the improvement of this method of variant quantification. Moreover, they can also help in improving other algorithms applied to the SARS-CoV-2 genome by highlighting the genomic sections containing the most differentiating information between variants.