Predictive Genome Analysis Using Partial DNA Sequencing Data

Conference paper (2017)

Authors

N. Ahmed Computer Engineering -

K.L.M. Bertels Quantum & Computer Engineering

Z. Al-Ars Computer Engineering -

Research Group

Computer Engineering () (TU Delft)

Prediction GATK DNA Sequencing delay

To reference this document use:

http://resolver.tudelft.nl/uuid:7673203f-5755-4fd6-bb94-0fb6cc6f3c02

More Info

expand_more

Published Date

2017

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Department

Quantum & Computer Engineering

Research Group

Computer Engineering

Abstract

Much research has been dedicated to reducing the computational time associated with the analysis of genome data, which resulted in shifting the bottleneck from the time needed for the computational analysis part to the actual time needed for sequencing of DNA information. DNA sequencing is a time consuming process, and all existing DNA analysis methods have to wait for the DNA sequencing to completely finish before starting the analysis. In this paper, we propose a new DNA analysis approach where we start the genome analysis before the DNA sequencing is completely finished. The genome analysis is started when the DNA reads are still in the process of being sequenced. We use algorithms to predict the unknown bases and their corresponding base quality scores of the incomplete read. Results show that our method of predicting the unknown bases and quality scores achieves more than 90% similarity with the full dataset for 50 unknown bases (slashing more than a day of sequencing time). We also show that our base quality value prediction scheme is highly accurate, only reducing the similarity of the detected variants by 0.45%. However, there is still room to introduce more accurate prediction schemes for the unknown bases to increase the effectiveness of the analysis by up to 5.8%.

Files

Bibe2017_1.pdf

(pdf | 0.658 Mb)

Download not available