Recent advancements in quantification of repair outcomes of CRISPR-Cas9 mediated double-stranded DNA breaks (DSBs) have allowed for the use of machine learning for predicting the frequencies of these repair outcomes. Local DNA sequence context influences the frequencies of mutati
...
Recent advancements in quantification of repair outcomes of CRISPR-Cas9 mediated double-stranded DNA breaks (DSBs) have allowed for the use of machine learning for predicting the frequencies of these repair outcomes. Local DNA sequence context influences the frequencies of mutations that arise when DNA gets repaired after it is targeted by CRISPR (CRISPR outcomes). Contemporary models exploit this and can predict what the frequencies are of CRISPR outcomes at predetermined genomic loci. Predictions of such models are reasonably precise, but there may be opportunities for improvement in how the DNA sequence context is leveraged for making predictions. Some models only utilize a set of hand-crafted features, limiting the available information for the model. Other models do utilize broader sequence context but disregard sequence order or only predict a limited set of outcome classes. In this work we present an attention-based deep learning model that uses DNA sequence context to make fine-grained CRISPR outcome predictions. We present a custom input embedding for representing DSB repair outcomes and we expand on existing methods for analyzing attention-based models.