Predicting DNA repair-deficient genotypes based on Cas9-induced DNA repair products
More Info
expand_more
Abstract
Double strand breaks are lesions to the DNA and can be fatal for cells. Therefore these breaks are repaired, primarily by one of the three major repair pathways. Two of these pathways are non-homologous end-joining (NHEJ) and theta-mediated end-joining (TMEJ). These pathways leave genetic alterations in their repair products, a form of DNA damage. DNA damage is linked to several diseases such as cancer. Understanding of these pathways is important and being able to recognize which pathways are active can be beneficial for research. In this work, repair products are used to predict repair-deficient genotypes using Cas9-induced repair products. Ku80 and PolQ deficient genotypes are used, impairing NHEJ and TMEJ respectively. The ability to recognize a repair-deficient genotype is tested using two predictive tasks. First statistical machine learning algorithms are used to predict the genotype where a repair product can be found. This is done by only using a single repair product as input. Secondly, a set of Cas9-induced repair products from a single cell culture is used to predict the genotype of that cell culture. Results show that when given a single repair products, models have difficulty predicting the correct genotype. However, results are modest and the best classifier achieved an AUC of 0.76. For predicting the genotype of a cell culture using multiple repair products of that culture showed really promising results. When predicting on cell cultures with breaks induced on a target site which the model has seen in the training data, results are near perfect. Predicting on unseen target sites shows that there is room for improvement but the best performing models showed an average AUC of 0.879 across target sites. A Results show that Cas9-induced repair products can be used to predict repair-deficient genotypes.