Constrained Codes for DNA-Based Storage Systems

More Info
expand_more

Abstract

Every day we produce an extremely high amount of data and a significant portion of this data is archival data. It is an important challenge to save this data in a cheap and environmentally friendly way. The current methods to save archival data are sufficient, but improvements can be made. Synthetic DNA based data storage is a great choice to do so. DNA is (roughly) made out of four nucleotides, Adine (A), Cytosine (C), Guanine (G) and Thymine (T). To save data using DNA the binary data is encoded to quaternary data and later DNA strands are created using this quaternary data. During the process of saving the data errors can occur. Previous research [3] has found that some errors can be prevented by taking two constraints into account.The no run-length constraint, which states that no DNA word can have two repeated symbolsand GC-weight constraint, which states that every DNA word must have a fixed number of G and C nucleotides. These constraints reduce the number of quaternary data sequences that can be used to save data. The aim of this thesis is to improve the lower bound on the maximum number of quaternary data sequences that satisfy the no run-length constraint, the fixed GC-weight and a minimum distance. Limbachiya et. al. [1] gave an algorithm to compute a set with a given minimum distance of quaternary data sequences that satisfies the constraints. The size of this set is the lower bound that we aim to improve. We do so by introducing two other algorithms that also compute a quaternary data sequences that satisfy the constraints and a minimum distance. The size of each computed set gives a lower bound on the maximum size. The lower bound computed with these two algorithms is always better or equal to the lower bound computed by Limbachiya et. al. for certain parameters. When the minimum distance of a code is 2 we know this maximum size. The formula for this maximum size is given in this thesis together with a proof for this formula.