Constrained Error-Correcting Codes for DNA-Based Storage Systems

Bachelor thesis (2021)

Authors

F.J. Laseur Electrical Engineering, Mathematics and Computer Science

Contributors

J.H. Weber Discrete Mathematics and Optimization - (mentor)

J.A.M. de Groot Mathematical Physics - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

To reference this document use:

http://resolver.tudelft.nl/uuid:7b6358f8-ef26-4b62-9476-514d076ced25

More Info

expand_more

Published Date

14-12-2021

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Humanity is producing data at exponential rates. Solutions need to be found in order to accommodate the bytes of today and the future in an efficient and environmentally friendly way. DeoxyriboNucleic Acid, well-known as DNA, is a suitable destination for this major collection of data, as it is a high-density storage medium with long storage endurance [1]. In the field of research about DNA-Based Data Storage Systems, people examine quaternary codes. The considered DNA-codes consist of equi-length DNA-words over the Alphabet {0, 1, 2, 3} representing the available DNA building blocks or nucleotides {A, T,C,G} standing for Adenine, Thymine, Cytosine and Guanine respectively. Combinatorial constrained coding is introduced to avoid DNA patterns prone to sequencing errors, that might occur when archival data is read from DNA-blocks [2]. If an error occurs anyway, an imposed minimum Hamming distance d that a DNA-code satisfies, enforces error correction and detection capabilities [3]. Methods are considered to optimize the number of DNA-words that can be stored in a DNA-code satisfying these restrictions. This research explores DNA-codes of which the DNA-words adhere to a fixed GC content referred to as the GC-weight w, as well as the no-runlength constraint r = 1 that will not allow adjacent symbol repetition. By increasing distance d, we build on the findings of Van Leeuwen [4] and Vermeer [5], who created DNA-codes satisfying d = 2 for r = 1 and r ≥ 1 respectively. The maximum size of a DNA-code satisfying d = 3 among other constraints is proven to be 12. Furthermore, we cover the case d = n where the length n of the DNA-words and the minimum distance d the DNA-code satisfies are equal. We derive maximum sizes of these DNA-codes for all possible GC-weights w. Finally, we present algorithms that create DNA-d codes satisfying d = 3 and d = 4 and succeed in improving sizes of the DNA-codes created by the algorithms designed by Limbachiya et. al. [6] and Van Leeuwen [4].

Files

BEP_Floor_Laseur.pdf

(pdf | 0.494 Mb)