Codes with relaxed weight constraints for DNA-Based storage systems

Capel, J.M.

Codes with relaxed weight constraints for DNA-Based storage systems

Bachelor thesis (2022)

Authors

J.M. Capel Electrical Engineering, Mathematics and Computer Science

Contributors

JH Weber Discrete Mathematics and Optimization (mentor)

Joost A.M. de Groot Mathematical Physics (mentor)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

To reference this document use:

http://resolver.tudelft.nl/uuid:9fd64a9c-c2fe-4bd4-af2f-b4dfd47bcc12

More Info

expand_more

Published Date

24-08-2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Nowadays, we produce an enormous amount of data and all this data needs to be stored somewhere. In order to store all this data in an efficient and environmentally friendly way, solutions need to be found. A storage technique that has great potential to be a solution is synthetic DNA based storage.
A DNA-strand is made up of a sequence of four nucleotides, Adenine (A), Thymine (T), Guanine (G) and Cytosine (C). To save our data using DNA, our binary data is encoded to quaternary data, using this quaternary data DNA strands are made.
During this process, of saving our data in DNA, errors can occur, therefore two constraints are considered to minimise the number of errors in our codes.
The GC-weight, which is the number of G and C nucleotides in a DNA word and the maximal run-length, which is the maximum number of identical nucleotides in a row. In addition, the Hamming distance between words is also considered to be able to detect and even correct errors that still occur.

This research tries to find lower and upper bounds for the maximum size of DNA codes.
We continue the work of Van Leeuwen [1], Vermeer [2] and Laseur [3], who considered codes with fixed GC-weight. In this thesis we will relax this GC-weight constraint and we will look at codes where we have a set of GC-weights and all the DNA words in the code need to satisfy one of these weights. Furthermore, the codes we consider have a run-length of r=1. We will determine the maximum size of codes where the minimum Hamming distance d is equal to the length n of the DNA words. We determine lower and upper bounds for the case where d=n-1 and for specific sets of GC-weights we will determine the maximum size. Finally we will present nine algorithms, these algorithms create DNA-codes. The size of these codes give us lower bounds for the maximum size of DNA codes.

Files

BEP_Judith_Capel_Final_Version... (pdf)

(pdf | 0.344 Mb)

License info not available