Comparing Performance of Locally Repairable Codes

van der Pas, T.R.

Abstract

With the internet growing exponentially, the amount of information stored digitally becomes enormous. Storage systems are expected to withstand failures to prevent the loss of data. To provide the reliability, data is stored multiple times in different storage units. If in this case, one of the storage unit goes offline, a backup of the data will be available in another. Needing to deal with duplication causes massive amount of excessive storage overhead. To prevent this overhead, erasure coding can be used. This form of coding can reduce the storage space needed, while still being able to provide a certain level of fault tolerance. Erasure coding makes use of so-called parities; symbols that do not directly contain data, but can be used to retrieve actual information nonetheless. When informational symbols get lost, due to a failure, the parity symbols are combined to reconstruct the lost symbol. Since multiple parities might be used in this process, repair traffic is generated. This traffic slows down the rest of the data flow, retaining quick access to the data. The amount of traffic needed to recover an erasure, is called the locality of a code. Codes with small amounts of storage overhead tend to have a greater locality, causing a lot of repair traffic in case of an erasure. In this thesis, multiple codes are being compared based on their locality and storage overhead. Besides these characteristics, a large code distance is favorable in erasure coding schemes. The distance within the code words of a scheme, determines how many erasures the scheme can handle without losing data. The Maximum Distance Separable (MDS) codes, such as the widely used Reed-Solomon codes, are optimal in the trade-off between code distance and storage overhead. In other words, these codes only need a little amount of overhead in order to withstand a large number of erasures. The pitfall of these codes, however, is their large locality which makes them almost unusable for some applications. Nevertheless, they form a great base for another type of coding scheme; the Locally Repairable Codes (LRC). These codes are extended with extra parity symbols on top of the MDS code which ensures a code locality lower than that of the MDS codes. Since less traffic is needed to repair erasures, they form an interesting group considering the compromise between code locality and storage overhead. The foundation of this thesis is formed by the content of the research paper “XORing Elephants: Novel Erasure Codes for Big Data” [4]. In this paper, researchers explain the creation of the (10,6,5) Locally Repairable Code and compare its characteristics to those of the (10,4) Reed-Solomon code. On top of the researchers’ findings, different parameter configurations will be applied to discover more of the LRC family in this thesis. Different configurations create codes that perform differently in recovering erasures. After comparing their characteristics, applications will be sought for each of the newly created LRCs. As one code might be optimal on the aspect of storage overhead, another might be optimal on the aspect of locality and yet another on the aspect of code distance. In Table 3 of the conclusion it shown which codes excels on what aspects.

Comparing Performance of Locally Repairable Codes

Abstract

Files