Characterization of a Big Data Storage Workload in the Cloud

Talluri, S.; Abad, Cristina L.; Łuszczak, Alicja; Iosup, A.

Characterization of a Big Data Storage Workload in the Cloud

Conference paper (2019)

Authors

S. Talluri Data-Intensive Systems

Cristina L. Abad Escuela Superior Politecnica del Litoral, Guayaquil

Alicja Łuszczak Databricks B.V

A. Iosup Vrije Universiteit Amsterdam

Research Group

Data-Intensive Systems

To reference this document use:

http://resolver.tudelft.nl/uuid:796ebcb0-35d5-47b7-813c-03255ae994ca

More Info

expand_more

Published Date

2019

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Research Group

Data-Intensive Systems

Abstract

The proliferation of big data processing platforms has led to radically different system designs, such as MapReduce and the newer Spark. Understanding the workloads of such systems facilitates tuning and could foster new designs. However, whereas MapReduce workloads have been characterized extensively, relatively little public knowledge exists about the characteristics of Spark workloads in representative environments. To address this problem, in this work we collect and analyze a 6-month Spark workload from a major provider of big data processing services, Databricks. Our analysis focuses on a number of key features, such as the long-term trends of reads and modifications, the statistical properties of reads, and the popularity of clusters and of file formats. Overall, we present numerous findings that could form the basis of new systems studies and designs. Our quantitative evidence and its analysis suggest the existence of daily and weekly load imbalances, of heavy-tailed and bursty behaviour, of the relative rarity of modifications, and of proliferation of big data specific formats.

Files

P33_talluri.pdf

(pdf | 2 Mb)

Unknown license

Download not available