Optimistic recovery for iterative dataflows in action

Dudoladov, Sergey; Xu, C.; Schelter, Sebastian; Katsifodimos, Asterios; Ewen, Stephan; Tzoumas, Kostas; Markl, Volker

Optimistic recovery for iterative dataflows in action

Conference paper (2015)

Authors

Sergey Dudoladov Technical University of Berlin

C. Xu Technical University of Berlin

Sebastian Schelter Technical University of Berlin

Asterios Katsifodimos Technical University of Berlin

Stephan Ewen Data Artisans GmbH

Kostas Tzoumas Data Artisans GmbH

Volker Markl Technical University of Berlin

Affiliation

External organisation

Fault-tolerance Iterative algorithms Optimistic recovery

To reference this document use:

http://resolver.tudelft.nl/uuid:631c4a81-1d5a-491c-b487-a4ef8e490c85

More Info

expand_more

Published Date

27-05-2015

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Affiliation

External organisation

Abstract

Over the past years, parallel dataflow systems have been employed for advanced analytics in the field of data mining where many algorithms are iterative. These systems typically provide fault tolerance by periodically checkpointing the algorithm's state and, in case of failure, restoring a consistent state from a checkpoint. In prior work, we presented an optimistic recovery mechanism that in certain cases eliminates the need to checkpoint the intermediate state of an iterative algorithm. In case of failure, our mechanism uses a compensation function to transit the algorithm to a consistent state, from which the execution can continue and successfully converge. Since this recovery mechanism does not checkpoint any state, it achieves optimal failure-free performance while guaranteeing fault tolerance. In this paper, we demonstrate our recovery mechanism with the Apache Flink data processing engine. During our demonstration, attendees will be able to run graph algorithms and trigger failures to observe the algorithms recovering with compensation functions instead of checkpoints.

No files available

Metadata only record. There are no files for this conference paper.