While the amount of data and variability in data produced by numerous systems in a modern company continues to increase, users desire real-time and consistent results from complex analyses across a large variety of event sources. In industry, stream processing systems are emergin
...
While the amount of data and variability in data produced by numerous systems in a modern company continues to increase, users desire real-time and consistent results from complex analyses across a large variety of event sources. In industry, stream processing systems are emerging to process events with low latency in a scalable and reliable fashion. As more and more stream processing jobs are processing mission critical events, older jobs are subject to maintenance and have to be upgraded or replaced. These upgrade operations include a snapshot-restore operation, where between the snapshot and restore a non-trivial state conversion has to be performed. Such an operation requires a lot of technical expertise and imposes significant down-time on the job itself and all jobs that depend on it. This thesis proposes a mechanism to align the progress of multiple independent jobs sharing common event sources. The mechanism is an extension of the checkpoint protocol proposed by Carbone et al. Not only does this mechanism simplify maintenance of streaming jobs by allowing hot-swap operations with exactly-once processing semantics, but it can also be used to provide consistency of queryable state. By implementing a proof of concept we show that this so called epoch alignment can be achieved with minimal additional costs over exactly-once processing semantics.