Consensus algorithms, as well as distributed systems in general, are vulnerable to concurrency bugs due to non-determinism. Such bugs are hard to detect since it is necessary to test using a lot of different scenarios and even then, there is no guarantee to find one.
Co
...
Consensus algorithms, as well as distributed systems in general, are vulnerable to concurrency bugs due to non-determinism. Such bugs are hard to detect since it is necessary to test using a lot of different scenarios and even then, there is no guarantee to find one.
Controlled concurrency testing is a proposed solution to that problem. Its main strength is allowing for more control over the order in which threads are executed, using controlled scheduling.
In this paper, we utilize Coyote, a concurrency testing framework, to test for concurrency bugs in PBFT, the seminal consensus algorithm. Random Walk, Delay-Bounding, Probabilistic Concurrency Testing (PCT) and Q-Learning are the exploration strategies we performed experiments with. The goal is to find schedules that lead to concurrency bugs. We test for 2 bugs that are seeded by us into the algorithm.
A comparison of the results shows that fair exploration strategies outperform their unfair variations. Finally, we conclude that PCT and fair PCT are the best-performing strategies for the benchmarks we use.