T.W.J. Catshoek

Master thesis (1)

1 records found

Exploiting structure in counterexamples to speed up equivalence checking in the minimally adequate teacher framework

Active Learning

Active state machine learning algorithms are a class of algorithms that allow us to infer state machines representing certain systems. These algorithms interact with a system and build a hypothesis of what the state machine describing that system looks like according to the behavior they observed. Once the algorithm arrives at a hypothesis, it sends it to a so-called equivalence oracle to be checked. The equivalence oracle returns a counterexample trace describing different behavior between the hypothesis and the real system if one exists. Traditionally, this task is done using a technique called the W-method, which provides certain guarantees about finding a counterexample if one exists, but this is rather slow due to the thorough search it needs to do. In real systems, there are sometimes patterns to the counterexamples to be found during the learning process, which can be exploited to speed up the process of finding new counterexamples. Instead of performing an exhaustive search each time, we first use the already observed patterns to generate new traces to try before falling back to traditional techniques in case no counterexample was found in the first step. This allows us to find counterexamples more quickly than usual, and because we can always fall back on the techniques that do provide guarantees we do not have to sacrifice correctness. In the same amount of time, our method provides an up to 26x increase in the number of states found compared to the W-method, while remaining fully black box and without degrading worst-case performance. We also show that, when not working in a fully black-box scenario, fuzzing can be a successful complementary technique to apply next to active learning. However, we also show that relying on fuzzing alone does not always fully capture the behavior of a system, and when correctness is critical such as in the case of model checking, it needs to be supplemented with additional techniques to result in fully correct models. Finally, we use our home-grown state machine learning library to participate in the 2020 RERS challenge, and apply the techniques we studied so far to attain a perfect score on the sequential LTL track.