Model Counting solvers are critical in many domains. One way of validating them is through fuzzing. However, current fuzzing approaches lack systematic methods to evaluate how different test generators compare in bug-triggering behavior. This paper proposes three methods for eval
...
Model Counting solvers are critical in many domains. One way of validating them is through fuzzing. However, current fuzzing approaches lack systematic methods to evaluate how different test generators compare in bug-triggering behavior. This paper proposes three methods for evaluating fuzzer similarity: black-box analysis, differential testing, and white-box analysis.
A similarity metric was defined using differential testing, incorporating solver behavior, solve time, and count differences. A case study was conducted on three fuzzers across numerous configurations and four SOTA model counters, followed by an analysis of correlations between CNF feature distribution similarities and fuzzer behavioral similarities.
The results show moderate correlations between graph structure features (minimum variable node degrees) and fuzzer similarity, while clause balance features show negative correlations. However, the method proves inconclusive for selecting dissimilar fuzzers due to limited incorrect counts in our dataset, uncertainty about crash causes, and the similarity metric's inherent subjectivity. Further research is needed, either by expanding the scope of the experiment with more diverse fuzzers, CNF features, and counters, or by pursuing another method recommended by this study.