The set of regression and integration tests at many modern
software companies is huge. It is difficult to run all tests after each code
change, so the tests are often run for batches of code changes by different
developers, late in the release cycle. This has developers waiting for tests
longer than desirable and makes it difficult to link a failed test to a
specific code change. To increase developer productivity a subset of high-priority
tests can be selected to be run directly after a code change. Tests can furthermore
be reordered so that tests reveal faults faster. This field is called test case
selection and prioritization (TCS&P). On the flip side the large number of
historical test executions can be leveraged to predict test failures. This can
be done by training machine learning models on the historical execution data
and metrics about the software repository at which these tests were executed.
The prediction of failure is then used as the basis for test selection and the
confidence of the prediction can be used for prioritization, with higher risk
tests being run earlier resulting in tests failing faster on average. This
technique is named predictive TCS&P (PTCS&P). This work describes an
attempt at PTCS&P by using the historical test executions of Adyen, a
financial technology platform. The resulting model has a recall of 95.4% and
can be used for three use cases: skipping the tests with a low probability of
reveal- ing a fault, which misses 0.6% of faults and saves 34.4% wall clock
time, selecting only tests with a high probability of revealing a fault, which saves
94% of time and reveals 39% of faults, and prioritizing based on the chance of
failure, which allows for partial left shifting of these tests, which reduces
the average time to first fault revealed by 95%.