Enhancing Unit Tests using ChatGPT-3.5

More Info
expand_more

Abstract

Manually crafting test suites is time-consuming and susceptible to bugs. The automation of this process has the potential to make this task more appealing. While current tools like EvoSuite manage to obtain high coverages, their generated tests are not always readable. Recent literature indicates that Large Language Models (LLMs) could address readability and comprehension issues. Our objective in this study is to explore the capabilities of ChatGPT-3.5-Turbo in enhancing existing Java unit tests. We have designed an algorithm that sends multiple prompts to the LLM and overwrites the test cases with the ones received from GPT-3.5. Thus, we have assessed its performance by measuring the initial mutation score of the test suite with the new coverage. The benchmark consists of 16 non-trivial Java classes, on which we performed 80 runs of our algorithm. The results indicate that after one run, GPT-3.5 increases mutation coverage by 23% on average for isolated classes. However, for classes with dependencies, it is less reliable, often producing code with run-time or compile-time errors. Through this paper, we hope to emphasize the importance of ongoing research in this domain to optimize LLMs for providing better test cases.

Files

Research_paper_final.pdf
(pdf | 0.367 Mb)
Unknown license