The Effectiveness of GPT-4o for Generating Test Assertions

Bagdonas, A.

The Effectiveness of GPT-4o for Generating Test Assertions

Bachelor thesis (2024)

Authors

A. Bagdonas Electrical Engineering, Mathematics and Computer Science

Contributors

M.J.G. Olsthoorn Software Engineering (mentor)

Annibale Panichella Software Engineering (mentor)

Casper Poulsen Programming Languages (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

EvoSuite Artificial Intelligence Mutation Testing Generative AI JUnit GPT-4o OpenAI

To reference this document use:

http://resolver.tudelft.nl/uuid:102f3083-8ffd-4d4c-a1bf-f40d07622e55

More Info

expand_more

Published Date

25-06-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Over the last few years, Large Language Models have become remarkably popular in research and in daily use with GPT-4o being the most advanced model from OpenAI as of the publishing of this paper. We assessed its performance in unit test generation using mutation testing. 20 Java classes were selected from the SF110 Corpus of classes, and for each 10 different test classes were generated. After we resolved build errors and removed failing assertions, the evaluation using Pitest produced around 71% of mutation coverage on average on the sample dataset. Manually fixing the failing assertions increased the overall mutation score to 75%. Nonetheless, one of the main drawbacks was the need to manually resolve problems that the GPT-4o responses produced, such as code hallucination and incorrect assumptions about the classes under test.

Files

The_Effectiveness_of_GPT-4o_fo... (pdf)

(pdf | 0.507 Mb)

Unknown license