Large Language Models Meet Commit Message Generation: An Empirical Study

Tang, Y.

Abstract

In the realm of software development, commit messages are vital for understanding code changes, enhancing maintainability, and improving collaboration. Despite their importance, generating high-quality commit messages remains a challenging task, with existing methods often facing issues such as limited flexibility and high training costs. This paper addresses the research gaps in automated commit message generation (CMG) by exploring the capabilities of large language models (LLMs) in this domain. We specifically investigate the potential of the prompt engineering method to enhance LLM performance compared to state-of-the-art (SOTA) techniques such as RACE.

Our research begins with a comprehensive literature review of CMG methodologies, focusing on the effectiveness of various message formats and the limitations of existing approaches. To fill the gap, we introduce a unified commit message formats dataset and evaluate the previous LLM-based method on the dataset, utilizing the GPT model as a representative example of LLMs. By comparing the LLM zero-shot method with previous retrieval-based and hybrid methods, we provide a detailed analysis of the strengths and weaknesses of LLM-based approaches.

We further explore the impact of different retrieval augmented generation (RAG) configurations on CMG performance and investigate what constitutes a good demonstration of the LLM RAG method for the CMG task. That is followed by proposing a new prompt engineering method called Adaptive Retrieval Augmented Generation With Commit Type Classification And Partitioned Retrieval (ARC-PR), which incorporates a classification module and a database partitioning module to the LLM RAG system. Validating through comprehensive testing on the unified message format dataset, our experiments demonstrate that the proposed method shows significant improvements in message effectiveness compared to the previous LLM-based methods and in the aspects of informativeness, message format consistency and the balance between precision and recall, our method surpasses the state-of-the-art methods in the field. Further generalizability study illustrates the robustness of our proposed method. With the introduction of human evaluation, we further confirmed the superiority of our proposed method over the state-of-the-art methods in terms of informativeness and expressiveness.

In summary, this study makes several key contributions: it provides a thorough comparison of previous LLM-based methods with existing techniques, proposes an enhanced LLM prompt engineering approach specifically tailored for commit message generation (CMG) tasks that address the issue of low informativeness and expressiveness seen in past state-of-the-art methods and demonstrates performance that surpasses other LLM-based methods.

Large Language Models Meet Commit Message Generation: An Empirical Study

Abstract

Files