LLM of Babel

An analysis of the behavior of large language models when performing Java code summarization in Dutch

More Info
expand_more

Abstract

How well do large language models (LLMs) infer text in a non-English context when performing code summarization? The goal of this paper was to understand the mistakes made by LLMs when performing code summarization in Dutch. We categorized the mistakes made by CodeQwen1.5-7b when inferring Java code comments in the Dutch language through an open coding methodology to create a taxonomy of errors by which to categorize these mistakes.

Dutch code comments scraped from Github were analyzed, resulting in a taxonomy that revealed four broad categories under which inference errors could be classified: Semantic, Syntactic, Linguistic, and LLM Specific. Additional analysis revealed a prevalence of semantic and LLM specific errors in the dataset compared to the other categories. The resulting taxonomy has significant overlap with other taxonomies in similar fields like machine translation and English code summarization while introducing several categories that are not prevalent in those fields. Furthermore, it was found that BLEU-1 And ROUGEL metrics were unreliable as accuracy measures in this use case due to their nature as similarity metrics.

Files

LLM_of_Babel_Gopal_10_.pdf
(pdf | 0.783 Mb)
Unknown license