Testing the Performance of Automated Documentation Generation with Included Inline Comments

Bachelor thesis (2022)

Authors

B. Morkūnas Electrical Engineering, Mathematics and Computer Science

Contributors

A. Panichella Software Engineering - (mentor)

L.H. Applis Software Engineering - (mentor)

B.H.M. Gerritsen Computer Science & Engineering-Teaching Team - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

To reference this document use:

http://resolver.tudelft.nl/uuid:57ce8926-5f14-482c-9aba-e6fa9885ba6d

More Info

expand_more

Published Date

24-06-2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

A number of Machine Learning models utilize source code as training data for automating software development tasks. A common trend is to omit inline comments from source code in order to unify and standardize the examples, even though the additional information can capture important aspects and better explain algorithms. We claim that models, utilizing the supplementary data, are able to produce more fluent translations for Automatic Documentation Generation task. We test this by creating two datasets and measuring the performance difference. The results show that there is a slight improvement in translation accuracy when a dataset contains inline comments, with stop words removed. Further research needs to be done to optimize the preprocessing of data and to more accurately detect the scope of inline comments.

Files

BM_RP_final.pdf

(pdf | 0.228 Mb)

Unknown license