Machine learning (ML) algorithms have been used frequently in the past years for Software Engineering tasks.
One of the popular tasks researchers use is method name prediction, which helps them generate an identifier for methods with ML models such as Code2Seq.
This model
...
Machine learning (ML) algorithms have been used frequently in the past years for Software Engineering tasks.
One of the popular tasks researchers use is method name prediction, which helps them generate an identifier for methods with ML models such as Code2Seq.
This model represents code snippets as Abstract Syntax Trees (AST) and includes all code elements except comments.
This paper introduces a novel approach to incorporating comments in Code2Seq.
The reimplemented models integrate comments into the AST when preprocessing the data.
We filtered comments to reduce the noise introduced in the model and improve the predictions using techniques such as TF-IDF and stopword removal.
We implement several models for each filtering technique and one for raw comments.
These models are trained and evaluated using the dataset java-small, provided by Code2Seq, with 700K examples and the following metrics: precision, recall, and F1.
The results obtained from the evaluation of our model with raw comments show an improvement of 3.62% and 2.36% in precision and F1, respectively, on method name prediction compared to the original model.
Furthermore, the model with stopword removal has a 6% and 3.52% gain in recall and F1. These improvements suggest that adding comments to Code2Seq is valuable for better method name predictions since they contain additional information about the methods.