Categorizing Stack Overflow Questions With A Tag Hierarchy

Bachelor thesis (2022)

Authors

P.M. Roozendaal Electrical Engineering, Mathematics and Computer Science

Contributors

M. Izadi Software Engineering - (mentor)

A. van Deursen Software Technology (mentor)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

To reference this document use:

http://resolver.tudelft.nl/uuid:897d89c7-23f1-41c9-821f-129c466ba2dd

More Info

expand_more

Published Date

22-06-2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Software Question & Answer platforms such as Stack Overflow allow users to annotate their posts with tags in order to help organize them and aid in their discoverability. This work sets out to study the machine learning techniques used to determine these tags automatically, and see how, and to what extent, these determinations could be improved by organizing the tags in a hierarchical fashion and using this hierarchy as a heuristic. This is a multi-label classification problem. The tag hierarchy is built by clustering the tags by subject, connecting these clusters, and then fine-tuning the results. Then, after gathering and preparing the training data consisting of Stack Overflow question titles, bodies and tags, a DistilBERT based multi-label classifier is trained and serves as the baseline. Then, this baseline is extended such that it incorporates the newly constructed hierarchy in its final predictions. Finally, the classifier is evaluated on the accuracy of its predictions, and on its usefulness, which is derived from a survey performed with expert users in the area of Computer Science. The resulting model evaluation results in an LRAP score of 54% and an F1 score of 65%, improving over the baseline with 2% and 2% respectively.

Files

Philip_ResearchProject_Automat... (pdf)

(pdf | 0.318 Mb)