R.K. Thakoersingh

Bachelor thesis (1)

Master thesis (1)

2 records found

Automated Detection and Correction of Python Code Style Violations

An Empirical Study in Open Source Projects

Master thesis (2024) - R.K. Thakoersingh (author) , D. Spinellis (mentor) , Pradeep Murukannaiah (graduation committee member)

This thesis investigates the prevalence of Pylint warnings in open-source Python projects and evaluates the effectiveness of an AI-driven tool for automatically fixing these warnings. The study also explores how developers perceive automated code suggestions and seeks to streamline consent mechanisms for research-related code changes. The primary research questions addressed are: (1) What is the prevalence of Pylint warnings across open-source Python projects? (2) How effective is the AI tool developed to fix Pylint warnings? (3) How do developers perceive the automated suggestions and (4) how can the process of proposing research-related code changes with developer consent be streamlined?

To address these questions, the research draws on literature related to static code analysis, fault detection, and the increasing use of artificial intelligence (AI) in automated code repair. Previous studies highlight the challenges developers face in maintaining consistent code quality and the role of AI in automating such tasks.

The research follows a mixed-method approach. Quantitatively, a dataset of 205 open-source Python projects was analyzed to identify and address common Pylint warnings. An AI-driven tool was employed to attempt fixing these warnings, achieving a success rate of 88\%. In 60 projects, pull requests were submitted to open source maintainers to assess the effectiveness and reception of the tool. Qualitative feedback from maintainers was collected and analyzed, leading to a shift in the contribution strategy from pull requests to submitting issues first, as this was perceived as less intrusive and more manageable by developers.

The analysis revealed a high prevalence of Pylint warnings, particularly \textit{missing-function-docstring} and \textit{line-too-long}, across projects of all sizes. The AI-driven tool effectively fixed 88\% of the warnings, resulting in 70\% of the projects being fully warning-free. However, developer responses to automated pull requests were mixed, prompting the adoption of a more collaborative issue-first approach. These results suggest that AI tools can significantly improve code quality, but challenges remain to foster developer engagement and integrating such tools into established open source workflows.

The study has certain limitations, mainly the focus on Python projects, which may limit the generalizability of the findings to other languages or more complex projects. Furthermore, developer consent and participation were limited, which affected the full implementation of automated changes. Future research should focus on improving the integration of AI tools into developer workflows and expanding the scope of automated code fixes to more diverse and complex projects.

How does imbalanced data affect performance of regression CNNs?

Bachelor thesis (2021) - R.K. Thakoersingh (author) , Tom Viering (mentor) , Y. Kato (mentor) , M. Loog (mentor) , David M. J. Tax (mentor) , K.A. Hildebrandt (coach)

This research provides an overview on how training Convolutional Neural Networks (CNNs) on imbalanced datasets affect the performance of the CNNs. Datasets could be imbalanced as a result of several reasons. There are for example naturally less samples of rare diseases. Since the ...