User Evaluation of InCoder Based on Statement Completion

More Info
expand_more

Abstract

A lot of models have been proposed to automatically complete code with promising evaluation results when tested in isolation on testing sets. This research aims to evaluate the performance of these models when used by developers when programming. Are these models still useful for actual programming and do developers even want this functionality? The model evaluated in this study is the InCoder model by Facebook, specifically the ability to complete code statements for the Python programming language. To evaluate this a plugin called Code4Me was made for PyCharm and VSC that will show code completion suggestions from the model when a keybind is pressed or a trigger point is encountered. If the user is shown a suggestion the plugin will send the actual line of code made by the developer after a delay, this can also be the suggestion itself if the user thought the suggestion was correct. When the user has used the model sufficiently they will also be asked to fill in a survey to gather opinions on the functionality that the model provides. The results show that there is a 21.95% ExactMatch, 52,73% edit similarity, and a BLEU-4 score of 36.05 for the statement completion functionality of InCoder. All users that filled in the survey preferred the automatic suggestions on trigger points but some indicated the keybind functionality was also useful. If the suggestion shown to the user was good they would use it instead of typing it themselves. The users indicated that the suggestions were a lot or a bit better than the default suggestions and using the plugin did save time programming. Overall all users were positive of the performance and thought the statement completion functionality provided useful suggestions.