Large Language Models (LLMs) have experienced a rapid increase in usage across numerous sectors in recent years. However, this growth brings a greater risk of misuse. This paper explores the issue of copyright infringement facilitated by LLMs in the domain of software engineering
...
Large Language Models (LLMs) have experienced a rapid increase in usage across numerous sectors in recent years. However, this growth brings a greater risk of misuse. This paper explores the issue of copyright infringement facilitated by LLMs in the domain of software engineering. Through the creation of a taxonomy and prompt engineering, we investigate how alignment, structure and language of prompts affect the behavior of LLMs against copyright infringing prompts, assessing their willingness to engage in copyright violation. Our findings underscore the critical role of model alignment in identifying potentially infringing inputs, irrespective of model complexity or modality. Notably, prompts that are crafted to avoid overtly malicious language, especially those that instruct the model to complete the input given, tend to yield more responses that could facilitate malicious activities. This research provides a preliminary understanding of copyright infringement by LLMs in software engineering and suggests avenues for future research.