Red-Teaming Code LLMs for Malware Generation

Ionescu, C.

Red-Teaming Code LLMs for Malware Generation

Bachelor thesis (2024)

Authors

C. Ionescu Electrical Engineering, Mathematics and Computer Science

Contributors

Arie van Van Deursen Software Engineering (mentor)

Maliheh Izadi Software Engineering (mentor)

A. Al-Kaswan Software Engineering (mentor)

Kaitai Liang Cyber Security (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Large Language Models LLM AI ethics Generative AI impact Malware HHH LLM4Code

To reference this document use:

http://resolver.tudelft.nl/uuid:bb5a6ebe-d5b5-4563-aa29-8ad662a0f732

More Info

expand_more

Published Date

28-06-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Large Language Models (LLMs) are increasingly used in software development, but their potential for misuse in generating harmful code, such as malware, raises significant concerns. We present a red-teaming approach to assess the safety and ethical alignment of LLMs in the context of code generation, in particular how it applies to the generation of malware. By developing a dataset of prompts that are likely to elicit harmful behavior from the LLMs, we aim to provide a valuable resource for benchmarking the harmlessness factor of these models. Using this dataset, we evaluate multiple state-of-the-art open-source LLMs, analyzing factors such as model size, training alignment, and prompt specificity. Our findings show that LLMs vary significantly in their likelihood to generate harmful code, depending on factors like training data, alignment techniques, and prompt specificity. Furthermore, we demonstrate that system prompts could significantly alter the model's response to potentially harmful queries. We also demonstrate the efficacy of using LLMs to evaluate the harmlessness of other LLMs' responses. This research highlights the importance of ongoing development of safety measures to mitigate the risks associated with code-generating LLMs.

Files

CSE3000_Final_Paper_Ciprian.pd... (pdf)

(pdf | 0.209 Mb)

Unknown license