How can Large Languages Models for code be used to harm the privacy of users?

Moruz, I.

How can Large Languages Models for code be used to harm the privacy of users?

Red-Teaming Large Languages Models

Bachelor thesis (2024)

Authors

I. Moruz Electrical Engineering, Mathematics and Computer Science

Contributors

Arie Van Deursen Software Engineering (mentor)

Ali Al-Kaswan Software Engineering (coach)

Maliheh Izadi Software Engineering (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Privacy Red Teaming LLMs

To reference this document use:

http://resolver.tudelft.nl/uuid:102d0504-1313-4feb-a389-78aca6247648

More Info

expand_more

Published Date

25-06-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

In recent years, Large Language Models (LLMs) have significantly advanced, demonstrating impressive capabilities in generating human-like text. This paper explores the potential privacy risks associated with Large Language Models for Code (LLMs4Code), which are increasingly used in various sectors. These models, while beneficial for tasks such as code generation and understanding, may inadvertently expose sensitive information contained in their training datasets. We investigate the specific types of personally identifiable information (PII) that can be leaked and explore targeted and untargeted attacks with diverse prompting styles under which these leaks occur. Our analysis reveals that LLMs4Code can leak PII with the targeted attacks, emphasizing the need for robust privacy-preserving measures. This research contributes to the ongoing discourse on AI ethics and privacy, providing insights into the safety of various prompting conditions under targeted and untargeted attacks. Future work should focus on running the experiment with more diverse parameters, implementing more advanced PII detection techniques, and testing a broader range of models to enhance the generalizability of the findings.

Files

CSE3000_Final_Paper_Ioana-16.p... (pdf)

(pdf | 0.221 Mb)

Unknown license