A. Al-Kaswan | TU Delft Repository

Towards Safe, Secure, and Usable LLMs4Code

Conference paper (2024) - A. Al-Kaswan (author)

Large Language Models (LLMs) are gaining popularity in the field of Natural Language Processing (NLP) due to their remarkable accuracy in various NLP tasks. LLMs designed for coding are trained on massive datasets, which enables them to learn the structure and syntax of programmi ...

The (ab)use of Open Source Code to Train Large Language Models

Conference paper (2023) - A. Al-Kaswan (author) , M. Izadi (author)

In recent years, Large Language Models (LLMs) have gained significant popularity due to their ability to generate human-like text and their potential applications in various fields, such as Software Engineering. LLMs for Code are commonly trained on large unsanitized corpora of s ...

Targeted Attack on GPT-Neo for the SATML Language Model Data Extraction Challenge [PRESENTATION]

Other (2023) - Ali Al-Kaswan (author) , Maliheh Izadi (author) , Arie van Deursen (author)

Previous work has shown that Large Language Models are susceptible to so-called data extraction attacks. This allows an attacker to extract a sample that was contained in the training data, which has massive privacy implications. The construction of data extraction attacks is cha ...

Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binaries

Conference paper (2023) - A. Al-Kaswan (author) , Toufique Ahmed (author) , Maliheh Izadi (author) , Anand Ashok Sawant (author) , Premkumar Devanbu (author) , Arie van Deursen (author)

Binary reverse engineering is used to understand and analyse programs for which the source code is unavailable. Decompilers can help, transforming opaque binaries into a more readable source code-like representation. Still, reverse engineering is difficult and costly, involving c ...

STACC: Code Comment Classification using SentenceTransformers

Conference paper (2023) - Ali Al-Kaswan (author) , Maliheh Izadi (author) , Arie van Deursen (author)

Code comments are a key resource for information about software artefacts. Depending on the use case, only some types of comments are useful. Thus, automatic approaches to clas-sify these comments have been proposed. In this work, we address this need by proposing, STACC, a set o ...