AA

13 records found

Authored

Large Language Models (LLMs) are gaining popularity in the field of Natural Language Processing (NLP) due to their remarkable accuracy in various NLP tasks. LLMs designed for coding are trained on massive datasets, which enables them to learn the structure and syntax of progra ...

In recent years, Large Language Models (LLMs) have gained significant popularity due to their ability to generate human-like text and their potential applications in various fields, such as Software Engineering. LLMs for Code are commonly trained on large unsanitized corpora of s ...

Code comments are a key resource for information about software artefacts. Depending on the use case, only some types of comments are useful. Thus, automatic approaches to clas-sify these comments have been proposed. In this work, we address this need by proposing, STACC, a se ...

Previous work has shown that Large Language Models are susceptible to so-called data extraction attacks. This allows an attacker to extract a sample that was contained in the training data, which has massive privacy implications. The construction of data extraction attacks is cha ...

Binary reverse engineering is used to understand and analyse programs for which the source code is unavailable. Decompilers can help, transforming opaque binaries into a more readable source code-like representation. Still, reverse engineering is difficult and costly, involvin ...

Contributed

In recent years, Large Language Models (LLMs) have significantly advanced, demonstrating impressive capabilities in generating human-like text. This paper explores the potential privacy risks associated with Large Language Models for Code (LLMs4Code), which are increasingly used ...

Exploring the Generation and Detection of Weaknesses in LLM Generated Code

LLMs can not be trusted to produce secure code, but they can detect it

Large Language Models (LLMs) have gained a lot of popularity for code generation in recent years. Developers might use LLM-generated code in projects where the security of software matters. A relevant question is therefore: what is the prevalence of code weaknesses in LLM-generat ...

Red Teaming Large Language Models for Code

Exploring Dangerous and Unfair Software Applications

The rapid advancement of large language models has enabled numerous innovative, but also harmful applications. It is therefore essential to create these models to behave safely and responsibly. One way to improve these models is by red teaming them. In this study, we aim to ident ...

Implications of LLMs4Code on Copyright Infringement

An Exploratory Study Through Red Teaming

Large Language Models (LLMs) have experienced a rapid increase in usage across numerous sectors in recent years. However, this growth brings a greater risk of misuse. This paper explores the issue of copyright infringement facilitated by LLMs in the domain of software engineering ...
Large Language Models (LLMs) are increasingly used in software development, but their potential for misuse in generating harmful code, such as malware, raises significant concerns. We present a red-teaming approach to assess the safety and ethical alignment of LLMs in the context ...

CodeGPT on XTC

Compressing a CodeGPT Model Using Hybrid Layer Reduction and Extreme Quantisation through Knowledge Distillation

Large language models are powerful because of their state-of-the-art language processing abilities. But, they come at the cost of being extremely resource-intensive, and are steadily growing in size. As a result, compressing such models for resource- constrained devices is an act ...

Compressing code generation language models on CPUs

Using Group Lasso pruning and post-training quantization

Code generation models have become more popular recently, due to the fact that they assist developers in writing code in a more productive manner. While these large models deliver impressive performance, they require significant computational resources and memory, making them dif ...
The application of large language models (LLMs) for programming tasks, such as automatic code completion, has seen a significant upswing in recent years. However, due to their computational demands, they have to operate on servers. This both requires users to have a steady intern ...