Exploring the Impact of Single-Character Attacks in Federated Learning Language Classification

Introducing the Novel Single-Character Strike

More Info
expand_more

Abstract

Federated learning (FL) is a privacy preserving machine learning approach which allows a machine learning model to be trained in a distributed fashion without ever sharing user data. Due to the large amount of valuable text and voice data stored on end-user devices, this approach works particularly well for natural language processing (NLP) tasks. Due to many applications making use of the algorithm and increasing interest in academics, ensuring security is essential. Current backdoor attacks in NLP tasks are still unable to evade some defence mechanisms. Therefore, we propose a novel attack, the single-character strike to address this research gap. Consequently, the following research question is posed: What are the properties of the single-character strike in a language classification task? By experimental analysis the following properties are discovered: the single-character strike is undetectable against five state-of-the-art defences, has low impact on the global model accuracy, trains slower than similar attacks, relies on characters on the edge of the distribution to function, is robust within the global model, and performs best when close to convergence and with more adversarial clients. Emphasizing its imperceptibility and persistence, the attack maintains a 70\% backdoor accuracy after a thousand iterations without training and remains undetectable against: (Multi-)Krum, RFA, Norm Clipping and Weak Differential Privacy. By providing insight into the effective single-character strike, this paper adds to the growing body of work that questions whether federated learning can be secure against backdoor attacks.