Improving Defenses against Backdoors in Federated Learning using Data Generation

van Moergestel, S.A.

Improving Defenses against Backdoors in Federated Learning using Data Generation

Master thesis (2025)

Authors

S.A. van Moergestel Electrical Engineering, Mathematics and Computer Science

Contributors

Kaitai Liang Cyber Security (mentor)

R. Wang Cyber Security (mentor)

G. Smaragdakis Cyber Security (graduation committee member)

Jérémie Decouchant Data-Intensive Systems (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science

Differential Privacy Federated Learning Non-IID Generative Adversarial Nets (GANs)

To reference this document use:

http://resolver.tudelft.nl/uuid:edf907da-9401-4dc4-b1c9-978c4cc0a215

More Info

expand_more

Published Date

12-03-2025

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

In this work, we propose a general solution to address the non-IID challenges that hinder many defense methods against backdoor attacks in federated learning. Backdoor attacks involve malicious clients attempting to poison the global model. While many defense methods effectively filter out these malicious clients using clustering techniques, their effectiveness diminishes when the federated learning process involves non-IID datasets. In such cases, clustering methods struggle to distinguish between benign and malicious clients due to the inherent variability in the clients' data distributions.
Our proposed solution leverages data generation to mitigate the non-IID nature of clients' local datasets. By generating synthetic data, the datasets become more IID, enabling defense methods to once again effectively counter backdoor attacks. Evaluations are carried out on standard datasets in the image classification fields, like MNIST and CIFAR-10. The results show that the data generation solution can effectively improve the performance of defense methods and filter out malicious clients again. Although the generated data samples may suffer from low quality and limited diversity due to constraints in training the generative adversarial networks (GANs), our approach demonstrates significant improvements in defending against backdoors.

Files

TU_Delft_Master_Thesis_Sebasti... (pdf)

(pdf | 1.81 Mb)

Unknown license