Secure and resilient federated learning

Doctoral thesis (2024)

Authors

R. Wang

Federated Learning Privacy-preservation Byzantine Robustness

To reference this document use:

http://resolver.tudelft.nl/uuid:b226b59e-c7b8-45f9-bd71-8f2b4d0d7b55

More Info

expand_more

Published Date

2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Federated Learning (FL) is a revolutionary approach to machine learning that enables collaborative model training among multiple parties without exposing sensitive data. Introduced by Google in 2016, FL taps into the wealth of data generated by edge devices while prioritizing user privacy and minimizing communication costs. Its applications span diverse sectors like healthcare, finance, and the Internet of Things. While FL offers significant benefits, it grapples with privacy concerns due to the risk of revealing sensitive information during the exchange of model updates. Security issues also arise, as some clients may behave unpredictably or maliciously, posing a threat to model accuracy or introducing vulnerabilities. Although various efforts have tackled these challenges, new technical hurdles persist, requiring innovative solutions. This thesis explores the practicality of privacy-preserving horizontal and vertical FL and enhances the Byzantine robustness of FL to ensure effective training even in the presence of malicious clients. In investigating privacy-preserving horizontal FL, the thesis uncovers practical issues. Encryption-based solutions, reliant on a Trusted Third Party (TTP) for key distribution, involve frequent, costly, and potentially unreliable communication between a central server and distributed clients, placing a significant computational burden on FL. Privacy-preserving vertical FL faces challenges, particularly when assuming that only one client possesses all labels for all samples. In real-world healthcare scenarios, where diagnoses are spread across different hospitals, label inconsistencies question the feasibility of this assumption. Addressing the Byzantine robustness of FL raises a critical consideration many existing systems assume an honest majority of clients. In reality, FL operates in environments with competing interests, where clients may manipulate the learning process. Recognizing the potential for a malicious majority of clients becomes crucial. In essence, resolving these three core issues is essential for integrating FL into real world scenarios. This thesis aims to contribute innovative methods and tools to overcome these challenges, paving the way for the widespread adoption of FL across diverse domains.

Files

Dissertation_RuiWang.pdf

(pdf | 19.8 Mb)