Multi-server Asynchronous Federated Learning

Zuo, Y.

Multi-server Asynchronous Federated Learning

Master thesis (2023)

Authors

Y. Zuo Electrical Engineering, Mathematics and Computer Science

Contributors

Lydia Y. Chen Data-Intensive Systems - (mentor)

Jérémie Decouchant Data-Intensive Systems - (graduation committee member)

B.A. Cox Data-Intensive Systems - (coach)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Federated Learning Multi-server FL Asynchronous Systems

To reference this document use:

http://resolver.tudelft.nl/uuid:7fdd1a14-0639-4edb-aca4-c729794eebb9

More Info

expand_more

Published Date

23-08-2023

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

In federated learning systems, a server maintains a global model trained by a set of clients based on their local datasets. Conventional synchronous FL systems are very sensitive to system heterogeneity since the server needs to wait for the slowest clients in each round. Asynchronous fl partially addresses this bottleneck by dealing with updates once they are received. But with a single server, the system performance would be influenced if the clients are located far from the server and require very high communication costs. Another issue in single-server settings is that the client scale is limited since the server can be overloaded with heavy communication and computation workload. Moreover, a crash on the central server is fatal to the single-server system. Multi-server FLreduces the average communication cost by decreasing the distance between servers and clients. However, the bottleneck brought by the slowest clients still exists in multi-server systems that preserve synchrony, such as Hierarchical FL. The approach we follow in this paper consists in replicating the server in a way that the global training process remains asynchronous. We propose MultiAsync, a novel asynchronous multi-server FL framework that aims to address the single-server and synchronous-system bottleneck.

Files

YuncongZuo_thesis.pdf

(pdf | 1.86 Mb)

Unknown license