PIVODL

Zhu, Hangyu; Wang, R.; Jin, Yaochu; Liang, K.

doi:10.1109/TAI.2021.3139055

PIVODL

Privacy-Preserving Vertical Federated Learning Over Distributed Labels

Journal article (2023)

Authors

Hangyu Zhu University of Surrey

R. Wang

Yaochu Jin University of Surrey, Bielefeld University

K. Liang

DOI: https://doi.org/10.1109/TAI.2021.3139055

Privacy preservation Encryption Gradient boosting decision tree (GBDT) Vertical federated learning (VFL)

To reference this document use:

http://resolver.tudelft.nl/uuid:44f83d7c-20bd-4ba3-bdda-f3f95ca10d90

More Info

expand_more

Published Date

2023

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Federated learning (FL) is an emerging privacy preserving machine learning protocol that allows multiple devices to collaboratively train a shared global model without revealing their private local data. Nonparametric models like gradient boosting decision trees (GBDTs) have been commonly used in FL for vertically partitioned data. However, all these studies assume that all the data labels are stored on only one client, which may be unrealistic for real-world applications. Therefore, in this article, we propose a secure vertical FL framework, named privacy-preserving vertical federated learning system over distributed labels (PIVODL), to train GBDTs with data labels distributed on multiple devices. Both homomorphic encryption and differential privacy are adopted to prevent label information from being leaked through transmitted gradients and leaf values. Our experimental results show that both information leakage and model performance degradation of the proposed PIVODL are negligible. Impact Statement - Federated learning is a distributed machine learning framework proposed for privacy preservation. Most federated learning algorithms work on horizontally partitioned data, with only a few exceptions considering vertically partitioned data that is widely seen in the real world. However, existing vertical federated learning makes an unrealistic assumption that data labels are distributed on only one device and no research has been reported so far that considers data labels distributed on multiple client devices. The PIVODL framework reported in this article allows us to build a secure vertical federated XGBoost system, in which the labels may distributed either on one device or on multiple devices, making it possible to apply federated learning to a wider range of real-world problems.

Files

PIVODL_Privacy_Preserving_Vert... (pdf)

(pdf | 2.05 Mb)

- Embargo expired in 22-03-2024

Unknown license