Data Privacy in Supply Chains and Machine Learning through Differential Privacy and Cryptography
More Info
expand_more
Abstract
In modern society, data is increasingly important for people's daily lives and commercial activities to benefit organization, management and analysis. Proper data utilization involves different stages, and the data life cycle is introduced to describe the procedures as generation, collection, processing, storage, management, analysis, visualization and interpretation. Along the data life cycle, data privacy is highly concerned since possible breaches can lead to the abuse of personal information and financial loss. In this thesis, we advance data privacy protection in data processing, data management, and data analysis correlated with the Spark Living Lab project in the domain of supply chains and machine learning. To enhance privacy protection, we propose solutions based on differential privacy and cryptographic protocols since they provide strong and provable security and privacy guarantees. Moreover, we integrate differential privacy and cryptographic protocols to achieve strong privacy guarantees efficiently and practically for data processing, management, and analysis.
In data processing, we focus on data anonymization and location data perturbation in supply chains. Data de-identification is essential to comply with privacy laws, such as GDPR, before possible sharing or analysis. We propose an anonymization algorithm by combining differential privacy and k-anonymity to achieve stronger privacy guarantees or better data utility than using them alone. Meanwhile, we consider trajectory hiding under possible attacks and real maps, which propose a more practical solution to share trajectory data under privacy protection.
In data management, we address secure data sharing with cryptographic protocols. Data sharing is vital in data management to advance collaboration and knowledge. However, possible data breaches and malicious inputs can lead to potential financial loss and identity theft. In this thesis, we propose a framework for sharing logistic data in a privacy-preserving way using blockchain and cryptographic protocols. Differential privacy is applied to anonymize data, while cryptographic protocols enhance privacy during data transmission.
In data analysis, we pay attention to privacy-preserving machine learning. Machine learning models are usually trained on large datasets which may contain sensitive personal information. It is important to consider privacy protection during the training and utilization of models. We use differential privacy and secure multi-party computation techniques to design a framework for collaborative learning among multiple parties against inference attacks. Also, we utilize zero-knowledge proof to validate model integrity without leaking the model.