Y. Chen | TU Delft Repository

Collaborative and Confidential Junction Trees for Hybrid Bayesian Networks

Master thesis (2025) - R. Gheda (author) , Y. Chen (mentor) , Thiago Guzella (mentor) , Carlo Lancia (mentor) , J. Yang (graduation committee member)

Bayesian Networks (BNs) are widely utilized across various industrial sectors to optimize processes, with an emerging focus on the collaboration across multiple parties. While most realistic scenarios require handling a mixture of categorical and continuous data simultaneously, t ...

Attacking Federated Time Series Forecasting Models

Reconstructing Private Household Energy Data during Federated Learning with Gradient Inversion Attacks

Master thesis (2024) - C.J. Meijer (author) , Y. Chen (mentor) , J. Huang (mentor)

Federated learning for time series forecasting enables clients with privacy-sensitive time series data to collaboratively learn accurate forecasting models, e.g., in energy load prediction.
Unfortunately, privacy risks in federated learning persist, as servers can potentially ...

T-REST: A watermark for autoregressive tabular large language models

Bachelor thesis (2024) - Minh Hieu Nguyen Hoang Minh (author) , Lydia Chen (mentor) , Jeroen Galjaard (mentor) , C. Zhu (mentor) , Rihan Hai (graduation committee member)

Tabular data is one of the most common forms of data in the industry and science. Recent research on synthetic data generation employs auto-regressive generative large language models (LLMs) to create highly realistic tabular data samples. With the increasing use of LLMs, there i ...

Time's Up!

Robust Watermarking in Large Language Models for Time Series Generation

Bachelor thesis (2024) - N.J.I. van Schaik (author) , Lydia Y. Chen (mentor) , C. Zhu (mentor) , Jeroen Galjaard (mentor) , R. Hai (graduation committee member)

The advent of pretrained probabilistic time series foundation models has significantly advanced the field of time series forecasting. Despite these models’ growing popularity, the application of watermarking techniques to them remains underexplored. This paper addresses this rese ...

Ellipse: Robust and imperceptible watermarking for tabular diffusion models

Bachelor thesis (2024) - T. Volentir (author) , Lydia Chen (mentor) , J.M. Galjaard (mentor) , C. Zhu (mentor) , R. Hai (graduation committee member)

Data in the form of tables is commonly used in the scientific and research industry, as it provides a compact, easy-to-understand and logical way of storing data. The advancement of diffusion models has significantly improved the quality of generated tabular data, but it also pos ...

Watermarking Diffusion Graph Models

GUISE: Graph GaUssIan Shading watErmark

Bachelor thesis (2024) - R. Yang (author) , Lydia Y. Chen (mentor) , C. Zhu (mentor) , Jeroen Galjaard (mentor) , R. Hai (graduation committee member)

In the expanding field of generative artificial intelligence, the integration of robust watermarking technologies is essential to protect intellectual property and maintain content authenticity. Traditionally, watermarking techniques have been developed primarily for rich informa ...

Go With The Flow: Fault-Tolerant Decentralized Training of Large Language Models

Decentralised Training of Large Language Models

Master thesis (2024) - N. Blagoev (author) , Lydia Y. Chen (mentor) , Jérémie Decouchant (graduation committee member)

Motivated by the emergence of Large Language Models (LLMs) and the importance of democratizing their training, we propose Go With The Flow, the first practical decentralized training framework for LLMs. Differently from existing distributed and federated training frameworks, Go W ...

Watermarking Time Series Diffusion Models

Bachelor thesis (2024) - L. Fatas Lynas (author) , R. Hai (mentor) , Lydia Y. Chen (mentor) , Jeroen Galjaard (mentor) , C. Zhu (mentor)

In many scientific fields, time series data is essen- tial, yet maintaining the integrity and legitimacy of such data is still difficult. Traditional watermarking techniques have mainly been used for multimedia. Although approaches for watermarking non-media data have been develo ...

Ripple Watermarking for Latent Tabular Diffusion Models

Master thesis (2024) - J. Tang (author) , Y. Chen (mentor) , A. Anand (graduation committee member)

Synthetic tabular data generated by tabular generative models represent an effective means of augmenting and sharing data. It is of paramount importance to trace and audit such synthetic data, avoiding potential harms and risks associated with inappropriate usage. While watermark ...

Exploring the Impact of Single-Character Attacks in Federated Learning Language Classification

Introducing the Novel Single-Character Strike

Bachelor thesis (2024) - J.B. van der Meulen (author) , Lydia Chen (mentor) , Jiyue Huang (mentor) , Marco Zúñiga Zuñiga Zamalloa (graduation committee member)

Federated learning (FL) is a privacy preserving machine learning approach which allows a machine learning model to be trained in a distributed fashion without ever sharing user data. Due to the large amount of valuable text and voice data stored on end-user devices, this approach ...

Evaluating differential privacy on language processing federated learning

Bachelor thesis (2024) - Q.M.F. Van Opstal (author) , J. Huang (mentor) , Y. Chen (mentor) , Marco Zuñiga Zamalloa (graduation committee member)

Federated learning provides a lot of opportunities, especially with the built-in privacy considerations. There is however one attack that might compromise the utility of federated learning: backdoor attacks [14]. There are already some existing defenses, like flame [13] but they ...

Analysis on the Vulnerability of Multi-Server Federated Learning Against Model Poisoning Attacks

Bachelor thesis (2024) - L. Nenovski (author) , Jiyue Huang (mentor) , Lydia Chen (mentor) , Marco Zúñiga Zuñiga Zamalloa (graduation committee member)

Abstract— Federated Learning (FL) makes it possible for a network of clients to jointly train a machine learning model, while also keeping the training data private. There are several approaches when designing a FL network and while most existing research is focused on a single-s ...

Time-Series Forecasting with Hybrid Federated Learning

A Personalized Approach to Collaboration

Master thesis (2024) - J.R. Vega Sanchez (author) , Lydia Chen (mentor) , R. Hai (graduation committee member) , Thiago Guzella (mentor) , A. Shankar (mentor)

Collaborative efforts in Predictive Maintenance and Control can be beneficial for manufacturers and customers in industrial environments. However, these efforts are challenged by the need for multi-dimensional sharing of information about the same type (horizontal) and piece (ver ...

Robustness Against Untargeted Attacks of Multi-Server Federated Learning for Image Classification

Are Defenses Based on Existing Methods Effective?

Bachelor thesis (2024) - T. Mladenović (author) , J. Huang (graduation committee member) , Lydia Y. Chen (mentor)

Multi-Server Federated Learning (MSFL) is a decentralised way to train a global model, taking a significant step toward enhanced privacy preservation while minimizing communication costs through the use of edge servers with overlapping reaches. In this context, the FedMes algorit ...

Label Alchemy: Transforming Noisy Data into Precious Insights in Deep Learning

Doctoral thesis (2024) - S. Ghiassi (author) , D.H.J. Epema (promotor) , Lydia Y. Chen (promotor)

Labels are essential for training Deep Neural Networks (DNNs), guiding learning with fundamental ground truth. Label quality directly impacts DNN performance and generalization with accurate labels fostering robust predictions. Noisy labels introduce errors and hinder learning, a ...

Labels are essential for training Deep Neural Networks (DNNs), guiding learning with fundamental ground truth. Label quality directly impacts DNN performance and generalization with accurate labels fostering robust predictions. Noisy labels introduce errors and hinder learning, affecting performance adversely. High-quality labels aid convergence, optimizing DNN training towards accurate data distribution representation. Ensuring label accuracy is vital for DNNs’ effective learning, generalization, and real-world performance. Undoubtedly, ensuring the quality of labels is not only critical but also demanding, often entailing considerable resources in terms of time and cost. As the scale of datasets grows, methods such as crowdsourcing have gained traction to expedite the labeling process. However, this approach comes with its own set of challenges, most notably the inherent susceptibility to errors and inaccuracies. For example, it was observed that the accuracy of AlexNet in classifying CIFAR-10 images plummeted from 77% to a mere 10% when labels were subjected to random flips. This stark drop in accuracy exemplifies the magnitude of influence that corrupted or erroneous labels can exert on the performance of DNNs. Such instances underscore the critical relationship between accurate labels and the efficacy of DNNs in understanding and effectively leveraging data. EnsuringDNNrobustness is vital, involving strategies like noise label identification, filtering, and integrating noise patterns into training for resilientmodels. Architectural and loss function design also combats label-related challenges, enhancing DNN adaptability across applications. This thesis investigates the pivotal role of labels in DNN training and their quality impact onmodel performance. Strategies spanning noise recovery, robust learning frameworks, andmulti-label solutions contribute toDNNresilience against noisy labels, advancing both understanding and practical applications. Chapter 1 of this thesis introduces and explains the crucial elements involved in training DNNs, which include data, DNN models, and expert participation. It highlights the complexity introduced by label noise and sets the stage for the diverse methods designed in subsequent chapters to address these aspects comprehensively. @en

Confidentiality-Preserving Collaborative Bayesian Networks

Master thesis (2023) - A.M. Mălan (author) , Y. Chen (mentor) , Jérémie Decouchant (mentor) , Thiago Guzella (mentor) , Burcu Külahçıoğlu Ozkan (graduation committee member)

Effective large-scale process optimization in manufacturing industries requires close cooperation between different parties of human experts who encode their knowledge of related domains as Bayesian network models. For example, parties in the steel industry must collaboratively u ...

Multi-server Asynchronous Federated Learning

Master thesis (2023) - Y. Zuo (author) , Y. Chen (mentor) , Jérémie Decouchant (graduation committee member) , Bart Cox (coach)

In federated learning systems, a server maintains a global model trained by a set of clients based on their local datasets. Conventional synchronous FL systems are very sensitive to system heterogeneity since the server needs to wait for the slowest clients in each round. Asynchr ...

Share your Secrets for Private Forecasting with Vertical Federated Learning

Master thesis (2023) - A. Shankar (author) , Lydia Y. Chen (mentor) , Jérémie Decouchant (mentor) , Dimitra Gkorou (mentor)

Vertical federated learning’s (VFL) immense potential for time series forecasting in industrial applications such as predictive maintenance and machine control remains untapped. Critical challenges to be addressed in the manufacturing industry include small and noisy datasets, mo ...

Clustering faces of comic characters

An experimental investigation

Bachelor thesis (2023) - A. Boz (author) , Lydia Chen (mentor) , Zilong Zhao (mentor) , Anna Lukina (graduation committee member)

Face clustering is a subfield of computer vision and pattern recognition with many applications such as face recognition and surveillance. Accurate clustering of faces can also help us to create labeled datasets. However, in the domain of comics, face clustering is not well studi ...

Does text matter?

Extending CLIP with OCR and NLP for image classification and retrieval

Bachelor thesis (2023) - J. Sassoon (author) , Zilong Zhao (mentor) , Lydia Y. Chen (mentor) , A. Lukina (graduation committee member)

Contrastive Language-Image Pretraining (CLIP) has gained vast interest due to its impressive performance on a variety of computer vision tasks: image classification, image retrieval, action recognition, feature extraction, and more. The model learns to associate images with their ...