Ellipse: Robust and imperceptible watermarking for tabular diffusion models

Volentir, T.

Ellipse: Robust and imperceptible watermarking for tabular diffusion models

Bachelor thesis (2024)

Authors

T. Volentir Electrical Engineering, Mathematics and Computer Science

Contributors

Lydia Chen Data-Intensive Systems - (mentor)

J.M. Galjaard Data-Intensive Systems - (mentor)

C. Zhu Data-Intensive Systems - (mentor)

R. Hai Web Information Systems - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Tabular Data Diffusion Models Watermarking

To reference this document use:

http://resolver.tudelft.nl/uuid:c996a940-83bc-4571-ba61-e2b5d64fa5bc

More Info

expand_more

Published Date

28-06-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Data in the form of tables is commonly used in the scientific and research industry, as it provides a compact, easy-to-understand and logical way of storing data. The advancement of diffusion models has significantly improved the quality of generated tabular data, but it also poses risks of misappropriation and copyright concerns. Thus, there is a need to control and monitor the data generated by diffusion models, to enable harm mitigation and protect intellectual property. This paper addresses the necessity for robust watermarking techniques specifically designed for tabular data generated by diffusion models. We propose Ellipse, a generalization of the Tree-Ring watermarking method, originally developed for square-shaped images, to handle rectangular shaped tables. We change the shape of the watermark from a circle---fit for square-shaped images---to an oval---fit for datasets of rectangle shape.
Through comprehensive experiments on four real world datasets (Abalone, Adult, Default, and Diabetes), we demonstrate that the adapted watermarking technique has a negligible drop of 3.5% in data quality, measured through correlations between real and synthetic distributions, performance of downstream machine learning tasks, and discriminability between the real and synthetic data. This is a better result than the 12.46% drop in data quality offered by having a circle mask. Ellipse introduces a non-significant average drop of 0.4% in detection efficiency compared to having a circle mask. Our implementation also offers resilience against value skewing and deletion attacks on the rows and columns of the dataset. When exposed to attacks, Ellipse has a higher Area Under the Curve (AUC) than the circular mask of Tree-Ring by an average of 7.17%. The code for Ellipse is publicly available at https://github.com/6toma/ellipse-watermark.

Files

CSE3000_Research_Paper_Toma_Vo... (pdf)

(pdf | 0.977 Mb)

Unknown license