Ellipse: Robust and imperceptible watermarking for tabular diffusion models

More Info
expand_more

Abstract

Data in the form of tables is commonly used in the scientific and research industry, as it provides a compact, easy-to-understand and logical way of storing data. The advancement of diffusion models has significantly improved the quality of generated tabular data, but it also poses risks of misappropriation and copyright concerns. Thus, there is a need to control and monitor the data generated by diffusion models, to enable harm mitigation and protect intellectual property. This paper addresses the necessity for robust watermarking techniques specifically designed for tabular data generated by diffusion models. We propose Ellipse, a generalization of the Tree-Ring watermarking method, originally developed for square-shaped images, to handle rectangular shaped tables. We change the shape of the watermark from a circle---fit for square-shaped images---to an oval---fit for datasets of rectangle shape.
Through comprehensive experiments on four real world datasets (Abalone, Adult, Default, and Diabetes), we demonstrate that the adapted watermarking technique has a negligible drop of 3.5% in data quality, measured through correlations between real and synthetic distributions, performance of downstream machine learning tasks, and discriminability between the real and synthetic data. This is a better result than the 12.46% drop in data quality offered by having a circle mask. Ellipse introduces a non-significant average drop of 0.4% in detection efficiency compared to having a circle mask. Our implementation also offers resilience against value skewing and deletion attacks on the rows and columns of the dataset. When exposed to attacks, Ellipse has a higher Area Under the Curve (AUC) than the circular mask of Tree-Ring by an average of 7.17%. The code for Ellipse is publicly available at https://github.com/6toma/ellipse-watermark.