Data-driven health research, specifically the development of AI models, is hampered by poor data availability and associated administrative burdens, caused complex and fragmented data protection regulation. To reap the benefits of using high quality health data, while safeguardin
...
Data-driven health research, specifically the development of AI models, is hampered by poor data availability and associated administrative burdens, caused complex and fragmented data protection regulation. To reap the benefits of using high quality health data, while safeguarding data protection of patients, the synthetic data generation is seen as a promising privacy-enhancing technology to avoid the need for personal data sharing. Although synthetic data is widely discussed, research primarily focuses on a technical development and evaluation of data protection, leaving substantive matters on the use of synthetic data within its institutional context open. By combining legal and technical knowledge, this thesis aimed to bridge this gap, by analysing how synthetic data generation could enable health data sharing for research in a privacy-enhancing manner. Specifically, a design science research approach is followed to combine the requirements from the institutional environment, focusing on a use case with a Dutch healthcare provider and research institute, with scientific knowledge on synthetic data generation and data protection evaluations. The research objective was to design a framework that structures the data protection-related factors that influence the extent to which synthetic health data enables secondary use of health data for research. The identified barriers and drivers of synthetic data generation focus on the health data sharing process, the interplay between the legal definition of anonymisation and technical data protection evaluations, as well as data protection principles. Synthetic data can enable secondary use of health data for research, but, measures should be implemented in the various phases of synthetic data sharing to safeguard patients’ data protection. As policy opportunity, this thesis argues for narrower definition of personal data to support privacy-enhancing technologies such as synthetic data generation.