Characterising AI Weakness in Detecting Personal Data from Images By Crowds

Somai, A.S.S.

Characterising AI Weakness in Detecting Personal Data from Images By Crowds

Master thesis (2021)

Authors

A.S.S. Somai Electrical Engineering, Mathematics and Computer Science

Contributors

Jie Yang Web Information Systems (mentor)

Agathe Balayn Web Information Systems (mentor)

Q. Wang Embedded Systems (graduation committee member)

Geert Jan Houben Web Information Systems (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Machine Learning Privacy Crowdsourcing Privacy Detection

To reference this document use:

http://resolver.tudelft.nl/uuid:f28d8d32-d7ab-4008-bb61-6ddb2158bf4c

More Info

expand_more

Published Date

21-12-2021

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

This thesis looks at how to characterize weaknesses in machine learning models that are used for detecting privacy-sensitive data in images with the help of crowdsourcing. Before we can come up with a method to achieve a goal, we first need to make clear what we consider privacy-sensitive data. We took the General Data Protection Regulation (GDPR) as a starting point, and performed a crowdsourcing task to see how workers interpret this regulation. Interpreting legal texts can be difficult, there is room for interpretation and the perception of a legal text can change over time. Therefore, we need to take the input of the crowd, next to our own input, to operationalize this regulation to use in this context. Next, we took a machine learning model for detecting privacy-sensitive data in images in order to retrieve saliency maps, which helps us with explaining the inner-working of the model. Subsequently, the saliency maps are inspected through a crowdsourcing task, with the established privacy definition, to find out the strengths and weaknesses. From the results, we see that crowd workers can be efficiently used to find the strengths and weaknesses of a machine learning model, while keeping the privacy definition in mind. Workers are able to consistently apply their views about privacy across different images, whilst also increasing the trust people have in the machine learning model. This shows us that we can use crowdsourcing efficiently in a fairly difficult context of privacy, and paves the way for a more sophisticated approach to privacy-sensitive elements in images, and even for contexts other than privacy.

Files

Msc_thesis_asomai.pdf

(pdf | 3.15 Mb)

Unknown license