Black-box Adversarial Attacks using Substitute models

Effects of Data Distributions on Sample Transferability

Bachelor thesis (2022)

Authors

P.M. Vigilanza Lorenzo Electrical Engineering, Mathematics and Computer Science

Contributors

S. Roos Data-Intensive Systems - (mentor)

J. Huang Data-Intensive Systems - (mentor)

C. Hong Data-Intensive Systems - (mentor)

G. Lan Embedded Systems - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

To reference this document use:

http://resolver.tudelft.nl/uuid:bcbc50b1-479a-4738-89dc-645456cffd82

More Info

expand_more

Published Date

24-06-2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Machine Learning (ML) models are vulnerable to adversarial samples — human imperceptible changes to regular input to elicit wrong output on a given model. Plenty of adversarial attacks assume an attacker has access to the underlying model or access to the data used to train the model. Instead, in this paper we focus on the effects the data distributions has on the transferability of adversarial samples under a ``black-box'' scenario. We assume an attacker has to train a separate model (the ``substitute model'') and generate adversaries using this independent model. The substitute models are trained with different data distributions: symmetric, cross-section or completely disjoint data to the one used to train the target model. The results demonstrate that an attacker only needs semantically similar data to execute an effective attack using a substitute model and well-known gradient based adversarial generation techniques. Under ideal attack scenarios, target model accuracies can drop below 50\%. Furthermore, our research shows that generating adversarial images from an ensemble increases average attack success.

Files

Final_paper_pv.pdf

(pdf | 0.797 Mb)