Black-box Adversarial Attacks using Substitute models

Effects of Data Distributions on Sample Transferability

More Info
expand_more

Abstract

Machine Learning (ML) models are vulnerable to adversarial samples — human imperceptible changes to regular input to elicit wrong output on a given model. Plenty of adversarial attacks assume an attacker has access to the underlying model or access to the data used to train the model. Instead, in this paper we focus on the effects the data distributions has on the transferability of adversarial samples under a ``black-box'' scenario. We assume an attacker has to train a separate model (the ``substitute model'') and generate adversaries using this independent model. The substitute models are trained with different data distributions: symmetric, cross-section or completely disjoint data to the one used to train the target model. The results demonstrate that an attacker only needs semantically similar data to execute an effective attack using a substitute model and well-known gradient based adversarial generation techniques. Under ideal attack scenarios, target model accuracies can drop below 50\%. Furthermore, our research shows that generating adversarial images from an ensemble increases average attack success.

Files