Decreasing Model Stealing Querying for Black Box Adversarial Attacks

More Info
expand_more

Abstract

A machine learning classifier can be tricked us- ing adversarial attacks, attacks that alter images slightly to make the target model misclassify the image. To create adversarial attacks on black-box classifiers, a substitute model can be created us- ing model stealing. The research question this re- port address is the topic of using model stealing while minimizing the amount of querying the sub- stitute model needs to train. The solution used in this report is a variant of the ActiveThief algo- rithm that makes use of active learning to deter- mine which data is being queried. The paper exper- iments with different subset selection strategies to find the most informative data points. Also, a seed- ing algorithm based on clustering is explored and finally, a stopping criterion for the ActiveThief al- gorithm is proposed. These variations are evaluated on their accuracy and the number of queries they take to achieve that accuracy. This paper shows cluster seeding is an alternative to random seeding in ActiveThief. This paper also presents different subset selection strategies that outperform the ran- dom sampling strategy. Finally, a stopping criterion based on entropy is introduced that halts the algo- rithm when an uncertainty threshold is reached.