Leveraging Feature Extraction to Detect Adversarial Examples

Stenhuis, R.

Leveraging Feature Extraction to Detect Adversarial Examples

Let's Meet in the Middle

Master thesis (2024)

Authors

R. Stenhuis Electrical Engineering, Mathematics and Computer Science

Contributors

Kaitai Liang Cyber Security (mentor)

D. Liu Cyber Security (graduation committee member)

SE Verwer Algorithmics (graduation committee member)

Jérémie Decouchant Data-Intensive Systems (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science

Feature Engineering Computer Vision Deep Neural Networks Anomaly Detection Adversarial learning Adversarial Examples

To reference this document use:

http://resolver.tudelft.nl/uuid:6131e78f-5e20-4a17-af94-93f834e2f99b

More Info

expand_more

Published Date

03-10-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Previous research has explored the detection of adversarial examples with dimensional reduction and Out-of-Distribution (OOD) recognition. However, these approaches are not effective against white-box adversarial attacks. Moreover, recent OOD methods that utilize hidden units hinder the scalability of the target model.

For that reason, various explanations of adversarial examples are studied to get a better understanding about its properties and anomalies. Furthermore, we discuss the added value of using natural scene statistics and utility functions to improve the relevance of the features for detection. By utilizing the anomalies we identified for adversarial examples in an ensemble, this thesis is the first to propose a robust solution for adaptive and white-box attacks.

Particularly, we address these challenges with MeetSafe. A Gaussian Mixture Model that leverages principal component analysis, feature squeezing, and density estimation to detect adaptive white-box adversaries. Furthermore, our enhanced Local Reachability Density (LRD) algorithm further improves the efficiency of state-of-the-art OOD methods. In particular, the proposed LRD enhances scalability by feature bagging hidden units with large absolute Z-scores. We then show that predictors, including LRD, are far more effective in ensembles like MeetSafe which supports prior conjectures that a range of different heuristics may further constrain adversaries when combined.

Extensive experiments on 14 models show that MeetSafe detects adaptive perturbations with an accuracy of 62% on STL-10, 75% on CIFAR-10, and 99% on MNIST using either adversarial training or Reverse Cross Entropy (RCE), achieving an improvement of at least 8.1% for each evaluated method by averaging across the three datasets.

Files

Adversarial_Example_Detection_... (pdf)

(pdf | 5.92 Mb)

Unknown license