Towards Robust Deep Learning

Deep Latent Variable Modeling against Out-of-Distribution and Adversarial Inputs

More Info
expand_more

Abstract

As Deep Neural Networks (DNNs) continue to be deployed in safety-critical domains, two specific concerns — adversarial examples and Out-of-Distribution (OoD) data — pose significant threats to their reliability. This thesis proposes novel methods to enhance the robustness of deep learning by detecting such inputs and mitigating their impact.

A central insight of this work is that algorithmic stability plays a crucial role in generalizing to in-distribution data. Motivated by this, the thesis formulates a dual perspective on stability with respect to the hypotheses and explores whether this perspective facilitates the separation of problematic inputs under two main lenses: epistemic uncertainty estimation and the choice of an appropriate inductive bias. By grounding our approach in generative modeling with a latent variable based on an information bottleneck and, specifically, employing Variational Autoencoders (VAEs), we first leverage Bayesian inference over model parameters to estimate the model’s uncertainty with respect to a particular input. Second, we investigate the required properties of both VAE maps and latent representations from a topological perspective. This reveals how OoD inputs predominantly map onto empty regions — or “holes” — in the latent manifold. Finally, we discover that adversarial examples likewise exhibit similar behavior. This finding is then used to craft new scoring functions that reliably distinguish between inliers, outliers, and adversarial attacks.