Novel Examination of Interpretable Surrogates and Adversarial Robustness in Machine Learning
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The lack of transparent output behavior is a significant source of mistrust in many of the currently most successful machine learning tools. Concern arises particularly in situations where the data generation changes, for example under marginal shift or under adversarial manipulations. We analyze the use of decision trees (a human interpretable model) for indicating marginal shift. We then investigate the role of the data generation for the validity of the interpretable surrogate and its implementation as both local and global interpretation methods.
We often observed that the decision boundaries of the blackbox model was mostly sitting close to the original data manifold. This makes those regions vulnerable to imperceptible perturbations. Hence, we carefully argue that adversarial robustness should be defined as a locally adaptive measure complying with the underlying distribution. We then suggest a definition for an adaptive robust loss, an empirical version of it and a resulting data-augmentation framework.