Gradients with confounding label encodings to probe effective expressivity of trained networks

Personnel: Jinsol Lee

Goal: To characterize inputs from the perspective of models during inference with gradients; confounding labels are a tool to elicit model response that can be utilized to probe the effective expressivity of trained networks

Challenges: Deep neural networks are vulnerable in real-world environments where they often encounter data that diverge from training conditions or adversarial attacks. To ensure reliable performance of practical applications of a neural network, the model must be capable of distinguishing inputs that differ from training data and cannot be handled properly.

Our Work: Gradients correspond to the amount of change that a model requires to accurately represent a given sample. We argue that with gradients, we can characterize the anomaly in inputs based on what the model is unfamiliar with and thus incapable of representing properly. One problem in utilizing gradients to observe the necessary change is that we do not have access to the labels for given inputs or any information regarding their distribution. To remove dependency on information about inputs in gradient generation during inference, we introduce confounding labels, which are labels formulated by combining multiple categorical labels, as opposed to normal labels that are individual categorical labels used in training. During inference, we observe the necessary changes captured in gradients invoked by a confounding label with respect to an image similar to training data and highly dissimilar. Our hypothesis is that the necessary change to effective expressivity would be larger for the dissimilar input since the trained model weights would not be able to capture the characteristics of the dissimilar input.

References:

J. Lee, M. Prabhushankar, and G. AlRegib, "Gradient-Based Adversarial and Out-of-Distribution Detection," in International Conference on Machine Learning (ICML) Workshop on New Frontiers in Adversarial Machine Learning, Baltimore, MD, Jul. 2022.
J. Lee and G. AlRegib, "Open-Set Recognition with Gradient-Based Representations," in IEEE International Conference on Image Processing (ICIP), Anchorage, AK, Sep. 19-22 2021.
J. Lee and G. AlRegib, "Gradients as a Measure of Uncertainty in Neural Networks," in IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, Oct. 2020.