ECE PhD Dissertation Defense: Panagiota Kiourti

  • Starts: 4:00 pm on Wednesday, June 26, 2024
  • Ends: 6:00 pm on Wednesday, June 26, 2024

ECE PhD Dissertation Defense: Panagiota Kiourti

Title: Enhancing Deep Learning Security Through Explainability and Robustness

Presenter: Panagiota Kiourti

Advisor: Professor Wenchao Li

Chair: Professor Richard Brower

Committee: Professor Kayhan Batmanghelich, Professor Wenchao Li, Professor Eshed Ohn-Bar, Professor Gianluca Stringhini.

Google Scholar Link: ‪ https://scholar.google.com/citations?user=ncU8YRsAAAAJ

Abstract: The growing interest in deploying deep learning models in critical applications has raised concerns about their vulnerabilities, particularly to backdoor or Trojan attacks. These attacks aim to train a network to respond maliciously to specially crafted trigger patterns in the inputs while exhibiting state-of-the-art performance. This thesis addresses the identification of such attacks in deep reinforcement learning, proposes a mitigation strategy for their detection in classification neural networks in production using feature attribution methods, and introduces a new framework for evaluating the robustness of attribution methods.

Firstly, TrojDRL is introduced as a tool for exploring and evaluating backdoor attacks on deep reinforcement learning agents. TrojDRL exploits the sequential nature of deep reinforcement learning (DRL) and considers various threat model gradations. It introduces untargeted attacks on state-of-the-art actor-critic policy networks that can circumvent existing defenses built on the assumption that backdoors are targeted. TrojDRL shows that the attacks require only as little as 0.025% poisoning of the training data. Compared with existing works of backdoor attacks on classification models, this tool is a pioneering effort toward understanding the vulnerability of DRL agents.

Secondly, this thesis presents MISA, a new online detection approach for Trojan triggers present in neural networks at inference time after the deployment of the model. MISA utilizes misattributions to capture the anomalous manifestation of a Trojan activation in the feature attribution space. It first computes the input's attribution on different features and then statistically analyzes these attributions to ascertain the presence of a Trojan trigger. Across a set of benchmarks, MISA can effectively detect Trojan triggers for a wide variety of trigger patterns, achieving 96% AUC for detecting Trojan-triggered images without any assumptions on the trigger pattern.

Lastly, this thesis studies the robustness of feature attribution methods for deep neural networks. It challenges the current notion of attributional robustness that largely ignores the difference in the model's outputs and introduces a new evaluation framework. This involves defining similar inputs in a different way than existing methods do and introducing a novel method based on generative adversarial networks to generate these inputs, leading to a different definition of attributional robustness. The new robustness metric is comprehensively evaluated against existing metrics and state-of-the-art attribution methods. The findings highlight the need for a more objective metric that reveals the weaknesses of an attribution method rather than that of the neural network, thus providing a more accurate evaluation of the robustness of attribution methods.

Location:
PHO 339