KIT | KIT-Bibliothek | Impressum | Datenschutz

A Brief Systematization of Explanation-Aware Attacks

Noppel, Maximilian ORCID iD icon; Wressnegger, Christian ORCID iD icon

Abstract:

Due to the overabundance of trained parameters modern
machine learning models are largely considered black boxes. Explanation
methods aim to shed light on the inner working of such models, and, thus
can serve as debugging tools. However, recent research has demonstrated
that carefully crafted manipulations at the input or the model can suc-
cessfully fool the model and the explanation method. In this work, we
briefly present our systematization of such explanation-aware attacks.
We categorize them according to three distinct attack types, three types
of scopes, and three different capabilities an adversary can have. In our
full paper [12], we further present a hierarchy of robustness notion and
various defensive techniques tailored toward explanation-aware attacks.


Originalveröffentlichung
DOI: 10.1007/978-3-031-70893-0_30
Zugehörige Institution(en) am KIT Fakultät für Informatik (INFORMATIK)
Publikationstyp Proceedingsbeitrag
Publikationsjahr 2024
Sprache Englisch
Identifikator ISBN: 978-3-031-70893-0
ISSN: 0302-9743
KITopen-ID: 1000176622
HGF-Programm 46.23.01 (POF IV, LK 01) Methods for Engineering Secure Systems
Erschienen in KI 2024: Advances in Artificial Intelligence – 47th German Conference on AI, Würzburg, Germany, September 25–27, 2024, Proceedings. Ed.: A. Hotho
Veranstaltung 47th German Conference on AI (2024), Würzburg, Deutschland, 25.09.2024 – 27.09.2024
Verlag Springer Nature Switzerland
Seiten 350–354
Serie Lecture Notes in Artificial Intelligence ; 14992
Vorab online veröffentlicht am 30.08.2024
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page