A Brief Systematization of Explanation-Aware Attacks

Noppel, Maximilian; Wressnegger, Christian

doi:10.1007/978-3-031-70893-0_30

A Brief Systematization of Explanation-Aware Attacks

Noppel, Maximilian

; Wressnegger, Christian

Abstract:

Due to the overabundance of trained parameters modern
machine learning models are largely considered black boxes. Explanation
methods aim to shed light on the inner working of such models, and, thus
can serve as debugging tools. However, recent research has demonstrated
that carefully crafted manipulations at the input or the model can suc-
cessfully fool the model and the explanation method. In this work, we
briefly present our systematization of such explanation-aware attacks.
We categorize them according to three distinct attack types, three types
of scopes, and three different capabilities an adversary can have. In our
full paper [12], we further present a hierarchy of robustness notion and
various defensive techniques tailored toward explanation-aware attacks.

Zugehörige Institution(en) am KIT	Fakultät für Informatik (INFORMATIK)
Publikationstyp	Proceedingsbeitrag
Publikationsjahr	2024
Sprache	Englisch
Identifikator	ISBN: 978-3-031-70893-0 ISSN: 0302-9743 KITopen-ID: 1000176622
HGF-Programm	46.23.01 (POF IV, LK 01) Methods for Engineering Secure Systems
Erschienen in	KI 2024: Advances in Artificial Intelligence – 47th German Conference on AI, Würzburg, Germany, September 25–27, 2024, Proceedings. Ed.: A. Hotho
Veranstaltung	47th German Conference on AI (2024), Würzburg, Deutschland, 25.09.2024 – 27.09.2024
Verlag	Springer Nature Switzerland
Seiten	350–354
Serie	Lecture Notes in Artificial Intelligence ; 14992
Vorab online veröffentlicht am	30.08.2024
Nachgewiesen in	Dimensions OpenAlex

Externe Links

Originalveröffentlichung
DOI: 10.1007/978-3-031-70893-0_30

Dimensions

Export

Statistiken

Seitenaufrufe: 27
seit 26.11.2024

Repository KITopen

A Brief Systematization of Explanation-Aware Attacks

Abstract: