KIT | KIT-Bibliothek | Impressum | Datenschutz

Interpretable World Model Imaginations as Deep Reinforcement Learning Explanation

Wenninghoff, Nils ORCID iD icon 1; Schwammberger, Maike ORCID iD icon 1
1 Institut für Informationssicherheit und Verlässlichkeit (KASTEL), Karlsruher Institut für Technologie (KIT)

Abstract:

Explainable Deep Reinforcement Learning aims to clarify the decision-making processes of agents. Recent world model-based approaches, such as Dreamer, train agents through “imagination,” where the actor learns by interacting with a learned world model that simulates the environment. Consequently, the overall performance of these systems depends not only on the learned actor but also on the fidelity of the world model’s representation. Effective explanations should, therefore, incorporate the learned dynamics of the environment.

In this work, we propose a method that leverages the imagination technique from the training process to generate stepwise, contrastive explanations during inference. Our approach systematically compares predicted states, actions, and value and reward estimates to evaluate the observed trajectory. This analysis provides insights into whether failures arise from inaccuracies in the world model, errors in value estimation, or deficiencies in reward prediction. We demonstrate the effectiveness of our method across multiple goal-oriented tasks.


Verlagsausgabe §
DOI: 10.5445/IR/1000186426
Veröffentlicht am 04.11.2025
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Informationssicherheit und Verlässlichkeit (KASTEL)
Publikationstyp Buchaufsatz
Publikationsjahr 2026
Sprache Englisch
Identifikator ISBN: 978-3-032-08327-2
ISSN: 1865-0929
KITopen-ID: 1000186426
Erschienen in Explainable Artificial Intelligence – Third World Conference, xAI 2025, Istanbul, Turkey, July 9–11, 2025, Proceedings, Part III. Ed.: R. Guidotti
Verlag Springer Nature Switzerland
Seiten 140–161
Serie Communications in Computer and Information Science
Vorab online veröffentlicht am 12.10.2025
Schlagwörter Explainable Deep Reinforcement Learning,Reinforcement Learning,Explainability,Contrastive Explanation,World Model
Nachgewiesen in Scopus
Dimensions
OpenAlex
Globale Ziele für nachhaltige Entwicklung Ziel 10 – Weniger UngleichheitenZiel 16 – Frieden, Gerechtigkeit und starke Institutionen
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page