KIT | KIT-Bibliothek | Impressum | Datenschutz

Versatile Inverse Reinforcement Learning via Cumulative Rewards

Freymuth, Niklas; Becker, Philipp; Neumann, Gerhard

Abstract:

Inverse Reinforcement Learning infers a reward function from expert demonstrations, aiming to encode the behavior and intentions of the expert. Current approaches usually do this with generative and uni-modal models, meaning that they encode a single behavior. In the common setting, where there are various solutions to a problem and the experts show versatile behavior this severely limits the generalization capabilities of these methods. We propose a novel method for Inverse Reinforcement Learning that overcomes these problems by formulating the recovered reward as a sum of iteratively trained discriminators. We show on simulated tasks that our approach is able to recover general, high-quality reward functions and produces policies of the same quality as behavioral cloning approaches designed for versatile behavior.


Postprint §
DOI: 10.5445/IR/1000140287
Veröffentlicht am 10.12.2021
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Anthropomatik und Robotik (IAR)
Publikationstyp Proceedingsbeitrag
Publikationsdatum 14.12.2021
Sprache Englisch
Identifikator KITopen-ID: 1000140287
Erschienen in NeurIPS 2021 Workshop on Robot Learning: Self-Supervised and Lifelong Learning, Virtual
Veranstaltung 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Online, 06.12.2021 – 14.12.2021
Bemerkung zur Veröffentlichung Workshop on Robot Learning: Self-Supervised and Lifelong Learning. 6.12.2021
Nachgewiesen in arXiv
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page