KIT | KIT-Bibliothek | Impressum | Datenschutz

Versatile Inverse Reinforcement Learning via Cumulative Rewards

Freymuth, Niklas; Becker, Philipp; Neumann, Gerhard

Abstract:

Inverse Reinforcement Learning infers a reward function from expert demonstrations, aiming to encode the behavior and intentions of the expert. Current approaches usually do this with generative and uni-modal models, meaning that they encode a single behavior. In the common setting, where there are various solutions to a problem and the experts show versatile behavior this severely limits the generalization capabilities of these methods. We propose a novel method for Inverse Reinforcement Learning that overcomes these problems by formulating the recovered reward as a sum of iteratively trained discriminators. We show on simulated tasks that our approach is able to recover general, high-quality reward functions and produces policies of the same quality as behavioral cloning approaches designed for versatile behavior.

Zugehörige Institution(en) am KIT Institut für Anthropomatik und Robotik (IAR)
Publikationstyp Proceedingsbeitrag
Publikationsdatum 14.12.2021
Sprache Englisch
Identifikator KITopen-ID: 1000140287
Erschienen in NeurIPS 2021 Workshop on Robot Learning: Self-Supervised and Lifelong Learning, Virtual
Veranstaltung 35th Annual Conference on Neural Information Processing Systems (NIPS 2021), Online, 06.12.2021 – 14.12.2021
Bemerkung zur Veröffentlichung Workshop on Robot Learning: Self-Supervised and Lifelong Learning. 6.12.2021
Nachgewiesen in arXiv

Postprint §
DOI: 10.5445/IR/1000140287
Veröffentlicht am 10.12.2021
Seitenaufrufe: 155
seit 25.11.2021
Downloads: 109
seit 11.12.2021
Cover der Publikation
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page