KIT | KIT-Bibliothek | Impressum | Datenschutz

A survey of preference-based reinforcement learning methods

Wirth, Christian; Akrour, Riad; Neumann, Gerhard; Fürnkranz, Johannes

Abstract:
Reinforcement learning (RL) techniques optimize the accumulated long-term reward of a suitably chosen reward function. However, designing such a reward function of ten requires a lot of task-specific prior knowledge. The designer needs to consider different objectives that do not only influence the learned behavior but also the learning progress. To alleviate these issues, preference-based reinforcement learning algorithms (PbRL) have been proposed that can directly learn from an expert's preferences instead of a hand-designed numeric reward. PbRL has gained traction in recent years due to its ability to resolve the reward shaping problem, its ability to learn from non numeric rewards and the possibility to reduce the dependence on expert knowledge. We provide a unified framework for PbRL that describes the task formally and points out the different design principles that affect the evaluation task for the human as well as the computational complexity. The design principles include the type of feedback that is assumed, the representation that is learned to capture the preferences, the optimization problem that has to be solved as well as how the exploration/exploitation problem is tackled. ... mehr

Open Access Logo


Verlagsausgabe §
DOI: 10.5445/IR/1000118270
Veröffentlicht am 21.04.2020
Scopus
Zitationen: 23
Web of Science
Zitationen: 7
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Anthropomatik und Robotik (IAR)
Publikationstyp Zeitschriftenaufsatz
Publikationsjahr 2017
Sprache Englisch
Identifikator ISSN: 1532-4435, 1533-7928
KITopen-ID: 1000118270
Erschienen in Journal of machine learning research
Band 18
Heft 136
Seiten 1–46
Externe Relationen Abstract/Volltext
Schlagwörter Reinforcement Learning, Preference Learning, Qualitative Feedback, Markov Decision Process, Policy Search, Temporal Difference Learning, Preference-based Reinforcement Learning
Nachgewiesen in Web of Science
Scopus
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page