A survey of preference-based reinforcement learning methods

Wirth, Christian; Akrour, Riad; Neumann, Gerhard; Fürnkranz, Johannes

A survey of preference-based reinforcement learning methods

Wirth, Christian; Akrour, Riad; Neumann, Gerhard; Fürnkranz, Johannes

Abstract:

Reinforcement learning (RL) techniques optimize the accumulated long-term reward of a suitably chosen reward function. However, designing such a reward function of ten requires a lot of task-specific prior knowledge. The designer needs to consider different objectives that do not only influence the learned behavior but also the learning progress. To alleviate these issues, preference-based reinforcement learning algorithms (PbRL) have been proposed that can directly learn from an expert's preferences instead of a hand-designed numeric reward. PbRL has gained traction in recent years due to its ability to resolve the reward shaping problem, its ability to learn from non numeric rewards and the possibility to reduce the dependence on expert knowledge. We provide a unified framework for PbRL that describes the task formally and points out the different design principles that affect the evaluation task for the human as well as the computational complexity. The design principles include the type of feedback that is assumed, the representation that is learned to capture the preferences, the optimization problem that has to be solved as well as how the exploration/exploitation problem is tackled. ... mehr

KITopen-Download

Verlagsausgabe

DOI: 10.5445/IR/1000118270

Veröffentlicht am 21.04.2020

Externe Links

Scopus
Zitationen: 195

Web of Science
Zitationen: 97

Export

Statistiken

Seitenaufrufe: 171
seit 22.04.2020

Downloads: 95
seit 03.05.2020

Zugehörige Institution(en) am KIT	Institut für Anthropomatik und Robotik (IAR)
Publikationstyp	Zeitschriftenaufsatz
Publikationsjahr	2017
Sprache	Englisch
Identifikator	ISSN: 1532-4435, 1533-7928 KITopen-ID: 1000118270
Erschienen in	Journal of machine learning research
Verlag	Journal of Machine Learning Research
Band	18
Heft	136
Seiten	1–46
Externe Relationen	Abstract/Volltext
Schlagwörter	Reinforcement Learning, Preference Learning, Qualitative Feedback, Markov Decision Process, Policy Search, Temporal Difference Learning, Preference-based Reinforcement Learning
Nachgewiesen in	Scopus Web of Science

Repository KITopen

A survey of preference-based reinforcement learning methods

Abstract: