Model-Free Trajectory-based Policy Optimization with Monotonic Improvement

Akrour, R.; Abdolmaleki, A.; Abdulsamad, H.; Peters, J.; Neumann, Gerhard

Model-Free Trajectory-based Policy Optimization with Monotonic Improvement

Akrour, R.; Abdolmaleki, A.; Abdulsamad, H.; Peters, J.; Neumann, Gerhard

Abstract:

Many of the recent trajectory optimization algorithms alternate between linear approximation of the system dynamics around the mean trajectory and conservative policy update. One way of constraining the policy change is by bounding the Kullback-Leibler (KL) divergence between successive policies. These approaches already demonstrated great experimental success in challenging problems such as end-to-end control of physical systems. However, the linear approximation of the system dynamics can introduce a bias in the policy update and prevent convergence to the optimal policy. In this article, we propose a new model-free trajectory-based policy optimization algorithm with guaranteed monotonic improvement. The algorithm backpropagates a local, quadratic and time-dependent Q-Function learned from trajectory data instead of a model of the system dynamics. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics. We experimentally demonstrate on highly non-linear control tasks the improvement in performance of our algorithm in comparison to approaches linearizing the system dynamics. In order to show the monotonic improvement of our algorithm, we additionally conduct a theoretical analysis of our policy update scheme to derive a lower bound of the change in policy return between successive iterations.

Zugehörige Institution(en) am KIT	Institut für Anthropomatik und Robotik (IAR)
Publikationstyp	Zeitschriftenaufsatz
Publikationsjahr	2018
Sprache	Englisch
Identifikator	ISSN: 1532-4435, 1533-7928 KITopen-ID: 1000118268
Erschienen in	Journal of machine learning research
Verlag	Journal of Machine Learning Research
Band	19
Heft	14
Seiten	1–25
Externe Relationen	Abstract/Volltext
Schlagwörter	Reinforcement Learning, Policy Optimization, Trajectory Optimization, Robotics
Nachgewiesen in	Scopus Web of Science

KITopen-Download

Verlagsausgabe

DOI: 10.5445/IR/1000118268

Veröffentlicht am 15.04.2020

Externe Links

Scopus
Zitationen: 13

Web of Science
Zitationen: 12

Export

Statistiken

Seitenaufrufe: 155
seit 15.04.2020

Downloads: 157
seit 16.04.2020

Repository KITopen

Model-Free Trajectory-based Policy Optimization with Monotonic Improvement

Abstract: