Recently proposed Adaptive Dynamic Programming (ADP) tracking controllers assume that the reference trajectory follows time-invariant exo-system dynamics—an assumption that does not hold for many applications. In order to overcome this limitation, we propose a new Q-function which explicitly incorporates a parametrized approximation of the reference trajectory. This allows learning to track a general class of trajectories by means of ADP. Once our Q-function has been learned, the associated controller handles time-varying reference trajectories without the need for further training and independent of exo-system dynamics. ... mehrAfter proposing this general model-free off-policy tracking method, we provide an analysis of the important special case of linear quadratic tracking. An example demonstrates that our new method successfully learns the optimal tracking controller and outperforms existing approaches in terms of tracking error and cost.
Model-free control based on the idea of Reinforcement Learning is a promising approach that has recently gained extensive attention. However, Reinforcement-Learning-based control methods solely focus on the regulation problem or learn to track a reference that is generated by a time-invariant exo-system. In the latter case, controllers are only able to track the time-invariant reference dynamics which they have been trained on and need to be re-trained each time the reference dynamics change. Consequently, these methods fail in a number of applications which obviously rely on a trajectory not being generated by an exo-system. ... mehrOne prominent example is autonomous driving. This paper provides for the first time an adaptive optimal control method capable to track arbitrary reference trajectories that are provided on a moving horizon. The main innovation is a novel Q-function that directly incorporates the given reference trajectory. This new Q-function exhibits a particular structure which allows the design of an efficient, iterative, provably convergent Reinforcement Learning algorithm that enables optimal tracking. Two real-world examples demonstrate the effectiveness of our new method.