Reinforcement learning has the potential to improve classical control design methods in numerous applications. However, tracking control is still a challenge. Varying the target over time can cause the learning process to fail since the agent is unable to discern between trajectory and system dynamics. Only if the control target is assumed to be constant, a value function can be constructed.
To solve this problem we propose to manipulate the state-action-reward-state tuples for training to simulate a constant target within each tuple. We further demonstrate that this mechanism can be used to move exploration noise to the trajectory. We successfully apply the presented reinforcement learning algorithm to speed control with varying setpoints both to a simulation model and a real-world road vehicle.