Reinforcement Learning provides a way to learn optimal controllers but struggles when the state is only partially observed. This is the case with the actuator dynamics in vehicle longitudinal control that translate a controller output into acceleration. We propose a structure for approximating the value function that exploits this property. It allows the critic to learn an observer in the form of a finite impulse response filter and reduces the variance in temporal difference learning. We show that our approach learns faster and with higher precision than an agent that ignores the unobserved states. This is still the case even when using a realistic simulation model with disturbances for the longitudinal dynamics of a road vehicle.