Target Value Criterion in Markov Decision Processes

Moritz, Lars Norman

We apply the Target Value Criterion to an MDP with a random planning horizon, derive an optimality equation and prove the existence of an optimal stationary policy in a generalized state space. The structure of the value function is exploited to approximate the target space by a finite subset and to derive upper and lower bounds as well as nearly optimal policies. As an extension we combine the Total Reward Criterion and the Target Value Criterion in a penalty approach.

Zugehörige Institution(en) am KIT Institut für Operations Research (IOR)
Publikationstyp Hochschulschrift
Jahr 2014
Sprache Englisch
Identifikator URN: urn:nbn:de:swb:90-472888
KITopen ID: 1000047288
Verlag Karlsruhe
Abschlussart Dissertation
Fakultät Fakultät für Wirtschaftswissenschaften (WIWI)
Institut Institut für Operations Research (IOR)
Prüfungsdaten 04.12.2014
Referent/Betreuer Prof. K.-H. Waldmann
Schlagworte Markov Decision Processes, Risk Sensitive Optimization, Target Value Criterion
