Reinforcement Learning CSE 446 – Winter 2012
Summary of MDPs (until Now) Finite-horizon MDPs – Non-stationary policy – Value iteration Compute V 0..V k.. V T the value functions for k stages to go.
Summary of MDPs (until Now)