Using Inaccurate Models in Reinforcement Learning Pieter Abbeel, Morgan Quigley and Andrew Y. Ng Stanford University.
7. Experiments 6. Theoretical Guarantees Let the local policy improvement algorithm be policy gradient. Notes: These assumptions are insufficient to give.
Using Inaccurate Models in Reinforcement Learning