Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with...

27
Reinforcement Learning and Motion Planning Mrinal Kalakrishnan University of Southern California August 25, 2010 Reinforcement Learning and Motion Planning Mrinal Kalakrishnan

Transcript of Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with...

Page 1: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Reinforcement Learningand Motion Planning

Mrinal KalakrishnanUniversity of Southern California

August 25, 2010

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 2: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Reinforcement Learning

I Holy grail of learning for roboticsI Curse of dimensionality. . .

I Trajectory-based RLI High dimensionsI Continuous states and actionsI State-of-the-art: Policy Improvement

with Path Integrals - Theodorou etal., 2010

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 3: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Reinforcement Learning

I Holy grail of learning for roboticsI Curse of dimensionality. . .

I Trajectory-based RLI High dimensionsI Continuous states and actionsI State-of-the-art: Policy Improvement

with Path Integrals - Theodorou etal., 2010

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 4: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Motion Planning

I Sampling-based plannersI Solve very difficult problemsI Jerky paths, require smoothingI Feasible paths, not optimal

I Optimization-based plannersI CHOMP (Ratliff et al., 2009)I Covariant gradient descentI Smooth trajectoriesI Solves “easy” problemsI Local minima

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 5: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Motion Planning

I Sampling-based plannersI Solve very difficult problemsI Jerky paths, require smoothingI Feasible paths, not optimal

I Optimization-based plannersI CHOMP (Ratliff et al., 2009)I Covariant gradient descentI Smooth trajectoriesI Solves “easy” problemsI Local minima

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 6: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Our New Motion Planner

Apply PI2 to motion planning

I Create a new policy: x = K (u−x)I Control command u(t) = state x(t +1)I Quadratic control cost: uTRuI R = ATA:

I A is an acceleration differentiation matrixI R measures squared accelerations

I Cost = control cost + state costsI State costs can include:

I Collision costI Energy costI Constraint violation costI Need not be differentiable!

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 7: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Our New Motion Planner

Apply PI2 to motion planning

I Create a new policy: x = K (u−x)I Control command u(t) = state x(t +1)I Quadratic control cost: uTRuI R = ATA:

I A is an acceleration differentiation matrixI R measures squared accelerations

I Cost = control cost + state costsI State costs can include:

I Collision costI Energy costI Constraint violation costI Need not be differentiable!

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 8: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Our New Motion Planner

Apply PI2 to motion planning

I Create a new policy: x = K (u−x)I Control command u(t) = state x(t +1)I Quadratic control cost: uTRuI R = ATA:

I A is an acceleration differentiation matrixI R measures squared accelerations

I Cost = control cost + state costsI State costs can include:

I Collision costI Energy costI Constraint violation costI Need not be differentiable!

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 9: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Our New Motion Planner

Apply PI2 to motion planning

I Create a new policy: x = K (u−x)I Control command u(t) = state x(t +1)I Quadratic control cost: uTRuI R = ATA:

I A is an acceleration differentiation matrixI R measures squared accelerations

I Cost = control cost + state costsI State costs can include:

I Collision costI Energy costI Constraint violation costI Need not be differentiable!

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 10: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Collision cost

I Distance field / distance transformI Answers clearance and penetration

depth queriesI Voxelize robot body and add up

costs for each voxel

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 11: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Collision cost

I Distance field / distance transformI Answers clearance and penetration

depth queriesI Voxelize robot body and add up

costs for each voxel

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 12: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

The algorithm

I Generate initial straight-line trajectoryI Repeat until convergence:

I Create noisy rollouts around the trajectoryNoise does not modify start or goal due to Σ= R−1!

I Compute costs for each rolloutI Apply PI2 update: reward-weighted average

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 13: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

The algorithm

Initial trajectory

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 14: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

The algorithm

Noisy rollout

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 15: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

The algorithm

Noisy rollout

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 16: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

The algorithm

Noisy rollout

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 17: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

The algorithm

Noisy rollout

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 18: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

The algorithm

Noisy rollout

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 19: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

The algorithm

Noisy rollout

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 20: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

The algorithm

Updated trajectory

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 21: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Video: Pole

Updated trajectory

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 22: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Video: Test Setup

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 23: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Test Results

Condition Success rate

Unconstrained 39 / 42

Constrained 38 / 42

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 24: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Video: Real-world

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 25: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Conclusion

I Optimization-based motion planner that does not requiregradients

I Generates collision-free, smooth trajectoriesI Optimizes arbitrary secondary criteria (constraints,

torques)I May handle local minima better than CHOMP (needs

further testing)I ICRA 2011 submission pendingI Code is in the optimization_motion_planning

package, coming soon to a sandbox near you. . .

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 26: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Future Work

I Torque optimalityI Trajectory libraries, cached plans

Thanks:I Sachin ChittaI Peter PastorI Willow Garage

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 27: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Future Work

I Torque optimalityI Trajectory libraries, cached plans

Thanks:I Sachin ChittaI Peter PastorI Willow Garage

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan