Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with...

Reinforcement Learningand Motion Planning

Mrinal KalakrishnanUniversity of Southern California

August 25, 2010

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Reinforcement Learning

I Holy grail of learning for roboticsI Curse of dimensionality. . .

I Trajectory-based RLI High dimensionsI Continuous states and actionsI State-of-the-art: Policy Improvement

with Path Integrals - Theodorou etal., 2010

Reinforcement Learning

I Holy grail of learning for roboticsI Curse of dimensionality. . .

I Trajectory-based RLI High dimensionsI Continuous states and actionsI State-of-the-art: Policy Improvement

with Path Integrals - Theodorou etal., 2010

Motion Planning

I Sampling-based plannersI Solve very difficult problemsI Jerky paths, require smoothingI Feasible paths, not optimal

I Optimization-based plannersI CHOMP (Ratliff et al., 2009)I Covariant gradient descentI Smooth trajectoriesI Solves “easy” problemsI Local minima

Motion Planning

I Sampling-based plannersI Solve very difficult problemsI Jerky paths, require smoothingI Feasible paths, not optimal

I Optimization-based plannersI CHOMP (Ratliff et al., 2009)I Covariant gradient descentI Smooth trajectoriesI Solves “easy” problemsI Local minima

Our New Motion Planner

Apply PI2 to motion planning

I Create a new policy: x = K (u−x)I Control command u(t) = state x(t +1)I Quadratic control cost: uTRuI R = ATA:

I A is an acceleration differentiation matrixI R measures squared accelerations

I Cost = control cost + state costsI State costs can include:

I Collision costI Energy costI Constraint violation costI Need not be differentiable!

Collision cost

I Distance field / distance transformI Answers clearance and penetration

depth queriesI Voxelize robot body and add up

costs for each voxel

Collision cost

I Distance field / distance transformI Answers clearance and penetration

depth queriesI Voxelize robot body and add up

costs for each voxel

The algorithm

I Generate initial straight-line trajectoryI Repeat until convergence:

I Create noisy rollouts around the trajectoryNoise does not modify start or goal due to Σ= R−1!

I Compute costs for each rolloutI Apply PI2 update: reward-weighted average

The algorithm

Initial trajectory

The algorithm

Noisy rollout

The algorithm

Noisy rollout

The algorithm

Noisy rollout

The algorithm

Noisy rollout

The algorithm

Noisy rollout

The algorithm

Noisy rollout

The algorithm

Updated trajectory

Video: Pole

Updated trajectory

Video: Test Setup

Test Results

Condition Success rate

Unconstrained 39 / 42

Constrained 38 / 42

Video: Real-world

Conclusion

I Optimization-based motion planner that does not requiregradients

I Generates collision-free, smooth trajectoriesI Optimizes arbitrary secondary criteria (constraints,

torques)I May handle local minima better than CHOMP (needs

further testing)I ICRA 2011 submission pendingI Code is in the optimization_motion_planning

package, coming soon to a sandbox near you. . .

Future Work

I Torque optimalityI Trajectory libraries, cached plans

Thanks:I Sachin ChittaI Peter PastorI Willow Garage

Future Work

I Torque optimalityI Trajectory libraries, cached plans

Thanks:I Sachin ChittaI Peter PastorI Willow Garage

Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with...

Documents

Transcript of Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with...