Partially Observable Markov Decision Processes (POMDPs)

57
Partially Observable Markov Decision Processes (POMDPs) Sachin Patil Guest Lecture: CS287 Advanced Robotics Slides adapted from Pieter Abbeel, Alex Lee

Transcript of Partially Observable Markov Decision Processes (POMDPs)

Page 1: Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs)

Sachin Patil

Guest Lecture: CS287 Advanced Robotics Slides adapted from Pieter Abbeel, Alex Lee

Page 2: Partially Observable Markov Decision Processes (POMDPs)

n  Introduction to POMDPs

n  Locally Optimal Solutions for POMDPs

n  Trajectory Optimization in (Gaussian) Belief Space

n  Accounting for Discontinuities in Sensing Domains

n  Separation Principle

Outline

Page 3: Partially Observable Markov Decision Processes (POMDPs)

Markov Decision Process (S, A, H, T, R)

Given

n  S: set of states

n  A: set of actions

n  H: horizon over which the agent will act

n  T: S x A x S x {0,1,…,H} à [0,1] , Tt(s,a,s’) = P(st+1 = s’ | st = s, at =a)

n  R: S x A x S x {0, 1, …, H} à , Rt(s,a,s’) = reward for (st+1 = s’, st = s, at =a)

Goal:

n  Find : S x {0, 1, …, H} à A that maximizes expected sum of rewards, i.e., π

Page 4: Partially Observable Markov Decision Processes (POMDPs)

= MDP

BUT

don’t get to observe the state itself, instead get sensory measurements

Now: what action to take given current probability distribution rather than given current state.

POMDP – Partially Observable MDP

Page 5: Partially Observable Markov Decision Processes (POMDPs)

POMDPs: Tiger Example

Page 6: Partially Observable Markov Decision Processes (POMDPs)

Belief State n  Probability of S0 vs S1 being true underlying state

n  Initial belief state: p(S0)=p(S1)=0.5

n  Upon listening, the belief state should change according to the Bayesian update (filtering)

TL TR

Page 7: Partially Observable Markov Decision Processes (POMDPs)

Policy – Tiger Example n  Policy π is a map from [0,1] → {listen, open-left, open-right}

n  What should the policy be?

n  Roughly: listen until sure, then open

n  But where are the cutoffs?

Page 8: Partially Observable Markov Decision Processes (POMDPs)

n  Canonical solution method 1: Continuous state “belief MDP”

n  Run value iteration, but now the state space is the space of probability distributions

n  à value and optimal action for every possible probability distribution

n  à will automatically trade off information gathering actions versus actions that affect the underlying state

n  Value iteration updates cannot be carried out because uncountable number of belief states – approximation

Solving POMDPs

Page 9: Partially Observable Markov Decision Processes (POMDPs)

n  Canonical solution method 2:

n  Search over sequences of actions with limited look-ahead

n  Branching over actions and observations

Solving POMDPs

Finite horizon: nodes

Page 10: Partially Observable Markov Decision Processes (POMDPs)

n  Approximate solution: becoming tractable for |S| in millions

n  α-vector point-based techniques

n  Monte Carlo Tree Search

n  …Beyond scope of course…

Solving POMDPs

Page 11: Partially Observable Markov Decision Processes (POMDPs)

n  Canonical solution method 3:

n  Plan in the MDP

n  Probabilistic inference (filtering) to track probability distribution

n  Choose optimal action for MDP for currently most likely state

Solving POMDPs

Page 12: Partially Observable Markov Decision Processes (POMDPs)

n  Introduction to POMDPs

n  Locally Optimal Solutions for POMDPs

n  Trajectory Optimization in (Gaussian) Belief Space

n  Accounting for Discontinuities in Sensing Domains

n  Separation Principle

Outline

Page 13: Partially Observable Markov Decision Processes (POMDPs)

Facilitate reliable operation of cost-effective robots that use:

n  Imprecise actuation mechanisms – serial elastic actuators, cables

n  Inaccurate encoders and sensors – gyros, accelerometers

Motivation

Cable-driven 7-DOF arms Perception

(stereo, depth) Motors connected

to joints using cables

Page 14: Partially Observable Markov Decision Processes (POMDPs)

Continuous state/action/observation spaces

Motivation

Cable-driven 7-DOF arms Perception

(stereo, depth) Motors connected

to joints using cables

Page 15: Partially Observable Markov Decision Processes (POMDPs)

Model Uncertainty As Gaussians

Start

Uncertainty parameterized by mean and covariance

Start

Page 16: Partially Observable Markov Decision Processes (POMDPs)

Dark-Light Domain

State space plan

start

goal

Problem Setup

[Example from Platt, Tedrake, Kaelbling, Lozano-Perez, 2010]

Page 17: Partially Observable Markov Decision Processes (POMDPs)

Dark-Light Domain

start

goal

Problem Setup Belief space plan

Tradeoff information gathering vs. actions

Page 18: Partially Observable Markov Decision Processes (POMDPs)

Problem Setup

n  Stochastic motion and observation Model

n  Non-linear

n  User-defined objective / cost function

n  Plan trajectory that minimizes expected cost

Page 19: Partially Observable Markov Decision Processes (POMDPs)

Locally Optimal Solutions

n  Belief is Gaussian

n 

n  Belief dynamics – Bayesian filter

n  [X] Kalman Filter

(underlying state space) (belief space)

Page 20: Partially Observable Markov Decision Processes (POMDPs)

State Space – Trajectory Optimization

Page 21: Partially Observable Markov Decision Processes (POMDPs)

(Gaussian) Belief Space Planning

             

minµ,Σ,u

H�

t=0

c(µt,Σt, ut)

s.t. (µt+1,Σt+1) = xKF (µt,Σt, ut, wt, vt)

µH = goal

u ∈ U

Page 22: Partially Observable Markov Decision Processes (POMDPs)

(Gaussian) Belief Space Planning

             

= maximum likelihood assumption for observations Can now be solved by Sequential Convex Programming [Platt et al., 2010; also Roy et al ; van den Berg et al. 2011, 2012]

minµ,Σ,u

H�

t=0

c(µt,Σt, ut)

s.t. (µt+1,Σt+1) = xKF (µt,Σt, ut, 0, 0)

µH = goal

u ∈ U Obstacles?

Page 23: Partially Observable Markov Decision Processes (POMDPs)

Dark-Light Domain

start

goal

Problem Setup Belief space plan

Tradeoff information gathering vs. actions

Page 24: Partially Observable Markov Decision Processes (POMDPs)

n  Prior work approximates robot geometry as points or spheres

n  Articulated robots cannot be approximated as points/spheres

n  Gaussian noise in joint space

n  Need probabilistic collision avoidance w.r.t robot links

Collision Avoidance

Van den Berg et al.

Page 25: Partially Observable Markov Decision Processes (POMDPs)

n  Definition: Convex hull of a robot link transformed (in joint space) according to sigma points

n  Consider sigma points lying on the !-standard deviation contour of uncertainty covariance (UKF)

Sigma Hulls

Page 26: Partially Observable Markov Decision Processes (POMDPs)

Collision Avoidance Constraint Consider signed distance between obstacle and sigma hulls

Page 27: Partially Observable Markov Decision Processes (POMDPs)

n  Gaussian belief state in joint space: #↓% =[█■)↓% @Σ↓%  ]

n  Optimization problem:

Variables:

Belief space planning using trajectory optimization

mean covariance

Belief dynamics (UKF) Probabilistic collision avoidance Reach desired end-effector pose Control inputs are feasible

Page 28: Partially Observable Markov Decision Processes (POMDPs)

n  Robot trajectory should stay at least distance from other objects

Collision avoidance constraint

Page 29: Partially Observable Markov Decision Processes (POMDPs)

n  Robot trajectory should stay at least distance from other objects

n  Linearize signed distance at current belief

Collision avoidance constraint

Page 30: Partially Observable Markov Decision Processes (POMDPs)

n  Robot trajectory should stay at least distance from other objects

n  Linearize signed distance at current belief

n  Consider the closest point lies on a face spanned by vertices

Collision avoidance constraint

Page 31: Partially Observable Markov Decision Processes (POMDPs)

n  Discrete collision avoidance can lead to trajectories that collide with obstacles in between time steps

n  Use convex hull of sigma hulls between consecutive time steps

n  Advantages:

n  Solutions are collision-free in between time-steps

n  Discretized trajectory can have less time-steps

Continuous Collision Avoidance Constraint

Page 32: Partially Observable Markov Decision Processes (POMDPs)

n  During execution, update the belief state based on the actual observation

n  Re-plan after every belief state update

n  Effective feedback control, provided one can re-plan sufficiently fast

Model Predictive Control (MPC)

Page 33: Partially Observable Markov Decision Processes (POMDPs)

State space trajectory

Example: 4-DOF planar robot

Page 34: Partially Observable Markov Decision Processes (POMDPs)

1-standard deviation belief space trajectory

Example: 4-DOF planar robot

Page 35: Partially Observable Markov Decision Processes (POMDPs)

4-standard deviation belief space trajectory

Example: 4-DOF planar robot

Page 36: Partially Observable Markov Decision Processes (POMDPs)

Probability of collision

Experiments: 4-DOF planar robot

Page 37: Partially Observable Markov Decision Processes (POMDPs)

Mean distance from target

Experiments: 4-DOF planar robot

Page 38: Partially Observable Markov Decision Processes (POMDPs)

n  Efficient trajectory optimization in Gaussian belief spaces to reduce task uncertainty

n  Prior work approximates robot geometry as a point or a single sphere

n  Pose collision constraints using signed distance between sigma hulls of robot links and obstacles

n  Sigma hulls never explicitly computed – fast convex collision checking and analytical gradients

n  Iterative re-planning in belief space (MPC)

Take-Away

Page 39: Partially Observable Markov Decision Processes (POMDPs)

n  Introduction to POMDPs

n  Locally Optimal Solutions for POMDPs

n  Trajectory Optimization in (Gaussian) Belief Space

n  Accounting for Discontinuities in Sensing Domains

n  Separation Principle

Outline

Page 40: Partially Observable Markov Decision Processes (POMDPs)

Discontinuities in Sensing Domains

Zero gradient, hence local optimum

start

goal

“dark” “light”

Patil et al., under review

Page 41: Partially Observable Markov Decision Processes (POMDPs)

Increasing difficulty

≈  Noise level determined by signed distance to sensing region * homotopy iteration

Discontinuities in Sensing Domains

Page 42: Partially Observable Markov Decision Processes (POMDPs)

Signed Distance to Sensing Discontinuity

Field of view (FOV) discontinuity

Occlusion discontinuity

Page 43: Partially Observable Markov Decision Processes (POMDPs)

vs. Signed distance

Page 44: Partially Observable Markov Decision Processes (POMDPs)

Modified Belief Dynamics

: Binary variable {0,1}  0 -> No measurement 1 -> Measurement

Page 45: Partially Observable Markov Decision Processes (POMDPs)

Incorporating in SQP

n  Binary non-convex program – difficult to solve

n  Solve successively smooth approximations

Page 46: Partially Observable Markov Decision Processes (POMDPs)

Algorithm Overview

n  While δ not within desired tolerance

n  Solve optimization problem with current value of α

n  Increase α

n  Re-integrate belief trajectory

n  Update δ

Page 47: Partially Observable Markov Decision Processes (POMDPs)

Increasing difficulty

≈  Noise level determined by signed distance to sensing region * homotopy iteration

Discontinuities in Sensing Domains

Page 48: Partially Observable Markov Decision Processes (POMDPs)

“No measurement” Belief Update

Truncate Gaussian Belief if no measurement obtained

Page 49: Partially Observable Markov Decision Processes (POMDPs)

Without “No measurement” Belief Update

With “No measurement” Belief Update

Effect of Truncation

Page 50: Partially Observable Markov Decision Processes (POMDPs)

Experiments

Page 51: Partially Observable Markov Decision Processes (POMDPs)

Car and Landmarks (Active Exploration)

Page 52: Partially Observable Markov Decision Processes (POMDPs)

Arm Occluding (Static) Camera

Initial belief State space plan execution

(way-point) (end) Belief space plan execution

Page 53: Partially Observable Markov Decision Processes (POMDPs)

Arm Occluding (Moving) Camera

Initial belief State space plan execution

(way-point) (end) Belief space plan execution

Page 54: Partially Observable Markov Decision Processes (POMDPs)

n  Introduction to POMDPs

n  Locally Optimal Solutions for POMDPs

n  Trajectory Optimization in (Gaussian) Belief Space

n  Accounting for Discontinuities in Sensing Domains

n  Separation Principle

Outline

Page 55: Partially Observable Markov Decision Processes (POMDPs)

n  Assume:

n  Goal:

n  Then, optimal control policy consists of:

1. Offline/Ahead of time: Run LQR to find optimal control policy for fully observed case, which gives sequence of feedback matrices

2. Online: Run Kalman filter to estimate state, and apply control

xt+1 = Axt +But + wt wt ∼ N (0, Qt)

zt = Cxt + vt vt ∼ N (0, Rt)

minimize E

�H�

t=0

u�tUtut + x�

tXtxt

K1,K2, . . .

ut = Ktµt|0:t

Separation Principle

Page 56: Partially Observable Markov Decision Processes (POMDPs)

Extensions

n  Current research directions

n  Fast! belief space planning

n  Multi-modal belief spaces

n  Physical experiments with the Raven surgical robot

Page 57: Partially Observable Markov Decision Processes (POMDPs)

Recap

n  POMDP = MDP but sensory measurements instead of exact state knowledge

n  Locally optimal solutions in Gaussian belief spaces

n  Augmented state vector (mean, covariance)

n  Trajectory optimization

n  Sigma Hulls for probabilistic collision avoidance

n  Homotopy methods for dealing with discontinuities in sensing domains