Uncertainty in Sensing (and action)

28
UNCERTAINTY IN SENSING (AND ACTION)

description

Uncertainty in Sensing (and action). Agenda. Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty. Belief State. - PowerPoint PPT Presentation

Transcript of Uncertainty in Sensing (and action)

Page 1: Uncertainty in Sensing (and action)

UNCERTAINTY INSENSING (AND ACTION)

Page 2: Uncertainty in Sensing (and action)

AGENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

Page 3: Uncertainty in Sensing (and action)

BELIEF STATE A belief state is the set of all states that an

agent think are possible at any given time or at any stage of planning a course of actions, e.g.:

To plan a course of actions, the agent searches a space of belief states, instead of a space of states

Page 4: Uncertainty in Sensing (and action)

SENSOR MODEL (DEFINITION #1) State space S The sensor model is a function

SENSE: S 2S

that maps each state s S to a belief state (the set of all states that the agent would think possible if it were actually observing state s)

Example: Assume our vacuum robot can perfectly sense the room it is in and if there is dust in it. But it can’t sense if there is dust in the other roomSENSE( ) =

SENSE( ) =

Page 5: Uncertainty in Sensing (and action)

SENSOR MODEL (DEFINITION #2) State space S, percept space P The sensor model is a function

SENSE: S Pthat maps each state s S to a percept (the percept that the agent would obtain if actually observing state s)

We can then define the set of states consistent with the observation P

CONSISTENT(P) = { s if SENSE(s)=P }SENSE( ) =

CONSISTENT( ) =

?

?

Page 6: Uncertainty in Sensing (and action)

VACUUM ROBOT ACTION AND SENSOR MODEL

RightAppl if sIn(R1){s1 = s - In(R1) + In(R2), s2 = s}

[Right does either the right thing, or nothing]

LeftAppl if sIn(R2){s1 = s - In(R2) + In(R1),

s2 = s - In(R2) + In(R1) - Clean(R2)}[Left always move the robot to R1, but it may occasionally deposit dust in R2]Suck(r)

Appl sIn(r){s1 = s+Clean(r)}

[Suck always does the right thing]

• The robot perfectly senses the room it is in and whether there is dust in it

• But it can’t sense if there is dust in the other room

State s : any logical conjunction of In(R1), In(R2), Clean(R1), Clean (R2)(notation: + adds an attribute, - removes an attribute)

Page 7: Uncertainty in Sensing (and action)

TRANSITION BETWEEN BELIEF STATES Suppose the robot is initially in state:

After sensing this state, its belief state is:

Just after executing Left, its belief state will be:

After sensing the new state, its belief state will be:

or if there is no dust in R1 if there is dust in R1

Page 8: Uncertainty in Sensing (and action)

TRANSITION BETWEEN BELIEF STATES Suppose the robot is initially in state:

After sensing this state, its belief state is:

Just after executing Left, its belief state will be:

After sensing the new state, its belief state will be:

or if there is no dust in R1 if there is dust in R1

Left

Clean(R1) Clean(R1)

Page 9: Uncertainty in Sensing (and action)

TRANSITION BETWEEN BELIEF STATES How do you propagate the action/sensing

operation to obtain the successors of a belief state?

Left

Clean(R1) Clean(R1)

Page 10: Uncertainty in Sensing (and action)

COMPUTING THE TRANSITION BETWEEN BELIEF STATES Given an action A, and a belief state S = {s1,

…,sn} Result of applying action, without sensing:

Take the union of all SUCC(si,A) for i=1,…,n This gives us a pre-sensing belief state S’

Possible percepts resulting from sensing: {SENSE(si’) for si’ in S’} (using SENSE definition

#2) This gives us a percept set P

Possible states both in S’ AND consistent with each possible percept pj in P: Sj = {si | SENSE(si’)=pj for si’ in S’}

i.e.,Sj = CONSISTENT(pj) ∩ S’

Page 11: Uncertainty in Sensing (and action)

AND/OR TREE OF BELIEF STATES

Left

Suck

Suck

goal

A goal belief state is one in which all states are goal states

An action is applicable to a belief state B if its precondition is achieved in all states in B

Right

loop goal

Page 12: Uncertainty in Sensing (and action)

PLANNING WITH PROBABILISTIC UNCERTAINTY IN SENSING

No motion

Perpendicular motion

Page 13: Uncertainty in Sensing (and action)

PARTIALLY OBSERVABLE MDPS Consider the MDP model with states sS, actions

aA Reward R(s) Transition model P(s’|s,a) Discount factor g

With sensing uncertainty, initial belief state is a probability distributions over state: b(s)b(si) 0 for all siS, i b(si) = 1

Observations are generated according to a sensor model Observation space oO Sensor model P(o|s)

Resulting problem is a Partially Observable Markov Decision Process (POMDP)

Page 14: Uncertainty in Sensing (and action)

POMDP UTILITY FUNCTION A policy p(b) is defined as a map from belief

states to actions Expected discounted reward with policy p:

Up(b) = E[t gt R(St)]

where St is the random variable indicating the state at time t

P(S0=s) = b0(s) P(S1=s) = ?

Page 15: Uncertainty in Sensing (and action)

POMDP UTILITY FUNCTION A policy p(b) is defined as a map from belief

states to actions Expected discounted reward with policy p:

Up(b) = E[t gt R(St)]

where St is the random variable indicating the state at time t

P(S0=s) = b0(s) P(S1=s) = P(s|p(b0),b0)

= s’ P(s|s’,p(b0)) P(S0=s’) = s’ P(s|s’,p(b0)) b0(s’)

Page 16: Uncertainty in Sensing (and action)

POMDP UTILITY FUNCTION A policy p(b) is defined as a map from belief

states to actions Expected discounted reward with policy p:

Up(b) = E[t gt R(St)]

where St is the random variable indicating the state at time t

P(S0=s) = b0(s) P(S1=s) = s’ P(s|s’,p(b)) b0(s’) P(S2=s) = ?

Page 17: Uncertainty in Sensing (and action)

POMDP UTILITY FUNCTION A policy p(b) is defined as a map from belief

states to actions Expected discounted reward with policy p:

Up(b) = E[t gt R(St)]

where St is the random variable indicating the state at time t

P(S0=s) = b0(s) P(S1=s) = s’ P(s|s’,p(b)) b0(s’) What belief states could the robot take on

after 1 step?

Page 18: Uncertainty in Sensing (and action)

b0

Predictb1(s)=s’ P(s|s’,p(b0)) b0(s’)

Choose action p(b0)

b1

Page 19: Uncertainty in Sensing (and action)

b0

oA oB oC oD

Predictb1(s)=s’ P(s|s’,p(b0)) b0(s’)

Choose action p(b0)

b1

Receiveobservation

Page 20: Uncertainty in Sensing (and action)

b0

P(oA|b1)

Predictb1(s)=s’ P(s|s’,p(b0)) b0(s’)

Choose action p(b0)

b1

Receiveobservation

b1,A

P(oB|b1) P(oC|b1) P(oD|b1)

b1,B b1,C b1,D

Page 21: Uncertainty in Sensing (and action)

b0

Predictb1(s)=s’ P(s|s’,p(b0)) b0(s’)

Choose action p(b0)

b1

Update belief

b1,A(s) = P(s|b1,oA)

P(oA|b1) P(oB|b1) P(oC|b1) P(oD|b1)Receiveobservation

b1,A b1,B b1,C b1,D

b1,B(s) = P(s|b1,oB)b1,C(s) = P(s|b1,oC)b1,D(s) = P(s|b1,oD)

Page 22: Uncertainty in Sensing (and action)

b0

Predictb1(s)=s’ P(s|s’,p(b0)) b0(s’)

Choose action p(b0)

b1

Update belief

P(oA|b1) P(oB|b1) P(oC|b1) P(oD|b1)Receiveobservation

P(o|b) = sP(o|s)b(s)

P(s|b,o) = P(o|s)P(s|b)/P(o|b)

= 1/Z P(o|s) b(s)

b1,A(s) = P(s|b1,oA)

b1,B(s) = P(s|b1,oB)b1,C(s) = P(s|b1,oC)b1,D(s) = P(s|b1,oD)

b1,A b1,B b1,C b1,D

Page 23: Uncertainty in Sensing (and action)

BELIEF-SPACE SEARCH TREE Each belief node has |A| action node successors Each action node has |O| belief successors Each (action,observation) pair (a,o) requires

predict/update step similar to HMMs

Matrix/vector formulation: b(s): a vector b of length |S| P(s’|s,a): a set of |S|x|S| matrices Ta P(ok|s): a vector ok of length |S|

ba = Tab (predict) P(ok|ba) = ok

T ba (probability of observation) ba,k = diag(ok) ba / (ok

T ba) (update)

Denote this operation as ba,o

Page 24: Uncertainty in Sensing (and action)

RECEDING HORIZON SEARCH Expand belief-space search tree to some

depth h Use an evaluation function on leaf beliefs to

estimate utilities

For internal nodes, back up estimated utilities:U(b) = E[R(s)|b] + g maxaA oO P(o|ba)U(ba,o)

Page 25: Uncertainty in Sensing (and action)

QMDP EVALUATION FUNCTION One possible evaluation function is to

compute the expectation of the underlying MDP value function over the leaf belief states f(b) = s UMDP(s) b(s)

“Averaging over clairvoyance” Assumes the problem becomes instantly fully

observable after 1 action Is optimistic: U(b) f(b) Approaches POMDP value function as state and

sensing uncertainty decreases In extreme h=1 case, this is called the QMDP

policy

Page 26: Uncertainty in Sensing (and action)

QMDP POLICY (LITTMAN, CASSANDRA, KAELBLING 1995)

Page 27: Uncertainty in Sensing (and action)

WORST-CASE COMPLEXITY Infinite-horizon undiscounted POMDPs are

undecideable (reduction to halting problem) Exact solution to infinite-horizon discounted

POMDPs are intractable even for low |S| Finite horizon: O(|S|2 |A|h |O|h) Receding horizon approximation: one-step

regret is O(gh) Approximate solution: becoming tractable for

|S| in millions a-vector point-based techniques Monte Carlo tree search …Beyond scope of course…

Page 28: Uncertainty in Sensing (and action)

SCHEDULE 11/29: Robotics 12/1: Guest lecture: Mike Gasser, Natural

Language Processing 12/6: Review 12/8: Final project presentations, review