Integrating decision-theoretic planning andprogramming for robot control in highly
dynamic domains
Christian Fritz
Thesis, Final Presentation
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.1/32
Introduction
Goals:I combine:
� programming� decision-theoretic planning� on-line!
I extend planning with optionsI evaluate in three diversified example domains
� grid world� RoboCup Simulation� RoboCup Mid-Size
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.2/32
Programming
ICPGOLOGI based on situation calculus
I extends basic GOLOG:
+ on-line: incremental, sensing (active and passive)
+ continuous change
+ concurrency
+ progression
+ probabilistic projection
– nondeterminism
I problems:
� decision making: explicit, missing utility theory
� projection comparatively slowIntegrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.3/32
Decision-Theoretic Planning
Markov Decision Processes (MDPs) standard model fordecision-theoretic planning problems
I Formally: M =< S,A, T,R >, with
� S a set of states
� A a set of actions
� T : S ×A× S → [0, 1] a transition function
� R : S → IR a reward function
I Here: fully observable MDPs
I Planning task: find an optimal policy, maximizing expected reward
I Note: S and A are usually finite!
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.4/32
Programming & Planning:DTGolog
I New Golog derivative DTGOLOG [Boutilier et al.]
I Combines explicit agent programming with planning
I Uses MDPs to model the planning problem:
� S = situations
� A = primitive actions
� T = for each action a ∈ A, a list of outcomes and theirrespective probability
� R : situations→ IR
I applies decision-tree search to solve MDP up to a given horizon
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.5/32
Programming & Planning:DTGolog
Disadvantages:I offlineI situations = states
� infinite state space� inefficient
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.6/32
READYLOG
Contributions:I re-added nondeterminism with decision-theoretic
semantics→ on-line decision-theoretic Golog
I added options to speed up MDP solutionI preprocessor to minimize interpretation on-line
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.7/32
Part I
Extending DTGolog with Options
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.8/32
Options? what’s that?
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.9/32
Options
Idea:I construct complex actions from primitive onesI options: solutions to sub-MDPsI generate models about them:
� when possible to execute?� which outcomes possible to occur?� which probabilities do the outcomes have?� expected rewards and costs? (expected value)
I these can then be used in planning
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.10/32
Integrating Options into Golog
how do we integrate options into DTGolog/ReadyLog?
I avoiding the inconvenience “situations = states”
I instead mappings:
� situations→ states (when ’entering’ option)
� states→ situations (when ’leaving’ option)
I options..
� ..are solutions to local MDPs..
� ..encapsulated into a stochastic procedure.
I stochastic procedures..
� ..are procedures with an explicit model (preconditions/effects/costs);
� ..replace stochastic actions;
� ..can model options.Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.11/32
Generating Options
how do we generate options?
I define:
� φ precondition (think: states where option is applicable)
� β : exitstates → value pseudo-rewards for local MDP
� θ option-skeleton one-step program to take in each step..• ..usually something like nondet( [left, right, down, up] ) ;• ..can contain ifs;• ..can build on options/stochastic procedures
� and: two mappings:
• Φ : situations→ states• Σ : states→ situations• option_mapping(o, σ,Γ, ϕ)
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.12/32
Examples
I example policy:proc (room1_2 , [exogf_Update ,
while (is_possible (room1_2 ),
[ if (pos=[0, 0], go_right , if (pos=[0, 1], go_right ,
if (pos=[0, 2], go_up , if (pos =[1, 0], go_right ,
if (pos=[1, 1], go_right , if (pos=[1, 2], go_right ,
if (pos=[2, 0], go_down , if (pos=[2, 1], go_right ,
if (pos=[2, 2], go_up , []))))))))),
exogf_Update ])]).
I example model (for state ’position=(0,0)’):opt_costs (room1_2, [(pos, [0, 0])], 4.51650594972207).
opt_probability_list (room1_2 , [(pos , [0, 0])], [([(pos , [1, 3])], 0.00012),
([(pos , [3, 1])], 0.99987)]).
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.13/32
Test Setting
3 4
5
6 7
21 S
G11
G4
G3
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.14/32
Experimental Results
(a) full MDP
planning (A)
(b) heuristics
(B)
(c) options (C)
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.15/32
Experimental Results
0.1
0.03
9555.86
2507.56
302.01
0.048
53.6
702.33
38.63
11.23 6.81
1.04
0.55
3.66
3 4 5 6 7 8 9 10 11
seco
nds
manhattan distance from start to goal
A
A’
B
B’
C
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.16/32
Part II
On-line Decision-Theoretic Golog for UnpredictableDomains
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.17/32
READYLOG: on-line DT planning
on-line:
I incremental
� solve( plan-skeleton , horizon)
� execute returned policy
I sensing / exogenous events
� problem:• dynamic environment (changes while thinking)• imperfect models→ policy can get invalid
⇒ execution monitoring:• program and policy coexistence• markers
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.18/32
Execution Monitoring Semantics
Trans(solve(p, h), s, δ′, s′) ≡∃π, v, pr .BestDo(p, s, h, π, v, pr) ∧ δ′ = applyPol(π) ∧ s′ = s.
BestDo(if (ϕ, p1, p2); p,s, h, π, v, pr).=
ϕ[s] ∧ ∃π1.BestDo(p1; p, s, h, π1, pr) ∧ π = M(ϕ, true);π1 ∨¬ϕ[s] ∧ ∃π2.BestDo(p2; p, s, h, π2, v, pr) ∧ π = M(ϕ, false);π2
Trans(applyPol(M(ϕ, v);π), s, δ′, s′) ≡ s = s′∧(v = true ∧ ϕ[s] ∧ δ′ = applyPol(π) ∨v = false ∧ ¬ϕ[s] ∧ δ′ = applyPol(π) ∨v = true ∧ ¬ϕ[s] ∧ δ′ = nil ∨v = false ∧ ϕ[s] ∧ δ′ = nil)
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.19/32
READYLOG II
I options (..)
I preprocessor:
� translates READYLOG functions, conditions, definitions.. to Prolog code
� creates successor state axioms from effect axioms
� speed-up of about factor 16
1
4
16
64
256
1024
200 400 600 800 1000 1200 1400 1600 1800 2000
seco
nds
length of situation term
effect axioms (uncompiled)
successor state axioms (compiled)
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.20/32
Experimental Results
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.21/32
Experimental Results:SimLeague
I compared with ICPGOLOG(Normans results)planning time in seconds
ICPGOLOG READYLOG
goal shot 0.35 0.01
direct pass 0.25 0.01
I speed-up due to preprocessor
Example where these are combined (demo):
solve (nondet ([goalKick (OwnNumber ),
[pickBest (bestP , [2..11],
[directPass (OwnNumber , bestP, pass_NORMAL ),
goalKick (bestP )])]
]), Horizon )
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.22/32
Experimental Results: MidSize
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.23/32
Experimental Results: MidSize,Code
solve (
nondet ([ kick(ownNumber , 40),
dribble_or_move_kick (ownNumber ),
dribble_to_points (ownNumber ),
if (isKickable (ownNumber ),
pickBest (var_turnAngle , [-3.1, -2.3, 2.3, 3.1],
[ turn_relative (ownNumber , var_turnAngle , 2),
nondet ([ [intercept_ball (ownNumber , 1),
dribble_or_move_kick (ownNumber )],
[intercept_ball (numberByRole (supporter ), 1),
dribble_or_move_kick (numberByRole (supporter ))]
]) ]),
nondet ([ [intercept_ball (ownNumber , 1),
dribble_or_move_kick (ownNumber )],
intercept_ball (ownNumber , 0.0, 1)]) ) ]), 4)
proc (dribble_or_move_kick (Own ),
nondet ([[dribble_to (Own , oppGoalBestCorner , 1)],
[move_kick (Own , oppGoalBestCorner , 1)]])).
proc (dribble_to_points (Own),
pickBest (var_pos , [[2.5, -1.25], [2.5, -2.5], [2.5, 0.0],[2.5, 2.5], [2.5, 1.25]],
dribble_to (Own , var_pos , 1))).
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.24/32
Experimental Results: MidSize,Behavior
ball behavior when turning with ball
move_kick/dribble
move/dribble/intercept
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.25/32
Experimental Results: MidSize,Example: Situation
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.26/32
Experimental Results: MidSize,Example: Plans
(d) (e) (f)
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.27/32
Experimental Results: MidSize,Example: Teamplay
(g) (h) (i)
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.28/32
Experimental Results: MidSize,Example: Decision Tree
natures choices
agent choices move_kick
kick
turn
intercept(me)
intercept(TM)
move_kick
move_kick
0.8
0.2
0.8
0.2
10000
10000
4169
4169
costs: −70
costs: −70
costs: −70
costs: −70
4169
4169
4169
costs: −12
costs: −7
4557
4776
4623
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.29/32
Experimental Results: MidSize,Computation
planning time in secondsexamples min avg max
without ball 698 0.0 0.094 0.450with ball 117 0.170 0.536 2.110
I variance due to processor load on robotsI qualitatively: enough for rudimentarily playing
soccer
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.30/32
Conclusion
I On-line decision theoretic Golog..
� ..can be applied to highly dynamic domains withinfinite/continuous state spaces,
� ..can coexist with passive sensing,
� ..motivates more sophisticated execution monitoring.
I Options..
� ..can be added to decision theoretic Golog,
� ..provide good speed-ups,
� ..rely on finite state spaces(!).
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.31/32
Top Related