Logistics
description
Transcript of Logistics
![Page 1: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/1.jpg)
1(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Logistics
Reading for MonNo class Wed 11/26
![Page 2: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/2.jpg)
2(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Activeresearch area
No uncertainty
Achieve goals
Heuristic search
Uncertainty
Maximize utility
Dynamic programming
Classical AI planning Operations Research
Knowledge-based representation
Markov decision process
![Page 3: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/3.jpg)
3(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Review
MDPsBayesian NetworksDBNsFactored MDPsBDDs & ADDs
![Page 4: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/4.jpg)
4(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Markov Decision ProcessesS = set of states set (|S| = n)
A = set of actions (|A| = m)
Pr = transition function Pr(s,a,s’)represented by set of m n x n stochastic matriceseach defines a distribution over SxS
R(s) = bounded, real-valued reward functionrepresented by an n-vector
![Page 5: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/5.jpg)
5(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Planning
Plan?• Objective?
Policy?• Objective?
![Page 6: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/6.jpg)
6(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Dynamic programming (DP)
Value iteration [Bellman, 1957]
Sj
ijiAa
jfaPaCif maxDP improves
value function
Initial value function
-optimal value function
Policy iteration [Howard, 1960]
Sj
ijiAa
jfaPaCi maxarg
Evaluatepolicy
DP improvespolicy
Initial policy
-optimal policy
![Page 7: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/7.jpg)
7(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Bellman’s Curse of Dimensionality
![Page 8: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/8.jpg)
8(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Bayes NetsCompact Rep’n Joint Prob, Distribution
Earthquake Burglary
Alarm
Nbr2CallsNbr1Calls
Pr(B=t) Pr(B=f) 0.05 0.95
Pr(A|E,B)e,b 0.9 (0.1)e,b 0.2 (0.8)e,b 0.85 (0.15)e,b 0.01 (0.99)
Radio
![Page 9: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/9.jpg)
9(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
DBN Representation: DelC
Tt
Lt
CRt
RHCt
Tt+1
Lt+1
CRt+1
RHCt+1
fCR(Lt,CRt,RHCt,CRt+1)
fT(Tt,Tt+1)
L CR RHC CR(t+1) CR(t+1)
O T T 0.2 0.8
E T T 1.0 0.0
O F T 0.0 1.0
E F T 0.0 1.0
O T F 1.0 0.1
E T F 1.0 0.0
O F F 0.0 1.0
E F F 0.0 1.0
T T(t+1) T(t+1)
T 0.91 0.09
F 0.0 1.0
RHMt RHMt+1
Mt Mt+1
fRHM(RHMt,RHMt+1)RHM R(t+1) R(t+1)
T 1.0 0.0
F 0.0 1.0
![Page 10: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/10.jpg)
10(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Benefits of DBN Representation
- Only 48 parameters vs. 25440 for matrix
s1 s2 ... s160
s1 0.9 0.05 ... 0.0s2 0.0 0.20 ... 0.1
s160 0.1 0.0 ... 0.0
...Tt
Lt
CRt
RHCt
Tt+1
Lt+1
CRt+1
RHCt+1
RHMt RHMt+1
Mt Mt+1
![Page 11: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/11.jpg)
11(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Example (x3 and x2) or not x1
x2
x1
0 1
00
01
1
1
x3
OBDD
10
1 1 1 1 1
Binary decision tree
x3
x2
x1
![Page 12: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/12.jpg)
12(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Action Representation – DBN/ADD
CR
0.0 1.0 0.8
RHC
L
CR(t+1)CR(t+1)CR(t+1)
0.2
Algebraic Decision Diagram (ADD)Tt
Lt
CRt
RHCt
Tt+1
Lt+1
CRt+1
RHCt+1
RHMt RHMt+1
Mt Mt+1
f
t
t
o
t
e
f
ffft
t
fCR(Lt,CRt,RHCt,CRt+1)
![Page 13: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/13.jpg)
13(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Today – Solving the curse
AbstractionApproximationReachability
![Page 14: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/14.jpg)
14(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Structured Computation
Given compact representation, can we solve
MDP without explicit state space enumeration?
Can we avoid O(|S|)-computations by exploiting
regularities made explicit by propositional or first-
order representations?
Two general schemes:
• abstraction/aggregation
• decomposition
![Page 15: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/15.jpg)
15(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
State Space Abstraction
General method: state aggregation
• group states, treat aggregate as single state
• commonly used in OR [SchPutKin85, BertCast89]
• viewed as automata minimization [DeanGivan96]
Abstraction is a specific aggregation technique
• aggregate by ignoring details (features)
• ideally, focus on relevant features
![Page 16: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/16.jpg)
16(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Dimensions of Abstraction
A B C
A B C
A B C
A B C
A B C
A B C
A B C
A B C
A
A B C
A B
A B C
A
B
C=
5.3
5.3
5.3
5.3
2.9
2.9 9.3
9.3
5.3
5.2
5.5
5.3
2.9
2.79.3
9.0
Uniform
Nonuniform
Exact
Approximate
Adaptive
Fixed
![Page 17: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/17.jpg)
17(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
A Fixed, Uniform Approximate Abstraction Method
Uniformly delete features from domain [BD94/AIJ97]
Ignore features based on degree of relevance• rep’n used to determine importance to sol’n quality
Allows tradeoff between abstract MDP size and solution quality
A B C
A B C
A B C
A B C
A B C
A B C
0.8
0.2
0.5
0.5
![Page 18: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/18.jpg)
18(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Immediately Relevant Variables
Rewards determined by particular variables• impact on reward clear from STRIPS/ADD rep’n of R
• e.g., difference between CR/-CR states is 10, while difference between T/-T states is 3, MW/-MW is 5
Approximate MDP: focus on “important” goals• e.g., we might only plan for CR
• we call CR an immediately relevant variable (IR)
• generally, IR-set is a subset of reward variables
![Page 19: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/19.jpg)
19(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Relevant Variables
We want to control the IR variables• must know which actions influence these and under
what conditions
A variable is relevant if it is the parent in the DBN for some action a of some relevant variable
• ground (fixed pt) definition by making IR vars relevant
• analogous def’n for PSTRIPS
• e.g., CR (directly/indirectly) influenced by L, RHC, CR
Simple “backchaining” algorithm to contruct set• linear in domain descr. size, number of relevant vars
![Page 20: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/20.jpg)
20(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Constructing an Abstract MDP
Simply delete all irrelevant atoms from domain• state space S’: set of assts to relevant vars
• transitions: let Pr(s’,a,t’) = t t’ Pr(s,a,t’) for any ss’
construction ensures identical for all ss’
• reward: R(s’) = max {R(s): ss’} - min {R(s): ss’} / 2 midpoint gives tight error bounds
Construction of DBN/PSTRIPS with these properties involves little more than simplifying action descriptions by deletion
![Page 21: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/21.jpg)
21(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Example
Abstract MDP• only 3 variables
• 20 states instead of 160
• some actions become identical, so action space is simplified
• reward distinguishes only CR and –CR (but “averages” penalties for MW and –T)
Lt
CRt
RHCt
Lt+1
CRt+1
RHCt+1
DelC action
Aspect Condt’n Rew
Coffee CR -14
-CR -4
Reward
![Page 22: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/22.jpg)
22(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Solving Abstract MDPAbstract MDP can be solved using std methodsError bounds on policy quality derivable
• Let be max reward span over abstract states
• Let V’ be optimal VF for M’, V* for original M
• Let ’ be optimal policy for M’ and * for original M
s'any sfor sVsV
)(
|)'(')(| *
12
s'any sfor sVsV
)(
|)'()(| '*
1
![Page 23: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/23.jpg)
23(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
FUA Abstraction: Relative Merits
FUA easily computed (fixed polynomial cost)
FUA prioritizes objectives nicely• a priori error bounds computable (anytime tradeoffs)
• can refine online (heuristic search) [DeaBou97]
FUA is inflexible• can’t capture conditional relevance
• approximate (may want exact solution)
• can’t be adjusted during computation
• may ignore the only achievable objectives
![Page 24: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/24.jpg)
24(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Dimensions of Abstraction
A B C
A B C
A B C
A B C
A B C
A B C
A B C
A B C
A
A B C
A B
A B C
A
B
C=
5.3
5.3
5.3
5.3
2.9
2.9 9.3
9.3
5.3
5.2
5.5
5.3
2.9
2.79.3
9.0
Uniform
Nonuniform
Exact
Approximate
Adaptive
Fixed
![Page 25: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/25.jpg)
25(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Constructing Abstract MDPs
Many ways to abstract an MDP• methods will exploit the logical representation
Abstraction can be viewed as a form of automaton minimization
• general minimization schemes require state space enumeration
• Instead, exploit the logical structure of the domain (state, actions, rewards) to construct logical descriptions of abstract states, avoiding state enumeration
![Page 26: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/26.jpg)
26(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Decision-Theoretic Regression
Abstraction based on analog of regression• as abstraction: dynamic, nonuniform, exact/approx.
• exploits logical representation of MDP
Overview• value iteration as variable elimination
• propositional decision-theoretic regression
• approximate decision-theoretic regression
• first-order decision-theoretic regression
![Page 27: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/27.jpg)
27(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Classical Regression
Goal regression a classical abstraction method• Regression of a logical condition/formula G through
action a is a weakest logical formula C = Regr(G,a) such that: G is guaranteed to be true after doing a if C is true before doing a
• Weakest precondition for G wrt a
G
G
C
Cdo(a)
![Page 28: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/28.jpg)
28(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Example: Regression in SitCalc
For the situation calculus• Regr(G(do(a,s))): logical condition C(s) under which a
leads to G (aggregates C states and ~C states)
Regression in sitcalc straightforward
• Regr(F(x, do(a,s))) F(x,a,s)• Regr(1) Regr(1)• Regr(12) Regr(1) Regr(2)• Regr(x.1) x.Regr(1)
![Page 29: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/29.jpg)
29(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Decision-Theoretic Regression
In MDPs, we don’t have goals, but regions of distinct value
Decision-theoretic analog: given “logical description” of Vt+1, produce such a description of Vt or optimal policy (e.g., using ADDs)
Cluster together states at any point in calculation
with same best action (policy), or with same
value (VF)
![Page 30: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/30.jpg)
30(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Decision-Theoretic Regression
Decision-theoretic complications:• multiple formulae G describe fixed value partitions
• a can leads to multiple partitions (stochastically)
VtQt+1(a)
G2
G3G1
C1
p1
p2
p3
![Page 31: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/31.jpg)
31(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Functional View of DTR
Generally, Vt+1 depends on only a subset of variables @ t (usually in a structured way)
What is value of action a at time t (at any s)?
CR
M
-10 0
Vt+1
Tt
Lt
CRt
RHCt
Tt+1
Lt+1
CRt+1
RHCt+1
RHMt RHMt+1
Mt Mt+1
fRm(Rmt,Rmt+1)
fM(Mt,Mt+1)
fT(Tt,Tt+1)
fL(Lt,Lt+1)
fCr(Lt,Crt,Rct,Crt+1)
fRc(Rct,Rct+1)
![Page 32: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/32.jpg)
32(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Bellman Backup (Regression)
JC
10 012
CP
CC
JP BC JP
9
CR
0.0 1.0 0.8
RHC
L
CR(t+1)CR(t+1)CR(t+1)
0.2
f
t
t
o
t
e
f
ffft
t
CR
M
-10 0
![Page 33: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/33.jpg)
33(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
A Simple Action/Reward Example
X
Y
Z
X
Y
Z
X
Y0.9
0.0
W
1.0 0.0
1.0
Y
Z0.9
0.01.0
Z
10 0
Network Rep’n for Action A Reward Function R
![Page 34: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/34.jpg)
34(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Example: Generation of V1
Z
010
Y
Z8.1
0.09.0
V0 = R
Y
ZZ: 0.9
Z: 0.0Z: 1.0
P(Z|a,s)
Y
Z9.0
0.010.0
P(Z|a,s)V0 P(Z|a,s)V0
Y
Z8.1
0.09.0
Maxa …
Y
Z8.1
0.019.0
R(s) +Maxa … = V1
Y
Z8.1
0.09.0
Z
010
+ =
![Page 35: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/35.jpg)
35(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Example: Generation of V2
Y
Z8.1
0.09.0
V1
X
YY: 0.9
Y: 0.0Y: 1.0 Y
ZY: 0.9
Z: 0.9
Y: 0.9
Z: 0.0
Y:0.9
Z: 1.0
Y
ZY: 1.0
Y: 0.0
Z: 0.0
Y:0.0
Z: 1.0
X
P(Y|a, s) P(Z|a,s)
![Page 36: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/36.jpg)
36(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Some Results: Natural Examples
![Page 37: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/37.jpg)
38(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Some Results: Worst-case
![Page 38: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/38.jpg)
40(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Some Results: Best-case
![Page 39: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/39.jpg)
41(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
DTR: Relative Merits
Adaptive, nonuniform, exact abstraction method• provides exact solution to MDP
• much more efficient on certain problems (time/space)
• 400 million state problems (ADDs) in a couple hrs
Some drawbacks• produces piecewise constant VF
• some problems admit no compact solution representation (though ADD overhead “minimal”)
• approximation may be desirable or necessary
![Page 40: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/40.jpg)
42(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Criticisms of SPUDD
![Page 41: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/41.jpg)
43(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Future Work
![Page 42: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/42.jpg)
44(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Dimensions of Abstraction
A B C
A B C
A B C
A B C
A B C
A B C
A B C
A B C
A
A B C
A B
A B C
A
B
C=
5.3
5.3
5.3
5.3
2.9
2.9 9.3
9.3
5.3
5.2
5.5
5.3
2.9
2.79.3
9.0
Uniform
Nonuniform
Exact
Approximate
Adaptive
Fixed
![Page 43: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/43.jpg)
45(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Approximate DTR
Easy to approximate solution using DTR
Simple pruning of value function
• Can prune trees [BouDearden96] or ADDs [StaubinHoeyBou00]
Gives regions of approximately same value
![Page 44: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/44.jpg)
46(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
A Pruned Value ADD
8.368.45
7.45
U
R
W
6.817.64
6.64
U
R
W
5.626.19
5.19
U
R
WLoc
HCR
HCU
9.00
W
10.00
[7.45, 8.45]
Loc
HCR
HCU
[9.00, 10.00]
[6.64, 7.64]
[5.19, 6.19]
![Page 45: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/45.jpg)
47(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Approximate Structured VIRun normal SVI using ADDs/DTs
• at each leaf, record range of values
At each stage, prune interior nodes whose leaves all have values with some threshold
• tolerance can be chosen to minimize error or size• tolerance can be adjusted to magnitude of VF
Convergence requires some careIf max span over leaves < and term. tol. < :
1
22 )(* VV
![Page 46: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/46.jpg)
48(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Approximate DTR: Relative Merits
Relative merits of ADTR• fewer regions implies faster computation• can provide leverage for optimal computation• 30-40 billion state problems in a couple hours• allows fine-grained control of time vs. solution quality
with dynamic (a posteriori) error bounds• technical challenges: variable ordering, convergence,
fixed vs. adaptive tolerance, etc.
Some drawbacks• (still) produces piecewise constant VF• doesn’t exploit additive structure of VF at all
![Page 47: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/47.jpg)
49(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Reachability
![Page 48: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/48.jpg)
50(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
DP vs. heuristic search
Given a start state, heuristic search can find an optimal solution without evaluating all states.
Startstate
Solution graph:all states reachableby optimal solution
Explicit graph:states evaluatedduring search
Implicit graph:all states
Each iteration, DP improves solution for each stateDP solves problem for all possible starting states.
![Page 49: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/49.jpg)
51(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Solution structures
Solution path Acyclicsolution graph
Cyclicsolutiongraph
![Page 50: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/50.jpg)
52(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Solution structure
Simple
path
Acyclic graph
Cyclic graph
Dynamic programming
Forward DP (Dijkstra’s alg.)
Backwards induction
Policy (Value) iteration
Heuristic search
A*
AO*
LAO*
Heuristic search = dynamic programming + starting state + forward expansion of solution + admissible heuristic
DP vs. heuristic search
![Page 51: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/51.jpg)
53(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
AO*[Nilsson 1971; Martelli & Montanari 1973]
Initialize partial solution graph to start state
Repeat until a complete solution is found:• Expand some nonterminal
state on the fringe of the best partial solution graph
• Use backwards induction to update the costs of all ancestor states of the expanded state and possibly change their selected action.
Same except allow solution to contain loops, and use value iteration (or policy iteration) instead of backwards induction
LAO*[Hansen & Zilberstein AAAI-98, AIJ]
![Page 52: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/52.jpg)
54(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Heuristic evaluation function
h(i) is an heuristic estimate of the minimal-cost solution for every non-terminal tip state.
h(i) is admissible if h(i) < f*(i). An admissible heuristic estimate f(i) for any state in
the explicit graph is defined as follows:
Sj
ijiiAa
jfjaipac
iih
i
if
)(),,()(min else
state tipterminal-non a is if )(
state goal a is if 0
)(
)(
Underestimate cost (overestimate reward)
![Page 53: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/53.jpg)
55(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Example: Path-finding
1 Start
4 Goal2 3
5
678
Actions: Move one cell to the North, East, South, or WestEach action succeeds with probability 0.5 if there is a cell in intended direction of movement, and otherwise failsEach action has cost of one
Heuristic?
![Page 54: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/54.jpg)
56(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Start of search
1 Start
4 Goal2 3
5
678
1 Start
3.0
![Page 55: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/55.jpg)
57(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
After first node expansion
1 Start
4 Goal2 3
5
678
1 Start
4.0
2
2.0
8
4.0
N
ES
![Page 56: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/56.jpg)
58(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
After second node expansion
1 Start
4 Goal2 3
5
678
1 Start
5.0
2
3.0
8
4.0
N
ES
3
1.0E
N
S
![Page 57: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/57.jpg)
59(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
After third node expansion
1 Start
4 Goal2 3
5
678
1 Start
6.0
2
4.0
8
4.0
N
ES
3
2.0E
N
S
4
0.0E
N
S
![Page 58: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/58.jpg)
60(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Theoretical Properties
Theorem 1: Using an admissible heuristic, LAO* converges to an optimal solution without (necessarily) expanding/evaluating all states.
Theorem 2: If h2(i) is a more informative heuristic than h1(i) (i.e., h1(i) h2(i) f*(i)), LAO* using h2(i) expands a subset of the worst case set of states expanded using h1(i).
![Page 59: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/59.jpg)
61(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Today’s paper
LAO* search over what?
![Page 60: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/60.jpg)
62(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Results
States reach
able by opt
States explored durin
g search
![Page 61: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/61.jpg)
63(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Criticisms?
![Page 62: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/62.jpg)
64(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Future work?
![Page 63: Logistics](https://reader034.fdocuments.in/reader034/viewer/2022051002/56815862550346895dc5bf75/html5/thumbnails/63.jpg)
65(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Off-line vs. on-line search
Deterministicstate transitions
Stochasticstate transitions
(MDPs)
Off-line A* LAO*
On-line (real-time) LRTA* (Korf) RTDP (Barto et al.)