Black Box and Generalized Algorithms for Planning in Uncertain Domains
description
Transcript of Black Box and Generalized Algorithms for Planning in Uncertain Domains
![Page 1: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/1.jpg)
1
Black Box and Generalized Algorithms for
Planning in Uncertain Domains
Thesis Proposal, Dept. of Computer Science, Carnegie Mellon University
H. Brendan McMahan
![Page 2: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/2.jpg)
2
Outline
The Problem and Approach Motivating Examples Goals and Techniques MDPs and Uncertainty
Example Algorithms Proposed Future Work
![Page 3: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/3.jpg)
3
Mars Rover Mission Planning
Human control not realistic
Collect data while conserving power and bandwidth
First Experiments in the Robotic Investigation of Life in the Atacama Desert of Chile. D. Wettergreen, et al. 2005.
Recent Progress in Local and Global Traversability for Planetary Rovers. S. Singh, et al. 2000.
![Page 4: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/4.jpg)
4
Autonomous Helicopter Control
6+ continuous state dimensions
Complex, non-linear dynamics
High failure cost
Inverted Autonomous Helicopter Flight via Reinforcement LearningA. Ng, et al.
Autonomous Helicopter Control using Reinforcement Learning Policy Search MethodsJ. Bagnell and J. Schneider
![Page 5: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/5.jpg)
5
Online Shortest Path Problem
Getting from my (old) house to CMU each day:
![Page 6: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/6.jpg)
6
Other Domains
![Page 7: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/7.jpg)
7
Goal
Planning multiple decisions over time to achieve
goals or minimize cost
in Uncertain Domains NOT deterministic, fully observable,
perfectly modeled
![Page 8: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/8.jpg)
8
The Black Box Approach
Fast ExistingAlgorithm
New Algorithm
HardProblem
EasierProblems
Solutions
Solution
![Page 9: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/9.jpg)
9
The Generalization Approach
HardProblem
Solution
Generalization of ExistingAlgorithm
Fast ExistingAlgorithm
![Page 10: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/10.jpg)
10
Two Examples
Black Box Approach
Value Iteration(MDPs) Used as a Black Box
Oracle Algorithms(MDPs with
unknown costs)
Generalize To Algorithms for MDPs
Dijkstra’s Alg(Shortest Paths)
Generalization Approach
![Page 11: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/11.jpg)
11
Benefits of using Black Boxes
Use fast/optimized/mature implementations Pick implementation for specific domain Will be able to use algorithms not even
invented yet Theoretical advantages
![Page 12: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/12.jpg)
12
Benefits of Generalization
New intuitions
Some performance guarantees for free
![Page 13: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/13.jpg)
13
Review of MDPs
An MDP (S, A, P, c) … S is a finite set of states A is a finite set of actions dynamics P(y | x, a) costs c(x,a)
Goal:New idea!
No New Ideas
Hungry
A = {eat, wait, work}
0.1
0.80.1
0.01
0.99
1.0
1.0
$1.00 $1.00
$0.10
$4.75
A Research MDP:
![Page 14: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/14.jpg)
14
Simple Example DomainRobot path planning problem: Actions = {8 neighbors} Cost: Euclidean Distance Prob. p of random action
![Page 15: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/15.jpg)
15
Types of Uncertainty
Outcome Uncertainty (MDPs) Partial Observability (POMDPs) Model Uncertainty (families of MDPs, RL)
Modeling Other Agents (Agent Uncertainty?)
![Page 16: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/16.jpg)
16
The Curse of Dimensionality The size of |S| is exponential in the number
of state variables:
<x,y, vx, vy, battery_power, this_door_open, that_door_open, goal_x, goal_y, bob_x, bob_y, …>
![Page 17: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/17.jpg)
17
Outline
The Problem and Approach Example Algorithms
MDPs with unknown costs Generalizing Dijkstra’s Algorithm
Proposed Future Work
![Page 18: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/18.jpg)
18
Unknown Costs, Offline Version
A game with two players: The Planner chooses a policy for a
MDP with known dynamics
The Sentry chooses a cost function from a set K = {c1,…,ck} of possible cost functions.
![Page 19: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/19.jpg)
19
Avoiding Detection by Sensors
The Planner (robot) picks policies (paths):
The Sentry picks cost functions (sensor placements):
![Page 20: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/20.jpg)
20
Matrix Game Formulation
Matrix game M: Planner (rows) selects a policy Sentry (columns) selects a cost c M(, c) = [total cost of under costs c]
Goal: Find a minimax solution to M
An optimal mixed strategy for the planner is a distribution over pure polices (paths).
![Page 21: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/21.jpg)
21
Interpretations
Model Uncertainty:→ unknown cost function
Partial Observability:→ fixed, unobservable cost function
Agent Uncertainty:→ an adversary picks the cost function
![Page 22: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/22.jpg)
22
How to Solve It
Problem: Matrix M is exponentially big Solution: Can be represented compactly as a
Linear Program (LP)
Problem: LP still takes much too long to solve Solution: The Single Oracle Algorithm, taking
advantage of fast black box MDP algorithms
![Page 23: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/23.jpg)
23
Single Oracle Algorithm
F is a small set of policies M’ is the matrix game where the Planner
must play from F.
|F| = 2
We can solve M’ efficiently,it is only |F| x |K| in size!
![Page 24: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/24.jpg)
24
Single Oracle Algorithm
If only … we knew it was sufficient for
the Planner to randomize among a small set of strategies
and we could find that set of strategies.
![Page 25: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/25.jpg)
25
Single Oracle Algorithm
1. Use an MDP algorithm to find an optimal policy against the fixed cost function c.
2. Add to F
3. Solve M’ and let c be the expected cost function under the Sentry’s optimal mixed strategy.
![Page 26: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/26.jpg)
26
Example Run: Initialization
Fix policy (blue path)
Solve M’ to find red sensor field (cost vector), fix this as c
![Page 27: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/27.jpg)
27
Iteration 1: Best Response
Solve for the best response policy (new blue line)
Add to F
Red: Fixed cost vector (expected field of view)Blue: Shortest path given costs
![Page 28: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/28.jpg)
28
Iteration 1: Solve the Game
Solve M’
Minimax Equilibrium:Red: Mixture of CostsBlue: Mixture of Paths from F
![Page 29: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/29.jpg)
29
Iteration 2: Best Response
Solve for the best response policy (new blue line)
Add to F
Red: Fixed cost vector (expected field of view)Blue: Shortest path given costs
![Page 30: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/30.jpg)
30
Iteration 2: Solve the Game
Solve M’
Minimax Equilibrium:Red: Mixture of CostsBlue: Mixture of Paths from F
![Page 31: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/31.jpg)
31
Iteration 6: ConvergenceSolution to M’ Best Response
![Page 32: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/32.jpg)
32
Unknown Costs, Online Version
Go from my house to CMU each day Model as a graph
![Page 33: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/33.jpg)
33
A Shortest Path Problem?
If we knew all the edge costs, it would be easy! But, traffic, downed trees → uncertainty
![Page 34: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/34.jpg)
34
Limited Observations
Each day, observe the total length of the path we actually took to get to CMU
BGA Algorithm:
Keep estimates of edge lengths
• Most days, follow FPL1 algorithm: pick shortest path with respect to estimated lengths plus a little noise.
• Occasionally, play a “random” path in order to make sure we have good estimates of the edge lengths.
1 [Kalai and Vempala, 2003]
![Page 35: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/35.jpg)
35
Dijkstra's Algorithm
G
x1
x2
x3
x4
v'= 0
v'=∞
v'=∞
v'= ∞
v'=∞
v'=3
v'=2v'=1
v'=5
v'=6v'=7
v'=2
Keeps states on a priority queue
Pops states in order of increasing distance, updates predecessors
Prioritized Sweeping1,2 has a similar structure, but doesn’t reduce to Dijkstra’s algorithm
1 [A. Moore, C. Atkeson 1993] 2 [D. Andre, et al. 1998]
![Page 36: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/36.jpg)
36
Prioritized Sweeping
When we pop a state x, backup x, update priorities of predecessors w
y1
y2
y3
w1
w2
x1
Values of red states updated
based on value of purple states.
![Page 37: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/37.jpg)
37
Improved Prioritized Sweeping
When we pop a state x, its value has already been updated
Update values and priorities of predecessors w
y1
y2
y3
w1
w2
x1
Values of red states updated
based on value of purple states.
![Page 38: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/38.jpg)
38
Priority Function Intuitions
Update the state: with lowest value (closest to goal) whose value is most accurately known
For Dijkstra’s algorithm, the updated (popped) state’s optimal value is known
This is the state whose value will change the least in the future.
whose value has changed the most since it was last updated.
![Page 39: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/39.jpg)
39
ComparisonIPS, deterministic domain: PS, same problem:
Dark red indicates recently popped from queue, lighter means less recently.
![Page 40: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/40.jpg)
40
Outline
The Problem and Approach Example Algorithms Proposed Future Work
Bounded RTDP and extensions Large action spaces Details of proposed contributions
![Page 41: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/41.jpg)
41
Bounded RTDP
RTDP: Fixed start state means
many states are irrelevant Sample, backup along start → goal trajectories
BRTDP adds: performance guarantees,
much faster convergence
![Page 42: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/42.jpg)
42
Dijkstra and BRTDP
Dijkstra-style scheduling of backups for BRTDP
Sample multiple trajectories
Use priority queue to schedule backups of states on all trajectories
![Page 43: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/43.jpg)
43
Dijkstra, BRTDP, and POMDPs
HSVI1 is like BRTDP, but for POMDPs
The same trick should apply
But more benefit, because backups are more expensive
Piecewise linear belief-space value function
x1 x2
1 [T. Smith and R. Simmons. 2004 ]
![Page 44: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/44.jpg)
44
Large Action Spaces
(Prioritized) Policy Iteration already has an advantage
Better tradeoff between policy evaluation, policy improvement?
Structured sets of actions? Application of
Experts/Bandits algorithms?
![Page 45: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/45.jpg)
45
Details: Proposed Contributions
Discussion of algorithms already developed: Oracle Algorithms, BGA, IPS, BRTDP, and several others.
At least two significant new algorithmic contributions: BRTDP + Dijkstra algorithm, extension to POMDPs Improved version of PPI to handle large action spaces Something else: generalizations of conjugate-gradient linear
solvers to MDPs, extensions of the technique for finding upper bounds introduced in the BRTDP paper, algorithms for efficiently solving restricted classes of POMDPs...
![Page 46: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/46.jpg)
46
Details: Proposed Contributions
At least one significant new theoretical contribution: Approximation algorithm for Canadian Traveler’s
Problem or Stochastic TSP Results connecting online algorithms / MDP
techniques to stochastic optimization New contributions on bandit-style online algorithms,
perhaps applications to MDPs
![Page 47: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/47.jpg)
47
SummaryMotivating Problems
Black Boxes: MDPs with unknown Costs
Generalization: Reducing to Dijkstra
Future Work:BRTDP + Dijkstra,Large action spaces
![Page 48: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/48.jpg)
48
Questions?
![Page 49: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/49.jpg)
49
Relationships of Algorithms Discussed
![Page 50: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/50.jpg)
50
Iteration 3: Best Response
Solve for the best response policy (new blue line)
Add to F
Red: Fixed cost vector (expected field of view)Blue: Shortest path given costs
![Page 51: Black Box and Generalized Algorithms for Planning in Uncertain Domains](https://reader036.fdocuments.in/reader036/viewer/2022070503/568155cd550346895dc39c88/html5/thumbnails/51.jpg)
51
Representations, Algorithms
Simulation dynamics model
Factored Representation (DBNs, etc)
STRIPS-style languages
Policy Search, …
Generalizations of Value Iteration, …