The Linear Programming Approach to Approximate Dynamic...
Transcript of The Linear Programming Approach to Approximate Dynamic...
![Page 1: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/1.jpg)
The Linear Programming Approach toApproximate Dynamic Programming
Daniela Pucci de Farias(joint work with Ben Van Roy)
Massachusetts Institute of Technology
http://www.mit.edu/∼pucci – p. 1/29
![Page 2: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/2.jpg)
OutlineMarkov decision processesApproximate Dynamic ProgrammingApproximate linear programmingPerformance and Error AnalysisConstraint Sampling
http://www.mit.edu/∼pucci – p. 2/29
![Page 3: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/3.jpg)
Markov Decision Processes(finite) state space S
(finite) action sets Ax
costs ga(x)
transition probabilities Pa(x, y)discount factor αMinimize E
[∑∞
t=0 αtga(t)(x(t))
]
http://www.mit.edu/∼pucci – p. 3/29
![Page 4: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/4.jpg)
Markov Decision Processes(finite) state space S(finite) action sets Ax
costs ga(x)
transition probabilities Pa(x, y)discount factor αMinimize E
[∑∞
t=0 αtga(t)(x(t))
]
http://www.mit.edu/∼pucci – p. 3/29
![Page 5: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/5.jpg)
Markov Decision Processes(finite) state space S(finite) action sets Ax
costs ga(x)
transition probabilities Pa(x, y)discount factor αMinimize E
[∑∞
t=0 αtga(t)(x(t))
]
http://www.mit.edu/∼pucci – p. 3/29
![Page 6: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/6.jpg)
Markov Decision Processes(finite) state space S(finite) action sets Ax
costs ga(x)
transition probabilities Pa(x, y)
discount factor αMinimize E
[∑∞
t=0 αtga(t)(x(t))
]
http://www.mit.edu/∼pucci – p. 3/29
![Page 7: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/7.jpg)
Markov Decision Processes(finite) state space S(finite) action sets Ax
costs ga(x)
transition probabilities Pa(x, y)discount factor α
Minimize E[∑∞
t=0 αtga(t)(x(t))
]
http://www.mit.edu/∼pucci – p. 3/29
![Page 8: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/8.jpg)
Markov Decision Processes(finite) state space S(finite) action sets Ax
costs ga(x)
transition probabilities Pa(x, y)discount factor αMinimize E
[∑∞
t=0 αtga(t)(x(t))
]
http://www.mit.edu/∼pucci – p. 3/29
![Page 9: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/9.jpg)
Tetris
x ∈ S: wall configuration and current piecea ∈ Ax: Piece placement
Pa(x, ·): Distribution of next piece
ga(x): number of rows eliminated
http://www.mit.edu/∼pucci – p. 4/29
![Page 10: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/10.jpg)
ExamplesScheduling/routing in queueing networksDynamic resource allocationAsset allocation/risk managementPower management in devices
http://www.mit.edu/∼pucci – p. 5/29
![Page 11: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/11.jpg)
Dynamic ProgrammingBellman’s equation
J(x) = mina∈Ax
E [ga(x) + αJ(y)]
Value iteration, policy iteration, linear programmingObtain an optimal policy
u∗(x) ∈ argmina∈Ax
E [ga(x) + αJ∗(y)]
The curse of dimensionality
http://www.mit.edu/∼pucci – p. 6/29
![Page 12: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/12.jpg)
Dynamic ProgrammingBellman’s equation
J(x) = mina∈Ax
E [ga(x) + αJ(y)]
Value iteration, policy iteration, linear programming
Obtain an optimal policy
u∗(x) ∈ argmina∈Ax
E [ga(x) + αJ∗(y)]
The curse of dimensionality
http://www.mit.edu/∼pucci – p. 6/29
![Page 13: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/13.jpg)
Dynamic ProgrammingBellman’s equation
J(x) = mina∈Ax
E [ga(x) + αJ(y)]
Value iteration, policy iteration, linear programmingObtain an optimal policy
u∗(x) ∈ argmina∈Ax
E [ga(x) + αJ∗(y)]
The curse of dimensionality
http://www.mit.edu/∼pucci – p. 6/29
![Page 14: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/14.jpg)
Dynamic ProgrammingBellman’s equation
J(x) = mina∈Ax
E [ga(x) + αJ(y)]
Value iteration, policy iteration, linear programmingObtain an optimal policy
u∗(x) ∈ argmina∈Ax
E [ga(x) + αJ∗(y)]
The curse of dimensionality
http://www.mit.edu/∼pucci – p. 6/29
![Page 15: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/15.jpg)
OutlineMarkov decision processesApproximate Dynamic ProgrammingApproximate linear programmingPerformance and error analysisConstraint Sampling
http://www.mit.edu/∼pucci – p. 7/29
![Page 16: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/16.jpg)
Value Function Approximation
Approximate J∗ ≈ Jr, for some r ∈ <K
Generate a policy
u(x) ∈ argmina∈Ax
E[
ga(x) + αJr(y)]
Linearly parameterized approximators
Jr(x) = (Φr)(x) =K∑
k=1
r(k)φk(x)φ1
φ2
φ3
J~
Design a function approximator Jr
Compute parameters r ∈ <K so that Jr ≈ J∗
http://www.mit.edu/∼pucci – p. 8/29
![Page 17: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/17.jpg)
Value Function Approximation
Approximate J∗ ≈ Jr, for some r ∈ <K
Generate a policy
u(x) ∈ argmina∈Ax
E[
ga(x) + αJr(y)]
Linearly parameterized approximators
Jr(x) = (Φr)(x) =K∑
k=1
r(k)φk(x)φ1
φ2
φ3
J~
Design a function approximator Jr
Compute parameters r ∈ <K so that Jr ≈ J∗
http://www.mit.edu/∼pucci – p. 8/29
![Page 18: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/18.jpg)
Value Function Approximation
Approximate J∗ ≈ Jr, for some r ∈ <K
Generate a policy
u(x) ∈ argmina∈Ax
E[
ga(x) + αJr(y)]
Linearly parameterized approximators
Jr(x) = (Φr)(x) =K∑
k=1
r(k)φk(x)φ1
φ2
φ3
J~
Design a function approximator Jr
Compute parameters r ∈ <K so that Jr ≈ J∗
http://www.mit.edu/∼pucci – p. 8/29
![Page 19: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/19.jpg)
Value Function Approximation
Approximate J∗ ≈ Jr, for some r ∈ <K
Generate a policy
u(x) ∈ argmina∈Ax
E[
ga(x) + αJr(y)]
Linearly parameterized approximators
Jr(x) = (Φr)(x) =K∑
k=1
r(k)φk(x)φ1
φ2
φ3
J~
Design a function approximator Jr
Compute parameters r ∈ <K so that Jr ≈ J∗
http://www.mit.edu/∼pucci – p. 8/29
![Page 20: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/20.jpg)
Value Function Approximation
Approximate J∗ ≈ Jr, for some r ∈ <K
Generate a policy
u(x) ∈ argmina∈Ax
E[
ga(x) + αJr(y)]
Linearly parameterized approximators
Jr(x) = (Φr)(x) =K∑
k=1
r(k)φk(x)φ1
φ2
φ3
J~
Design a function approximator Jr
Compute parameters r ∈ <K so that Jr ≈ J∗
http://www.mit.edu/∼pucci – p. 8/29
![Page 21: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/21.jpg)
Tetris
22 features / basis functionsColumn heightsDifferences between heights of consecutive columnsMaximum heightNumber of holesConstant function
http://www.mit.edu/∼pucci – p. 9/29
![Page 22: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/22.jpg)
Approximate DP: ExamplesAmerican options pricing(Longstaff & Schwartz, 2001, Tsitsiklis & Van Roy, 2001)
Job-shop scheduling(Zhang & Dietterich, 1996)
Elevator scheduling(Crites & Barto, 1996)
Backgammon(Tesauro,1995)
http://www.mit.edu/∼pucci – p. 10/29
![Page 23: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/23.jpg)
OutlineMarkov decision processesApproximate Dynamic ProgrammingApproximate linear programmingPerformance and error analysisConstraint Sampling
http://www.mit.edu/∼pucci – p. 11/29
![Page 24: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/24.jpg)
LP Formulation of DP
maxJ∑
x
c(x)J(x)
s.t. ga(x) + α∑
y
Pa(x, y)J(y) ≥ J(x), ∀x, ∀a
J ≤ J∗ for all feasible JLP solution is J∗ for all c > 0
one variable per stateone constraint per state-action pair
http://www.mit.edu/∼pucci – p. 12/29
![Page 25: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/25.jpg)
LP Formulation of DP
maxJ∑
x
c(x)J(x)
s.t. ga(x) + α∑
y
Pa(x, y)J(y) ≥ J(x), ∀x, ∀a
J ≤ J∗ for all feasible J
LP solution is J∗ for all c > 0
one variable per stateone constraint per state-action pair
http://www.mit.edu/∼pucci – p. 12/29
![Page 26: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/26.jpg)
LP Formulation of DP
maxJ∑
x
c(x)J(x)
s.t. ga(x) + α∑
y
Pa(x, y)J(y) ≥ J(x), ∀x, ∀a
J ≤ J∗ for all feasible JLP solution is J∗ for all c > 0
one variable per stateone constraint per state-action pair
http://www.mit.edu/∼pucci – p. 12/29
![Page 27: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/27.jpg)
LP Formulation of DP
maxJ∑
x
c(x)J(x)
s.t. ga(x) + α∑
y
Pa(x, y)J(y) ≥ J(x), ∀x, ∀a
J ≤ J∗ for all feasible JLP solution is J∗ for all c > 0
one variable per stateone constraint per state-action pair
http://www.mit.edu/∼pucci – p. 12/29
![Page 28: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/28.jpg)
Approximate Linear Programming
maxJ∑
x
c(x)J(x)
s.t. ga(x) + α∑
y
Pa(x, y)J(y) ≥ J(x), ∀x, ∀a
Idea: Consider only solutions J = Φrone variable per basis functionone constraint per state-action pair⇒ efficient constraint sampling
http://www.mit.edu/∼pucci – p. 13/29
![Page 29: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/29.jpg)
Approximate Linear Programming
maxJ∑
x
c(x)J(x)
s.t. ga(x) + α∑
y
Pa(x, y)J(y) ≥ J(x), ∀x, ∀a
Idea: Consider only solutions J = Φr
one variable per basis functionone constraint per state-action pair⇒ efficient constraint sampling
http://www.mit.edu/∼pucci – p. 13/29
![Page 30: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/30.jpg)
Approximate Linear Programming
maxr∑
x
c(x)(Φr)(x)
s.t. ga(x) + α∑
y
Pa(x, y)(Φr)(y) ≥ (Φr)(x), ∀x, ∀a
Idea: Consider only solutions J = Φr
one variable per basis functionone constraint per state-action pair⇒ efficient constraint sampling
http://www.mit.edu/∼pucci – p. 13/29
![Page 31: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/31.jpg)
Approximate Linear Programming
maxr∑
x
c(x)(Φr)(x)
s.t. ga(x) + α∑
y
Pa(x, y)(Φr)(y) ≥ (Φr)(x), ∀x, ∀a
Idea: Consider only solutions J = Φrone variable per basis functionone constraint per state-action pair
⇒ efficient constraint sampling
http://www.mit.edu/∼pucci – p. 13/29
![Page 32: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/32.jpg)
Approximate Linear Programming
maxr∑
x
c(x)(Φr)(x)
s.t. ga(x) + α∑
y
Pa(x, y)(Φr)(y) ≥ (Φr)(x), ∀x, ∀a
Idea: Consider only solutions J = Φrone variable per basis functionone constraint per state-action pair⇒ efficient constraint sampling
http://www.mit.edu/∼pucci – p. 13/29
![Page 33: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/33.jpg)
Some Historyearly work
Schweitzer and Seidmann (1985)Trick and Zin (1993,1997)Gordon (1995)
analytical and computational toolMorrison and Kumar (1999)Paschalidis and Tsitsiklis (2000)Adelman (2002)
more extensive analysis and implementation in large problemsSchuurmans and Patrascu (2001)de Farias and Van Roy (2001,2002)Guestrin et al. (2002)Poupart et al. (2002)
http://www.mit.edu/∼pucci – p. 14/29
![Page 34: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/34.jpg)
Some Historyearly work
Schweitzer and Seidmann (1985)Trick and Zin (1993,1997)Gordon (1995)
analytical and computational toolMorrison and Kumar (1999)Paschalidis and Tsitsiklis (2000)Adelman (2002)
more extensive analysis and implementation in large problemsSchuurmans and Patrascu (2001)de Farias and Van Roy (2001,2002)Guestrin et al. (2002)Poupart et al. (2002)
http://www.mit.edu/∼pucci – p. 14/29
![Page 35: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/35.jpg)
Some Historyearly work
Schweitzer and Seidmann (1985)Trick and Zin (1993,1997)Gordon (1995)
analytical and computational toolMorrison and Kumar (1999)Paschalidis and Tsitsiklis (2000)Adelman (2002)
more extensive analysis and implementation in large problemsSchuurmans and Patrascu (2001)de Farias and Van Roy (2001,2002)Guestrin et al. (2002)Poupart et al. (2002)
http://www.mit.edu/∼pucci – p. 14/29
![Page 36: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/36.jpg)
OutlineMarkov decision processesApproximate Dynamic ProgrammingApproximate linear programmingPerformance and error analysisConstraint Sampling
http://www.mit.edu/∼pucci – p. 15/29
![Page 37: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/37.jpg)
Theory on Value Function ApproximationGoals
Understand what algorithms are doingFigure out which variations work and whenReduce trial and errorImprove performance
Quality of ultimate approximation limited by choice of ΦWill my algorithm A compute weights r that make good use ofmy basis functions Φ?“Competitive” bound
If Φr can come within ε of J∗,then algorithm A will compute r such that1. Φr is within O(ε) of J∗
2. the greedy policy u is O(ε)–optimal
http://www.mit.edu/∼pucci – p. 16/29
![Page 38: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/38.jpg)
Theory on Value Function ApproximationGoals
Understand what algorithms are doingFigure out which variations work and whenReduce trial and errorImprove performance
Quality of ultimate approximation limited by choice of Φ
Will my algorithm A compute weights r that make good use ofmy basis functions Φ?“Competitive” bound
If Φr can come within ε of J∗,then algorithm A will compute r such that1. Φr is within O(ε) of J∗
2. the greedy policy u is O(ε)–optimal
http://www.mit.edu/∼pucci – p. 16/29
![Page 39: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/39.jpg)
Theory on Value Function ApproximationGoals
Understand what algorithms are doingFigure out which variations work and whenReduce trial and errorImprove performance
Quality of ultimate approximation limited by choice of ΦWill my algorithm A compute weights r that make good use ofmy basis functions Φ?
“Competitive” boundIf Φr can come within ε of J∗,then algorithm A will compute r such that1. Φr is within O(ε) of J∗
2. the greedy policy u is O(ε)–optimal
http://www.mit.edu/∼pucci – p. 16/29
![Page 40: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/40.jpg)
Theory on Value Function ApproximationGoals
Understand what algorithms are doingFigure out which variations work and whenReduce trial and errorImprove performance
Quality of ultimate approximation limited by choice of ΦWill my algorithm A compute weights r that make good use ofmy basis functions Φ?“Competitive” bound
If Φr can come within ε of J∗,then algorithm A will compute r such that1. Φr is within O(ε) of J∗
2. the greedy policy u is O(ε)–optimal
http://www.mit.edu/∼pucci – p. 16/29
![Page 41: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/41.jpg)
Notation‖J‖∞ = maxx |J(x)|
weighted norms:
‖J‖1,ν =∑
x
ν(x)|J(x)|, ‖x‖∞,ν = maxx
ν(x)|J(x)|
http://www.mit.edu/∼pucci – p. 17/29
![Page 42: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/42.jpg)
Graphical Interpretation of Approximate LP
J*
J(1)
J(2)
TJ J>
Even with arbitrarily small ‖J∗ − Φr∗‖∞,we can have arbitrarily large ‖J ∗ − Φr‖ (or infeasibility!)
http://www.mit.edu/∼pucci – p. 18/29
![Page 43: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/43.jpg)
Graphical Interpretation of Approximate LP
J*
J = ΦrJ(1)
J(2)
TJ J>
Even with arbitrarily small ‖J∗ − Φr∗‖∞,we can have arbitrarily large ‖J ∗ − Φr‖ (or infeasibility!)
http://www.mit.edu/∼pucci – p. 18/29
![Page 44: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/44.jpg)
Graphical Interpretation of Approximate LP
J*
J = Φr
Φr~
Φr*
J(1)
J(2)
TJ J>
Even with arbitrarily small ‖J∗ − Φr∗‖∞,we can have arbitrarily large ‖J ∗ − Φr‖ (or infeasibility!)
http://www.mit.edu/∼pucci – p. 18/29
![Page 45: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/45.jpg)
Graphical Interpretation of Approximate LP
J*
J = Φr
Φr~
Φr*
J(1)
J(2)
TJ J>
Even with arbitrarily small ‖J∗ − Φr∗‖∞,we can have arbitrarily large ‖J ∗ − Φr‖ (or infeasibility!)
http://www.mit.edu/∼pucci – p. 18/29
![Page 46: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/46.jpg)
Error and performance boundsSimple bound: If Φv = e for some v,
‖J∗ − Φr‖1,c ≤2
1− α‖J∗ − Φr∗‖∞
Limitations:state-relevance weights?maximum norm to assess architecture
“Lyapunov function” V > 0:
αmaxaE[V (y)|x, a] ≤ βV (x)
Theorem: If Φv is a “Lyapunov function” for some v,
‖J∗ − Φr‖1,c ≤2cTΦv
1− β‖J∗ − Φr∗‖∞,1/Φv
http://www.mit.edu/∼pucci – p. 19/29
![Page 47: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/47.jpg)
Error and performance boundsSimple bound: If Φv = e for some v,
‖J∗ − Φr‖1,c ≤2
1− α‖J∗ − Φr∗‖∞
Limitations:state-relevance weights?maximum norm to assess architecture
“Lyapunov function” V > 0:
αmaxaE[V (y)|x, a] ≤ βV (x)
Theorem: If Φv is a “Lyapunov function” for some v,
‖J∗ − Φr‖1,c ≤2cTΦv
1− β‖J∗ − Φr∗‖∞,1/Φv
http://www.mit.edu/∼pucci – p. 19/29
![Page 48: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/48.jpg)
Error and performance boundsSimple bound: If Φv = e for some v,
‖J∗ − Φr‖1,c ≤2
1− α‖J∗ − Φr∗‖∞
Limitations:state-relevance weights?maximum norm to assess architecture
“Lyapunov function” V > 0:
αmaxaE[V (y)|x, a] ≤ βV (x)
Theorem: If Φv is a “Lyapunov function” for some v,
‖J∗ − Φr‖1,c ≤2cTΦv
1− β‖J∗ − Φr∗‖∞,1/Φv
http://www.mit.edu/∼pucci – p. 19/29
![Page 49: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/49.jpg)
Error and performance boundsSimple bound: If Φv = e for some v,
‖J∗ − Φr‖1,c ≤2
1− α‖J∗ − Φr∗‖∞
Limitations:state-relevance weights?maximum norm to assess architecture
“Lyapunov function” V > 0:
αmaxaE[V (y)|x, a] ≤ βV (x)
Theorem: If Φv is a “Lyapunov function” for some v,
‖J∗ − Φr‖1,c ≤2cTΦv
1− β‖J∗ − Φr∗‖∞,1/Φv
http://www.mit.edu/∼pucci – p. 19/29
![Page 50: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/50.jpg)
Error Bound InsightsError proportional to best in architecture
‖J∗ − Φr∗‖∞,1/V = maxx
|J∗(x)− (Φr∗)(x)|
V (x)
V (x) large in rarely visited states⇒ good scalingpropertiesFor multiclass queueing networks, error uniformlybounded on• size of the state space• dimension of the state space
Performance bound:
‖Ju − J∗‖1,πu≤
1
1− α‖J∗ − Φr‖1,πu
We have bound on ‖J∗ − Φr‖1,c
http://www.mit.edu/∼pucci – p. 20/29
![Page 51: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/51.jpg)
Error Bound InsightsError proportional to best in architecture
‖J∗ − Φr∗‖∞,1/V = maxx
|J∗(x)− (Φr∗)(x)|
V (x)
V (x) large in rarely visited states⇒ good scalingpropertiesFor multiclass queueing networks, error uniformlybounded on• size of the state space• dimension of the state space
Performance bound:
‖Ju − J∗‖1,πu≤
1
1− α‖J∗ − Φr‖1,πu
We have bound on ‖J∗ − Φr‖1,c
http://www.mit.edu/∼pucci – p. 20/29
![Page 52: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/52.jpg)
Error Bound InsightsError proportional to best in architecture
‖J∗ − Φr∗‖∞,1/V = maxx
|J∗(x)− (Φr∗)(x)|
V (x)
V (x) large in rarely visited states⇒ good scalingproperties
For multiclass queueing networks, error uniformlybounded on• size of the state space• dimension of the state space
Performance bound:
‖Ju − J∗‖1,πu≤
1
1− α‖J∗ − Φr‖1,πu
We have bound on ‖J∗ − Φr‖1,c
http://www.mit.edu/∼pucci – p. 20/29
![Page 53: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/53.jpg)
Error Bound InsightsError proportional to best in architecture
‖J∗ − Φr∗‖∞,1/V = maxx
|J∗(x)− (Φr∗)(x)|
V (x)
V (x) large in rarely visited states⇒ good scalingpropertiesFor multiclass queueing networks, error uniformlybounded on• size of the state space• dimension of the state space
Performance bound:
‖Ju − J∗‖1,πu≤
1
1− α‖J∗ − Φr‖1,πu
We have bound on ‖J∗ − Φr‖1,c
http://www.mit.edu/∼pucci – p. 20/29
![Page 54: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/54.jpg)
Error Bound InsightsError proportional to best in architecture
‖J∗ − Φr∗‖∞,1/V = maxx
|J∗(x)− (Φr∗)(x)|
V (x)
V (x) large in rarely visited states⇒ good scalingpropertiesFor multiclass queueing networks, error uniformlybounded on• size of the state space• dimension of the state space
Performance bound:
‖Ju − J∗‖1,πu≤
1
1− α‖J∗ − Φr‖1,πu
We have bound on ‖J∗ − Φr‖1,chttp://www.mit.edu/∼pucci – p. 20/29
![Page 55: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/55.jpg)
Example: 8-dimensional queueing network
Minimize total number of jobs in the system
Linear and quadratic basis functionsState-relevance weights with exponential decayAverage cost:
ALP 136.67LBFS 153.82FIFO 163.63LONGEST 168.66
http://www.mit.edu/∼pucci – p. 21/29
![Page 56: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/56.jpg)
Example: 8-dimensional queueing network
Minimize total number of jobs in the systemLinear and quadratic basis functionsState-relevance weights with exponential decay
Average cost:ALP 136.67LBFS 153.82FIFO 163.63LONGEST 168.66
http://www.mit.edu/∼pucci – p. 21/29
![Page 57: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/57.jpg)
Example: 8-dimensional queueing network
Minimize total number of jobs in the systemLinear and quadratic basis functionsState-relevance weights with exponential decayAverage cost:
ALP 136.67LBFS 153.82FIFO 163.63LONGEST 168.66
http://www.mit.edu/∼pucci – p. 21/29
![Page 58: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/58.jpg)
TetrisComparison against reported results
Algorithm Average Score Time
variation on TD (Bertsekas and Ioffe) 3500 many hours
variation on policy gradient (Kakade) 6000 days
ALP (Farias* and Van Roy) 5000 hours
* not me!
Remarks:3 minutes to solve the approximate LP, rest of the timespent on simulationsolution is very sensitive to c
http://www.mit.edu/∼pucci – p. 22/29
![Page 59: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/59.jpg)
TetrisComparison against reported results
Algorithm Average Score Time
variation on TD (Bertsekas and Ioffe) 3500 many hours
variation on policy gradient (Kakade) 6000 days
ALP (Farias* and Van Roy) 5000 hours
* not me!
Remarks:3 minutes to solve the approximate LP, rest of the timespent on simulationsolution is very sensitive to c
http://www.mit.edu/∼pucci – p. 22/29
![Page 60: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/60.jpg)
OutlineMarkov decision processesApproximate Dynamic ProgrammingApproximate linear programmingPerformance and error analysisConstraint Sampling
http://www.mit.edu/∼pucci – p. 23/29
![Page 61: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/61.jpg)
Constraint Sampling in the Approximate LPone constraint per state-action pair
many constraints in low-dimensional space⇒ redundancyProblem-specific approaches in the literature:
Grötschel and Holland (1991)Morrison and Kumar (1999)Guestrin et al. (2002)Schuurmans and Patrascu (2002)
Generic approach? Complexity bounds?
http://www.mit.edu/∼pucci – p. 24/29
![Page 62: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/62.jpg)
Constraint Sampling in the Approximate LPone constraint per state-action pairmany constraints in low-dimensional space⇒ redundancy
Problem-specific approaches in the literature:Grötschel and Holland (1991)Morrison and Kumar (1999)Guestrin et al. (2002)Schuurmans and Patrascu (2002)
Generic approach? Complexity bounds?
http://www.mit.edu/∼pucci – p. 24/29
![Page 63: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/63.jpg)
Constraint Sampling in the Approximate LPone constraint per state-action pairmany constraints in low-dimensional space⇒ redundancyProblem-specific approaches in the literature:
Grötschel and Holland (1991)Morrison and Kumar (1999)Guestrin et al. (2002)Schuurmans and Patrascu (2002)
Generic approach? Complexity bounds?
http://www.mit.edu/∼pucci – p. 24/29
![Page 64: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/64.jpg)
Constraint Sampling in the Approximate LPone constraint per state-action pairmany constraints in low-dimensional space⇒ redundancyProblem-specific approaches in the literature:
Grötschel and Holland (1991)Morrison and Kumar (1999)Guestrin et al. (2002)Schuurmans and Patrascu (2002)
Generic approach? Complexity bounds?
http://www.mit.edu/∼pucci – p. 24/29
![Page 65: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/65.jpg)
The Reduced LP
maxr∑
x
c(x)(Φr)(x)
s.t. ga(x) + α∑
y
Pa(x, y)(Φr)(y) ≥ (Φr)(x), ∀x, ∀a
N contains i.i.d. state-action pairsB is a bounding box
Theorem: With ideal sampling distribution, if
|N | = poly
(
p, |A|,1
1− α,1
ε, log
1
δ, θN ,V
)
then with probability at least 1− δ,‖J∗ − Φr‖1,c ≤ ‖J
∗ − Φr‖1,c + ε‖J∗‖1,c.
http://www.mit.edu/∼pucci – p. 25/29
![Page 66: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/66.jpg)
The Reduced LP
maxr∑
x
c(x)(Φr)(x)
s.t. ga(x) + α∑
y
Pa(x, y)(Φr)(y) ≥ (Phir)(x), ∀(x, a) ∈ N
r ∈ B
N contains i.i.d. state-action pairs
B is a bounding box
Theorem: With ideal sampling distribution, if
|N | = poly
(
p, |A|,1
1− α,1
ε, log
1
δ, θN ,V
)
then with probability at least 1− δ,‖J∗ − Φr‖1,c ≤ ‖J
∗ − Φr‖1,c + ε‖J∗‖1,c.
http://www.mit.edu/∼pucci – p. 25/29
![Page 67: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/67.jpg)
The Reduced LP
maxr∑
x
c(x)(Φr)(x)
s.t. ga(x) + α∑
y
Pa(x, y)(Φr)(y) ≥ (Phir)(x), ∀(x, a) ∈ N
r ∈ B
N contains i.i.d. state-action pairsB is a bounding box
Theorem: With ideal sampling distribution, if
|N | = poly
(
p, |A|,1
1− α,1
ε, log
1
δ, θN ,V
)
then with probability at least 1− δ,‖J∗ − Φr‖1,c ≤ ‖J
∗ − Φr‖1,c + ε‖J∗‖1,c.
http://www.mit.edu/∼pucci – p. 25/29
![Page 68: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/68.jpg)
The Reduced LP
maxr∑
x
c(x)(Φr)(x)
s.t. ga(x) + α∑
y
Pa(x, y)(Φr)(y) ≥ (Phir)(x), ∀(x, a) ∈ N
r ∈ B
N contains i.i.d. state-action pairsB is a bounding box
Theorem: With ideal sampling distribution, if
|N | = poly
(
p, |A|,1
1− α,1
ε, log
1
δ, θN ,V
)
then with probability at least 1− δ,‖J∗ − Φr‖1,c ≤ ‖J
∗ − Φr‖1,c + ε‖J∗‖1,c.http://www.mit.edu/∼pucci – p. 25/29
![Page 69: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/69.jpg)
Remarks on Constraint SamplingSample complexity is
polynomial in number of basis functionsindependent of dimensions of the state space
linear on maximum number of actions per state |A|but can do with log|A|
“ideal” distribution“Bounding set” N
http://www.mit.edu/∼pucci – p. 26/29
![Page 70: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/70.jpg)
Remarks on Constraint SamplingSample complexity is
polynomial in number of basis functionsindependent of dimensions of the state spacelinear on maximum number of actions per state |A|
but can do with log|A|
“ideal” distribution“Bounding set” N
http://www.mit.edu/∼pucci – p. 26/29
![Page 71: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/71.jpg)
Remarks on Constraint SamplingSample complexity is
polynomial in number of basis functionsindependent of dimensions of the state spacelinear on maximum number of actions per state |A|but can do with log|A|
“ideal” distribution“Bounding set” N
http://www.mit.edu/∼pucci – p. 26/29
![Page 72: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/72.jpg)
Remarks on Constraint SamplingSample complexity is
polynomial in number of basis functionsindependent of dimensions of the state spacelinear on maximum number of actions per state |A|but can do with log|A|
“ideal” distribution“Bounding set” N
http://www.mit.edu/∼pucci – p. 26/29
![Page 73: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/73.jpg)
Intuition for constraint sampling
Air + bi ≥ 0, i ∈ I, r ∈ <p
well-approximated with poly(p) constraints
constraint characterized by vector [Ai bi] ∈ <p+1
for feasibility, assume w.l.g. ‖[Ai bi]‖ = 1
http://www.mit.edu/∼pucci – p. 27/29
![Page 74: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/74.jpg)
Intuition for constraint sampling
Air + bi ≥ 0, i ∈ I, r ∈ <p
well-approximated with poly(p) constraints
constraint characterized by vector [Ai bi] ∈ <p+1
for feasibility, assume w.l.g. ‖[Ai bi]‖ = 1
http://www.mit.edu/∼pucci – p. 27/29
![Page 75: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/75.jpg)
Intuition for constraint sampling
Air + bi ≥ 0, i ∈ I, r ∈ <p
well-approximated with poly(p) constraints
constraint characterized by vector [Ai bi] ∈ <p+1
for feasibility, assume w.l.g. ‖[Ai bi]‖ = 1
http://www.mit.edu/∼pucci – p. 27/29
![Page 76: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/76.jpg)
Intuition for constraint sampling
Air + bi ≥ 0, i ∈ I, r ∈ <p
well-approximated with poly(p) constraints
constraint characterized by vector [Ai bi] ∈ <p+1
for feasibility, assume w.l.g. ‖[Ai bi]‖ = 1
http://www.mit.edu/∼pucci – p. 27/29
![Page 77: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/77.jpg)
Intuition for constraint sampling
Air + bi ≥ 0, i ∈ I, r ∈ <p
well-approximated with poly(p) constraints
constraint characterized by vector [Ai bi] ∈ <p+1
for feasibility, assume w.l.g. ‖[Ai bi]‖ = 1
http://www.mit.edu/∼pucci – p. 27/29
![Page 78: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/78.jpg)
Intuition for constraint sampling
Air + bi ≥ 0, i ∈ I, r ∈ <p
well-approximated with poly(p) constraints
constraint characterized by vector [Ai bi] ∈ <p+1
for feasibility, assume w.l.g. ‖[Ai bi]‖ = 1
http://www.mit.edu/∼pucci – p. 27/29
![Page 79: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/79.jpg)
Intuition for constraint sampling
Air + bi ≥ 0, i ∈ I, r ∈ <p
well-approximated with poly(p) constraints
constraint characterized by vector [Ai bi] ∈ <p+1
for feasibility, assume w.l.g. ‖[Ai bi]‖ = 1
http://www.mit.edu/∼pucci – p. 27/29
![Page 80: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/80.jpg)
Intuition for constraint sampling
Air + bi ≥ 0, i ∈ I, r ∈ <p
well-approximated with poly(p) constraints
constraint characterized by vector [Ai bi] ∈ <p+1
for feasibility, assume w.l.g. ‖[Ai bi]‖ = 1
http://www.mit.edu/∼pucci – p. 27/29
![Page 81: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/81.jpg)
Intuition for constraint sampling
Air + bi ≥ 0, i ∈ I, r ∈ <p
well-approximated with poly(p) constraints
constraint characterized by vector [Ai bi] ∈ <p+1
for feasibility, assume w.l.g. ‖[Ai bi]‖ = 1
http://www.mit.edu/∼pucci – p. 27/29
![Page 82: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/82.jpg)
In Short...Approximate dynamic programming: central ideas and issues
Approximate linear programming: analysis, performance anderror bounds
first approximation error bounds for arbitrary basisfunctions and decisionsuniform bounds for multiclass queueing networks
Forthcoming:analysis of case α ↑ 1• Lyapunov function argument breaks down• state-relevance weights c disappearrelaxation of Lyapunov function argumentnew variant of approximate LPimproved error bounds
http://www.mit.edu/∼pucci – p. 28/29
![Page 83: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/83.jpg)
In Short...Approximate dynamic programming: central ideas and issuesApproximate linear programming: analysis, performance anderror bounds
first approximation error bounds for arbitrary basisfunctions and decisionsuniform bounds for multiclass queueing networks
Forthcoming:analysis of case α ↑ 1• Lyapunov function argument breaks down• state-relevance weights c disappearrelaxation of Lyapunov function argumentnew variant of approximate LPimproved error bounds
http://www.mit.edu/∼pucci – p. 28/29
![Page 84: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/84.jpg)
In Short...Approximate dynamic programming: central ideas and issuesApproximate linear programming: analysis, performance anderror bounds
first approximation error bounds for arbitrary basisfunctions and decisionsuniform bounds for multiclass queueing networks
Forthcoming:analysis of case α ↑ 1• Lyapunov function argument breaks down• state-relevance weights c disappearrelaxation of Lyapunov function argumentnew variant of approximate LPimproved error bounds
http://www.mit.edu/∼pucci – p. 28/29
![Page 85: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/85.jpg)
In Short...Approximate dynamic programming: central ideas and issuesApproximate linear programming: analysis, performance anderror bounds
first approximation error bounds for arbitrary basisfunctions and decisionsuniform bounds for multiclass queueing networks
Forthcoming:analysis of case α ↑ 1• Lyapunov function argument breaks down• state-relevance weights c disappearrelaxation of Lyapunov function argumentnew variant of approximate LPimproved error bounds
http://www.mit.edu/∼pucci – p. 28/29
![Page 86: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/86.jpg)
Future WorkChoice of state-relevance weights c
Address norm discrepancy between error bound andperformance bound
Adaptive selection of basis functionsOnline versions of the algorithm
Robustness to model uncertaintyIncremental solution of the LPLearning the Q function instead of the value function
Issues on constraint samplingSpecific applications: how far can we push guarantees?
http://www.mit.edu/∼pucci – p. 29/29
![Page 87: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/87.jpg)
Future WorkChoice of state-relevance weights c
Address norm discrepancy between error bound andperformance bound
Adaptive selection of basis functions
Online versions of the algorithmRobustness to model uncertaintyIncremental solution of the LPLearning the Q function instead of the value function
Issues on constraint samplingSpecific applications: how far can we push guarantees?
http://www.mit.edu/∼pucci – p. 29/29
![Page 88: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/88.jpg)
Future WorkChoice of state-relevance weights c
Address norm discrepancy between error bound andperformance bound
Adaptive selection of basis functionsOnline versions of the algorithm
Robustness to model uncertaintyIncremental solution of the LPLearning the Q function instead of the value function
Issues on constraint samplingSpecific applications: how far can we push guarantees?
http://www.mit.edu/∼pucci – p. 29/29
![Page 89: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/89.jpg)
Future WorkChoice of state-relevance weights c
Address norm discrepancy between error bound andperformance bound
Adaptive selection of basis functionsOnline versions of the algorithm
Robustness to model uncertaintyIncremental solution of the LPLearning the Q function instead of the value function
Issues on constraint sampling
Specific applications: how far can we push guarantees?
http://www.mit.edu/∼pucci – p. 29/29
![Page 90: The Linear Programming Approach to Approximate Dynamic ...donour/prof/conference_2003/daniela-chicago.pdfThe Linear Programming Approach to Approximate Dynamic Programming Daniela](https://reader034.fdocuments.in/reader034/viewer/2022050511/5f9b74ff50f8ef30955b7928/html5/thumbnails/90.jpg)
Future WorkChoice of state-relevance weights c
Address norm discrepancy between error bound andperformance bound
Adaptive selection of basis functionsOnline versions of the algorithm
Robustness to model uncertaintyIncremental solution of the LPLearning the Q function instead of the value function
Issues on constraint samplingSpecific applications: how far can we push guarantees?
http://www.mit.edu/∼pucci – p. 29/29