Dynamic Programming
description
Transcript of Dynamic Programming
Dynamic Programming
Study Guide for ES205
Yu-Chi HoJonathan T. LeeJan. 11, 2001
2
Outline Sample Problem General Formulation Linear-Quadratic Problem General Problems
3
Path-Cost Problem Find the path with minimal cost
N 1 2 3 4
1
5
3
3
65
12
23
6
18
63
19
82
60
48
3
5
4
Principle of Optimality “An optimal policy has the property
that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.”
N
5
Path-Cost Problem (cont.)
66
1 2 3 4
1
5
3
3
65
12
23
6
18
63
19
82
60
48
3
5
5
6
7
N
6
Path-Cost Problem (cont.)
1 2 3 4
1
5
3
3
65
12
23
6
18
63
19
82
60
48
3
5
7
12
7
15
N
7
Path-Cost Problem (cont.)
1 2 3 4
1
5
3
3
65
12
23
6
18
63
19
82
60
48
3
515
9
11
15
N
8
Formulation for Cost-Path Pb.
N
position of
funciton a ascost Terminal 00 NxJ
position offunction
a as go) left to step 1(with
1- stageat go-to-Cost 101 NNxJ
9
Formulation (cont.)
More generally,
1segmentpath ofcost min 0
1path" of "choice
0
iNxJ
iNxJ
i
i
NxJ
NxJ00
path" of "choice
01
segmentpath ofcost min
1
N
10
General Formulation Multistage optimization problem:
The cost-to-go
with initial condition
1
01,...,0
,,minN
iNuu
iiuixLNxJ
1,,min 0
1
0
iNxJiNiNuiNxL
iNxJ
iiNu
i
N
NxNxJ 00
11
Multistage or Optimal Control Problem Can be approached as static optimization
problem with specialized equality (staircase) constraints
See study guides titled “Dynamic Systems”
These two equivalent ways will be made clear below in the solution of a specific class of problems
12
Linear-Quadratic Problem
subject to linear system dynamics
given the initial state x(0)where x(i) is the state variables at time iu(i) is the control variable at time ia(i) and b(i) are the cost factor at time i
1
0
222
2
1
2
1
2
1min
N
iiu
iuibixiaNxNaJ
iuigixifix 1
N
13
LQ Problem (cont.)
N
NxNaNxJ 200 2
1
NxJ
NuNb
NxNa
NxJNu
00
2
2
1
01 11
2
1
112
1
min1
14
LQ Problem (cont.) Substitute
into
Set
We get
1111 NuNgNxNfNx
101 NxJ
N
0
1
101
Nu
NxJ
11
1111
2
NgNaNb
NgNxNfNaNu
15
LQ Problem (cont.) With some work, we have
LetThen, we have
N
1
11
Nb
NxNaNgNu
NxNaN
1
11
Nb
NNgNu
16
LQ Problem (cont.) With NxNNxJ
2
100
NxN
NuNb
NxNa
NxJNu
2
1
112
1
112
1
min1 2
2
1
01
1
11
Nb
NNgNu
N
17
LQ Problem (cont.) Substitute the optimum u(N-1), then
we have
Define
11112
110
1 NxNxNaNfNxNaNxJ
1111 NxNaNfNN
112
101 NxNNxJ
N
18
LQ Problem (cont.)
112
1
222
1
222
1
min2 2
2
2
02
NxN
NuNb
NxNa
NxJNu
2
122
Nb
NNgNu
N
19
LQ Problem (cont.) By induction, we have the optimal
solution to be
where
with boundary condition
ibiig
iu1
ixiaifii 1
NxNaN
N
20
General Problems Stochastic problems Combinatorial problems Variable termination time Constraints in the problem
N
21
Stochastic Problem
The cost-to-go
with initial condition
N
1
01,...,0
,,,,minN
iNuu
iiuixLNxEJ
1,,,min 0
1
0
iNxJiNiNuiNxLE
iNxJ
iiNu
i
,00 NxENxJ
22
Combinatorial Problem
The cost-to-go
with initial condition
N
N
ii
NxxixL
1,...,1min
1min 0
1
0
iNxJiNxL
iNxJ
iiNiNx
i
NxLNxJ N00
23
References:• Bellman, R., Dynamic Programming, Princeton
University Press, 1957.• Bryson, Jr., A. E. and Y.-C. Ho, Applied Optimal
Control: Optimization, Estimation, and Control, Taylor & Francis, 1975.
• Dreyfus, S. E. and A. M. Law, The Art and Theory of Dynamic Programming, Academic Press, 1977.
• Ho, Y.-C., Lecture Notes, Harvard University, 1997.
24
References:• National Institute of Standards and Technology,
Dictionary of Algorithms, Data Structures, and Problems, http://hissa.nist.gov/dads/HTML/principle.html
• Ortega, A. and K. Ramchandran, “Rate-Distortion Methods for Image and Video Compression: An Overview,” IEEE Signal Processing Magazine, Nov. 1998. http://sipi.usc.edu/~ortega/RD_Examples/boxDP.html