A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions...

26
A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING

Transcript of A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions...

Page 1: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.

A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING

Page 2: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.

PROBLEM SET-UPProblem is arrayed as a set of decisions made over time.System has a discrete stateEach decision results in some reward or cost, and results in the system being moved to another state.Usually has a finite number of transitions.Transitions can be probabilistic, as can the rewards.Solution is a decision strategy that maximizes summed reward (minimizes cost)

Page 3: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.

NotationN = finite planning horizonSn (x) = cost of optimally operating from n to N given state x at time n.dn*(x) is the optimal policy at stage n given state x at time n.x(dn) is the state resulting from deciding d at stage n.c(dn ) is the cost of taking decision dn

Page 4: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.

EXAMPLEYou have moved to Singapore, and you need to operate a car for 3 yrs.

You plan to sell the car when you leave

Your QOL is not affected by your wheels

Cost/resale of cars and operating costs are below

0 1 2 3

sale price 1000 800 450 150

op cost 200 400 600

Page 5: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.

MAPPING TO THE NOTATION

State: Age of you carStage: Years you have been in S-porePolicy: Car’s age you buy at the END of the year

Page 6: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.

COST EXAMPLEyou have a 2yr old caryou operate for the year ($600)you sell your 3 yr old car (-$150)you buy a new (to you) 1 yr old used car ($800)TOTAL: $1250

Page 7: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.

finish

0 1 2 3

start 0 400 200

1 950 750 400

2 1450 1250 900 600

Page 8: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.

car age "cost" end of yr 3

0 -1000

1 -800

2 -450

3 -150

Page 9: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.

CONTINUED COST EXAMPLEIt’s beginning yr 2, and you possess a 2 yr old carYou can....operate the car (600 + S3(3yr old car))

operate the car, sell it, buy new car (600 -150 + 1000 + S3(new))

operate the car, sell it, buy 1yr old car (600 -150 + 800 + S3(1 yr old car))

...

Page 10: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.

  1 2 3"cost" end of

yr 3

0 1200 -200 -600 -1000

1 1550 350 -50 -800

2 1700 850 450 -450

3       -150

1450

1250

900

Page 11: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.

  1 2 3"cost" end of

yr 3

0 1200 -200 -600 -1000

1 1550 350 -50 -800

2 1700 850 450 -450

3       -150

Page 12: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.

BELLMAN’S EQUATION

))(()((min)( 1 dxSdcxS ndn

Sometimes its easy to get your name on something!

Page 13: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.

EXEMPLARA specialized tool is available during the period 9am, ..., 3pmEach hour, a bid for the asset is made according to the table belowThe asset is busy for 3 hr. if the bid is accepted

9 10 11 12 1 2 3

100 150 160 50 175 40 10

Page 14: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.

0 0 0 0

100 150 160

9 11 12 1102

end

0 0

50 175 1040

0

Page 15: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.

0 0 0 0

100 150 160

9 11 12 1102

end

0 0

50 175 1040

0

10

Page 16: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.

0 0 0 0

100 150 160

9 11 12 1102

end

0 0

50 175 1040

0

40 10

Page 17: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.

0 0 0 0

100 150 160

9 11 12 1102

end

0 0

50 175 1040

0

40 10175

Page 18: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.

0 0 0 0

100 150 160

9 11 12 1102

end

0 0

50 175 1040

0

40 10175175

Page 19: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.

0 0 0 0

100 150 160

9 11 12 1102

end

0 0

50 175 1040

0

40 10175175175

Page 20: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.

0 0 0 0

100 150 160

9 11 12 1102

end

0 0

50 175 1040

0

40 10175175175325

Page 21: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.

0 0 0 0

100 150 160

9 11 12 1102

end

0 0

50 175 1040

0

40 10175175175325325

Note 1: Once the diagram is drawn, the problem can be solved by a shortest(longest) path algorithm

Note 2: Dynamic Programming = Shortest Path

Page 22: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.

PROBABILISTIC TRANSITIONS

))(()((min)( 1 dxESdEcxES ndn

1. c(d) is a random variable2. x(d) is random3. the “trial” takes place after the decision

Page 23: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.

EXEMPLAR (Probabilistic)An “asset” is available during the period 8pm, 9pm, ..., 3amEach hour, a bid for the asset is made according to the discrete probability density belowThe asset is busy for 3 hr. if the bid is accepted

bid ($1) 3 6 9probability 0.1 0.6 0.9

Page 24: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.

MANY APPROACHES TO FORMULATIONN = 4amSn (x) = profit of optimally operating from n to N given state x at time n.dn*(x) is the optimal policy at stage n given state x at time n (ACCEPT, REJECT)c(dn ) is the profit of taking decision dn

x(dn) is the proposed bid (3,6,9) or the number of hours left in the remaining engagement (1hr, 2hr)

Page 25: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.

RECURSION

s

)9(3.0)6(6.0)3(1.0

)2(3max)3(

121212

1211 SSS

hrSS

time

hours beforeasset is available again

See DP Example.xls

Page 26: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.

UNLOCKING THE JARGONx(d) can be governed by a Markov Chain

a different Pi,j matrix for each decision d

Result is a Markov Decision Process

)()()(min))((

)()()(min))((

,

1,

jESdPiEciSE

jESdPiEciSE

j jid

j njidn