A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions...

A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING

PROBLEM SET-UPProblem is arrayed as a set of decisions made over time.System has a discrete stateEach decision results in some reward or cost, and results in the system being moved to another state.Usually has a finite number of transitions.Transitions can be probabilistic, as can the rewards.Solution is a decision strategy that maximizes summed reward (minimizes cost)

NotationN = finite planning horizonSn (x) = cost of optimally operating from n to N given state x at time n.dn*(x) is the optimal policy at stage n given state x at time n.x(dn) is the state resulting from deciding d at stage n.c(dn ) is the cost of taking decision dn

EXAMPLEYou have moved to Singapore, and you need to operate a car for 3 yrs.

You plan to sell the car when you leave

Your QOL is not affected by your wheels

Cost/resale of cars and operating costs are below

0 1 2 3

sale price 1000 800 450 150

op cost 200 400 600

MAPPING TO THE NOTATION

State: Age of you carStage: Years you have been in S-porePolicy: Car’s age you buy at the END of the year

COST EXAMPLEyou have a 2yr old caryou operate for the year ($600)you sell your 3 yr old car (-$150)you buy a new (to you) 1 yr old used car ($800)TOTAL: $1250

finish

0 1 2 3

start 0 400 200

1 950 750 400

2 1450 1250 900 600

car age "cost" end of yr 3

0 -1000

1 -800

2 -450

3 -150

CONTINUED COST EXAMPLEIt’s beginning yr 2, and you possess a 2 yr old carYou can....operate the car (600 + S3(3yr old car))

operate the car, sell it, buy new car (600 -150 + 1000 + S3(new))

operate the car, sell it, buy 1yr old car (600 -150 + 800 + S3(1 yr old car))

...

1 2 3"cost" end of

yr 3

0 1200 -200 -600 -1000

1 1550 350 -50 -800

2 1700 850 450 -450

3 -150

1450

1250

900

1 2 3"cost" end of

yr 3

0 1200 -200 -600 -1000

1 1550 350 -50 -800

2 1700 850 450 -450

3 -150

BELLMAN’S EQUATION

))(()((min)( 1 dxSdcxS ndn

Sometimes its easy to get your name on something!

EXEMPLARA specialized tool is available during the period 9am, ..., 3pmEach hour, a bid for the asset is made according to the table belowThe asset is busy for 3 hr. if the bid is accepted

9 10 11 12 1 2 3

100 150 160 50 175 40 10

0 0 0 0

100 150 160

9 11 12 1102

end

0 0

50 175 1040

0

0 0 0 0

100 150 160

9 11 12 1102

end

0 0

50 175 1040

0

10

0 0 0 0

100 150 160

9 11 12 1102

end

0 0

50 175 1040

0

40 10

0 0 0 0

100 150 160

9 11 12 1102

end

0 0

50 175 1040

0

40 10175

0 0 0 0

100 150 160

9 11 12 1102

end

0 0

50 175 1040

0

40 10175175

0 0 0 0

100 150 160

9 11 12 1102

end

0 0

50 175 1040

0

40 10175175175

0 0 0 0

100 150 160

9 11 12 1102

end

0 0

50 175 1040

0

40 10175175175325

0 0 0 0

100 150 160

9 11 12 1102

end

0 0

50 175 1040

0

40 10175175175325325

Note 1: Once the diagram is drawn, the problem can be solved by a shortest(longest) path algorithm

Note 2: Dynamic Programming = Shortest Path

PROBABILISTIC TRANSITIONS

))(()((min)( 1 dxESdEcxES ndn

1. c(d) is a random variable2. x(d) is random3. the “trial” takes place after the decision

EXEMPLAR (Probabilistic)An “asset” is available during the period 8pm, 9pm, ..., 3amEach hour, a bid for the asset is made according to the discrete probability density belowThe asset is busy for 3 hr. if the bid is accepted

bid ($1) 3 6 9probability 0.1 0.6 0.9

MANY APPROACHES TO FORMULATIONN = 4amSn (x) = profit of optimally operating from n to N given state x at time n.dn*(x) is the optimal policy at stage n given state x at time n (ACCEPT, REJECT)c(dn ) is the profit of taking decision dn

x(dn) is the proposed bid (3,6,9) or the number of hours left in the remaining engagement (1hr, 2hr)

RECURSION

s

)9(3.0)6(6.0)3(1.0

)2(3max)3(

121212

1211 SSS

hrSS

time

hours beforeasset is available again

See DP Example.xls

UNLOCKING THE JARGONx(d) can be governed by a Markov Chain

a different Pi,j matrix for each decision d

Result is a Markov Decision Process

)()()(min))((

)()()(min))((

,

1,

jESdPiEciSE

jESdPiEciSE

j jid

j njidn

A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions...

Documents

Transcript of A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions...