Chapter 19Solutions Ch19

8/10/2019 Chapter 19Solutions Ch19

1/26

19-1

CHAPTER 19: MARKOV DECISION PROCESSES

19.2-1.

Bank One, one of the major credit card issuers in the United States has developed theportfolio control and optimization (PORTICO) system to manage APR and credit-line

changes of its card holders. Customers prefer low APR and high credit lines, which canreduce the bank's profitability and increase the risk. Consequently, the bank faces theneed to find a balance between revenue growth and risk. PORTICO formulates theproblem as a Markov decision process. The state variables are chosen in a way to satisfyMarkovian assumption as closely as possible while keeping the dimension of the statespace at a tractable level. The resulting variables are , where corresponds to theB C Bcredit line and APR level and represents the behavior variables. The transitionCprobabilities are estimated from the available data. The objective is to maximize theexpected net present value of the cash flows over a 36-month horizon. The dynamicprogramming equation for the decision periods of the problem is

Z B C >"+EBC 4max , !" fwhere denotes the immediate net cash flow and is the discount factor. The


2/26

19-2

19.2-2.

(a) Let the states be the number of customers at the facility. There are two3 ! " #possible actions when the facility has one or two customers. Let decision 1 be to use theslow configuration and decision 2 be to use the fast configuration. Also let denote theG34expected net immediate cost of using decision in state . Then,4 3

G G $ &! #("" #"$

& G G * &! $""# ##

%&

G $!" G *!#

(b) In state , the configuration chosen does not affect the transition probabilities, so it is!best to choose the slow configuration when there are no customers in line. Consequently,the number of stationary policies is four.

3 . V . V . V . V " " " # ## " # " #

3 " 3 # 3 $ 3 %

Policy Transition Matrix Expected Average Cost

V G $ #( #(

!

!

V G $ #( $"

!

!

" " ! " #

" "# #$ " "

"! # &$ #& &

# # ! "

" "# #$ " "

"! # &% "& &

1 1 1

1 1 1#

$ $ ! " #

" "# #

# " "& # "!$ #& &

% % ! " #

" "# ## " "& # "!

% "& &

V G $ $" #(

!

!

V G $ $" $"

!

!

1 1 1

1 1 1

(c)

Policy Average Cost

1 1 1! " #

" "

# #

$ $

%

V !$"!$ !&"(# !"(#% G "('*V !$#%$ !&%!& !"$&" G "()"V !%!') !&!)& !!)%( G "')$V !%"' !&"* !!'& G "')(%

G V# #is the minimum, so the optimal policy is , i.e., to use slow configuration when nocustomer or only one customer is present and fast configuration when there are twocustomers.


3/26

19-3

19.2-3.

(a) Let the states represent whether the student's car is dented, , or not, .3 " 3 !

Decision Action State Immediate Cost

Park on street in one space

Park on street in two spaces

" ! G !# ! G %&

!"

!#

Park in lotHave it repaired

Drive dented

$ ! G &% " G &!& " G *

!$

"%

"&

(b) Assuming the student's car has no dent initially, once she decides to park in lot, state" "will never be entered. In that case, the decision chosen in state does not affect theexpected average cost. Hence, it is enough to consider five stationary deterministicpolicies.

3 . V . V . V . V . V ! " " # # $" % & % &

3 " 3 # 3 $ 3 % 3 &


V G ! &!!* !"" !

V G ! *!* !"! "

V G %& &!!*) !!#

" !

V !*

" " "

# # "

$ $ "

%

1 1

1 1

1 1

0

0

0

) !!#! "

G %& *

V G &" !

% "

& &

1 1

1

0

0 (c)

Policy Average Cost

(if initially not dented)

1 1! "

"

#

$

%

&

V !*!* !!*" %&&V ! " *V !*) !!# &%"V ! " *V " ! &

The policy has the minimum cost, so it is optimal to park on the street in one space ifV"not dented and to have it repaired if dented.


4/26

19-4

19.2-4.

(a) Let states and denote the good and the bad mood respectively. The decision in! "each state is between providing refreshments or not.


Provide refreshments

Not provide refreshmentsProvide r

" ! G "%

# ! G !"

!"

!#efreshments

Not provide refreshments

" G "%# " G (&

""

"#

(b) There are four possible stationary policies.

3 . V . V . V . V ! " " # #" " # " #

3 " 3 # 3 $ 3 %


V G "% "%!)(& !"#&

!)(& !"#&

V G "% (&!)(& !"#&!"#& !)(&

V !"#

" " ! "

# # ! "

$

1 1

1 1

& !)(&!)(& !"#&

G "%

V G (&!"#& !)(&!"#& !)(&

$ "

% % "

1

1 (c)

Policy Average Cost

1 1! "

" "

# #

$ $

%

V !)(& !"#& G "%V !& !& G %%&V !& !& G (V% !"#& !)(& G '&'#&

The optimal policy is , i.e., to provide refreshments only if the group begins the nightV$in a bad mood.


5/26

19-5

19.2-5.

(a) Let state denote point over, two serves to go on next point and state denote one! "serve left. The decision in each state is to attempt an ace or a lob.


Attempt ace

Attempt lob

" ! G " "

# ! G " "

!"$ # " ") $ $ )

!#( " # () $ $ #%

Attempt ace

Attempt lob

" " G " " "

# " G " " "

""$ # " & ") $ $ ) #

"#( " # " &) $ $ ) "#

(b) There are four possible stationary deterministic policies.

3 . V . V . V . V ! " " # #" " # " #

3 " 3 # 3 $ 3 %


V G ") "#$) &)

" !

V G ") &"#$) &)

" !

V G (() ")

" !

" " ! "

# # ! "

$ $

1 1

1 1

#% "#

V G (#% &"#() ")

" !

1 1

1 1

! "

% % ! " (c)

Policy Average Cost

1 1! "

" "

# #

$ $

%

V !'"& !$)& G !#(!V !'"& !$)& G !#$(V !))* !""" G !$"&V% !))* !""" G !$!'

The optimal policy is , i.e., to attempt lob in state and ace in state .V ! "$


6/26

19-6

19.2-6.

(a) Let states represent the state of the market, , and3 ! " # "$ !!! "% !!! "& !!!respectively. The decision is between two funds, namely the Go-Go Fund and the Go-Slow Mutual Fund. All the costs are expressed in thousand dollars.


Invest in the Go-GoInvest in the Go-Slow

" ! G !%#& !#'! ### ! G !%"!"!# ! !##& *

" " G !$#& !$'! "! " G !$"! !$#& %&

Invest in the Go-Go

Invest in the Go-Slow""

"#

Invest in the Go-Go

Invest in the Go-Slow

" # G !"'! !%#& "'# # G !"#& !%"! '&

#"

##

(b) There are eight possible stationary policies.

3 . V . V . V . V . V . V . V . V ! " " " " # # # #" " " # # # " "

3 " 3 # 3 $ 3 % 3 & 3 ' 3 ( 3 )

## " # # " " # " #

All 's have the same transition matrix: .V!% !% !#!$ !% !$!" !% !&

3

Policy Expected Average Cost

V G ## "!& "'V G ## "!& '&V G ## %& '&V G ## %& "'

V G *

" " ! " #

# # ! " #

$ $ ! " #

% % ! " #

& &

1 1 1

1 1 1

1 1 1

1 1 1

1 1 11 1 1

1 1 1

1 1 1

! " #

' ' ! " #

( ( ! " #

) ) ! " #

%& "'V G * "!& '&V G * "!& "'V G * %& '&

(c) 1 !#&( !% !$%$

Policy Average Cost

V %$("V ('#*V #*

V "*("V "$("V %#)'V "!#*V "))'

"

#

$

%&

'

(

)

The optimal policy is , i.e. to invest in the Go-Go Fund in states and , in the Go-V ! "&Slow Fund in state .#


7/26

19-7

19.2-7.

(a) Let states and represent whether the machine is broken down or is running! "respectively. The decision is between Buck and Bill.


Buck

BillBuck

Bill

" ! G !

# ! G !" " G "#!!# " G "#!!

!"

!#

""

"#

(b) There are four possible stationary deterministic policies.

3 . V . V . V . V ! " " # #" " # " #

3 " 3 # 3 $ 3 %


V G "#!!!% !'

!' !%

V G "#!!!% !'!% !'

V G "#!!!& !&!' !%

V

" " "

# # "

$ $ "

1

1

1

% % " !& !&!% !' G "#!!1(c)

Policy Average Cost

1 1! "

" "

# #

$ $

%

V !& !& G '!!V !% !' G (#!V !&%& !%&& G &%'V% !%%% !&&' G ''(#

The largest expected average profit is given by .V#

19.2-8.

(a) Let the states be the number of items in inventory at the beginning of the period andthe decision be the number of items ordered. To conform to the software package, oneneeds to relabel the decisions as respectively. The cost matrix is:"#$

- " # $! %!$ &'$ #%" % "* # %

35

Let denote the policy to order items when the inventory level is initially and not toV # !$order when the inventory level is initially either or . In other words, and! " . V $! $. V . V "" $ # $ .


8/26

19-8

T V %* $* #*"$ "$ "$#$ "$ !"$ "$ "$

$

1

Expected average cost: $ /period%*G $*G #*G ""'* "#)*!$ "" #"

(b) There are stationary policies, since one can order or items in each state.$ #( ! " #$

However, only six of these are feasible. The remaining policies are infeasible and the#"decision at least in one of the states leads to over capacity.

3 . V . V . V . V . V . V ! " # $ " # $" " " " # # ## " " " " " "

3 " 3 # 3 $ 3 % 3 & 3 '

19.3-1.

(a) minimize $C *C $C *C #)C $%C!" !# "" "# #" ##

subject to C C C C C C "!" !# "" "# #" ##

C C C C C C !!" !# !" !# "" "#" " $ ## # "! &

C C C C C C C C !"" "# !" !# "" "# #" ##" " " " $ %# # # # & &

C C C C C C !#" ## "" "# #" ### " # "

"! "! & & for andC ! 3 ! " # 5 " #35

(b) Using the simplex method, we find andC !$#%$# C !&%!&% C !"$&"%!" "" ##the remaining 's are zero. Hence, the optimal policy uses decision in states and ,C " ! "35decision in state .# #

19.3-2.

(a) minimize %&C &C &!C *C!# !$ "% "&

subject to C C C C C "!" !# !$ "% "&

C C C C C C C !!" !# !$ !" !# !$ "%* %*

"! &! C C C C C !"% "& !" !# "&

" ""! &!

C C C C C !!" !# !$ "% "&

(b) Using the simplex method, all 's turn out to be zero except that andC C !*!*!*35 !"C !!*!*" " ! % ""% , so the policy that uses decision in state and decision in state isoptimal.


9/26


10/26

19-10

19.3-6.

(a) minimize "#!!C "#!!C"" "#

subject to C C C C "!" !# "" "#

C C !%C !&C !'C !%C !!" !# !" !# "" "# C C !'C !&C !%C !'C !"" "# !" !# "" "# for andC ! 3 ! " 5 " #35

(b) Using the simplex method, we find , so theC !% C !' C C !!" "# !# ""optimal policy is to use decision (Buck) in state and decision (Bill) in state ." ! # "

19.3-7.

(a) minimize %! &'$ $!" !# !$ "" "# #"C C #%C %C "*C %C

subject to C C C C C C "!" !# !$ "" "# #"

C C C C C C C C !!" !# !" !# !$ "" "# #"# " # " "$ $ $ $ $

C C C C C C C !"" "# !# !$ "" "# #"" " " " "$ $ $ $ $

C C C C C !#" !$ "" "# #"" " " "$ $ $ $

for andC ! 3 ! " # 5 " # $35

(b) Using the simplex method, we find and theC !%%%%C !$$$$ C !####!$ "" #"remaining 's are zero. Hence, the optimal policy is to order items in state and not toC # !35order in states and ." #


11/26

19-11

19.4-1.


12/26

19-12

19.4-2.


13/26

19-13

19.4-3.


14/26

19-14

19.4-4.


15/26

19-15

19.4-5.


16/26

19-16


17/26

19-17

19.4-6.


18/26

19-18

19.4-7.


19/26

19-19

19.4-8.

When the number of pints of blood delivered can be specified at the time of delivery, thestarting number of pints including the delivery will never exceed the largest possibledemand in a period, so we can restrict our attention to states . The admissible3 ! " # $actions in state are to order . Given a decision , the transition3 ! 5 $ 3 5probabilities and the immediate cost are computed as follows:

if: 5 TH 3 5 4 4 "34

: 5 T H 3 53!

.G &!5 I"!!3 5 H 35

Initialization: for and. V " 3 ! " # . V !3 " $ "

PV GV

!' !% ! ! *!!$ !$ !% ! '!!" !# !$ !% &!!" !# !$ !% !

" "

Iteration 1:Step 1: Value determination:

1 *! !'@ !%@ @ V V V V" " " "! " !

1 '! !$@ !$@ !%@ @ V V V V V" " " " "! " # "

1 &! !"@ !#@ !$@ !%@ @ V V V V V V" " " " " "! " # $ #

1 ! !"@ !#@ !$@ !%@ @ V V V V V V" " " " " "! " # $ $

@ !$ V"

1 &() @ "*'$ @ ""&* @ &! @ !V V V V V" " " " "! " # $

Step 2: Policy improvement:

minimize

"!! @ @ "!!*! !'@ !%@ @ &()

""! !$@ !$@ !%@ @ #($'"&! !"@

! !

! " !

! " # !

!

V VV V V

V V V V

" "

" " "

" " " "

V V V V V" " " " " !#@ !$@ !%@ @ " # $ ! ""&"

. V $! #

minimize

%! !'@ !%@ @ ))#%'! !$@ !$@ !%@ @ &()

"!! !"@ !#@ !$@ !

! " "

! " # "

! " #

V V VV V V V

V V V

" " "

" " " "

" " " %@ @ $ "V V" " %"*"

. V #" #

minimize "! !$@ !$@ !%@ @ ($''&! !"@ !#@ !$@ !%@ @ ! " # #! " # $ #V V V V

V V V V V" " " "

" " " " " &()

. V "# #

V V# "is not identical to , so optimality test fails.


20/26

19-20

Iteration :#Step 1: Value determination:

1 "&! !"@ V# # # # # #! V !#@ !$@ !%@ @ " # $ !V V V V

1 "!!V# # # # # # !"@ !#@ !$@ !%@ @ ! " # $ "V V V V V

1 &! !"@ !#@ !$@ !%@ @ V V V V V V# # # # # #! " # $ #

1 ! !"@ !#@ !$@ !%@ @ V V V V V V# # # # # #! " # $ $

@ !$ V#

1 &! @ "&! @ "!! @ &! @ !V V V V V# # # # #! " # $

Step 2: Policy improvement:

minimize

"!! @ @ "!!*! !'@ !%@ @ (!

""! !$@ !$@ !%@ @ &&"&! !"@ !

! !

! " !

! " # !

!

V VV V V

V V V VV

# #

# # #

# # # #

# #@ !$@ !%@ @ " # $ !V V V V# # # # &!

. V $! $

minimize

%! !'@ !%@ @ (!'! !$@ !$@ !%@ @ &&

"!! !"@ !#@ !$@ !%@

! " "

! " # "

! " # $

V V VV V V V

V V V V

# # #

# # # #

# # # # # @ " V &!

. V #" $

minimize "! !$@ !$@ !%@ @ &&&! !"@ !#@ !$@ !%@ @ ! " # #! " # $ #V V V V

V V V V V" " " "

" " " " " &!

. V "# $

V V $$ #is identical to , so it is optimal to start every period with pints of blood after

delivery of the order.19.5-1.

Let states , and denote $ , $ and $ offers respectively and let state! " # '!! )!! "!!! $designate the case that the car has already been sold (state of the hint). Let decisions_ "and be to reject and to accept the offer respectively.#

G G G '! G '!! G )!! G "!!!!" "" #" !# "# ##, , and

T " T #

&) "% ") ! ! ! ! "&) "% ") ! ! ! ! "&) "% ") ! ! ! ! "

! ! ! " ! ! ! "

Start with the policy to reject only the $ offer. The relevant equations are:'!! Z '! !*& Z Z Z ! ! " #

& " ") % )

Z )!! !*&Z" $

Z "!!! !*&Z# $

,Z !*&Z$ $


21/26


22/26

19-22

T " T #

%& "& ! ! ! ! ! ""% "% "# ! ! ! ! "

! $% "% ! ! ! ! "! ! ! " ! ! ! "

Start with the policy to sell only when the price is $ . The relevant equations are:$!

Z ! !* Z Z ! ! "% "& & Z ! !* Z Z Z " ! " #

" " "% % #

Z $! !*Z# $

,Z ! !*Z$ $

which admit the unique solution .Z Z Z Z %)'!$&$ (&'!$&$ $! !! " # $

Policy improvement:

State with decision :! # "! !*Z "! Z $ !

State with decision :" # #! !*Z #! Z $ "

State with decision :# " ! !*$%Z "%Z #"#" Z " # #

Hence, the policy to hold the stock when the price is $ and $ , and to sell it when the"! #!price is $ .$!

19.5-5.

(a) minimize "!C #!C $!C!# "# ##

subject to C C !* C C !" !# !" ""% " "& % $

C C !* C C C "" "# !" "" #"" " $ "& % % $

C C !* C C #" ## "" #"" " "# % $

for andC ! 3 ! " # 5 " #35

(b) Using the simplex method, we find andC "*'!&* C !*&)&" C !('%'$!" "" ##the remaining 's are zero. Hence, the optimal policy is to hold the stock at the pricesC35$ and $ and to sell it at the price $ ."! #! $!

19.5-6.

Z !*%&Z "&Z "!! ! "8 8" 8"min

Z !*"%Z "%Z "#Z #!" ! " #8 8" 8" 8"min

Z !*$%Z "%Z $!# " #8 8" 8"min

Z ! 3 ! " #3! for

Iteration 1: Z ! "! "! !" min Sell

min SellZ ! #! #! ""

min SellZ ! $! $! #"


23/26

19-23

Iteration 2: Z "!) "! "!) !# min Hold

min Hold Z #!#& #! #!#& "#

min SellZ #!#& $! $! ##

Iteration 3: Z ""%# "! ""%# !$ min Hold

min Hold Z #!%* #! #!%* "$

min SellZ #!%# $! $! #$

The approximate optimal solution is to sell if the price is $ and to hold otherwise. This$!policy is indeed optimal, as found in Problem 19.5-3 and 19.5-4.

19.5-7.

(a) Let states and be the chemical produced this month, and respectively, and! " G" G#decisions and refer to the process to be used next month, and respectively. There" # E Fare four stationary deterministic policies.

3 . V . V . V . V

! " " # #" " # " #

3 " 3 # 3 $ 3 %

The transition matrix is the same for every decision, viz.

.T !$ !(

!% !' The costs correspond to the expected amount of pollution using the process in theG 535next period.

G !$"& !(# &*!"

G !$$ !() '&!# G !%"& !'# (#""

.G !%$ !') '"#

(b)


24/26

19-24

19.5-8.

(a) minimize &*C '&C (#C 'C!" !# "" "#

subject to C C C C C C !" !# !" "" !# "#" $ % $ % "# "! "! "! "! #

C C C C C C "" "# !" "" !# "#" ( ' ( ' "# "! "! "! "! #

for andC ! 3 ! " 5 " #35

(b) Using the simplex method, we find and .C !)&( C ""%$ C C !!" "# !# ""Hence, the optimal policy is to use process if is produced and if is producedE G" F G#this month.


25/26

19-25

19.5-9.

19.5-10.

The three iterations of successive approximations in Problem 19.5-9 gives the optimalpolicy for the three-period problem. The optimal policy is, therefore, to use the process Eif is produced and if is produced in all periods.G" F G#


26/26

19.5-11.

Z ! !*!()Z ""'Z ""'Z %!!! !*!Z '!!! !*!Z ! # !8 8" 8" 8" 8" 8"

" $ "min

Z "!!! !*!$%Z ")Z ")Z %!!! !*!Z '!!! !*!Z " " $ "8 8" 8" 8" 8" 8"

# !min

Z $!!! !*!"#Z "#Z %!!! !*!Z '!!! !*!Z # # !8 8" 8" 8" 8"

$ "min

Z '!!! !*!Z$ !8 8"

Z ! 3 ! " # $3! for

Iteration 1: Z ! %!!! '!!! ! !" min Do nothing

min Do nothingZ "!!! %!!! '!!! "!!! ""

min Do nothingZ $!!! %!!! '!!! $!!! #"

ReplaceZ '!!! $"

Iteration 2: Z "#*$(& %*!! '!!! "#*$(& !# min Do nothing

min Do nothingZ #')(& %*!! '!!! #')(& "#

min OverhaulZ (!&! %*!! '!!! %*!! ##

ReplaceZ '!!! $#

Iteration 3: Z #(#*&$ '%")(& ("'%$) #(#*&$ !$ min Do nothing

min Do nothingZ %!%!$" '%")(& ("'%$) %!%!$" "$

min OverhaulZ (*!&'%")(& ("'%$) '%")(& #$

ReplaceZ ("'%$) $$

Iteration 4: Z $*%&)! ('$'#) )%&'&) $*%&)! !% min Do nothing

min Do nothingZ &&$" ('$'#) )%&'&) &&$" "%

min OverhaulZ *""#%" ('$'#) )%&'&) ('$'#) #%

ReplaceZ )%&'&) $%

The optimal policy is to do nothing in states and to replace in state in all periods.! " $When in state , it is best to overhaul in periods and to do nothing in period .# " # $ %

Chapter 19Solutions Ch19

Documents

Transcript of Chapter 19Solutions Ch19