Chapter 19Solutions Ch19

download Chapter 19Solutions Ch19

of 26

Transcript of Chapter 19Solutions Ch19

  • 8/10/2019 Chapter 19Solutions Ch19

    1/26

    19-1

    CHAPTER 19: MARKOV DECISION PROCESSES

    19.2-1.

    Bank One, one of the major credit card issuers in the United States has developed theportfolio control and optimization (PORTICO) system to manage APR and credit-line

    changes of its card holders. Customers prefer low APR and high credit lines, which canreduce the bank's profitability and increase the risk. Consequently, the bank faces theneed to find a balance between revenue growth and risk. PORTICO formulates theproblem as a Markov decision process. The state variables are chosen in a way to satisfyMarkovian assumption as closely as possible while keeping the dimension of the statespace at a tractable level. The resulting variables are , where corresponds to theB C Bcredit line and APR level and represents the behavior variables. The transitionCprobabilities are estimated from the available data. The objective is to maximize theexpected net present value of the cash flows over a 36-month horizon. The dynamicprogramming equation for the decision periods of the problem is

    Z B C >"+EBC 4max , !" fwhere denotes the immediate net cash flow and is the discount factor. The

  • 8/10/2019 Chapter 19Solutions Ch19

    2/26

    19-2

    19.2-2.

    (a) Let the states be the number of customers at the facility. There are two3 ! " #possible actions when the facility has one or two customers. Let decision 1 be to use theslow configuration and decision 2 be to use the fast configuration. Also let denote theG34expected net immediate cost of using decision in state . Then,4 3

    G G $ &! #("" #"$

    & G G * &! $""# ##

    %&

    G $!" G *!#

    (b) In state , the configuration chosen does not affect the transition probabilities, so it is!best to choose the slow configuration when there are no customers in line. Consequently,the number of stationary policies is four.

    3 . V . V . V . V " " " # ## " # " #

    3 " 3 # 3 $ 3 %

    Policy Transition Matrix Expected Average Cost

    V G $ #( #(

    !

    !

    V G $ #( $"

    !

    !

    " " ! " #

    " "# #$ " "

    "! # &$ #& &

    # # ! "

    " "# #$ " "

    "! # &% "& &

    1 1 1

    1 1 1#

    $ $ ! " #

    " "# #

    # " "& # "!$ #& &

    % % ! " #

    " "# ## " "& # "!

    % "& &

    V G $ $" #(

    !

    !

    V G $ $" $"

    !

    !

    1 1 1

    1 1 1

    (c)

    Policy Average Cost

    1 1 1! " #

    " "

    # #

    $ $

    %

    V !$"!$ !&"(# !"(#% G "('*V !$#%$ !&%!& !"$&" G "()"V !%!') !&!)& !!)%( G "')$V !%"' !&"* !!'& G "')(%

    G V# #is the minimum, so the optimal policy is , i.e., to use slow configuration when nocustomer or only one customer is present and fast configuration when there are twocustomers.

  • 8/10/2019 Chapter 19Solutions Ch19

    3/26

    19-3

    19.2-3.

    (a) Let the states represent whether the student's car is dented, , or not, .3 " 3 !

    Decision Action State Immediate Cost

    Park on street in one space

    Park on street in two spaces

    " ! G !# ! G %&

    !"

    !#

    Park in lotHave it repaired

    Drive dented

    $ ! G &% " G &!& " G *

    !$

    "%

    "&

    (b) Assuming the student's car has no dent initially, once she decides to park in lot, state" "will never be entered. In that case, the decision chosen in state does not affect theexpected average cost. Hence, it is enough to consider five stationary deterministicpolicies.

    3 . V . V . V . V . V ! " " # # $" % & % &

    3 " 3 # 3 $ 3 % 3 &

    Policy Transition Matrix Expected Average Cost

    V G ! &!!* !"" !

    V G ! *!* !"! "

    V G %& &!!*) !!#

    " !

    V !*

    " " "

    # # "

    $ $ "

    %

    1 1

    1 1

    1 1

    0

    0

    0

    ) !!#! "

    G %& *

    V G &" !

    % "

    & &

    1 1

    1

    0

    0 (c)

    Policy Average Cost

    (if initially not dented)

    1 1! "

    "

    #

    $

    %

    &

    V !*!* !!*" %&&V ! " *V !*) !!# &%"V ! " *V " ! &

    The policy has the minimum cost, so it is optimal to park on the street in one space ifV"not dented and to have it repaired if dented.

  • 8/10/2019 Chapter 19Solutions Ch19

    4/26

    19-4

    19.2-4.

    (a) Let states and denote the good and the bad mood respectively. The decision in! "each state is between providing refreshments or not.

    Decision Action State Immediate Cost

    Provide refreshments

    Not provide refreshmentsProvide r

    " ! G "%

    # ! G !"

    !"

    !#efreshments

    Not provide refreshments

    " G "%# " G (&

    ""

    "#

    (b) There are four possible stationary policies.

    3 . V . V . V . V ! " " # #" " # " #

    3 " 3 # 3 $ 3 %

    Policy Transition Matrix Expected Average Cost

    V G "% "%!)(& !"#&

    !)(& !"#&

    V G "% (&!)(& !"#&!"#& !)(&

    V !"#

    " " ! "

    # # ! "

    $

    1 1

    1 1

    & !)(&!)(& !"#&

    G "%

    V G (&!"#& !)(&!"#& !)(&

    $ "

    % % "

    1

    1 (c)

    Policy Average Cost

    1 1! "

    " "

    # #

    $ $

    %

    V !)(& !"#& G "%V !& !& G %%&V !& !& G (V% !"#& !)(& G '&'#&

    The optimal policy is , i.e., to provide refreshments only if the group begins the nightV$in a bad mood.

  • 8/10/2019 Chapter 19Solutions Ch19

    5/26

    19-5

    19.2-5.

    (a) Let state denote point over, two serves to go on next point and state denote one! "serve left. The decision in each state is to attempt an ace or a lob.

    Decision Action State Immediate Cost

    Attempt ace

    Attempt lob

    " ! G " "

    # ! G " "

    !"$ # " ") $ $ )

    !#( " # () $ $ #%

    Attempt ace

    Attempt lob

    " " G " " "

    # " G " " "

    ""$ # " & ") $ $ ) #

    "#( " # " &) $ $ ) "#

    (b) There are four possible stationary deterministic policies.

    3 . V . V . V . V ! " " # #" " # " #

    3 " 3 # 3 $ 3 %

    Policy Transition Matrix Expected Average Cost

    V G ") "#$) &)

    " !

    V G ") &"#$) &)

    " !

    V G (() ")

    " !

    " " ! "

    # # ! "

    $ $

    1 1

    1 1

    #% "#

    V G (#% &"#() ")

    " !

    1 1

    1 1

    ! "

    % % ! " (c)

    Policy Average Cost

    1 1! "

    " "

    # #

    $ $

    %

    V !'"& !$)& G !#(!V !'"& !$)& G !#$(V !))* !""" G !$"&V% !))* !""" G !$!'

    The optimal policy is , i.e., to attempt lob in state and ace in state .V ! "$

  • 8/10/2019 Chapter 19Solutions Ch19

    6/26

    19-6

    19.2-6.

    (a) Let states represent the state of the market, , and3 ! " # "$ !!! "% !!! "& !!!respectively. The decision is between two funds, namely the Go-Go Fund and the Go-Slow Mutual Fund. All the costs are expressed in thousand dollars.

    Decision Action State Immediate Cost

    Invest in the Go-GoInvest in the Go-Slow

    " ! G !%#& !#'! ### ! G !%"!"!# ! !##& *

    " " G !$#& !$'! "! " G !$"! !$#& %&

    Invest in the Go-Go

    Invest in the Go-Slow""

    "#

    Invest in the Go-Go

    Invest in the Go-Slow

    " # G !"'! !%#& "'# # G !"#& !%"! '&

    #"

    ##

    (b) There are eight possible stationary policies.

    3 . V . V . V . V . V . V . V . V ! " " " " # # # #" " " # # # " "

    3 " 3 # 3 $ 3 % 3 & 3 ' 3 ( 3 )

    ## " # # " " # " #

    All 's have the same transition matrix: .V!% !% !#!$ !% !$!" !% !&

    3

    Policy Expected Average Cost

    V G ## "!& "'V G ## "!& '&V G ## %& '&V G ## %& "'

    V G *

    " " ! " #

    # # ! " #

    $ $ ! " #

    % % ! " #

    & &

    1 1 1

    1 1 1

    1 1 1

    1 1 1

    1 1 11 1 1

    1 1 1

    1 1 1

    ! " #

    ' ' ! " #

    ( ( ! " #

    ) ) ! " #

    %& "'V G * "!& '&V G * "!& "'V G * %& '&

    (c) 1 !#&( !% !$%$

    Policy Average Cost

    V %$("V ('#*V #*

    V "*("V "$("V %#)'V "!#*V "))'

    "

    #

    $

    %&

    '

    (

    )

    The optimal policy is , i.e. to invest in the Go-Go Fund in states and , in the Go-V ! "&Slow Fund in state .#

  • 8/10/2019 Chapter 19Solutions Ch19

    7/26

    19-7

    19.2-7.

    (a) Let states and represent whether the machine is broken down or is running! "respectively. The decision is between Buck and Bill.

    Decision Action State Immediate Cost

    Buck

    BillBuck

    Bill

    " ! G !

    # ! G !" " G "#!!# " G "#!!

    !"

    !#

    ""

    "#

    (b) There are four possible stationary deterministic policies.

    3 . V . V . V . V ! " " # #" " # " #

    3 " 3 # 3 $ 3 %

    Policy Transition Matrix Expected Average Cost

    V G "#!!!% !'

    !' !%

    V G "#!!!% !'!% !'

    V G "#!!!& !&!' !%

    V

    " " "

    # # "

    $ $ "

    1

    1

    1

    % % " !& !&!% !' G "#!!1(c)

    Policy Average Cost

    1 1! "

    " "

    # #

    $ $

    %

    V !& !& G '!!V !% !' G (#!V !&%& !%&& G &%'V% !%%% !&&' G ''(#

    The largest expected average profit is given by .V#

    19.2-8.

    (a) Let the states be the number of items in inventory at the beginning of the period andthe decision be the number of items ordered. To conform to the software package, oneneeds to relabel the decisions as respectively. The cost matrix is:"#$

    - " # $! %!$ &'$ #%" % "* # %

    35

    Let denote the policy to order items when the inventory level is initially and not toV # !$order when the inventory level is initially either or . In other words, and! " . V $! $. V . V "" $ # $ .

  • 8/10/2019 Chapter 19Solutions Ch19

    8/26

    19-8

    T V %* $* #*"$ "$ "$#$ "$ !"$ "$ "$

    $

    1

    Expected average cost: $ /period%*G $*G #*G ""'* "#)*!$ "" #"

    (b) There are stationary policies, since one can order or items in each state.$ #( ! " #$

    However, only six of these are feasible. The remaining policies are infeasible and the#"decision at least in one of the states leads to over capacity.

    3 . V . V . V . V . V . V ! " # $ " # $" " " " # # ## " " " " " "

    3 " 3 # 3 $ 3 % 3 & 3 '

    19.3-1.

    (a) minimize $C *C $C *C #)C $%C!" !# "" "# #" ##

    subject to C C C C C C "!" !# "" "# #" ##

    C C C C C C !!" !# !" !# "" "#" " $ ## # "! &

    C C C C C C C C !"" "# !" !# "" "# #" ##" " " " $ %# # # # & &

    C C C C C C !#" ## "" "# #" ### " # "

    "! "! & & for andC ! 3 ! " # 5 " #35

    (b) Using the simplex method, we find andC !$#%$# C !&%!&% C !"$&"%!" "" ##the remaining 's are zero. Hence, the optimal policy uses decision in states and ,C " ! "35decision in state .# #

    19.3-2.

    (a) minimize %&C &C &!C *C!# !$ "% "&

    subject to C C C C C "!" !# !$ "% "&

    C C C C C C C !!" !# !$ !" !# !$ "%* %*

    "! &! C C C C C !"% "& !" !# "&

    " ""! &!

    C C C C C !!" !# !$ "% "&

    (b) Using the simplex method, all 's turn out to be zero except that andC C !*!*!*35 !"C !!*!*" " ! % ""% , so the policy that uses decision in state and decision in state isoptimal.

  • 8/10/2019 Chapter 19Solutions Ch19

    9/26

  • 8/10/2019 Chapter 19Solutions Ch19

    10/26

    19-10

    19.3-6.

    (a) minimize "#!!C "#!!C"" "#

    subject to C C C C "!" !# "" "#

    C C !%C !&C !'C !%C !!" !# !" !# "" "# C C !'C !&C !%C !'C !"" "# !" !# "" "# for andC ! 3 ! " 5 " #35

    (b) Using the simplex method, we find , so theC !% C !' C C !!" "# !# ""optimal policy is to use decision (Buck) in state and decision (Bill) in state ." ! # "

    19.3-7.

    (a) minimize %! &'$ $!" !# !$ "" "# #"C C #%C %C "*C %C

    subject to C C C C C C "!" !# !$ "" "# #"

    C C C C C C C C !!" !# !" !# !$ "" "# #"# " # " "$ $ $ $ $

    C C C C C C C !"" "# !# !$ "" "# #"" " " " "$ $ $ $ $

    C C C C C !#" !$ "" "# #"" " " "$ $ $ $

    for andC ! 3 ! " # 5 " # $35

    (b) Using the simplex method, we find and theC !%%%%C !$$$$ C !####!$ "" #"remaining 's are zero. Hence, the optimal policy is to order items in state and not toC # !35order in states and ." #

  • 8/10/2019 Chapter 19Solutions Ch19

    11/26

    19-11

    19.4-1.

  • 8/10/2019 Chapter 19Solutions Ch19

    12/26

    19-12

    19.4-2.

  • 8/10/2019 Chapter 19Solutions Ch19

    13/26

    19-13

    19.4-3.

  • 8/10/2019 Chapter 19Solutions Ch19

    14/26

    19-14

    19.4-4.

  • 8/10/2019 Chapter 19Solutions Ch19

    15/26

    19-15

    19.4-5.

  • 8/10/2019 Chapter 19Solutions Ch19

    16/26

    19-16

  • 8/10/2019 Chapter 19Solutions Ch19

    17/26

    19-17

    19.4-6.

  • 8/10/2019 Chapter 19Solutions Ch19

    18/26

    19-18

    19.4-7.

  • 8/10/2019 Chapter 19Solutions Ch19

    19/26

    19-19

    19.4-8.

    When the number of pints of blood delivered can be specified at the time of delivery, thestarting number of pints including the delivery will never exceed the largest possibledemand in a period, so we can restrict our attention to states . The admissible3 ! " # $actions in state are to order . Given a decision , the transition3 ! 5 $ 3 5probabilities and the immediate cost are computed as follows:

    if: 5 TH 3 5 4 4 "34

    : 5 T H 3 53!

    .G &!5 I"!!3 5 H 35

    Initialization: for and. V " 3 ! " # . V !3 " $ "

    PV GV

    !' !% ! ! *!!$ !$ !% ! '!!" !# !$ !% &!!" !# !$ !% !

    " "

    Iteration 1:Step 1: Value determination:

    1 *! !'@ !%@ @ V V V V" " " "! " !

    1 '! !$@ !$@ !%@ @ V V V V V" " " " "! " # "

    1 &! !"@ !#@ !$@ !%@ @ V V V V V V" " " " " "! " # $ #

    1 ! !"@ !#@ !$@ !%@ @ V V V V V V" " " " " "! " # $ $

    @ !$ V"

    1 &() @ "*'$ @ ""&* @ &! @ !V V V V V" " " " "! " # $

    Step 2: Policy improvement:

    minimize

    "!! @ @ "!!*! !'@ !%@ @ &()

    ""! !$@ !$@ !%@ @ #($'"&! !"@

    ! !

    ! " !

    ! " # !

    !

    V VV V V

    V V V V

    " "

    " " "

    " " " "

    V V V V V" " " " " !#@ !$@ !%@ @ " # $ ! ""&"

    . V $! #

    minimize

    %! !'@ !%@ @ ))#%'! !$@ !$@ !%@ @ &()

    "!! !"@ !#@ !$@ !

    ! " "

    ! " # "

    ! " #

    V V VV V V V

    V V V

    " " "

    " " " "

    " " " %@ @ $ "V V" " %"*"

    . V #" #

    minimize "! !$@ !$@ !%@ @ ($''&! !"@ !#@ !$@ !%@ @ ! " # #! " # $ #V V V V

    V V V V V" " " "

    " " " " " &()

    . V "# #

    V V# "is not identical to , so optimality test fails.

  • 8/10/2019 Chapter 19Solutions Ch19

    20/26

    19-20

    Iteration :#Step 1: Value determination:

    1 "&! !"@ V# # # # # #! V !#@ !$@ !%@ @ " # $ !V V V V

    1 "!!V# # # # # # !"@ !#@ !$@ !%@ @ ! " # $ "V V V V V

    1 &! !"@ !#@ !$@ !%@ @ V V V V V V# # # # # #! " # $ #

    1 ! !"@ !#@ !$@ !%@ @ V V V V V V# # # # # #! " # $ $

    @ !$ V#

    1 &! @ "&! @ "!! @ &! @ !V V V V V# # # # #! " # $

    Step 2: Policy improvement:

    minimize

    "!! @ @ "!!*! !'@ !%@ @ (!

    ""! !$@ !$@ !%@ @ &&"&! !"@ !

    ! !

    ! " !

    ! " # !

    !

    V VV V V

    V V V VV

    # #

    # # #

    # # # #

    # #@ !$@ !%@ @ " # $ !V V V V# # # # &!

    . V $! $

    minimize

    %! !'@ !%@ @ (!'! !$@ !$@ !%@ @ &&

    "!! !"@ !#@ !$@ !%@

    ! " "

    ! " # "

    ! " # $

    V V VV V V V

    V V V V

    # # #

    # # # #

    # # # # # @ " V &!

    . V #" $

    minimize "! !$@ !$@ !%@ @ &&&! !"@ !#@ !$@ !%@ @ ! " # #! " # $ #V V V V

    V V V V V" " " "

    " " " " " &!

    . V "# $

    V V $$ #is identical to , so it is optimal to start every period with pints of blood after

    delivery of the order.19.5-1.

    Let states , and denote $ , $ and $ offers respectively and let state! " # '!! )!! "!!! $designate the case that the car has already been sold (state of the hint). Let decisions_ "and be to reject and to accept the offer respectively.#

    G G G '! G '!! G )!! G "!!!!" "" #" !# "# ##, , and

    T " T #

    &) "% ") ! ! ! ! "&) "% ") ! ! ! ! "&) "% ") ! ! ! ! "

    ! ! ! " ! ! ! "

    Start with the policy to reject only the $ offer. The relevant equations are:'!! Z '! !*& Z Z Z ! ! " #

    & " ") % )

    Z )!! !*&Z" $

    Z "!!! !*&Z# $

    ,Z !*&Z$ $

  • 8/10/2019 Chapter 19Solutions Ch19

    21/26

  • 8/10/2019 Chapter 19Solutions Ch19

    22/26

    19-22

    T " T #

    %& "& ! ! ! ! ! ""% "% "# ! ! ! ! "

    ! $% "% ! ! ! ! "! ! ! " ! ! ! "

    Start with the policy to sell only when the price is $ . The relevant equations are:$!

    Z ! !* Z Z ! ! "% "& & Z ! !* Z Z Z " ! " #

    " " "% % #

    Z $! !*Z# $

    ,Z ! !*Z$ $

    which admit the unique solution .Z Z Z Z %)'!$&$ (&'!$&$ $! !! " # $

    Policy improvement:

    State with decision :! # "! !*Z "! Z $ !

    State with decision :" # #! !*Z #! Z $ "

    State with decision :# " ! !*$%Z "%Z #"#" Z " # #

    Hence, the policy to hold the stock when the price is $ and $ , and to sell it when the"! #!price is $ .$!

    19.5-5.

    (a) minimize "!C #!C $!C!# "# ##

    subject to C C !* C C !" !# !" ""% " "& % $

    C C !* C C C "" "# !" "" #"" " $ "& % % $

    C C !* C C #" ## "" #"" " "# % $

    for andC ! 3 ! " # 5 " #35

    (b) Using the simplex method, we find andC "*'!&* C !*&)&" C !('%'$!" "" ##the remaining 's are zero. Hence, the optimal policy is to hold the stock at the pricesC35$ and $ and to sell it at the price $ ."! #! $!

    19.5-6.

    Z !*%&Z "&Z "!! ! "8 8" 8"min

    Z !*"%Z "%Z "#Z #!" ! " #8 8" 8" 8"min

    Z !*$%Z "%Z $!# " #8 8" 8"min

    Z ! 3 ! " #3! for

    Iteration 1: Z ! "! "! !" min Sell

    min SellZ ! #! #! ""

    min SellZ ! $! $! #"

  • 8/10/2019 Chapter 19Solutions Ch19

    23/26

    19-23

    Iteration 2: Z "!) "! "!) !# min Hold

    min Hold Z #!#& #! #!#& "#

    min SellZ #!#& $! $! ##

    Iteration 3: Z ""%# "! ""%# !$ min Hold

    min Hold Z #!%* #! #!%* "$

    min SellZ #!%# $! $! #$

    The approximate optimal solution is to sell if the price is $ and to hold otherwise. This$!policy is indeed optimal, as found in Problem 19.5-3 and 19.5-4.

    19.5-7.

    (a) Let states and be the chemical produced this month, and respectively, and! " G" G#decisions and refer to the process to be used next month, and respectively. There" # E Fare four stationary deterministic policies.

    3 . V . V . V . V

    ! " " # #" " # " #

    3 " 3 # 3 $ 3 %

    The transition matrix is the same for every decision, viz.

    .T !$ !(

    !% !' The costs correspond to the expected amount of pollution using the process in theG 535next period.

    G !$"& !(# &*!"

    G !$$ !() '&!# G !%"& !'# (#""

    .G !%$ !') '"#

    (b)

  • 8/10/2019 Chapter 19Solutions Ch19

    24/26

    19-24

    19.5-8.

    (a) minimize &*C '&C (#C 'C!" !# "" "#

    subject to C C C C C C !" !# !" "" !# "#" $ % $ % "# "! "! "! "! #

    C C C C C C "" "# !" "" !# "#" ( ' ( ' "# "! "! "! "! #

    for andC ! 3 ! " 5 " #35

    (b) Using the simplex method, we find and .C !)&( C ""%$ C C !!" "# !# ""Hence, the optimal policy is to use process if is produced and if is producedE G" F G#this month.

  • 8/10/2019 Chapter 19Solutions Ch19

    25/26

    19-25

    19.5-9.

    19.5-10.

    The three iterations of successive approximations in Problem 19.5-9 gives the optimalpolicy for the three-period problem. The optimal policy is, therefore, to use the process Eif is produced and if is produced in all periods.G" F G#

  • 8/10/2019 Chapter 19Solutions Ch19

    26/26

    19.5-11.

    Z ! !*!()Z ""'Z ""'Z %!!! !*!Z '!!! !*!Z ! # !8 8" 8" 8" 8" 8"

    " $ "min

    Z "!!! !*!$%Z ")Z ")Z %!!! !*!Z '!!! !*!Z " " $ "8 8" 8" 8" 8" 8"

    # !min

    Z $!!! !*!"#Z "#Z %!!! !*!Z '!!! !*!Z # # !8 8" 8" 8" 8"

    $ "min

    Z '!!! !*!Z$ !8 8"

    Z ! 3 ! " # $3! for

    Iteration 1: Z ! %!!! '!!! ! !" min Do nothing

    min Do nothingZ "!!! %!!! '!!! "!!! ""

    min Do nothingZ $!!! %!!! '!!! $!!! #"

    ReplaceZ '!!! $"

    Iteration 2: Z "#*$(& %*!! '!!! "#*$(& !# min Do nothing

    min Do nothingZ #')(& %*!! '!!! #')(& "#

    min OverhaulZ (!&! %*!! '!!! %*!! ##

    ReplaceZ '!!! $#

    Iteration 3: Z #(#*&$ '%")(& ("'%$) #(#*&$ !$ min Do nothing

    min Do nothingZ %!%!$" '%")(& ("'%$) %!%!$" "$

    min OverhaulZ (*!&'%")(& ("'%$) '%")(& #$

    ReplaceZ ("'%$) $$

    Iteration 4: Z $*%&)! ('$'#) )%&'&) $*%&)! !% min Do nothing

    min Do nothingZ &&$" ('$'#) )%&'&) &&$" "%

    min OverhaulZ *""#%" ('$'#) )%&'&) ('$'#) #%

    ReplaceZ )%&'&) $%

    The optimal policy is to do nothing in states and to replace in state in all periods.! " $When in state , it is best to overhaul in periods and to do nothing in period .# " # $ %