Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn...
-
date post
20-Dec-2015 -
Category
Documents
-
view
222 -
download
0
Transcript of Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn...
Markov Decision Models for Markov Decision Models for Order Acceptance/Rejection Order Acceptance/Rejection
ProblemsProblems
Florian Defregger and Florian Defregger and Heinrich KuhnHeinrich Kuhn Catholic University of Eichstätt-Ingolstadt
Fifth International Conference on Fifth International Conference on „Analysis of Manufacturing Systems - Production Management“„Analysis of Manufacturing Systems - Production Management“
Zakynthos, Mai 24Zakynthos, Mai 24thth, 2005, 2005
May 24, 2005 2
StructureStructure
1. Introduction
2. Decision Problem
3. Markov Decision Model
4. Solution Procedure
5. Numerical Results
May 24, 2005 3
IntroductionIntroduction
Revenue Management (RM)
– Service industries (air transportation, hotels, car rental, etc.)
– Manufacturing industries (steel, paper, aluminum, etc.)
see Kniker/Burman (2001)
– Implementations of RM systems have increased profits
by 2 – 10%.
May 24, 2005 4
IntroductionIntroduction
Which kind of manufacturing company could potentially use revenue
management to increase the bottom line?
a) high fixed costs
b) a short-term increase of capacity to meet demand peaks is very
expensive or even not possible
c) demand fluctuates over time
d) customers are willing to pay different prices for essentially the
same product
May 24, 2005 6
Decision problemDecision problem
Assumptions
• One single bottleneck in the manufacturing process
• Orders:
• specific price, volume, and lead time (due date)
• one arrival in a given time period
• arrivals are independent of one another
• Products can be made to stock
• Limited inventory capacity
• Infinite planning horizon
May 24, 2005 7
Decision problemDecision problem
1. Accept order? yes/no
2. If yes; how much inventory should be used?
Incoming orders
Accept?
no
Deliveryyes
Machine
Inventory
kMachine m
Time
Accepted orders before
today
n
Maximum lead time, ln
May 24, 2005 8
NotationNotation
• N order classes, n {1, ..., N}.
• Each order n can be assigned to one order class.
• Parameters for orders of class n:
mn : profit margin
un : capacity usage
ln : lead time
pn : probability of arriving
dummy order class 0: 0 ,1 01
000
lumppN
nn
Orders:
.
.
.
today
0
1
N
p1
p0
pN
mnunln
n
.
.
.
pn
May 24, 2005 9
NotationNotation
Inventory:
Imax : maximum inventory level
i : inventory level, i {0,1, ..., Imax}.
h : inventory holding costs per unit of inventory per period
Inventory level i is expressed in periods that the machine needed toproduce that inventory
May 24, 2005 10
NotationNotation
n,c,i n,c,i n,c,i
today
Transition Probabilities
States (n, c, i) S (state space):
n : order class of the order arrived at the beginning of the current period
c : number of periods the machine is reserved for already accepted but not finished yet orders, c {0,1, ..., H}.
i : current inventory level
H-c : available capacity in the considered horizon H
Problem Size:
n c i
)1()1,maxmax()1( max IlNS nn
)1,maxmax( nn
lH
k
m
nmaximum lead time
timetoday
lk ln lm
capacity usage, un
maximum horizon, H
May 24, 2005 11
Sequence of DecisionsSequence of Decisions
Incoming Order
Accept?
no
yes
ReplenishInventory
?
ReplenishInventory
?
no
D1
yes
D4
no
D3(r)
yesIs themachine
busy?
yes
no
Is themachine
busy?
yes
no
Decide how manyunits to use from
inventory
D2
accept, do not raise inventory and satisfy order with r units from inventory: n > 0 (c+un ln + i un i), r {rmin,…,rmax}
D3(r) :=
D2 := reject and raise inventory level : c = 0 i < Imax
D1 := reject and do not raise inventory level
D4 := accept, satisfy order completely from inventory and raise inventory level: n > 0 c = 0 un i
D[(n, c, i)] =
n: order class
c: machine usage
i: inventory level
kMachine m
Time
Accepted orders before
today
n
Maximum lead time, ln
May 24, 2005 12
RewardsRewards
timetoday
InventoryD2
ii+1
timetoday
InventoryD3(r)
i-r
in
timetoday
InventoryD4
i-un
in
RD1 = RD2 = – h ·i
RD3(r) = mn – h · (i – r)
RD4 = mn – h · (i – un)
D1: reject and do not raise inventory level
D2: reject and raise inventory level
D3: accept and do not raise inventory level
D4: accept and raise inventory level
timetoday
InventoryD1
i
May 24, 2005 13
Time-discrete Markov Decision ProcessTime-discrete Markov Decision Process
Objective: find the best action for every state in order to maximize the long-term average reward per period
|D| = 4),maxmin(max
Iunn
Number of decision possibilities
state
Transition Probabilities
time
today
state state state
decision,reward
decision,reward
decision,reward
May 24, 2005 14
pm, (n, c, i) {S : c 0}, m {0, ..., N}
0, elsePD1[(n, c, i), (m, c – 1, i)] =
n, m: order class
c: machine usage
i: inventory level
Transition ProbabilitiesTransition Probabilities
=
pm, (n, c, i) S, m {0, ..., N},
r {min(max(0, c + un – ln), min(i, un),..., min(i, un)}
0, else
PD3(r)[(n, c, i), (m, c + un – r – 1, i – r )] =
D1: reject and do not raise inventory level
D3: accept and do not raise inventory level
n, c, i
m, c-1, i
m, c+un-r-1, i-r
machineis busy
pm
if D1
pm
if D3(r)
May 24, 2005 15
PD2[(n, 0, i), (m, 0, i + 1)] =
pm, n, m {0, ..., N}, i {0, ..., Imax – 1}
0, else
pm, (n, c, i) S, m {0, ..., N}
0, else
PD4[(n, 0, i), (m, 0, i – un + 1)] =
n, m: order class
c: machine usage
i: inventory level
Transition ProbabilitiesTransition Probabilities
pm, n, m {0, ..., N}, i {0, ..., Imax}
0, else
PD1[(n, 0, i), (m, 0, i)] =
PD3(r)[(n, 0, i), (m, max(0,un – r – 1), i – r )] = …
n, 0, i
m, 0, i
m, 0, i-un+1
machine isnot busy
pmif D1
pmif D4
m, 0, i+1pm
if D2
m, un-r-1, i-r
pmif D3(r)
D1: reject and do not raise inventory level
D2: reject and raise inventory level
D3: accept and do not raise inventory level
D4: accept and raise inventory level
May 24, 2005 16
This Markov Decision Process can be solved via standard methods, e.g.
linear programming , policy iteration or value iteration.
But, for large problem instances the computational times are too long(see Numerical Results).
Solution ProcedureSolution Procedure
May 24, 2005 17
Heuristic:
Objective: Find good policies in acceptable runtimes
Idea: Reject "bad" order classes and accept "good" order classes
"goodness" of an order class: relative profit margin mn / un [profit/cap. usage]
Solution ProcedureSolution Procedure
0 1 2 3 4 5order classes, sorted
ascending by relative profitmargins
reject under allcircumstances
reject,acceptance notpossible
reject, althoughacceptance possible
accept
accept if possibleaccept in favorablesituations
May 24, 2005 19
Consider an “accept in favorable situations” order class, e.g. n=2 or n=3:
Acceptance levels increase with lower machine usages or higher inventory levels
Solution ProcedureSolution Procedure
Machine
Time
n
lead time, ln =5
minimum inventory needed = 3
today
0
1
2
3
4
5
6
7
8
9
10
0
1
2
3
4
5
6
7
8
9
10
0
1
2
3
4
5
6
7
8
9
10
0
1
2
3
4
5
6
7
8
9
10
0
1
2
3
4
5
6
7
8
9
10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5machine
usage
inventorylevel
reject,acceptance notpossible
reject, althoughacceptance possible
accept
0
1
2
3
4
5
6
7
8
9
10
6
0
1
2
3
4
5
6
7
8
9
10
7
capacity usageun=8
lead timeln=5
minimum inventoryneededun - ln = 3
May 24, 2005 21
The result is a combinatorial optimization problem in N dimensions.
Idea for heuristic: evaluate the average reward of certain policies AT = (a1, a2, ..., aN) via
simulation and find good policies by simulation comparisons.
Example: N = 5
Solution ProcedureSolution Procedure
un
n
1 2 3 4 5
Imax
0
Imax+1
max(0, un - ln)
May 24, 2005 23
Solution ProcedureSolution Procedure
Policy i:• order classes n {0,1,…,i} are completely rejected• order classes n {i+1,…,N} are completely accepted• R(i) : average reward of policy i
an
n
1 2 3 4 5
Imax + 1
0
max (0, un ln )
first two policies to be compared
policy i = 1
policy i = 0
Imax
May 24, 2005 24
Solution ProcedureSolution Procedure
Procedure:
• Sort order classes ascending by their relative profit margins
• Close order classes successively n = 1, 2, ... until maximum of average reward is reached
• The last order class that was closed has the maximum reward R* ;it is called n*
i
ni
nRR
nR
Nn
i
RR
*
*
*
n
endfor
)(
endfor ,)(R if
,...,2,1for
0
)0(* an
n
1 2 3 4 5
Imax+1
0
n* = 2
max (0, un - ln )
Imax
May 24, 2005 25
Further improvement of the policy:
• Close half of the order class right of n*, n=n*+1,
• Open half of n*• Determine which policy offers maximum of average reward
Solution ProcedureSolution Procedure
an
n
1 2 3 4 5
Imax+1
0
n*
an
n
1 2 3 4 5
Imax+1
0
n*
max (0, un - ln )
an*an*+1
Imax Imax
May 24, 2005 26
Numerical ResultsNumerical Results
problem class 1 2 3 4 5
number of states 10,000 50,000 100,000 500,000 1,000,000
number of instances
100 100 100 100 100
order classes [5,20] [5,20] [10,30] [20,50] [20,50]
maximum inventory 10 15 20 50 100
relative profit margin
[1,3] [1,3] [1,3] [1,3] [1,3]
maximum lead time 151 520 423 466 471
inventory cost 0.01 0.01 0.01 0.01 0.01
trafic intensity [1.5,2.5] [1.5,2.5] [1.5,2.5] [1.5,2.5] [1.5,2.5]
Problem classesProblem classes
May 24, 2005 27
Numerical ResultsNumerical Results
problem class 1 2 3 4 5
proportion optimum [%] 99 93 94 0 0
runtime value iteration [sec.] 82.3 880.9 1584.1 3681.3 3741.1
average [%] 4.4 3.8 4.0 2.4 -8.5
minimum [%] 0.0 0.0 0.0 -3.0 -69.9
maximum [%] 18.3 33.9 34.2 22.2 8.6
standard deviation [%] 4.7 6.2 6.0 3.9 13.6
Average reward per period FCFS-policy vs. value iteration algorithm
May 24, 2005 28
Numerical ResultsNumerical Results
problem class 1 2 3
proportion optimum [%] 99 93 94
running time heuristic [sec.] 42.8 92.8 115.3
running time value iteration [sec.] 82.3 880.9 1584.1
average [%] 1.7 1.8 1.5
minimum [%] 0.0 0.0 0.0
maximum [%] 17.9 33.9 23.1
standard deviation [%] 2.9 4.8 3.1
Average reward per periodHeuristic procedure vs. value iteration algorithm
May 24, 2005 29
Numerical ResultsNumerical Results
problem class 1 2 3 4 5
runtime FCFS [sec.] 15.0 62.8 115.3 70.5 143.2
runtime heuristic [sec.] 42.8 92.8 58.3 254.8 206.9
average [%] 2.7 2.1 2.5 2.0 1.7
minimum [%] 0.0 0.0 0.0 0.0 0.0
maximum [%] 16.6 19.2 32.1 18.4 11.7
standard deviation [%] 3.8 4.1 5.1 2.8 2.5
Average reward per period FCFS-policy vs. heuristic procedure
May 24, 2005 30
Numerical ResultsNumerical Results
order class 1 2 3
lead time 10 4 2
profit margin 20,00 € 60,00 € 100,00 €
capacity usage 4 4 4
relative profit margin 5,00 15,00 25,00
relative traffic intensity
60% 30% 10%
Example with three order classes
May 24, 2005 31
Numerical ResultsNumerical Results
Average reward per period Heuristic procedure vs. value iteration algorithm
influence of traffic intensity on average reward, low inventory holding costs = 1€
-2
0
2
4
6
8
10
12
14
50% 75% 100% 125% 150% 175% 200% 225% 250%
traffic intensity
aver
age
rew
ard
optimal policy lowinventory capacity (2 units)
heuristic lowinventory capacity (2 units)
optimal policy highinventory capacity (8 units)
heuristic highinventory capacity (8 units)
May 24, 2005 32
Numerical ResultsNumerical Results
Average reward per period Heuristic procedure vs. value iteration algorithm
influence of inventory capacity on average reward, high traffic intensity = 200%
5
6
7
8
9
10
11
12
0 1 2 3 4 5 6 7 8 9 10
inventory capacity
ave
rag
e re
war
d
optimal policy lowinventory holdingcost (1€)
heuristic lowinventory holdingcost
optimal policy highinventory holdingcost (5€)
heuristic highinventory holdingcost
steep ascent because one order class needs at least two units of inventory for acceptance
May 24, 2005 34
Thank you for your attention.