Variance Reduction Technique
description
Transcript of Variance Reduction Technique
-
VARIANCE REDUCTION(L&K Chapter 11)
Variance reduction techniques (VRTs) improve a crude simulationexperiment by
reducing the variance of the estimators without increasing thecomputational effort, or by
obtaining the same variance with less computational effort.
VRTs focus on point-estimator performance, but this improvementshould be reflected in a reduced measure of error.
VRTs do not necessarily change the underlying variance of the sys-tem.
147
-
VRTs may require additional computational or analyst effort, so youmust decide if there is a net improvement.
We present general VRTs, but VRTs work best when tailored to aspecific problem.
148
-
WHY DO VARIANCE REDUCTION?
Recall the Challenger accident in 1986.
After that, a friend of mine was hired by Morton Thiokol to look atreliability estimation. MT wanted chances of 106 of failure.
At 100 random numbers per rep, how many do we need to observe30 failures? 1,000,000 x 100 x 30 = 3 billion
Suppose that the failure probability is 1 105 (meaning it is anorder of magnitude too large).
If we make 30 million replications, the standard error of the estimateof Pr{failure} is approximately 2 105. This is twice as large asthe value we are estimating!
Without variance reduction a lot of simulation still gives a poorestimate.
149
-
APPLICATIONS OF VARIANCE REDUCTION
Highly reliable systems
Financial engineering/quantitative finance
Simulation optimization
Lots of alternatives
Noisy gradient estimates
Metamodeling/mapping
Real-time control using simulation
150
-
VARIANCE REDUCTION IN REAL LIFE
Probably the most common example of variance reduction is publicopinion polling (very popular in the US).
One reason why small samples like 1500 give such good estimatesis that the sampling is not purely random.
Stratification among income levels, race, political parties insure amore representative sample
151
-
VARIANCE REDUCTION TECHNIQUES
We will cover the following VRTs for estimating = E[Y ], using thehighly reliable system simulation as an illustration.
Antithetic Variates (AV) Applicable in all stochastic simulations. Attempts tobalance simulation outputs by balancing the pseudorandom numbers.
Control variates (CV) Applicable in all stochastic simulations. Exploits infor-mation about simulation inputs to reduce variance of simulation outputs.Based on least-squares regression.
Conditional Expectations (CE) Not generally applicable, but guaranteed effec-tive when it is. Basic idea is not to simulate what you can compute.
Importance Sampling (IS) Generally applicable. Attempts to bias the simula-tion outputs toward more important areas.
We have already covered CRN; L&K add Indirect Estimation.
152
-
ABSOLUTE VS. RELATIVE PARAMETER
In theory any VRT could be used for any problem, but it is usefulto distinguish two classes:
Estimating the value of a parameter associated with a singlesystem (absolute parameter).
In this case the actual value of the parameter matters.
Estimating the difference between parameters of two or moredistinct systems (relative parameter).
In this case only the relative difference matters and not the actualvalues of the parameters. Common random numberswhich wehave already coveredis the primary VRT here.
153
-
BACKGROUND
Cov[A,B] = E[(A A)(B B)]
Corr[A,B] = Cov[A,B]/Var[A]Var[B]
Let Z = Y bX. Then
Var[Z] = Var[Y ] + b2Var[X] 2bCov[Y,X]
E[T ] = E [E[T |S]]
Some VRTs exploit these relationships.
154
-
MARKOV-PROCESS PRIMER
A continuous-time Markov process is a stochastic process {Yt; t 0}with state-space a subset of {0,1,2, . . .}.
Examples include queueing, reliability, inventory, combat, biologicalprocesses.
Characteristics:
Time spent in each state is exponentially distributed.Next state entered depends only on the current state.
MPs can be analyzed via mathematical analysis, but when the statespace is large numerical analysis or simulation may be required.
155
-
Suppose the state space is {1,2, . . . ,m}. The generator of the MPdescribes how Yt evolves over time.
G =
g11 g12 g1mg21 g22 g2m... ... ... ...gm1 gm2 gmm
.
For i 6= j, 1/gij is the mean of the exponential time until the processmoves from state i to j.
gij can be interpreted as the transition rate from i to j.
For i = j, 1/gii = 1/j 6=i gij is the mean of the exponential holding
time in state i.
gii can be interpreted as the transition rate out of state i.
156
-
TTF EXAMPLE
State space {0,1,2} corresponds to number of functional computers.
B is the time until computer breakdown, exponentially distributedwith mean 1/, or failure rate .
R is the time to repair a computer, exponentially distributed withmean 1/, or repair rate .
Generator for {Yt; t 0} is
G =
0 0 0 (+ ) 0
Estimate = E[TTF] when = 1 per day, = 1000 per day, fromn= 1000 replications.
157
-
STATE-CHANGE PROCESS
Let {Xn;n= 0,1,2, . . .} be the state-change process, where n countsthe number of state changes without regard to time.
The probability of transition from i to j is pij = gij/gii, and from ito i is pii = 0, provided gii > 0. If gii = 0 then pii = 1.
Transition matrix for TTF example is
P =
1 0 0/(+ ) 0 /(+ )
0 1 0
=
1 0 01/1001 1000/1001
0 1 0
158
-
CODE FOR TTFPublic Sub TTF(Lambda As Double, Mu As Double, Sum As Double)sub to generate one replication of the ttf for theairline reservation systemvariables State = number of operational computers Lambda = computer failure rate Mu = computer repair rate Sum = generate ttf value that is returned from the Call
Dim State As IntegerDim Fail As DoubleDim Repair As Double
State = 2Sum = 0While State > 0
If State = 2 ThenFail = Exponential(1 / Lambda)Sum = Sum + FailState = 1
ElseFail = Exponential(1 / Lambda)Repair = Exponential(1 / Mu)If Repair < Fail Then
Sum = Sum + RepairState = 2
ElseSum = Sum + FailState = 0
End IfEnd If
WendEnd Sub
159
-
CRUDE EXPERIMENT
Variance is reduced relative to a crude experiment.
Example: Estimate = E[Y ] (expected TTF)
If Y1, . . . , Yn are i.i.d. then Var[Y ] = 2Y /n where
2Y = Var[Y ] = E[(Y )2]
VRTs try to do better than this.
number of independent replications: npoint estimator: Yinterval estimator: Y t1/2,n1S/
n
160
-
INVERSE CDF REVIEW
To generate X F we can set
X = F1(U)
with U U(0,1).
If we want X1 F1 and X2 F2 but can otherwise choose the jointdistribution then...
the minimum Cov[X1, X2] (and correlation) occurs when
X1 = F11 (U) and X2 = F12 (1 U)
and the maximum Cov[X1, X2] (and correlation) occurs when
X1 = F11 (U) and X2 = F12 (U)
161
-
ANTITHETIC VARIATES
AV applies to estimating an absolute parameter.
The idea is to balance bad system performance with good systemperformance to obtain a better estimate of mean performance.
This is accomplished by balancing the pseudorandom numbers acrossreplications.
Rather than balance across all n replications, we typically balanceacross pairs of replications.
We hope to end up with negatively correlated (antithetic) pairs ofoutputs.
162
-
INDUCING NEGATIVE CORRELATION
On replication 2i 1 use U1, U2, . . .
On replication 2i use 1 U1,1 U2, . . .
Use different random numbers on replications(2i 1,2i) and (2j 1,2j) for i 6= j (implies independent).
Let Yj = (Y2j1+ Y2j)/2, j = 1,2, . . . n/2 and
Y =1n/2
n/2
j=1Yj
Var[Y ] =2Yn(1 + )
where = Corr[Y2j1, Y2j].163
-
AV EXPERIMENT
number of independent replications: n/2point estimator Yinterval estimator: Y t1/2,n/21S
2/n
where
S2 =1
n/2 1
n/2
j=1
(Yj Y
)2
It is possible to induce an antithetic effect among k-tuples ratherthan pairs.
The generation schemes tend to be complicated and there is nogeneral optimal transformation for k > 2.
164
-
IMPLEMENTATION
In replications 2i 1 and 2i we want U and 1U to be used for thesame purpose. Assigning a distinct stream to each input processhelps.
The random variate generators in many simulation languages havecalling sequences like
NORMAL(mean, stddev, stream)
Supposedly if you use -stream you obtain the antithetic variates.
However, if the generator is not inverse cdf then there may be noantithetic effect.
165
-
CODE FOR AV
To keep the random numbers sychronized, we run the pairs togetheruntil the last one quits.
Public Sub TTFAV(Lambda As Double, Mu As Double, _Sum As Double, AVSum As Double)
sub to generate one pair of antithetic replications of the ttf for theairline reservation systemvariables State = number of operational computers Lambda = computer failure rate Mu = computer repair rate Sum = generate ttf value that is returned from the Call
Dim State As IntegerDim AVState As IntegerDim Fail As DoubleDim Repair As DoubleDim U1 As DoubleDim U2 As Double
State = 2AVState = 2Sum = 0AVSum = 0
166
-
While State > 0 Or AVState > 0 simulate until thepair of runs is complete
U1 = MRG32k3a() random number for failureU2 = MRG32k3a() random number for repairIf State > 0 Then
Fail = -VBA.Log(1 - U1) / LambdaRepair = -VBA.Log(1 - U2) / Mu
If State = 2 ThenSum = Sum + FailState = 1
ElseIf Repair < Fail Then
Sum = Sum + RepairState = 2
ElseSum = Sum + FailState = 0
End IfEnd If
End If
167
-
antithetic runIf AVState > 0 Then
Fail = -VBA.Log(U1) / LambdaRepair = -VBA.Log(U2) / Mu
If AVState = 2 ThenAVSum = AVSum + FailAVState = 1
ElseIf Repair < Fail Then
AVSum = AVSum + RepairAVState = 2
ElseAVSum = AVSum + FailAVState = 0
End IfEnd If
End If
Wend
End Sub
168
-
EFFECTIVENESS OF AV
If < 0 then Var[Y ] < Var[Y ]
If < (, n) then
E[t1/2,n/21S2/n] < E[t1/2,n1S/
n]
The function (, n) approaches 0 as n increases; for n 100,(, n) 0 for = 0.01,0.05,0.1.
169
-
CONDITIONAL EXPECTED VALUE
Suppose S and T have joint distribution
S
T1 2 3
1 2/10 1/10 1/10 4/102 1/20 8/20 3/20 12/20
5/20 10/20 5/20
Then
E[T ] = (1)5/20 + (2)10/20 + (3)5/20 = 2
The conditional distribution of T given S is
Ta 1 2 3
Pr{T = a|S = 1} 2/4 1/4 1/4 1.00Pr{T = a|S = 2} 1/12 8/12 3/12 1.00
170
-
Therefore,
E[T |S = 1] = (1)2/4 + (2)1/4 + (3)1/4 = 7/4
E[T |S = 2] = (1)1/12 + (2)8/12 + (3)3/12= 26/12
A fundamental result from mathematical statistics is that
E[T ] = ES[ET |S[T |S]
]
In our case
ES[ET |S[T |S]
]= E[T |S = 1]Pr{S = 1}+ E[T |S = 2]Pr{S = 2}
= (7/4)(4/10)+ (26/12)(12/20) = 2
For variance reduction, these results imply that we can use E[T |S]to estimate E[T ].
-
CONTROL VARIATES
In CV we approximate E[Y |C] as
E[Y |C] 0+ 1(C C) (1)
where C = E[C]. Therefore
= E[Y ] = E [E[Y |C]] = 0
If we observe (Yi, Ci C), i = 1,2, . . . , n, then we can estimate 0(and thus ) via a least-squares regression.
If Y and C are strongly correlated, then this estimator will havesmaller variance than Y .
171
-
The least-squares estimator of = 0 is
0 = Y 1(C C)
where
1 =ni=1(Yi Y )(Ci C)n
i=1(Ci C)2.
How does 0 compare to Y ?
172
-
Result: If the linear model is correct, then E[1] = 1.
Proof: Let C = (C1, C2, . . . , Cn). Then E[1|C = c] =
=E[(Yi Y )|C = c](ci c)(ci c)2
=(0+ 1(ci C) 0 1(c C))(ci c)(ci c)2
= 1.
By the double expectation theorem, E[1] = 1.
Result: If the linear model is correct then E[0] = 0.
Proof:E[0|C= c] = E[Y |C= c]E[1|C = c](c C) = 0+ 1(c C)1(c C) = 0. Then the double expectation theorem gives theresult.
173
-
Notice that
Var[0] = Var[E[0|C]
]+E
[Var[0|C]
]
= E[Var[0|C]
]
since Var[E[0|C]
]= Var[0] = 0 from the proof of the result above.
Under the special assumption of constant conditional variance (Var[Y |C] =2), the following result can be derived:
Result: If the linear model is correct and we have constant condi-tional variance then
Var[0|C] = 2(1n+
(C C)2(Ci C)2
).
174
-
If, further, (Y,C) are jointly normally distributed with correlation ,then 2 = (1 2)2Y and
E[(C C)2(Ci C)2
]=
1n(n 3)
.
Result: If (Y,C) are bivariate normal, then
Var[0] =(n 2n 3
)(1 2)
2Yn.
Thus, if 2 > 1/(n2) then the control-variate estimator has smallervariance than the sample mean.
175
-
The se and confidence interval for are the usual ones for theintercept term of a least-squares regression.
Regression set up:
Y =
Y1Y2...Yn
=
1 C1 C1 C2 C...1 Cn C
[01
]+ = C+
= (CC)1CY
0 t1/2,n2se
Var[] = (YY CY)(CC)1/(n 2)
176
-
CV for TTF
Possible control variates:
We know the distribution of the time until a computer breakdown.
C1 = average time until a computer breakdown (E[C1] = 1/)
The number of times the process enters state 1 has a geometricdistribution with parameter/(+ )
C2 = number of times process is in state 1(E[C2] = (+ )/)
177
-
CODE FOR CVPublic Sub TTFCV(Lambda As Double, Mu As Double, Sum As Double, _
CV1 As Double, CV2 As Double)
sub to generate one replication of the ttf for theairline reservation system and record two control variates
variables State = number of operational computers Lambda = computer failure rate Mu = computer repair rate Sum = generate ttf value that is returned from the Call Sum1 = sum associated with control variate number 1, average computer failure time Count1 = counter associated with CV1 Count2 = counter for CV2, the number of time process is in state 1 CV# = control variate number #
Dim State As IntegerDim Fail As DoubleDim Repair As DoubleDim Sum1 As DoubleDim Count1 As Double
State = 2Sum = 0Sum1 = 0Count1 = 0Count2 = 0
178
-
While State > 0If State = 2 Then
Fail = Exponential(1 / Lambda)Sum = Sum + FailState = 1
record control variate dataSum1 = Sum1 + FailCount1 = Count1 + 1
ElseFail = Exponential(1 / Lambda)Repair = Exponential(1 / Mu)
record control variate dataSum1 = Sum1 + FailCount1 = Count1 + 1Count2 = Count2 + 1
If Repair < Fail ThenSum = Sum + RepairState = 2
ElseSum = Sum + FailState = 0
compute control variatesCV1 = Sum1 / Count1 - 1 / LambdaCV2 = Count2 - (Lambda + Mu) / Lambda
End IfEnd If
Wend
End Sub
179
-
Comments about CV:
When the linear relationship does not hold, the CV estimator isbiased. Usually not a problem with large samples, but there areremedies.
To estimate a probability p = Pr{Y a}, let
Zi = I(Yi a)
Ci = I(Xi b) for X with known probability near p and stronglycorrelated with Y .
Then do least-squares regression of Z onC C.
Control-variate estimators for quantiles are obtained by invertingthe probability estimator above.
180
-
Multiple control variates can be used (via a multiple regression)but make sure n is much larger than the number of controlvariates.
Regression set up:
Y1Y2...Yn
=
1 C11 C1 C1q Cq1 C21 C1 C2q Cq...1 Cn1 C1 Cnq Cq
01...q
+
If Y and C are jointly normally distributed
Var[0]=(
n 2n q 2
)(1R2YC)Var[Y ]
Again, the usual se and confidence interval for 0 apply.
0 t1/2,nq1se
-
CONDITIONAL EXPECTATIONS
CE is useful when E[Y |X] is known for all X, because
E [E[Y |X]] =
and
Var [E[Y |X]] = Var[Y ] E[Var[Y |X]]
Instead of observing Y1, . . . , Yn, we observeX1, . . . ,Xn and let
Yce =1n
n
i=1E[Y |Xi]
CE gives a guaranteed variance reduction.
181
-
CE EXPERIMENT
number of independent replications: npoint estimator Yceinterval estimator: Yce t1/2,n1S/
n
where
S2 =1
n 1
n
i=1
(E[Y |Xi] Yce
)2
182
-
CE for TTF
We can write
TTF = H1+H2+ +HNwhere Hn is the holding time in the nth state entered, and N is thenumber of states entered before system failure.
But
E[Hn|Xn = 2] = 1/E[Hn|Xn = 1] = 1/(+ )
Thus we condition on X = (X1, . . . , XN)
Yce = E[H1|X1] + E[H2|X2, X1] + + E[HN |XN , . . . , X1]= E[H1|X1] + E[H2|X2] + +E[HN |XN ]
183
-
CODE FOR CEPublic Sub TTFCE(Lambda As Double, Mu As Double, Sum As Double)sub to generate one replication of the ttf for theairline reservation system using conditional expectationsvariables State = number of operational computers Lambda = computer failure rate, Mu = computer repair rate Sum = generate ttf value that is returned from the Call HoldingTime# = expected holding time in state #
Dim State As IntegerDim Fail As DoubleDim Repair As DoubleDim HoldingTime1 As DoubleDim HoldingTime2 As DoubleHoldingTime1 = 1 / (Lambda + Mu)HoldingTime2 = 1 / LambdaState = 2Sum = 0While State > 0
If State = 2 ThenFail = Exponential(1 / Lambda)Sum = Sum + HoldingTime2State = 1
ElseFail = Exponential(1 / Lambda)Repair = Exponential(1 / Mu)If Repair < Fail Then
Sum = Sum + HoldingTime1State = 2
ElseSum = Sum + HoldingTime1State = 0
184
-
End IfEnd If
WendEnd Sub
-
IMPORTANCE SAMPLING
Suppose we represent
=
Ag(z)f(z) dz
( could be a probability if g is an indicator function).
Provided f (z) > 0 when f(z) > 0, we can rewrite as
=
Ag(z)
f(z)f (z)
f (z) dz
This is now an expectation with respect to f .
If g(z)f(z)/f (z) is nearly constant for all z, then we have reducedvariance.
In fact, if f = gf/ then the variance is 0!
185
-
The random variable f(Z)/f (Z) is called the likelihood ratio (LR).
Frequently, Z = (Z1, Z2, . . . , ZN) are independent so that
LR =f(Z)f (Z)
=Ni=1 f(Zi)Ni=1 f (Zi)
In dynamic simulation N can become quite large, making this termunstable.
Selecting f is not easy in general, and a bad selection can increasevariance.
186
-
IS for TTF
If we change the failure rate to and the repair rate to thenthe LR for Yt is
LR =NBi=1 e
BiNRj=1 eRj
NBi=1 e
BiNRj=1 eRj
=NB e
NBi=1Bi NR e
NRj=1Rj
()NB eNB
i=1Bi ()NR eNR
i=1Ri
If we only simulate the state-change process Xn then the LR is
LR =Nn=1 pXn1,XnNn=1 pXn1,Xn
Try changing = 1 to = 1/2.
187
-
CODE FOR ISPublic Sub TTFIS(Lambda As Double, Mu As Double, _
LambdaPrime As Double, MuPrime As Double, _Sum As Double, LikelihoodRatio As Double)
sub to generate one replication of the ttf for theairline reservation system using importance sampling
variables State = number of operational computers Lambda = computer failure rate Mu = computer repair rate LambdaPrime = biased computer failure rate MuPrime = biased computer repair rate Sum = generate ttf value that is returned from the Call LikelihoodRatio = likelihood ratio
Dim State As IntegerDim Fail As DoubleDim Repair As Double
initialize likelihood ratioLikelihoodRatio = 1
State = 2Sum = 0
188
-
While State > 0If State = 2 Then
Fail = Exponential(1 / LambdaPrime)Sum = Sum + FailState = 1LikelihoodRatio = LikelihoodRatio * Lambda * Exp(-Lambda * Fail)/ _
(LambdaPrime * Exp(-LambdaPrime * Fail))Else
Fail = Exponential(1 / LambdaPrime)Repair = Exponential(1 / MuPrime)If Repair < Fail Then
Sum = Sum + RepairState = 2LikelihoodRatio = LikelihoodRatio*Lambda*Exp(-Lambda* Fail) _
/ (LambdaPrime * Exp(-LambdaPrime * Fail)) _* Mu * Exp(-Mu * Repair)/(MuPrime * Exp(-MuPrime * Repair))
ElseSum = Sum + FailState = 0LikelihoodRatio = LikelihoodRatio*Lambda*Exp(-Lambda * Fail) _
/ (LambdaPrime * Exp(-LambdaPrime * Fail)) _* Mu * Exp(-Mu*Repair)/(MuPrime * Exp(-MuPrime * Repair))
Sum = Sum * LikelihoodRatioEnd If
End IfWend
End Sub
189
-
CODE FOR IS+CEPublic Sub TTFIS(Lambda As Double, Mu As Double, _
LambdaPrime As Double, MuPrime As Double, _Sum As Double, LikelihoodRatio As Double)
sub to generate one replication of the ttf for theairline reservation system using conditional expectationsand importance samplingvariables State = number of operational computers Lambda = computer failure rate Mu = computer repair rate LambdaPrime = biased computer failure rate MuPrime = biased computer repair rate Sum = generate ttf value that is returned from the Call HoldingTime# = biased expected holding time in state # Product = importance sampling accumulator LikelihoodRatio = likelihood ratio p12 = correct transition probability from state 1 to 2 is12 = biased transition probability from state 1 to 2
Dim State As IntegerDim Fail As DoubleDim Repair As DoubleDim HoldingTime1 As DoubleDim HoldingTime2 As Double
190
-
compute expected holding times & transition probabilitiesHoldingTime1 = 1 / (Lambda + Mu)HoldingTime2 = 1 / Lambdap12 = Mu / (Lambda + Mu)is12 = MuPrime / (LambdaPrime + MuPrime)Product = 1State = 2Sum = 0
While State > 0If State = 2 Then
Fail = Exponential(1 / LambdaPrime)Sum = Sum + HoldingTime2State = 1
ElseFail = Exponential(1 / LambdaPrime)Repair = Exponential(1 / MuPrime)If Repair < Fail Then
Sum = Sum + HoldingTime1State = 2Product = Product * (p12 / is12)
ElseSum = Sum + HoldingTime1State = 0LikelihoodRatio = Product * ((1 - p12) / (1 - is12))Sum = Sum * LikelihoodRatio
End IfEnd If
WendEnd Sub
191
-
HOW DID THEY DO?
AV: a little variance reduction
CE: a little variance reduction
CV: huge variance reduction (97%)
IS: a little variance reduction
IS+CE: modest variance reduction (60%)
192
-
VRT EXERCISE
Problem: Estimate p = Pr{Y > a} and E[Y ] where
Y = max{X1+X4+X6, X1+X3+X5+X6, X2+X5+X6}Crude Experiment:
1. sum 0
2. repeat n times:
sample X1, . . . , X5
Y =max{X1+X4+X6, X1+X3+X5+X6, X2+X5+X6}
sumP = sumP +I(Y > a)
sumM = sumM +Y
3. return p = sumP/n
Y = sumM/n193
-
DETAILS
Let the Xi be i.i.d. exponential(1).
Take a = 6, n= 30.
Try every variance reduction technique we learned on one of the twoproblems (or both).
You must demonstrate that at least one of your VRTs works on eachproblem. Use the experiment-within-an-experiment approach withm macroreplications and form a confidence interval on the varianceratio.
Remember that the practitioner gets only one experiment, but theresearcher gets as many as necessary to establish properties.
194