Variance Reduction Technique

VARIANCE REDUCTION(L&K Chapter 11)

Variance reduction techniques (VRTs) improve a crude simulationexperiment by

reducing the variance of the estimators without increasing thecomputational effort, or by

obtaining the same variance with less computational effort.

VRTs focus on point-estimator performance, but this improvementshould be reflected in a reduced measure of error.

VRTs do not necessarily change the underlying variance of the sys-tem.

147

VRTs may require additional computational or analyst effort, so youmust decide if there is a net improvement.

We present general VRTs, but VRTs work best when tailored to aspecific problem.

148

WHY DO VARIANCE REDUCTION?

Recall the Challenger accident in 1986.

After that, a friend of mine was hired by Morton Thiokol to look atreliability estimation. MT wanted chances of 106 of failure.

At 100 random numbers per rep, how many do we need to observe30 failures? 1,000,000 x 100 x 30 = 3 billion

Suppose that the failure probability is 1 105 (meaning it is anorder of magnitude too large).

If we make 30 million replications, the standard error of the estimateof Pr{failure} is approximately 2 105. This is twice as large asthe value we are estimating!

Without variance reduction a lot of simulation still gives a poorestimate.

149

APPLICATIONS OF VARIANCE REDUCTION

Highly reliable systems

Financial engineering/quantitative finance

Simulation optimization

Lots of alternatives

Noisy gradient estimates

Metamodeling/mapping

Real-time control using simulation

150

VARIANCE REDUCTION IN REAL LIFE

Probably the most common example of variance reduction is publicopinion polling (very popular in the US).

One reason why small samples like 1500 give such good estimatesis that the sampling is not purely random.

Stratification among income levels, race, political parties insure amore representative sample

151

VARIANCE REDUCTION TECHNIQUES

We will cover the following VRTs for estimating = E[Y ], using thehighly reliable system simulation as an illustration.

Antithetic Variates (AV) Applicable in all stochastic simulations. Attempts tobalance simulation outputs by balancing the pseudorandom numbers.

Control variates (CV) Applicable in all stochastic simulations. Exploits infor-mation about simulation inputs to reduce variance of simulation outputs.Based on least-squares regression.

Conditional Expectations (CE) Not generally applicable, but guaranteed effec-tive when it is. Basic idea is not to simulate what you can compute.

Importance Sampling (IS) Generally applicable. Attempts to bias the simula-tion outputs toward more important areas.

We have already covered CRN; L&K add Indirect Estimation.

152

ABSOLUTE VS. RELATIVE PARAMETER

In theory any VRT could be used for any problem, but it is usefulto distinguish two classes:

Estimating the value of a parameter associated with a singlesystem (absolute parameter).

In this case the actual value of the parameter matters.

Estimating the difference between parameters of two or moredistinct systems (relative parameter).

In this case only the relative difference matters and not the actualvalues of the parameters. Common random numberswhich wehave already coveredis the primary VRT here.

153

BACKGROUND

Cov[A,B] = E[(A A)(B B)]

Corr[A,B] = Cov[A,B]/Var[A]Var[B]

Let Z = Y bX. Then

Var[Z] = Var[Y ] + b2Var[X] 2bCov[Y,X]

E[T ] = E [E[T |S]]

Some VRTs exploit these relationships.

154

MARKOV-PROCESS PRIMER

A continuous-time Markov process is a stochastic process {Yt; t 0}with state-space a subset of {0,1,2, . . .}.

Examples include queueing, reliability, inventory, combat, biologicalprocesses.

Characteristics:

Time spent in each state is exponentially distributed.Next state entered depends only on the current state.

MPs can be analyzed via mathematical analysis, but when the statespace is large numerical analysis or simulation may be required.

155

Suppose the state space is {1,2, . . . ,m}. The generator of the MPdescribes how Yt evolves over time.

G =

g11 g12 g1mg21 g22 g2m... ... ... ...gm1 gm2 gmm

.

For i 6= j, 1/gij is the mean of the exponential time until the processmoves from state i to j.

gij can be interpreted as the transition rate from i to j.

For i = j, 1/gii = 1/j 6=i gij is the mean of the exponential holding

time in state i.

gii can be interpreted as the transition rate out of state i.

156

TTF EXAMPLE

State space {0,1,2} corresponds to number of functional computers.

B is the time until computer breakdown, exponentially distributedwith mean 1/, or failure rate .

R is the time to repair a computer, exponentially distributed withmean 1/, or repair rate .

Generator for {Yt; t 0} is

G =

0 0 0 (+ ) 0

Estimate = E[TTF] when = 1 per day, = 1000 per day, fromn= 1000 replications.

157

STATE-CHANGE PROCESS

Let {Xn;n= 0,1,2, . . .} be the state-change process, where n countsthe number of state changes without regard to time.

The probability of transition from i to j is pij = gij/gii, and from ito i is pii = 0, provided gii > 0. If gii = 0 then pii = 1.

Transition matrix for TTF example is

P =

1 0 0/(+ ) 0 /(+ )

0 1 0

=

1 0 01/1001 1000/1001

0 1 0

158

CODE FOR TTFPublic Sub TTF(Lambda As Double, Mu As Double, Sum As Double)sub to generate one replication of the ttf for theairline reservation systemvariables State = number of operational computers Lambda = computer failure rate Mu = computer repair rate Sum = generate ttf value that is returned from the Call

Dim State As IntegerDim Fail As DoubleDim Repair As Double

State = 2Sum = 0While State > 0

If State = 2 ThenFail = Exponential(1 / Lambda)Sum = Sum + FailState = 1

ElseFail = Exponential(1 / Lambda)Repair = Exponential(1 / Mu)If Repair < Fail Then

Sum = Sum + RepairState = 2

ElseSum = Sum + FailState = 0

End IfEnd If

WendEnd Sub

159

CRUDE EXPERIMENT

Variance is reduced relative to a crude experiment.

Example: Estimate = E[Y ] (expected TTF)

If Y1, . . . , Yn are i.i.d. then Var[Y ] = 2Y /n where

2Y = Var[Y ] = E[(Y )2]

VRTs try to do better than this.

number of independent replications: npoint estimator: Yinterval estimator: Y t1/2,n1S/

n

160

INVERSE CDF REVIEW

To generate X F we can set

X = F1(U)

with U U(0,1).

If we want X1 F1 and X2 F2 but can otherwise choose the jointdistribution then...

the minimum Cov[X1, X2] (and correlation) occurs when

X1 = F11 (U) and X2 = F12 (1 U)

and the maximum Cov[X1, X2] (and correlation) occurs when

X1 = F11 (U) and X2 = F12 (U)

161

ANTITHETIC VARIATES

AV applies to estimating an absolute parameter.

The idea is to balance bad system performance with good systemperformance to obtain a better estimate of mean performance.

This is accomplished by balancing the pseudorandom numbers acrossreplications.

Rather than balance across all n replications, we typically balanceacross pairs of replications.

We hope to end up with negatively correlated (antithetic) pairs ofoutputs.

162

INDUCING NEGATIVE CORRELATION

On replication 2i 1 use U1, U2, . . .

On replication 2i use 1 U1,1 U2, . . .

Use different random numbers on replications(2i 1,2i) and (2j 1,2j) for i 6= j (implies independent).

Let Yj = (Y2j1+ Y2j)/2, j = 1,2, . . . n/2 and

Y =1n/2

n/2

j=1Yj

Var[Y ] =2Yn(1 + )

where = Corr[Y2j1, Y2j].163

AV EXPERIMENT

number of independent replications: n/2point estimator Yinterval estimator: Y t1/2,n/21S

2/n

where

S2 =1

n/2 1

n/2

j=1

(Yj Y

)2

It is possible to induce an antithetic effect among k-tuples ratherthan pairs.

The generation schemes tend to be complicated and there is nogeneral optimal transformation for k > 2.

164

IMPLEMENTATION

In replications 2i 1 and 2i we want U and 1U to be used for thesame purpose. Assigning a distinct stream to each input processhelps.

The random variate generators in many simulation languages havecalling sequences like

NORMAL(mean, stddev, stream)

Supposedly if you use -stream you obtain the antithetic variates.

However, if the generator is not inverse cdf then there may be noantithetic effect.

165

CODE FOR AV

To keep the random numbers sychronized, we run the pairs togetheruntil the last one quits.

Public Sub TTFAV(Lambda As Double, Mu As Double, _Sum As Double, AVSum As Double)

sub to generate one pair of antithetic replications of the ttf for theairline reservation systemvariables State = number of operational computers Lambda = computer failure rate Mu = computer repair rate Sum = generate ttf value that is returned from the Call

Dim State As IntegerDim AVState As IntegerDim Fail As DoubleDim Repair As DoubleDim U1 As DoubleDim U2 As Double

State = 2AVState = 2Sum = 0AVSum = 0

166

While State > 0 Or AVState > 0 simulate until thepair of runs is complete

U1 = MRG32k3a() random number for failureU2 = MRG32k3a() random number for repairIf State > 0 Then

Fail = -VBA.Log(1 - U1) / LambdaRepair = -VBA.Log(1 - U2) / Mu

If State = 2 ThenSum = Sum + FailState = 1

ElseIf Repair < Fail Then

Sum = Sum + RepairState = 2


End IfEnd If

End If

167

antithetic runIf AVState > 0 Then

Fail = -VBA.Log(U1) / LambdaRepair = -VBA.Log(U2) / Mu

If AVState = 2 ThenAVSum = AVSum + FailAVState = 1

ElseIf Repair < Fail Then

AVSum = AVSum + RepairAVState = 2

ElseAVSum = AVSum + FailAVState = 0

End IfEnd If

End If

Wend

End Sub

168

EFFECTIVENESS OF AV

If < 0 then Var[Y ] < Var[Y ]

If < (, n) then

E[t1/2,n/21S2/n] < E[t1/2,n1S/

n]

The function (, n) approaches 0 as n increases; for n 100,(, n) 0 for = 0.01,0.05,0.1.

169

CONDITIONAL EXPECTED VALUE

Suppose S and T have joint distribution

S

T1 2 3

1 2/10 1/10 1/10 4/102 1/20 8/20 3/20 12/20

5/20 10/20 5/20

Then

E[T ] = (1)5/20 + (2)10/20 + (3)5/20 = 2

The conditional distribution of T given S is

Ta 1 2 3

Pr{T = a|S = 1} 2/4 1/4 1/4 1.00Pr{T = a|S = 2} 1/12 8/12 3/12 1.00

170

Therefore,

E[T |S = 1] = (1)2/4 + (2)1/4 + (3)1/4 = 7/4

E[T |S = 2] = (1)1/12 + (2)8/12 + (3)3/12= 26/12

A fundamental result from mathematical statistics is that

E[T ] = ES[ET |S[T |S]

]

In our case

ES[ET |S[T |S]

]= E[T |S = 1]Pr{S = 1}+ E[T |S = 2]Pr{S = 2}

= (7/4)(4/10)+ (26/12)(12/20) = 2

For variance reduction, these results imply that we can use E[T |S]to estimate E[T ].

CONTROL VARIATES

In CV we approximate E[Y |C] as

E[Y |C] 0+ 1(C C) (1)

where C = E[C]. Therefore

= E[Y ] = E [E[Y |C]] = 0

If we observe (Yi, Ci C), i = 1,2, . . . , n, then we can estimate 0(and thus ) via a least-squares regression.

If Y and C are strongly correlated, then this estimator will havesmaller variance than Y .

171

The least-squares estimator of = 0 is

0 = Y 1(C C)

where

1 =ni=1(Yi Y )(Ci C)n

i=1(Ci C)2.

How does 0 compare to Y ?

172

Result: If the linear model is correct, then E[1] = 1.

Proof: Let C = (C1, C2, . . . , Cn). Then E[1|C = c] =

=E[(Yi Y )|C = c](ci c)(ci c)2

=(0+ 1(ci C) 0 1(c C))(ci c)(ci c)2

= 1.

By the double expectation theorem, E[1] = 1.

Result: If the linear model is correct then E[0] = 0.

Proof:E[0|C= c] = E[Y |C= c]E[1|C = c](c C) = 0+ 1(c C)1(c C) = 0. Then the double expectation theorem gives theresult.

173

Notice that

Var[0] = Var[E[0|C]

]+E

[Var[0|C]

]

= E[Var[0|C]

]

since Var[E[0|C]

]= Var[0] = 0 from the proof of the result above.

Under the special assumption of constant conditional variance (Var[Y |C] =2), the following result can be derived:

Result: If the linear model is correct and we have constant condi-tional variance then

Var[0|C] = 2(1n+

(C C)2(Ci C)2

).

174

If, further, (Y,C) are jointly normally distributed with correlation ,then 2 = (1 2)2Y and

E[(C C)2(Ci C)2

]=

1n(n 3)

.

Result: If (Y,C) are bivariate normal, then

Var[0] =(n 2n 3

)(1 2)

2Yn.

Thus, if 2 > 1/(n2) then the control-variate estimator has smallervariance than the sample mean.

175

The se and confidence interval for are the usual ones for theintercept term of a least-squares regression.

Regression set up:

Y =

Y1Y2...Yn

=

1 C1 C1 C2 C...1 Cn C

[01

]+ = C+

= (CC)1CY

0 t1/2,n2se

Var[] = (YY CY)(CC)1/(n 2)

176

CV for TTF

Possible control variates:

We know the distribution of the time until a computer breakdown.

C1 = average time until a computer breakdown (E[C1] = 1/)

The number of times the process enters state 1 has a geometricdistribution with parameter/(+ )

C2 = number of times process is in state 1(E[C2] = (+ )/)

177

CODE FOR CVPublic Sub TTFCV(Lambda As Double, Mu As Double, Sum As Double, _

CV1 As Double, CV2 As Double)

sub to generate one replication of the ttf for theairline reservation system and record two control variates

variables State = number of operational computers Lambda = computer failure rate Mu = computer repair rate Sum = generate ttf value that is returned from the Call Sum1 = sum associated with control variate number 1, average computer failure time Count1 = counter associated with CV1 Count2 = counter for CV2, the number of time process is in state 1 CV# = control variate number #

Dim State As IntegerDim Fail As DoubleDim Repair As DoubleDim Sum1 As DoubleDim Count1 As Double

State = 2Sum = 0Sum1 = 0Count1 = 0Count2 = 0

178

While State > 0If State = 2 Then

Fail = Exponential(1 / Lambda)Sum = Sum + FailState = 1

record control variate dataSum1 = Sum1 + FailCount1 = Count1 + 1

ElseFail = Exponential(1 / Lambda)Repair = Exponential(1 / Mu)

record control variate dataSum1 = Sum1 + FailCount1 = Count1 + 1Count2 = Count2 + 1

If Repair < Fail ThenSum = Sum + RepairState = 2


compute control variatesCV1 = Sum1 / Count1 - 1 / LambdaCV2 = Count2 - (Lambda + Mu) / Lambda

End IfEnd If

Wend

End Sub

179

Comments about CV:

When the linear relationship does not hold, the CV estimator isbiased. Usually not a problem with large samples, but there areremedies.

To estimate a probability p = Pr{Y a}, let

Zi = I(Yi a)

Ci = I(Xi b) for X with known probability near p and stronglycorrelated with Y .

Then do least-squares regression of Z onC C.

Control-variate estimators for quantiles are obtained by invertingthe probability estimator above.

180

Multiple control variates can be used (via a multiple regression)but make sure n is much larger than the number of controlvariates.

Regression set up:

Y1Y2...Yn

=

1 C11 C1 C1q Cq1 C21 C1 C2q Cq...1 Cn1 C1 Cnq Cq

01...q

+

If Y and C are jointly normally distributed

Var[0]=(

n 2n q 2

)(1R2YC)Var[Y ]

Again, the usual se and confidence interval for 0 apply.

0 t1/2,nq1se

CONDITIONAL EXPECTATIONS

CE is useful when E[Y |X] is known for all X, because

E [E[Y |X]] =

and

Var [E[Y |X]] = Var[Y ] E[Var[Y |X]]

Instead of observing Y1, . . . , Yn, we observeX1, . . . ,Xn and let

Yce =1n

n

i=1E[Y |Xi]

CE gives a guaranteed variance reduction.

181

CE EXPERIMENT

number of independent replications: npoint estimator Yceinterval estimator: Yce t1/2,n1S/

n

where

S2 =1

n 1

n

i=1

(E[Y |Xi] Yce

)2

182

CODE FOR CEPublic Sub TTFCE(Lambda As Double, Mu As Double, Sum As Double)sub to generate one replication of the ttf for theairline reservation system using conditional expectationsvariables State = number of operational computers Lambda = computer failure rate, Mu = computer repair rate Sum = generate ttf value that is returned from the Call HoldingTime# = expected holding time in state #

Dim State As IntegerDim Fail As DoubleDim Repair As DoubleDim HoldingTime1 As DoubleDim HoldingTime2 As DoubleHoldingTime1 = 1 / (Lambda + Mu)HoldingTime2 = 1 / LambdaState = 2Sum = 0While State > 0

If State = 2 ThenFail = Exponential(1 / Lambda)Sum = Sum + HoldingTime2State = 1

ElseFail = Exponential(1 / Lambda)Repair = Exponential(1 / Mu)If Repair < Fail Then

Sum = Sum + HoldingTime1State = 2

ElseSum = Sum + HoldingTime1State = 0

184

End IfEnd If

WendEnd Sub

IMPORTANCE SAMPLING

Suppose we represent

=

Ag(z)f(z) dz

( could be a probability if g is an indicator function).

Provided f (z) > 0 when f(z) > 0, we can rewrite as

=

Ag(z)

f(z)f (z)

f (z) dz

This is now an expectation with respect to f .

If g(z)f(z)/f (z) is nearly constant for all z, then we have reducedvariance.

In fact, if f = gf/ then the variance is 0!

185

The random variable f(Z)/f (Z) is called the likelihood ratio (LR).

Frequently, Z = (Z1, Z2, . . . , ZN) are independent so that

LR =f(Z)f (Z)

=Ni=1 f(Zi)Ni=1 f (Zi)

In dynamic simulation N can become quite large, making this termunstable.

Selecting f is not easy in general, and a bad selection can increasevariance.

186

IS for TTF

If we change the failure rate to and the repair rate to thenthe LR for Yt is

LR =NBi=1 e

BiNRj=1 eRj

NBi=1 e

BiNRj=1 eRj

=NB e

NBi=1Bi NR e

NRj=1Rj

()NB eNB

i=1Bi ()NR eNR

i=1Ri

If we only simulate the state-change process Xn then the LR is

LR =Nn=1 pXn1,XnNn=1 pXn1,Xn

Try changing = 1 to = 1/2.

187

CODE FOR ISPublic Sub TTFIS(Lambda As Double, Mu As Double, _

LambdaPrime As Double, MuPrime As Double, _Sum As Double, LikelihoodRatio As Double)

sub to generate one replication of the ttf for theairline reservation system using importance sampling

variables State = number of operational computers Lambda = computer failure rate Mu = computer repair rate LambdaPrime = biased computer failure rate MuPrime = biased computer repair rate Sum = generate ttf value that is returned from the Call LikelihoodRatio = likelihood ratio

Dim State As IntegerDim Fail As DoubleDim Repair As Double

initialize likelihood ratioLikelihoodRatio = 1

State = 2Sum = 0

188


Fail = Exponential(1 / LambdaPrime)Sum = Sum + FailState = 1LikelihoodRatio = LikelihoodRatio * Lambda * Exp(-Lambda * Fail)/ _

(LambdaPrime * Exp(-LambdaPrime * Fail))Else

Fail = Exponential(1 / LambdaPrime)Repair = Exponential(1 / MuPrime)If Repair < Fail Then

Sum = Sum + RepairState = 2LikelihoodRatio = LikelihoodRatio*Lambda*Exp(-Lambda* Fail) _

/ (LambdaPrime * Exp(-LambdaPrime * Fail)) _* Mu * Exp(-Mu * Repair)/(MuPrime * Exp(-MuPrime * Repair))

ElseSum = Sum + FailState = 0LikelihoodRatio = LikelihoodRatio*Lambda*Exp(-Lambda * Fail) _

/ (LambdaPrime * Exp(-LambdaPrime * Fail)) _* Mu * Exp(-Mu*Repair)/(MuPrime * Exp(-MuPrime * Repair))

Sum = Sum * LikelihoodRatioEnd If

End IfWend

End Sub

189

CODE FOR IS+CEPublic Sub TTFIS(Lambda As Double, Mu As Double, _

LambdaPrime As Double, MuPrime As Double, _Sum As Double, LikelihoodRatio As Double)

sub to generate one replication of the ttf for theairline reservation system using conditional expectationsand importance samplingvariables State = number of operational computers Lambda = computer failure rate Mu = computer repair rate LambdaPrime = biased computer failure rate MuPrime = biased computer repair rate Sum = generate ttf value that is returned from the Call HoldingTime# = biased expected holding time in state # Product = importance sampling accumulator LikelihoodRatio = likelihood ratio p12 = correct transition probability from state 1 to 2 is12 = biased transition probability from state 1 to 2

Dim State As IntegerDim Fail As DoubleDim Repair As DoubleDim HoldingTime1 As DoubleDim HoldingTime2 As Double

190

compute expected holding times & transition probabilitiesHoldingTime1 = 1 / (Lambda + Mu)HoldingTime2 = 1 / Lambdap12 = Mu / (Lambda + Mu)is12 = MuPrime / (LambdaPrime + MuPrime)Product = 1State = 2Sum = 0


Fail = Exponential(1 / LambdaPrime)Sum = Sum + HoldingTime2State = 1

ElseFail = Exponential(1 / LambdaPrime)Repair = Exponential(1 / MuPrime)If Repair < Fail Then

Sum = Sum + HoldingTime1State = 2Product = Product * (p12 / is12)

ElseSum = Sum + HoldingTime1State = 0LikelihoodRatio = Product * ((1 - p12) / (1 - is12))Sum = Sum * LikelihoodRatio

End IfEnd If

WendEnd Sub

191

HOW DID THEY DO?

AV: a little variance reduction

CE: a little variance reduction

CV: huge variance reduction (97%)

IS: a little variance reduction

IS+CE: modest variance reduction (60%)

192

VRT EXERCISE

Problem: Estimate p = Pr{Y > a} and E[Y ] where

Y = max{X1+X4+X6, X1+X3+X5+X6, X2+X5+X6}Crude Experiment:

1. sum 0

2. repeat n times:

sample X1, . . . , X5

Y =max{X1+X4+X6, X1+X3+X5+X6, X2+X5+X6}

sumP = sumP +I(Y > a)

sumM = sumM +Y

3. return p = sumP/n

Y = sumM/n193

DETAILS

Let the Xi be i.i.d. exponential(1).

Take a = 6, n= 30.

Try every variance reduction technique we learned on one of the twoproblems (or both).

You must demonstrate that at least one of your VRTs works on eachproblem. Use the experiment-within-an-experiment approach withm macroreplications and form a confidence interval on the varianceratio.

Remember that the practitioner gets only one experiment, but theresearcher gets as many as necessary to establish properties.

194

Variance Reduction Technique

Documents

Transcript of Variance Reduction Technique