Output Analysis

*

Analysis of Simulation Experiments

*IntroductionClassification of OutputsDIDO vs. RIRO SimulationAnalysis of One SystemTerminating vs. Steady-State SimulationsAnalysis of Terminating SimulationsObtaining a Specified PrecisionAnalysis of Steady-State SimulationsMethod of Moving Average for Removing the Initial BiasMethod of Batch MeansMultiple Measures of PerformanceAnalysis of Several SystemsComparison of Two Alternative SystemsComparison of More than Two SystemsRanking and Selection Outline

*IntroductionThe greatest disadvantage of simulation:Dont get exact answersResults are only estimates

Careful design and analysis is needed to:Make these estimates as valid and precise as possibleInterpret their meanings properly

Statistical methods are used to analyze the results of simulation experiments.

*What Outputs to Watch?Need to think ahead about what you would want to get out of the simulation:Average, and worst (longest) time in systemAverage, and worst time in queue(s)Average hourly productionStandard deviation of hourly productionProportion of time a machine is up, idle, or downMaximum queue lengthAverage number of parts in system

*Classification of OutputsThere are typically two types of dynamic processes:

Discrete-time process: There is a natural first observation, second observation, etc.but can only observe them when they happen.If Wi = time in system for the ith part produced (for i = 1, 2, ..., N), and there are N parts produced during the simulation

*Classification of OutputsTypical discrete-time output performance measures:

Average time in system

Maximum time in system Proportion of parts that were in the system for more than 1 hourDelay of ith customer in queue Throughput during ith hour

*Classification of OutputsContinuous-time process: Can jump into system at any point in time (real, continuous time) and take a snapshot of something-there is no natural first or second observation.

If Q(t) = number of parts in a particular queue at time t between [0,T] and we run simulation for T units of simulated time

EMBED Word.Picture.6

*Classification of OutputsTypical continuous-time output performance measures:

Time-average length of queue Server Utilization (proportion of time the server is busy)

T

*Classification of OutputsOther continuous-time performance measures:Number of parts in the system at time t Number of machines down at time tProportion of time that there were more than n parts in the queue

*

DIDO Vs. RIRO SimulationDIDO

*RIRO

DIDO Vs. RIRO Simulation

*Analysis of One SystemSingle-server queue (M/M/1), Replicated 10 times


*Analysis of One SystemCAUTION: Because of autocorrelation that exists in the output of virtually all simulation models, classical statistical methods dont work directly within a simulation run.

Time in system for individual jobs: Y1, Y2, Y3, ..., Ynm = E(average time in system)Sample mean:

is an unbiased estimator for m , but how close is this sample mean to m ?

Need to estimate Var( ) to get confidence intervals on m .

*Analysis of One SystemProblem: Because of positive autocorrelation between Yi and Yi+1 (Correl (Yi, Yi+l) > 0), sample variance is no longer an unbiased estimator of the population variance (i.e., unbiasedness of variance estimators can only be achieved if Y1, Y2, Y3, ..., Yn are independent).

As a result, the sample variance

may be severely biased for Var[ ].

In fact, usually E[ ] < Var[ ]

Implications: Understating variances causes us to have too much faith in our point estimates and believe the results too much.

*Types of Simulations with Regard to Output AnalysisTerminating: A simulation where there is a specific starting and stopping condition that is part of the model.

Steady-state: A simulation where there is no specific starting and ending conditions. Here, we are interested in the steady-state behavior of the system.

The type of analysis depends on the goal of the study.

*Examples of Terminating SimulationsA retail/commercial establishment (a bank) that operates from 9 to 5 daily and starts empty and idle at the beginning of each day. The output of interest may be the average wait time of first 50 customers in the system.

A military confrontation between a blue force and a red force. The output of interest may be the probability that the red force loses half of its strength before the blue force loses half of its strength.

*Examples of Steady-State SimulationsA manufacturing company that operates 16 hours a day. The system here is a continuous process where the ending condition for one day is the initial condition for the next day. The output of interest here may be the expected long-run daily production.

A communication system where service must be provided continuously.

*Analysis for Terminating SimulationsObjective: Obtain a point estimate and confidence interval for some parameter Examples:= E (average time in system for n customers)= E (machine utilization)= E (work-in-process)

Reminder: Can not use classical statistical methods within a simulation run because observations from one run are not independently and identically distributed (i.i.d.)

*Analysis for Terminating SimulationsMake n independent replications of the model

Let Yi be the performance measure from the ith replicationYi = average time in system, orYi = work-in-process, or Yi = utilization of a critical facility

Performance measures from different replications, Y1, Y2, ..., Yn, are i.i.d.

But, only one sample is obtained from each replication

Apply classical statistics to Yis, not to observations within a run

Select confidence level 1 a (0.90, 0.95, etc.)

*Analysis for Terminating SimulationsApproximate 100(1 a)% confidence interval for m: unbiased estimator of m

unbiased estimator of Var(Yi)

covers m with approximate probability (1 a)

is the Half-Width expression

*Consider a single-server (M/M/1) queue. The objective is to calculate a confidence interval for the delay of customers in the queue.

n = 10 replications of a single-server queueYi = average delay in queue from ith replicationYis: 2.02, 0.73, 3.20, 6.23, 1.76, 0.47, 3.89, 5.45, 1.44, 1.23For 90% confidence interval, = 0.10 = 2.64, = 3.96, t9, 0.95 = 1.833Approximate 90% confidence interval is 2.64 1.15, or [1.49, 3.79]

Example

*Analysis for Terminating SimulationsInterpretation: 100(1 a)% of the time, the confidence interval formed in this way covers m

Wrong Interpretation: I am 90% confident that m is between 1.49 and 3.79


*Issue 1This confidence-interval method assumes Yis are normally distributed. In real life, this is almost never true.

Because of central-limit theorem, as the number of replications (n) grows, the coverage probability approaches 1 a.

In general, if Yis are averages of something, their distribution tends not to be too asymmetric, and the confidence- interval method shown above has reasonably good coverage.

*The confidence interval may be too wide

In the M/M/1 queue example, the approximate 90% C.I. was:2.64 1.15, or [1.49, 3.79]The half-width is 1.15 which is 44% of the mean (1.15/2.64)That means that the C.I. is 2.64 44% which is not very precise.

To decrease the half-width:Increase n until is small enough (this is called Sequential Sampling)

There are two ways of defining the precision in the estimate Y:Absolute precisionRelative precisionIssue 2

*Obtaining a Specified Precision Absolute Precision: Want to make n large enough such that , where is the half-width and > 0 .

Make n0 replications of the simulation model and compute , , and the half-width, .

Assuming that the estimate of the variance, , does not change appreciably, an approximate expression for the required number of replications to achieve an absolute error of is

*Obtaining a Specified PrecisionRelative Precision: Want to make n large enough such that where .

Make n0 replications of the simulation model and compute , , and the half-width, .

Assuming that the estimates of both population mean, , and population variance, , do not change appreciably, an approximate expression for the required number of replications to achieve an absolute error of is

*Analysis for Steady-State SimulationsObjective: Estimate the steady state mean

Basic question: Should you do many short runs or one long run ?????


*Analysis for Steady-State SimulationsAdvantages:Many short runs:Simple analysis, similar to the analysis for terminating systemsThe data from different replications are i.i.d.One long run:Less initial biasNo restartsDisadvantagesMany short runs:Initial bias is introduced several timesOne long run:Sample of size 1Difficult to get a good estimate of the variance

*Analysis for Steady-State SimulationsMake many short runs: The analysis is exactly the same as for terminating systems. The (1 a)% C.I. is computed as before.

Problem: Because of initial bias, may no longer be an unbiased estimator for the steady state mean, .

Solution: Remove the initial portion of the data (warm-up period) beyond which observations are in steady-state. Specifically pick l (warm-up period) and n (number of observations in one run) such that

*Method of Moving Average for Removing the Initial Bias Welchs method for removing the warm-up period, l:

Make n replications of the model (n>5), each of length m, where m is large. Let be the ith observation from the jth replication ( j = 1, 2, , n; i =1, 2, , m).

Let for i =1, 2, , m.

To smooth out the high frequency oscillations in define the moving average as follows (w is the window and is a positive integer such that ):

*Plot and choose l to be the value of i beyond which seem to have converged.

Note: Perform this procedure for several values of w and choose the smallest w for which the plot of looks reasonably smooth.Method of Moving Average for Removing the Initial Bias

*Analysis for Steady-State SimulationsMake one Long run: Make just one long replication so that the initial bias is only introduced once. This way, you will not be throwing out a lot of data.

Problem: How do you estimate the variance because there is only one run?

Solution: Several methods to estimate the variance:Batch means (only approach to be discussed)Time-series modelsSpectral analysisStandardized time series

*Method of Batch MeansDivide a run of length m into n adjacent batches of length k where m = nk.

Let be the sample or (batch) mean of the jth batch.

The grand sample mean is computed as


*Method of Batch MeansThe sample variance is computed as

The approximate 100(1 a )% confidence interval for is

*Method of Batch MeansTwo important issues:

Issue 1: How do we choose the batch size k?Choose the batch size k large enough so that the batch means, are approximately uncorrelated. Otherwise, the variance, , will be biased low and the confidence interval will be too small which means that it will cover the mean with a probability lower than the desired probability of (1 a ).

*Method of Batch MeansIssue 2: How many batches n?Due to autocorrelation, splitting the run into a larger number of smaller batches, degrades the quality of each individual batch. Therefore, 20 to 30 batches are sufficient.

*Multiple Measures of PerformanceIn most real-world simulation models, several measures of performance are considered simultaneously.Examples include:ThroughputAverage length of queueUtilizationAverage time in systemEach performance measure is perhaps estimated with a confidence interval.Any of the intervals could miss its expected performance measure.Must be careful about overall statements of coverage (i.e., that all intervals contain their expected performance measures simultaneously).

*Multiple Measures of PerformanceSuppose we have k performance measures and the confidence interval for performance measure s for s = 1, 2, ..., k, is at confidence level . Then the probability that all k confidence intervals simultaneously contain their respective true measures is

This is referred to as the Bonferroni inequality.


*Multiple Measure of PerformanceTo ensure that the overall probability (of all k confidence intervals simultaneously containing their respective true mean) is at least 100( ) percent, choose s such that

Can select for all s, or pick s differently with smaller s for the more important performance measures.


*Multiple Measures of PerformanceExample: If k =2 and we want the desired overall confidence level to be at least 90%, we can construct two 95% confidence intervals.

Difficulty: If there are a large number of performance measures, and we want a reasonable overall confidence level (e.g., 90% ), the individual s could become small, making the corresponding confidence intervals very wide. Therefore, it is recommended that the number of performance measures do not exceed 10.

*Analysis of Several SystemsMost simulation projects involve comparison of two or more systems or configurations:Change the number of machines in some workcentersEvaluate various job-dispatch policies (FIFO, SPT, etc.)With two alternative systems, the goal may be to:test the hypotheses: , or build confidence interval for With k > 2 alternatives, the objective may be to:build simultaneous confidence intervals for various combinations of select the best of the k alternativesselect a subset of size m < k that contains the best alternativeselect the m best (unranked) of the alternatives

*Analysis of Several SystemsTo illustrate the danger in making only one run and eyeballing the results when comparing alternatives, consider the following example:

Compare:

Alternative 1: M/M/1 queue with interarrival time of 1 min., and one fast machine with service time of 0.9 min., and Alternative 2: M/M/2 queue with interarrival time of 1 min., and two slow machines with service time of 1.8 min. for each machine.


*Analysis of Several SystemsIf the performance measure of interest is the expected average delay in queue of the first 100 customers with empty-and-idle initial conditions, using queuing analysis, the true steady-state average delays in the queues are:Therefore, system 2 is better

If we run each model just once and calculate the average delay, , from each alternative, and select the system with the smallest , then

Prob(selecting system 1 (wrong answer)) = 0.52

Reason: Randomness in the output

*Analysis of Several SystemsSolution:Replicate each alternative n timesLet = average delay from jth replication of alternative iCompute the average of all replications for alternative i

Select the alternative with the lowest .

If we conduct this experiment many times, the following results are obtained:

n

P(wrong Answer)

1

5

10

20

0.52

0.43

0.38

0.34

*Comparison of Two Alternative SystemsForm a confidence interval for the difference between the performance measures of the two systems ( i.e., ).

If the interval misses 0, there is a statistical difference between the two systems.

Confidence intervals are better than hypothesis tests because if a difference exists, the confidence interval measures its magnitude, while a hypothesis test does not.

There are two slightly different ways for constructing the confidence intervals:Paired-tTwo-Sample-t.

*Paired-t Confidence Interval

Make n replications of the two systems. Let be the jth observation from system i(i = 1, 2).Pair with and define forj = 1, 2, , n.Then, the are IID random variables and , the quantity for which we want to construct a confidence interval.Let

and

Then, the approximate 100(1- ) percent C.I. is

*Two-Sample-t Confidence IntervalMake n1 replications of system 1 and n2 replications of system 2. Here .Again, for system i= 1, 2, let

and

Estimate the degrees of freedom as

Then, the approximate 100(1- ) percent C.I. is

*Contrasting the Two MethodsThe two-sample-t approach requires independence of and , whereas in the paired-t approach and do not have to be independent.

Therefore, in the paired-t approach, common random numbers can be used to induce positive correlation between the observations on the different systems to reduce the variance.

In the paired-t approach, n1 = n2, whereas in the two-sample-t method , .

*Confidence Intervals For Comparing More Than Two SystemsIn the case of more than two alternative systems, there are two ways to construct a confidence interval on selected differences . Comparison with a standard, and All pairwise comparisons

NOTE: Since we are making c > 1 confidence intervals, in order to have an overall confidence level of , we must make each interval at level (Bonferroni).

*Comparison with a StandardIn this case, one of the systems (perhaps the existing system or policy) is a standard. If system 1 is the standard and we want to compare systems 2, 3, ..., k to system 1, k-1 confidence intervals must be constructed for the k-1 differences

In order to achieve an overall confidence level of at least , each of the k-1 confidence intervals must be constructed at level .

Can use paired-t or two-sample-t methods described in the previous section to make the individual intervals.

*All Pairwise ComparisonsIn this case, each system is compared to every other system to detect and quantify any significant differences. Therefore, for k systems, we construct k (k -1) / 2 confidence intervals for the k (k -1) / 2 differences:

Each of the confidence intervals must be constructed at a level of , so that an overall confidence of at least can be achieved.

Again, we can use paired-t or two-sample-t methods to make the individual confidence intervals.

m2 m1

m3 m1

...

mk m1

m3 m2

...

mk m2

.

.

.

mk mk1

*Ranking and SelectionThe goals of ranking and selection are different and more ambitious than simply making a comparison between several alternative systems. Here, the goal may be to:Select the best of k systems

Select a subset of size m containing the best of k systems

Select the m best of k systems

*Ranking and Selection1. Selecting the best of k systems:

Want to select one of the k alternatives as the best.

Because of the inherent randomness in simulation modeling, we cant be sure that the selected system is the one with smallest (assuming small is good). Therefore, we specify a correct-selection probability P* (like 0.90 or 0.95).

Also we specify an indifference zone d* which means that if the best mean and next-best mean differ by more than d*, we select the best one with probability P*.

As an example, suppose that we have 5 alternative configurations and we want to identify the best system with a probability of at least 95%.

*Ranking and Selection2. Selecting a subset of size m containing the best of k systems:

Want to select a subset of size m (< k) that contains the best system with probability of at least P*.

This approach is useful in initial screening of alternatives to eliminate the inferior options.

For example, suppose that we have 10 alternative configurations and we want to identify a subset of 3 alternatives that contains the best system with a probability of at least 95% .

*Ranking and Selection3. Selecting the m best of k systems:

Want to select the m best (unranked) of the k systems so that with probability of at least P* the expected responses of the selected subset are equal to the m smallest expected responses.

This situation may be useful when we want to identify several good options, in case the best one is unacceptable for some reason.

For example, suppose that we have 5 alternative configurations and we want to select the 3 best alternatives and we want the probability of correct selection to be at least 90% .

*********

Output Analysis

Documents

Transcript of Output Analysis