Www.simio.com| Copyright 2010 Simio LLC | All rights reserved. 1 Learning Simio Chapter 10 Analyzing...

21
www.simio.com| Copyright 2010 Simio LLC | All rights reserved. 1 Learning Simio Chapter 10 Analyzing Input Data

Transcript of Www.simio.com| Copyright 2010 Simio LLC | All rights reserved. 1 Learning Simio Chapter 10 Analyzing...

Page 1: Www.simio.com| Copyright 2010 Simio LLC | All rights reserved. 1 Learning Simio Chapter 10 Analyzing Input Data.

www.simio.com| Copyright 2010 Simio LLC | All rights reserved.

1

Learning Simio

Chapter 10Analyzing Input Data

Page 2: Www.simio.com| Copyright 2010 Simio LLC | All rights reserved. 1 Learning Simio Chapter 10 Analyzing Input Data.

www.simio.com| Copyright 2010 Simio LLC | All rights reserved.

2Simio

Outline

Working with various types of data.Fitting distributions to data.Summary of common distributions.Modeling customer arrivals.Modeling task times. Sensitivity of results to data.

Page 3: Www.simio.com| Copyright 2010 Simio LLC | All rights reserved. 1 Learning Simio Chapter 10 Analyzing Input Data.

www.simio.com| Copyright 2010 Simio LLC | All rights reserved.

3Simio

Model Input Data

A model has both structure and input data.

Both the model structure and the input data have a significant impact on the results.

The data can be a problematic aspect of a modeling project.

Page 4: Www.simio.com| Copyright 2010 Simio LLC | All rights reserved. 1 Learning Simio Chapter 10 Analyzing Input Data.

www.simio.com| Copyright 2010 Simio LLC | All rights reserved.

4Simio

Typical Data Cases

No data exists.Data exists in the wrong form.Lots of good data exists.

Page 5: Www.simio.com| Copyright 2010 Simio LLC | All rights reserved. 1 Learning Simio Chapter 10 Analyzing Input Data.

www.simio.com| Copyright 2010 Simio LLC | All rights reserved.

5Simio

No data exists Consider using the Triangular or Pert

distributions (minimum, mode, maximum) for activity times.

Hypothesize distributions based on the underlying processes, and make educated guesses for the parameters.

Run experiments to test sensitivity of results to the parameters.

Don’t use a mean in place of a distribution.

Page 6: Www.simio.com| Copyright 2010 Simio LLC | All rights reserved. 1 Learning Simio Chapter 10 Analyzing Input Data.

www.simio.com| Copyright 2010 Simio LLC | All rights reserved.

6Simio

Data exists in the wrong form. Data observed from a different real-world

process. Time between failures when failures are

count based. Time to repair when repairs are resource

constrained. Data recorded during a “slow time” or a

“busy time Values from multiple processes with no

discriminatory information (e.g., repair times without noting the type of stoppage).

Use the data that does exist to make intelligent guesses for the required data.

Page 7: Www.simio.com| Copyright 2010 Simio LLC | All rights reserved. 1 Learning Simio Chapter 10 Analyzing Input Data.

www.simio.com| Copyright 2010 Simio LLC | All rights reserved.

7Simio

Lots of data exists

If a large amount of data is available an empirical distribution may be used – however a theoretical distribution is preferred (compact, fast, easy to change).

If possible, hypothesize a distribution based on the underlying process (combine data and theory).

Use goodness of fit software to test the hypothesis and estimate the parameters.

Page 8: Www.simio.com| Copyright 2010 Simio LLC | All rights reserved. 1 Learning Simio Chapter 10 Analyzing Input Data.

www.simio.com| Copyright 2010 Simio LLC | All rights reserved.

8Simio

Data Fitting Procedure

Assess IID assumptions. Independent observations. Identically distributed.

Use software to view the data using a histogram

Hypothesize a distribution family/form. Use software to:

Estimate distribution parameters Assess quality of fit

Page 9: Www.simio.com| Copyright 2010 Simio LLC | All rights reserved. 1 Learning Simio Chapter 10 Analyzing Input Data.

www.simio.com| Copyright 2010 Simio LLC | All rights reserved.

9Simio

Sample Data Sets

Page 10: Www.simio.com| Copyright 2010 Simio LLC | All rights reserved. 1 Learning Simio Chapter 10 Analyzing Input Data.

www.simio.com| Copyright 2010 Simio LLC | All rights reserved.

10Simio

Common Distributions

Binomial – Models the number of successes in n trials, when the trials are independent with common success probability, p; for example; the number of defective computer chips found in a lot of n chips.

Negative Binomial – Models the number of trials required to achieve k successes; for example, the number of computer chips that we must inspect to find 4 defective chips.

Poisson – Models the number of independent events that occur in a fixed amount of time or space; for example, the number of customers that arrive to a store during 1 hour, or the number of defects found in 30 square meters of sheet metal.

Normal – Models the distribution of a process that can be thought of as the sum of a number of component processes; for example, a time to assemble a product that is the sum of times required for each assembly operation.

Lognormal – Models the distribution of a process that can be thought of as the product of a number of component processes; for example, the rate of an investment, when interest is compounded, is the product of the returns for a number of periods.

Banks et al., pp. 314-316

Page 11: Www.simio.com| Copyright 2010 Simio LLC | All rights reserved. 1 Learning Simio Chapter 10 Analyzing Input Data.

www.simio.com| Copyright 2010 Simio LLC | All rights reserved.

11Simio

Common Distributions

Exponential – Models the time between independent events, or a process time that is memoryless; for example, the times between the arrivals from a large population of potential customers who act independently of each other. The exponential is a highly variable distribution; it is sometime overused because it often leads to mathematically tractable models. Recall that, if the time between events is exponentially distributed, then the number of events in a fixed period of time is Poisson.

Gamma – An extremely flexible distribution used to model nonnegative random variables (can be shifted away from 0 by adding a constant).

Beta – An extremely flexible distribution used to model bounded random variables. The beta can be shifted away from 0 by adding a constant and can be given a range larger than [0, 1] by multiplying by a constant.

Erlang – Models processes that can be viewed as the sum of several exponentially distributed processes; for example, a computer network fails when a computer and two backup computers fail, and each has a TTF that is exponentially distributed.

Banks et al., pp. 314-316

Page 12: Www.simio.com| Copyright 2010 Simio LLC | All rights reserved. 1 Learning Simio Chapter 10 Analyzing Input Data.

www.simio.com| Copyright 2010 Simio LLC | All rights reserved.

12Simio

Common Distributions

Weibull – Models the time to failure for components; for example, the time to failure for a disk drive. The exponential is a special case of the Weibull.

Discrete or Continuous Uniform – Models complete uncertainty: All outcomes are equally likely. This distribution is often used inappropriately, when there are no data.

Triangular – Models a process for which only the minimum, most likely, and maximum values of the distribution are known; for example, the minimum, most likely, and maximum time required to test a product. This model is a marked improvement over the uniform distribution [in many cases].

Pert – A special case of the Beta with minimum, most likely, and maximum values. The pert provides a “smooth” alternative to the triangular in the absence of data.

Empirical – Samples from the distribution of the actual data collected; often used when no theoretical distribution seems appropriate.

Banks et al., pp. 314-316

Page 13: Www.simio.com| Copyright 2010 Simio LLC | All rights reserved. 1 Learning Simio Chapter 10 Analyzing Input Data.

www.simio.com| Copyright 2010 Simio LLC | All rights reserved.

13Simio

Goodness-of-fit (GOF) Tests

Statistical hypothesis tests that are used to assess formally whether the observations X1, X2, …, Xn constitute an independent sample from a particular distribution function

Hypothesis:

H0: The Xi’s are IID random variables with the specified distribution function.

Page 14: Www.simio.com| Copyright 2010 Simio LLC | All rights reserved. 1 Learning Simio Chapter 10 Analyzing Input Data.

www.simio.com| Copyright 2010 Simio LLC | All rights reserved.

14Simio

GOF Test Considerations

Failure to reject the null hypothesis should not be interpreted as “accepting H0 as being true.”

GOF tests are not very powerful for small-to-moderate sample sizes. Also, when n is large, the tests will often reject H0 since even minute differences will be detected.

Page 15: Www.simio.com| Copyright 2010 Simio LLC | All rights reserved. 1 Learning Simio Chapter 10 Analyzing Input Data.

www.simio.com| Copyright 2010 Simio LLC | All rights reserved.

15Simio

Some GOF Software Options

General packages EasyFit (www.mathwave.com)

Simulation specific packages Stat::Fit (www.geerms.com) ExpertFit (www.averill-law.com)

Page 16: Www.simio.com| Copyright 2010 Simio LLC | All rights reserved. 1 Learning Simio Chapter 10 Analyzing Input Data.

www.simio.com| Copyright 2010 Simio LLC | All rights reserved.

16Simio

Modeling Arrivals

If arrivals are independent and random, they follow a Poisson process. The number of arrivals in a fixed time is

Poisson. The time between arrivals is exponential.

In some cases the arrival rate may vary over time – Simio supports step-wise linear arrival rates using a Rate Table.

Page 17: Www.simio.com| Copyright 2010 Simio LLC | All rights reserved. 1 Learning Simio Chapter 10 Analyzing Input Data.

www.simio.com| Copyright 2010 Simio LLC | All rights reserved.

17Simio

Modeling Task Times

Use a distribution with a range >= 0 (e.g. not the Normal or JohnsonUB).

In the absence of data Triangular and Pert are possible choices.

With supporting data the Gamma, LogNormal, Weibull, LogLogisitc, Beta, PearsonIV, and JohnsonSB are possible choices.

Page 18: Www.simio.com| Copyright 2010 Simio LLC | All rights reserved. 1 Learning Simio Chapter 10 Analyzing Input Data.

www.simio.com| Copyright 2010 Simio LLC | All rights reserved.

18Simio

Gamma, Log Normal, Weibull

gamma

Log Normal Weibull

Page 19: Www.simio.com| Copyright 2010 Simio LLC | All rights reserved. 1 Learning Simio Chapter 10 Analyzing Input Data.

www.simio.com| Copyright 2010 Simio LLC | All rights reserved.

19Simio

Determining what data is critical

Some data may have a dominant impact on performance.

The variability is often more important than the mean.

Run scenarios specifically designed to determine the sensitivity of the model to the data inputs.

Page 20: Www.simio.com| Copyright 2010 Simio LLC | All rights reserved. 1 Learning Simio Chapter 10 Analyzing Input Data.

www.simio.com| Copyright 2010 Simio LLC | All rights reserved.

20Simio

References Leemis, L, “Input Modeling Techniques for

Discrete-Event Simulations,” Proceedings of the 2001 Winter Simulation Conference, Washington, DC, December 2001.

Vincent, S., “Input Data Analysis,” in Handbook of Simulation, Edited by J. Banks, John Wiley & Sons, Inc, New York, NY, pp. 55-91, 1998.

Chapter 9 – Input Modeling (Banks et al.) Chapter 6 – Selecting Input Probability

Distributions (Law)

Page 21: Www.simio.com| Copyright 2010 Simio LLC | All rights reserved. 1 Learning Simio Chapter 10 Analyzing Input Data.

www.simio.com| Copyright 2010 Simio LLC | All rights reserved.

21Simio

Summary Distributions are the primary method for capturing

variability in the system. Never use a mean in place of a distribution for a

random component. When data exists hypothesize a distribution and

estimate parameters and test using goodness-of-fit software.

In the absence of data, use appropriate distributions. Arrivals – exponential time between arrivals, or non-

stationary Poisson. Activities – triangular or pert.

Use the model to determine the critical data elements.