1 ADS, lecture input analysis - Universiteit Utrecht

68
1 ADS, lecture input analysis

Transcript of 1 ADS, lecture input analysis - Universiteit Utrecht

Page 1: 1 ADS, lecture input analysis - Universiteit Utrecht

1 ADS, lecture input analysis

Page 2: 1 ADS, lecture input analysis - Universiteit Utrecht

2

Algorithms for Decision Support

Simulation: Input analysis

Page 3: 1 ADS, lecture input analysis - Universiteit Utrecht

3

Until now: Modelling Simulation study Validation

Today we are going to look at stochastic variables

In this lecture: You learn about modelling uncertain input data, mostly by

probability distributions

ADS, lecture input analysis

Page 4: 1 ADS, lecture input analysis - Universiteit Utrecht

4

Stochastic input variables model:

Variation Things that are uncertain from the viewpoint of the system

ADS, lecture input analysis

Page 5: 1 ADS, lecture input analysis - Universiteit Utrecht

5 ADS, lecture input analysis

Examples stochastic input variables

Production line Transportation planning at DHL Communication network Sensor (e.g. in Electronic Road Pricing)Military

Page 6: 1 ADS, lecture input analysis - Universiteit Utrecht

6

Stochastic variables occur in simulation at different places:

1. Input data are modeled as stochastic variables• E.g time until arrival of next customer

2. Generate random variables• When you schedule a new Arrival event you have to generate a

random number for the time delay

3. Analysis of results

ADS, lecture input analysis

Page 7: 1 ADS, lecture input analysis - Universiteit Utrecht

7

Basics

Experiment: process with uncertain outcome/result

Stochastic variable represents the outcome of experiment

Stochastic variable X Discrete Continuous

ADS, lecture input analysis

Page 8: 1 ADS, lecture input analysis - Universiteit Utrecht

8

Discrete stochastic variable X

Possible values x1, x2,…,xn

Example: Die Flip a coin 4 times: X is the number of heads

n

1iii

ii

1p,1p0

)xX(Pp

ADS, lecture input analysis

Page 9: 1 ADS, lecture input analysis - Universiteit Utrecht

9

Continuous stochastic variable X

Can take any value in an interval Probability Density Function f(kansdichtheid) Total surface under graph equals 1

b

adx)x(f)bXa(P

a b

f(x)

x

ADS, lecture input analysis

Page 10: 1 ADS, lecture input analysis - Universiteit Utrecht

10 ADS, lecture input analysis

Normal N(µ,σ2)

2

2)(

222

1)(

x

exf

Page 11: 1 ADS, lecture input analysis - Universiteit Utrecht

11

Continuous stochastic variable X (2)

Cumulative Distribution Function F (verdelingsfunctie):

p-th percentile xp:

x

dyyfxXPxF )()()(

x

pp

p

pdyyfxXPxF

x

)()()(

:such that

ADS, lecture input analysis

Page 12: 1 ADS, lecture input analysis - Universiteit Utrecht

12

Expected value (average) E(X)

Let X and Y be stochastic variables

E(Y) E(X) Y)E(XcE(X) E(cX)

dx)x(xfE(X)

xp E(X) ii

i

ADS, lecture input analysis

Page 13: 1 ADS, lecture input analysis - Universiteit Utrecht

13

Variance, standard deviation

Let X be a stochastic variable Variance

𝜎 𝑣𝑎𝑟 𝑋 𝐸 𝑋 𝐸 𝑋 𝐸 𝑋 𝐸 𝑋

Standard deviation𝜎 𝑣𝑎𝑟 𝑋

Computation, let a and b be real numbers

𝑣𝑎𝑟 𝑎𝑋 𝑏 𝑎 𝑣𝑎𝑟 𝑋

ADS, lecture input analysis

Exercise:Variance standard die?Variance die, 2,3,3,4,4,5?

Page 14: 1 ADS, lecture input analysis - Universiteit Utrecht

14

Variance, covariance

Let X and Y be stochastic variables

𝑣𝑎𝑟 𝑋 𝑌 𝑣𝑎𝑟 𝑋 𝑣𝑎𝑟 𝑌 2 𝑐𝑜𝑣 𝑋, 𝑌

where𝑐𝑜𝑣 𝑋, 𝑌 𝐸 𝑋 𝐸 𝑋 𝑌 𝐸 𝑌

is the covariance of X and Y.

Correlation

𝑐𝑜𝑟𝑟 𝑋, 𝑌 𝑐𝑜𝑣 𝑋, 𝑌

𝜎 𝜎

ADS, lecture input analysis

Variancescannot just beadded

Page 15: 1 ADS, lecture input analysis - Universiteit Utrecht

15

Independence

Two stochastic variables X and Y are independent if:

Continuous

Discrete

BABYPAXPBYAXP , sets )()() and (

yxyYPxXPyYxXP , )()() and (

ADS, lecture input analysis

Example:One deck of cards, take 2 cards without putting backX=#aces, Y=#kings, Dependent or independent?

Page 16: 1 ADS, lecture input analysis - Universiteit Utrecht

16

Independence (2)

)var()var()var(0))())(((),cov(

)()()(

YXYXYEYXEXEYX

YEXEXYE

If X and Y independent stochastic variables:

ADS, lecture input analysis

Page 17: 1 ADS, lecture input analysis - Universiteit Utrecht

17 ADS, lecture input analysis

Input of simulation

1. Direct use of data: trace driven simulation2. Empirical distribution3. Theoretical probability distribution (see Law and

Kelton tables 6.3 and 6.4 for an overview)

Page 18: 1 ADS, lecture input analysis - Universiteit Utrecht

18 ADS, lecture input analysis

Theoretical distribution

Given data X1,X2,…,Xn for a certain input entity of the system (e.g. interarrival times of customers)

What probability distribution should we use to model the input entity?

We assume the values Xi are Independent and Identically Distributed, so independent samples from the same distribution.

Page 19: 1 ADS, lecture input analysis - Universiteit Utrecht

19 ADS, lecture input analysis

Uniform(a,b)

12)()var(

2)(

bx1

0

)(

otherwise0

],[1)(

2abX

baXE

bxaax

abaxxF

baxabxf

Page 20: 1 ADS, lecture input analysis - Universiteit Utrecht

20 ADS, lecture input analysis

Exponential(ß)

Exp(ß)~gamma(1, ß)~weibull(1, ß)Memory-less

rate. theis

1parameter with denoted Sometimes

)var()(

)0( 1)(

)0( 1)(

2

XxE

xexF

xexf

x

x

Page 21: 1 ADS, lecture input analysis - Universiteit Utrecht

21

Exponential distribution is memory-less

Suppose X follows an exponential distribution with E X β Probability that X is more than t:

𝑃 𝑋 𝑡 1 𝐹 𝑡 1 1 𝑒 𝑒

Probability that X is more than s+t (so at least t largerthan s) given that we know that it is at least s

𝑃 𝑋 𝑠 𝑡 𝑋 𝑠 𝑒

These are equal, so memory-less

ADS, lecture input analysis

Page 22: 1 ADS, lecture input analysis - Universiteit Utrecht

22

At the exam

You have to know the formulas of the uniform andexponential distribution by heart

For the next distributions you have to know properties Formulas will be given if you need them.

ADS, lecture input analysis

Page 23: 1 ADS, lecture input analysis - Universiteit Utrecht

23 ADS, lecture input analysis

Gamma(α,β)

)0()(

)(1

xexxfx

0

1 )0()( zdtetz tz

2

2121

2211

2

)2,2/(gamma)(Erlang-k),gamma(

)exp(),1(gammaparameter scale parameter, shape

),,(~then ),(~),,(~ if

var(X)E(X)

integer positive ,!)1()()1(

kkk

gammaXXgammaXgammaX

kkkzzz

Exercise:Variance Exp(p)?Variance Gamma(k,p/k)?

Page 24: 1 ADS, lecture input analysis - Universiteit Utrecht

24

Gamma(α (=k),β(=θ))

ADS, lecture input analysis

Page 25: 1 ADS, lecture input analysis - Universiteit Utrecht

25 ADS, lecture input analysis

Normal N(µ,σ2)

)1,0(~),N(~X

normal standard )1,0()var(

)(2

1)(

2

2

22

2

2)(

NXZ

NX

XE

exfx

Page 26: 1 ADS, lecture input analysis - Universiteit Utrecht

26

Normal N(µ,σ2)

Exercise:

The amount that a coffee machine puts in a cup is normally distributed with average μ= 170 ml and standard deviation σ= 4. Coffee cups are 175 ml. What is (approximately) the probability of overflow?

What should μ be such that the probability of overflow is 2%?

See statistical tables on course website.

ADS, lecture input analysis

Page 27: 1 ADS, lecture input analysis - Universiteit Utrecht

27

LogNormal LN(µ,σ2)

ADS, lecture input analysis

𝑓 𝑥1

𝑥 2𝜋𝜎𝑒 if 𝑥 0

0 otherwise𝜎 0 shape parameter, e scale parameter𝐸 𝑋 evar 𝑋 e e 1𝑋~𝐿𝑁 𝜇, 𝜎 ⇔ 𝑋 𝑒 with 𝑌~𝑁 𝜇, 𝜎

Page 28: 1 ADS, lecture input analysis - Universiteit Utrecht

28

LogNormal LN(µ,σ2): density function for µ=0

ADS, lecture input analysis

Page 29: 1 ADS, lecture input analysis - Universiteit Utrecht

29 ADS, lecture input analysis

Discrete distributions

Binomial(n,p):

Geometric(p):

knk ppkn

kXPkp

)1()()(

kppkXPkp )1()()(

Page 30: 1 ADS, lecture input analysis - Universiteit Utrecht

30

Discrete distributions: Poisson(λ)

𝐸 𝑋 𝜆; 𝑣𝑎𝑟 𝑋 𝜆

Let 𝑌 , 𝑌 , … be independent and have an exponentialdistribution with rate 𝜆, i.e. expected value , e.g. 𝑌 , 𝑌 , … are interarrival times

Then max 𝑖| ∑ 𝑌 1 , i.e. the number of Y’s that fit in 1 unit, i.e. the number of arrivals in 1 time periodhas the Poisson(λ) distribution.

ADS, lecture input analysis

𝑃 𝑋 𝑘!

𝑘 0,1, …

Page 31: 1 ADS, lecture input analysis - Universiteit Utrecht

31

Poisson process

The arrival process 𝑌 , 𝑌 , … is called Poisson process withintensity λ.

Suppose we have generated the number of arrivals from in a given time interval I by drawing this number from thePoisson distribution. Then each individual arrival time is from the uniform

distribution on that interval.

ADS, lecture input analysis

Page 32: 1 ADS, lecture input analysis - Universiteit Utrecht

32

Background Poisson process

Poisson distribution is `limit’ of binomial distribution Law of rare events

Exponential inter arrival times result in Poisson distribution for number of arrivals per time unit Memory-less

ADS, lecture input analysis

Page 33: 1 ADS, lecture input analysis - Universiteit Utrecht

33 ADS, lecture input analysis

Page 34: 1 ADS, lecture input analysis - Universiteit Utrecht

34 ADS, lecture input analysis

-1 𝑒

Page 35: 1 ADS, lecture input analysis - Universiteit Utrecht

35 ADS, lecture input analysis

Probability distribution: overview

Continuous: Uniform: first guess Exponential: inter arrival times Gamma: time to complete task Weibull: time to complete task, time to failure Normal: errors of various types, sum of large number

of other quantities LogNormal: time to complete task, estimate in the

absence of data, time until maintenance, income Discrete:

Binomial: number of successes Geometric: time until first success Poisson(λ): gives number of arrival per time period

when inter arrival times are exponentially distributed with parameter 1/λ.

Page 36: 1 ADS, lecture input analysis - Universiteit Utrecht

36 ADS, lecture input analysis

Different distributions: example

Single server queue Exponential interarrival times: avg 1 minute Historic data: 98 service times

Service time distribution

Avg delay

Avg queue length

% delays >= 15

Exponential 4.356 4.363 4.7Gamma 2.849 2.845 1Weibull 2.687 2.692 0.7Lognormal 4.816 4.825 5.8Normal 3.308 3.309 1.7

Page 37: 1 ADS, lecture input analysis - Universiteit Utrecht

37

Which distribution?

ADS, lecture input analysis

Page 38: 1 ADS, lecture input analysis - Universiteit Utrecht

38

Which distribution?

ADS, lecture input analysis

Page 39: 1 ADS, lecture input analysis - Universiteit Utrecht

39

Which distribution?

ADS, lecture input analysis

Page 40: 1 ADS, lecture input analysis - Universiteit Utrecht

40 ADS, lecture input analysis

Empirical distribution Given observations X1 <,…, < Xn. Find probability

distribution that directly follows from these observations.

Discrete or continuous possible Discrete:

each observation probability 1/n.

Example: Time to complete assignment: 2,3,5,8,16,20 Discrete: only these values, each with p = 1/6 If you also want to generate values like 4.5, use

continuous CDF.

Page 41: 1 ADS, lecture input analysis - Universiteit Utrecht

41

Empirical distribution

Continuous: linear interpolation, interval [X1,Xn]

ADS, lecture input analysis

Xi

Xi

Page 42: 1 ADS, lecture input analysis - Universiteit Utrecht

42 ADS, lecture input analysis

Input of simulation

1. Direct use of data: trace driven simulation2. Empirical distribution3. Theoretical probability distribution (see Law and

Kelton tables 6.3 and 6.4 for an overview)

Advantages, disadvantages.

Page 43: 1 ADS, lecture input analysis - Universiteit Utrecht

43 ADS, lecture input analysis

Input of simulation: advantages

1. Direct use of data: trace driven simulation valid, few data, no modeling difficulties

2. Empirical distribution:Given range, may have irregularities

3. Theoretical probability distribution (see Law and Kelton tables 6.3 and 6.4 for an overview) smooth, compact, easy to change to run another scenario, no

bound on the range, physical or theoretical reason

Page 44: 1 ADS, lecture input analysis - Universiteit Utrecht

44

Wrap-up

Simulation need stochastic variables for input entities thatare subject to uncertainty Interarrival times of customers Time until machine breakdown

Probability distributions are the best way to model these things: E.g. interarrival times from exponential distribution When your simulation program does: schedule new arrival

You have to generate a random number from the exponentialdistribution to put the event in the event list with the right time-stamp

ADS, lecture input analysis

Page 45: 1 ADS, lecture input analysis - Universiteit Utrecht

45

Fitting a distribution

When you do a simulation study

You hope to have a collection of data for the entity youneed to model

Question: What probability distribution should I use?

Fitting a distribution!

ADS, lecture input analysis

Page 46: 1 ADS, lecture input analysis - Universiteit Utrecht

46 ADS, lecture input analysis

Fitting a distribution

Observations X1,…,Xn

Finding a Cumulative Distribution Function F that models the observations is called Fitting

Goodness-of-fit Quality if the fit Can we assume that X1,…,Xn really have distribution

F ?

Page 47: 1 ADS, lecture input analysis - Universiteit Utrecht

47

Fitting a distribution

Clean the data Make a histogram Select a type of distribution

Visual inspection Fitting software

Estimate the parameters Evaluate the goodness of fit:

Heuristically Statistically

ADS, lecture input analysis

Page 48: 1 ADS, lecture input analysis - Universiteit Utrecht

48

Fitting a distribution

Available software: Expert fit (free for small number of data elements) MATLAB R

ADS, lecture input analysis

If fitting software suggestsa distribution, it will be onewith many parameters.

Page 49: 1 ADS, lecture input analysis - Universiteit Utrecht

49

Fitting a distribution: Estimate the parameters

X1,…,Xn samples of Independent Identically Distributed (IID) stochastic variables

Sample mean

Sample variance

These are unbiased estimators!

ADS, lecture input analysis

nnX

n

iXi

1)(

1

))(()( 1

2

2

n

nXXnS

n

ii

Page 50: 1 ADS, lecture input analysis - Universiteit Utrecht

50 ADS, lecture input analysis

Evaluate goodness of fit heuristically: Q-Q plot

Q-Q plot Assume X1 ≤ X2 ≤…≤Xn

~ straight line `y=x’

))(,( Draw

)(F X

percentile is )F(X F) from computed is P (where )XP(X have should then we

on distributi from are nsobservatio If

5.01

5.01-i

5.05.0i

5.0i

ni

i

ni

ni

ini

ni

i

FX

X

FX

Page 51: 1 ADS, lecture input analysis - Universiteit Utrecht

51 ADS, lecture input analysis

Goodness of fit heuristically: Q-Q plot

))(,( Draw 5.01n

ii FX

give?on distributi thewhat would

:)( 5.01n

iF

iX :dataMy

Page 52: 1 ADS, lecture input analysis - Universiteit Utrecht

52 ADS, lecture input analysis

Goodness of fit heuristically: Q-Q plot example

F: U[0,1]Data points 0.11; 0.29; 0.52; 0.69; 0.93

F: U[0,1]Data points 0.1; 0.11;1 0.3; 0.4; 0.9

Page 53: 1 ADS, lecture input analysis - Universiteit Utrecht

53 ADS, lecture input analysis

Evaluate goodness of fit: statistical tests

Hypothesis H0: the observations X1,X2,…,Xn follow distribution F

Do we accept or reject this hypothesis?

Chi-squared test Kolmogorov-Smirnov test

Can be performed with R

Page 54: 1 ADS, lecture input analysis - Universiteit Utrecht

54

Statistical testing in general: Example

Suppose we investigate the average number of hours gaming per week for CS students and we do not want collect numbers from all students.

Our hypothesis is H0: avg = 25 Suppose we sample five persons and find: 24, 24,24, 26,

26 (avg 24.8) Compute average from sample and ask: `Do we believe

H0?’ Believe H0 if avg is close enough to 25

How close is close enough? The above sample looks close enough, but how about: 20, 22,

23, 24, 26 (avg 23) Can statistics help to decide?

ADS, lecture input analysis

Page 55: 1 ADS, lecture input analysis - Universiteit Utrecht

55

Statistical hypothesis testing

1. Formulate hypothesis H0• For goodness-of-fit tests the observations

X1,X2,…,Xn follow distribution F2. Choose type of test3. Determine significance α and decision rule4. Compute test statistic and take decision

ADS, lecture input analysis

Page 56: 1 ADS, lecture input analysis - Universiteit Utrecht

56

Errors

CaseH0 true H0 false

Decision Accept H0 OKProbability 1-αConfidence level

Type 2

Reject H0 Type 1 Probability αSignificance

OK

ADS, lecture input analysis

Page 57: 1 ADS, lecture input analysis - Universiteit Utrecht

57

Chi-squared test

H0: the observations X1,X2,…,Xn follow distribution F

1

0

22

1i

12110

)( :statisticTest

)(

:ondistributiy probabilit toaccordingnumber Expected

),in # N:number Observed

),,),,),,

1

K

i i

ii

a

aii

iij

KK

EEN

dxxfnnpE

a[aX

a[aa[aa[a

i

i

ADS, lecture input analysis

Interval choice:• No Ei<1• No more than 20% has

Ni<5

Page 58: 1 ADS, lecture input analysis - Universiteit Utrecht

58

Chi-squared test (2) Suppose H0 is true

Suppose we draw N values X1,X2,…,Xn from the probablitydistribution F

Then we can compute

Now X2 is a stochastic variable

Statistical theory learns us Χ2 follows a chi-squared-distribution with df=K-1

We expect it to be small, but it may by coincidence attaina large value.

However, at some point we do not believe H0 anymore.

ADS, lecture input analysis

1

0

22 )(K

i i

ii

EEN

Page 59: 1 ADS, lecture input analysis - Universiteit Utrecht

59

If H0 is true then Χ2 ~ chi-squared-distribution with df=K-1

www.statsoft.com

Significance α

Χ2(df)α

Accept H0 ifReject otherwise

)(22 df

ADS, lecture input analysis

Now, the probabilitythat we reject H0 whileit is true is only α

Page 60: 1 ADS, lecture input analysis - Universiteit Utrecht

60

Example

Are the following values from U[0,1]a) 0.07; 0.15; 0.24; 0.31; 0.42; 0.51; 0.55; 0.65; 0.73; 0.76;

0.85; 0.97b) 0.07; 0.08; 0.15; 0.18; 0.51; 0.52; 0.53; 0.58; 0.64; 0.68;

0.74; 0.95

Use intervals [0;0.25), [0.25;0.5), [0.5;0.75), [0.75;1] and α=5%.

ADS, lecture input analysis

Statistical table:http://www.statsoft.com/Textbook/Distribution-Tables#chi

Page 61: 1 ADS, lecture input analysis - Universiteit Utrecht

61

Kolmogorov-Smirnov test

ADS, lecture input analysis

|)()(|sup

for )(on distributi Empirical

?on distributi follow .... Do :

*

1

210

xFxFD

XxXnixF

FXXXH

nx

iin

n

n

Page 62: 1 ADS, lecture input analysis - Universiteit Utrecht

62

Kolmogorov-Smirnov test (2)

ADS, lecture input analysis

𝐷∗ sup | 𝐹 𝑥 𝐹 𝑥 |

If H0 is true the following holds

𝐻 𝑡 lim→

𝑃 𝑛𝐷∗ 𝑡 1 1 𝑒

𝐻 𝑡 defines a probability distribution

Page 63: 1 ADS, lecture input analysis - Universiteit Utrecht

63

After this lecture

You know different methods for generating stochastic input variables for simulation

You know different probability distributions You are able to fit a probability distribution to input data

ADS, lecture input analysis

Page 64: 1 ADS, lecture input analysis - Universiteit Utrecht

64

Covariance, correlation

22XY),cov( :ncorrelatio

)))())((((),cov( :covariance:Y and X variablesstochastic 2Given

YX

YXYEYXEXEYX

Example 1: Die•X = score•Y =7 – score of same die

Example 2• X score Dutch coin•Y score Italian coin•Where, head = 1, tail = 0

ADS, lecture input analysis

Page 65: 1 ADS, lecture input analysis - Universiteit Utrecht

65

Independence

Two stochastic variables X and Y are independent if:

Continuous

Discrete

BABYPAXPBYAXP , sets )()() and (

yxyYPxXPyYxXP , )()() and (

ADS, lecture input analysis

Page 66: 1 ADS, lecture input analysis - Universiteit Utrecht

66

Independence (2)

)Yvar()Xvar()YXvar(0)Y,Xcov(

)Y(E)X(E)XY(E

If X and Y independent stochastic variables:

ADS, lecture input analysis

Page 67: 1 ADS, lecture input analysis - Universiteit Utrecht

67

Wrap-up

Previous lecture we studied some well-known probabilitydistribution.

How do you know which probability distribution you shoulduse? From data Sometimes from theory

This lecture: How to find out if you choose the correct distribution?

ADS, lecture input analysis

Page 68: 1 ADS, lecture input analysis - Universiteit Utrecht

68

Wrap-up

Simulation need stochastic variables for input entities thatare subject to uncertainty Interarrival times of customers Time until machine breakdown

Probability distributions are the best way to model these things: E.g. interarrival times from exponential distribution When your simulation program does: schedule new arrival

You have to generate a random number from the exponentialdistribution to put the event in the event list with the right time-stamp

ADS, lecture input analysis