Download - Probability Distributionshpgavin/cee201/Probability... · 2020-02-01 · Probability Distributions CEE 201L. Uncertainty, Design, and Optimization Department of Civil and Environmental

Probability DistributionsCEE 201L. Uncertainty, Design, and Optimization

Department of Civil and Environmental EngineeringDuke University

Philip Scott Harvey, Henri P. Gavin and Jeffrey T. ScruggsSpring 2020

1 Probability Distributions

Consider a continuous, random variable (rv) X with support over the domain X . The probabilitydensity function (PDF) of X is the function fX(x) such that for any two numbers a and b in thedomain X , with a < b,

P [a < X ≤ b] =∫ b

afX(x) dx

For fX(x) to be a proper distribution, it must satisfy the following two conditions:

1. The PDF fX(x) is positive-valued; fX(x) ≥ 0 for all values of x ∈ X .

2. The rule of total probability holds; the total area under fX(x) is 1;∫X fX(x) dx = 1.

Alternately, X may be described by its cumulative distribution function (CDF). The CDFof X is the function FX(x) that gives, for any specified number x ∈ X , the probability that therandom variable X is less than or equal to the number x is written as P [X ≤ x]. For real values ofx, the CDF is defined by

FX(x) = P [X ≤ b] =∫ b

−∞fX(x) dx ,

so,P [a < X ≤ b] = FX(b)− FX(a)

By the first fundamental theorem of calculus, the functions fX(x) and FX(x) are related as

fX(x) = d

dxFX(x)

2 CEE 201L. Uncertainty, Design, and Optimization – Duke University – Spring 2020 – P.S.H., H.P.G. and J.T.S.

A few important characteristics of CDF’s of X are:

1. CDF’s, FX(x), are monotonic non-decreasing functions of x.

2. For any number a, P [X > a] = 1− P [X ≤ a] = 1− FX(a)

3. For any two numbers a and b with a < b, P [a < X ≤ b] = FX(b)− FX(a) =∫ ba fX(x)dx

2 Descriptors of random variables

The expected or mean value of a continuous random variable X with PDF fX(x) is the centroidof the probability density.

µX = E[X] =∫ ∞−∞

x fX(x) dx

The expected value of an arbitrary function of X, g(X), with respect to the PDF fX(x) is

µg(X) = E[g(X)] =∫ ∞−∞

g(x) fX(x) dx

The variance of a continuous rv X with PDF fX(x) and mean µX gives a quantitative measure ofhow much spread or dispersion there is in the distribution of x values. The variance is calculatedas

σ2X = V[X] =

∫ ∞−∞

(x− µX)2 fX(x) dx

====

The standard deviation (s.d.) of X is σX =√

V[X]. The coefficient of variation (c.o.v.) ofX is defined as the ratio of the standard deviation σX to the mean µX :

cX =∣∣∣∣σXµX

∣∣∣∣for non-zero mean. The c.o.v. is a normalized measure of dispersion (dimensionless).

A mode of a probability density function, fX(x), is a value of x such that the PDF is maximized;d

dxfX(x)

∣∣∣∣x=xmode

= 0 .

The median value, xm, is is the value of x such that

P [X ≤ xm] = P [X > xm] = FX(xm) = 1− FX(xm) = 0.5 .

CC BY-NC-ND February 1, 2020 PSH, HPG, JTS

http://creativecommons.org/licenses/by-nc-nd/3.0/

Probability Distributions 3

3 Some common distributions

The National Institute of Standards and Technology (NIST) lists properties of nineteen commonlyused probability distributions in their Engineering Statistics Handbook. This section describes theproperties of seven distributions. For each of these distributions, this document provides figuresand equations for the PDF and CDF, equations for the mean and variance, the names of Matlabfunctions to generate samples, and empirical distributions of such samples.

3.1 The Normal distribution

The Normal (or Gaussian) distribution is perhaps the most commonly used distribution function.The notation X ∼ N (µX , σ2

X) denotes that X is a normal random variable with mean µX andvariance σ2

X . The standard normal random variable, Z, or “z-statistic”, is distributed as N (0, 1).The probability density function of a standard normal random variable is so widely used it has itsown special symbol, φ(z),

φ(z) = 1√2π

exp(−z

2

2

)Any normally distributed random variable can be defined in terms of the standard normal randomvariable, through the change of variables

X = µX + σXZ.

If X is normally distributed, it has the PDF

fX(x) = φ

(x− µXσX

)= 1√

2πσ2X

exp(−(x− µX)2

2σ2X

)

There is no closed-form equation for the CDF of a normal random variable. Solving the integral

Φ(z) = 1√2π

∫ z

−∞e−u

2/2 du

would make you famous. Try it. The CDF of a normal random variable is expressed in terms of theerror function, erf(z). If X is normally distributed, P [X ≤ x] can be found from the standardnormal CDF

P [X ≤ x] = FX(x) = Φ(x− µXσX

).

Values for Φ(z) are tabulated and can be computed, e.g., the Matlab command . . .Prob_X_le_x = normcdf(x,muX,sigX). The standard normal PDF is symmetric about z = 0,so φ(−z) = φ(z), Φ(−z) = 1 − Φ(z), and P [X > x] = 1 − FX(x) = 1 − Φ ((x− µX)/σX) =Φ ((µX − x)/σX).

The linear combination of two independent normal rv’s X1 and X2 (with means µ1 and µ2 andvariances σ2

1 and σ22) is also normally distributed,

aX1 + bX2 ∼ N(aµ1 + bµ2, a

2σ21 + b2σ2

2

),

and more specifically, aX − b ∼ N(aµX − b, a2σ2

X

).


https://www.nist.gov/

https://www.itl.nist.gov/div898/handbook/eda/section3/eda366.htm

https://www.itl.nist.gov/div898/handbook/eda/section3/eda366.htm

https://www.itl.nist.gov/div898/handbook/

http://math2.org/math/stat/distributions/z-dist.htm



Given the probability of a normal rv, i.e., given P [X ≤ x], the associated value of x can be foundfrom the inverse standard normal CDF,

x− µXσX

= z = Φ−1(P [X ≤ x]) .

Values of the inverse standard normal CDF are tabulated, and can be computed, e.g., the Matlabcommand . . . x = norminv(Prob_X_le_x,muX,sigX).

3.2 The Log-Normal distribution

The Normal distribution is symmetric and can be used to describe random variables that can takepositive as well as negative values, regardless of the value of the mean and standard deviation. Formany random quantities a negative value makes no sense (e.g., modulus of elasticity, air pressure,and distance). Using a distribution which admits only positive values for such quantities eliminatesany possibility of non-sensical negative values. The log-normal distribution is such a distribution.

If lnX is normally distributed (i.e., lnX ∼ N (µlnX , σlnX)) then X is called a log-normal randomvariable. In other words, if Y (= lnX) is normally distributed, eY (= X) is log-normally distributed.

µY = µlnX , σ2Y = σ2

lnX ,P [Y ≤ y]

P [lnX ≤ ln x]P [X ≤ x]

=FY (y)

FlnX(ln x)FX(x)

=Φ(y−µYσY

)Φ(

lnx−µln Xσln X

)The mean and standard deviation of a log-normal variable X are related to the mean and standarddeviation of lnX.

µlnX = lnµX −12σ

2lnX σ2

lnX = ln(1 + (σX/µX)2

)If (σX/µX) < 0.30, σlnX ≈ (σX/µX) = cX

The median, xm, is a useful parameter of log-normal rv’s. By definition of the median value, halfof the population lies above the median, and half lies below, so

Φ( ln xm − µlnX

σlnX

)= 0.5

ln xm − µlnXσlnX

= Φ−1(0.5) = 0

and, ln xm = µlnX ↔ xm = exp(µlnX) ↔ µX = xm√

1 + c2X

For the log-normal distribution xmode < xmedian < xmean. If cX < 0.15, xmedian ≈ xmean.

If lnX is normally distributed (X is log-normal) then (for cX < 0.3)

P [X ≤ x] ≈ Φ( ln x− ln xm

cX

)

If lnX ∼ N (µlnX , σ2lnX), and lnY ∼ N (µlnY , σ

2lnY ), and Z = aXn/Y m then

lnZ = ln a+ n lnX −m lnY ∼ N (µlnZ , σ2lnZ)

where µlnZ = ln a+ nµlnX −mµlnY = ln a+ n ln xm −m ln ymand σ2

lnZ = (nσlnX)2 + (mσlnY )2 = n2 ln(1 + c2X) +m2 ln(1 + c2

Y ) = ln(1 + c2Z)


http://math2.org/math/stat/distributions/z-dist.htm



Uniform X ∼ U [a, b] Triangular X ∼ T (a, b, c)a ≤ X ≤ b; a ≤ X ≤ b, a ≤ c ≤ b

a µ-σ µ µ+ σ b0

1/(b-a)

p.d

.f.,

f(x

)

a µ-σ µ µ+ σ b

0.21

0.5

0.79

x

c.d

.f.,

F(x

)

a µ-σ µ µ+ σ µ+2 σ bc0

2/(b-a)

p.d

.f.,

f(x

)

a µ-σ µ µ+ σ µ+2 σ bc

0.17

0.55

0.820.97

x

c.d

.f.,

F(x

)f(x) =

{1b−a , x ∈ [a, b]0, otherwise f(x) =

2(x−a)

(b−a)(c−a) , x ∈ [a, c]2(b−x)

(b−a)(b−c) , x ∈ [c, b]0, otherwise

F (x) =

0, x ≤ ax−ab−a , x ∈ [a, b]1, x ≥ b

F (x) =

0, x ≤ a

(x−a)2

(b−a)(c−a) , x ∈ [a, c]1− (b−x)2

(b−a)(b−c) , x ∈ [c, b]1, x ≥ b

µX = 12(a+ b) µX = 1

3(a+ b+ c)

σ2X = 1

12(b− a)2 σ2X = 1

18(a2 + b2 + c2 − ab− ac− bc)

x = a + (b-a)*rand(1,N); x = triangular rnd(a,b,c,1,N);

0 0.5 1 1.5 20

0.2

0.4

0.6

0.8

em

piric

al p

.d.f

.

0 0.5 1 1.5 20

0.2

0.4

0.6

0.8

1

x

em

piric

al c

.d.f

.

µ=1.0 σ=0.5

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

em

piric

al p

.d.f

.

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

x

em

piric

al c

.d.f

.

µ=1.0 σ=0.5




Exponential X ∼ E(µ) Laplace X ∼ L(µ, σ2)X ∈ R+, µ ∈ R+ X ∈ R, µ ∈ R, σ ∈ R+

0 µ 2µ 3µ 4µ0

1/(e µ)

1

p.d

.f.,

f(x

)

0 µ 2µ 3µ 4µ

0.63

0.860.95

x

c.d

.f.,

F(x

)

µ-3σ µ-2σ µ-σ µ µ+ σ µ+2 σ µ+3 σ0

p.d

.f.,

f(x

)

µ - 2σ µ-σ µ µ+ σ µ+2 σ0.030.12

0.5

0.880.97

x

c.d

.f.,

F(x

)

f(x) = 1µ

exp(−xµ

)f(x) =

√2

2σ exp(−√

2 |x− µ|σ

)F (x) = 1− exp

(−xµ

)F (x) =

12 exp

(√2 |x−µ|σ

)x < µ

1− 12 exp

(−√

2 |x−µ|σ

)x ≥ µ

µX = µ µX = kθ

σ2X = µ2 σ2

X = σ2

x = exp rnd(muX,1,N); x = laplace rnd(muX,sigmaX,1,N);

0 1 2 3 4 50

0.2

0.4

0.6

0.8

1

em

piric

al p

.d.f

.

0 1 2 3 4 50

0.2

0.4

0.6

0.8

1

x

em

piric

al c

.d.f

.

µ=1.0

-2 -1 0 1 2 3 40

0.2

0.4

0.6

0.8

em

piric

al p

.d.f

.

-2 -1 0 1 2 3 40

0.2

0.4

0.6

0.8

1

x

em

piric

al c

.d.f

.

µ=1.0 σ=1.0




Normal X ∼ N (µ, σ2) Log-Normal lnX ∼ N (µlnX , σ2lnX)

X ∈ R, µ ∈ R, σ ∈ R+ X ∈ R+, µlnX ∈ R+, σlnX ∈ R+

µ+3 σ µ-2σ µ-σ µ µ+ σ µ+2 σ µ+3 σ0

p.d

.f.,

f(x

)

µ+3 σ µ-2σ µ-σ µ µ+ σ µ+2 σ µ+3 σ0.020.16

0.5

0.840.98

x

c.d

.f.,

F(x

)

0 µ-σ µ µ+ σ µ+2 σ µ+3 σ µ+4 σ0

p.d

.f.,

f(x

)

0 µ-σ µ µ+ σ µ+2 σ µ+3 σ µ+4 σ

0.11

0.59

0.860.96

x

c.d

.f.,

F(x

)

f(x) = 1√2πσ2

exp(−(x− µ)2

2σ2

)f(x) = 1

x√

2πσ2lnX

exp(−(ln x− µlnX)2

2σ2lnX

)

F (x) = 12

[1 + erf

(x− µ√

2σ2

)]F (x) = 1

2

1 + erf

ln x− µlnX√2σ2

lnX

µX = µ µX = xm

√1 + c2

X

σ2X = σ2 σ2

X = x2m c2

X

(1 + c2

X

)µlnX = ln xm = lnµX − 1

2σ2lnX

σ2lnX = ln

(1 + (σX/µX)2)

x = muX + sigmaX*randn(1,N); x = logn rnd(Xm,Cx,1,N);

-1 0 1 2 30

0.2

0.4

0.6

0.8

em

piric

al p

.d.f

.

-1 0 1 2 30

0.2

0.4

0.6

0.8

1

x

em

piric

al c

.d.f

.

µ=1.0 σ=0.5

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

em

piric

al p

.d.f

.

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

x

em

piric

al c

.d.f

.

µ=1.0 σ=0.5




Rayleigh X ∼ R(m) Gamma X ∼ Γ(µ, σ2)X ∈ R+, m ∈ R+ X ∈ R+, µ ∈ R+, σ ∈ R+

0 µ-σ m µ µ+ σ µ+2 σ µ+3 σ0

p.d

.f.,

f(x

)

0 µ-σ m µ µ+ σ µ+2 σ µ+3 σ

0.16

0.54

0.840.96

x

c.d

.f.,

F(x

)

µ-2σ µ-σ µ µ+ σ µ+2 σ µ+3 σ µ+4 σ0

p.d

.f.,

f(x

)

µ - 2σ µ-σ µ µ+ σ µ+2 σ µ+3 σ µ+4 σ0

0.14

0.57

0.850.96

x

c.d

.f.,

F(x

)

f(x) = x

m2 exp(−1

2

(x

m

)2)

f(x) = 1Γ(k)θk x

k−1 exp(−xθ

)F (x) = 1− exp

(−1

2

(x

m

)2)

F (x) = 1Γ(k) γ

(k,x

θ

)

µX = m√π/2 µX = µ

σ2X = m2(4− π)/2 σ2

X = kθ2

x = rayleigh rnd(modeX,1,N); x = gamma rnd(muX,covX,1,N);

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

em

piric

al p

.d.f

.

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

x

em

piric

al c

.d.f

.

µ=1.0

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

em

piric

al p

.d.f

.

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

x

em

piric

al c

.d.f

.

µ=1.0 σ=0.5




Extreme I X ∼ EI [m, s] Extreme II X ∼ EII(m, s, k)X ∈ R, m ∈ R, s ∈ R+, X > m, m ∈ R, s ∈ R+, k > 0, k 6= 1, 2

µ-2σ µ-σ µ µ+ σ µ+2 σ µ+3 σ µ+4 σ0

p.d

.f.,

f(x

)

µ-2σ µ-σ µ µ+ σ µ+2 σ µ+3 σ µ+4 σ0

0.13

0.57

0.860.96

x

c.d

.f.,

F(x

)

0 µ-σ µ µ+ σ µ+2 σ µ+3 σ µ+4 σ0

p.d

.f.,

f(x

)

0 µ-σ µ µ+ σ µ+2 σ µ+3 σ µ+4 σ0.01

0.65

0.90.96

x

c.d

.f.,

F(x

)

f(x) = 1s

exp[−(x−ms

)+ exp

[−x−m

s

]]f(x) = k

s

(x−ms

)−1−kexp

[−(x−ms

)−k]

F (x) = exp[− exp

[−x−m

s

]]F (x) =

{exp

[−(x−ms

)−k]x > m

0 otherwise

µX = m+ γs , γ ≈ 0.5772 µX = m+ sΓ(1− 1/k)

σ2X = (π2/6) s2 σ2

X = s2[Γ(1− 2/k)− (Γ(1− 1/k))2]

x = extI rnd(mu,cv,1,N); x = extII rnd(m,s,k,1,N);

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

em

piric

al p

.d.f

.

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

x

em

piric

al c

.d.f

.

µ=1.0 σ=0.5

0 0.5 1 1.5 2 2.5 30

0.5

1

1.5

2

em

piric

al p

.d.f

.

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

x

em

piric

al c

.d.f

.

µ=1.0 σ=0.5




4 Sums and Differences of Independent Normal Random Variables

Consider two normally distributed random variables, X ∼ N (µX , σ2X) and Y ∼ N (µY , σ2

Y ). Anyweighted sum of normal random variables is also normally distributed.

Z = aX − bY

Z ∼ N(aµX − bµY , (aσX)2 + (bσY )2

)µZ = aµX − bµY σ2

Z = (aσX)2 + (bσY )2

5 Products and Quotients of Independent LogNormal Random Variables

Consider two log-normally distributed random variables, lnX ∼ N (µlnX , σ2lnX) and lnY ∼ N (µlnY , σ

2lnY ).

Any product or quotient of lognormal random variables is also lognormally distributed.

Z = X/Y

lnZ ∼ N(µlnX − µlnY , σ

2lnX + σ2

lnY

)µlnZ = µlnX − µlnY σ2

lnZ = σ2lnX + σ2

lnY c2Z = c2

X + c2Y + c2

Xc2Y

6 Examples

1. The strength, S, of a particular grade of steel is log-normally distributed with median 36 ksiand c.o.v. of 0.15. What is the probability that the strength of a particular sample is greaterthan 40 ksi?

P [S > 40] = 1− P [S ≤ 40] = 1− Φ( ln 40− ln 36

0.15

)= 1− Φ

(3.69− 3.580.15

)= 1− Φ(0.702) = 1− 0.759 = 0.241

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

20 25 30 35 40 45 50 55 60

p.d

.f.

strength of steel, s, ksi

smode = 35.20 ksi

smedian = 36.00 ksi

smean = 36.40 ksi

P[S>40]




2. Highway truck weights in Michigan, W , are assumed to be normally distributed with mean100 k and standard deviation 40 k. The load capacity of bridges in Michigan, R, are alsoassumed to be normally distributed with mean 200 k and standard devation 30 k. What isthe probability of a truck exceeding a bridge load rating?E = W −R. If E > 0 the truck weight eceededs the bridge capacity.µE = µW − µR = 100− 200 = −100 k.σE =

√402 + 302 = 50 k.

P [E > 0] = 1− P [E ≤ 0] = 1− Φ(0− (−100)

50

)= 1− Φ(2) = 1− 0.977 = 0.023

��

0−100 100 200 k

W

50

4030

R

E=W−R

3. Windows in the Cape Hattaras Lighthouse can withstand wind pressures of R. R is log-normal with median of 40 psf and coefficient of variation of 0.25. The peak wind pressureduring a hurricane P in psf is given by the equation P = 1.165 × 10−3CV 2 where C is alog-normal coefficient with median of 1.8 and coefficient of variation of 0.20 and V is thewind speed with median 100 fps and coefficient of variation of 0.30. What is the probabilityof the wind pressure exceeding the strength of the window?The peak wind pressure is also log-normal.

lnP = ln(1.165× 10−3) + lnC + 2 lnVµlnP = ln(1.165× 10−3) + µlnC + 2µlnV

µlnP = ln(1.165× 10−3) + ln(1.8) + 2 ln(100) = 3.0431σ2

lnP = ln(1 + 0.202) + 2 ln(1 + 0.302) = 0.2116 . . . σlnP = 0.4600

The wind pressure exceeds the resistance if P/R > 1 (that is, if lnP − lnR > 0)

lnE = lnP − lnRµlnE = µlnP − µlnR = 3.0431− ln(40) = −0.646σ2

lnE = 0.2116 + ln(1 + 0.252) = 0.27220 . . . σlnE = 0.5217

The probability of the wind load load exceeding the resistance of the glass is,

P [E > 1] = 1− P [E ≤ 1] = 1− P [lnE ≤ 0] = 1− Φ(0 + 0.646

0.5217

)= 1− Φ(1.2383) = 0.11

4. Earthquakes with M > 6 earthquake shake the ground at a building site randomly. The peakground acceleration (PGA) is log-normally distributed with median of 0.2 g and a coefficientof variation of 0.25. Assume that the building will sustain no damage for ground motionshaking up to 0.3 g. What is the probability of damage from an earthquake of M > 6?

P [D|M > 6] = P [PGA > 0.3] = 1−P [PGA ≤ 0.3] = 1−Φ( ln(0.3)− ln(0.2)

0.25

)= 1−0.947 = 0.053.




There have been two earthquakes with M > 6 in the last 50 years. What is the probabilityof no damage from earthquakes with M > 6 in the next 20 years?

P [D′|M > 6] = 1− 0.053 = 0.947

From the law of total probability,

P [D′] in 20 yr = P [D′|0 EQ M > 6 in 20yr] · P [0 EQ M > 6 in 20yr] +P [D′|1 EQ M > 6 in 20yr] · P [1 EQ M > 6 in 20yr] +P [D′|2 EQ M > 6 in 20yr] · P [2 EQ M > 6 in 20yr] +P [D′|3 EQ M > 6 in 20yr] · P [3 EQ M > 6 in 20yr] + · · ·

where P [D′|n EQ M > 6] = (P [D′|1 EQ M > 6])n (assuming damage from an earthquakedoes not weaken the building . . . ) So,

P [D′] in 20 yr =∞∑n=0

(0.947)n (20/25)n

n! exp(−20/25)

= exp(−0.8)[1 + 0.9470.8

1! + (0.947)2 0.82

2! + (0.947)3 0.83

3! + · · ·]

= exp(−0.8) · exp(0.947 · 0.8) = 0.958

The probability of damage from earthquakes in the next 20 years (given the assumptions inthis example) is close to 4%. Would that be an acceptable level of risk for you?

7 Empirical PDFs, CDFs, and exceedence rates (nonparametric statistics)

The PDF and CDF of a sample of random data can be computed directly from the sample, withoutassuming any particular probability distribution . . . (such as a normal, exponential, or other kindof distribution).

A random sample of N data points can be sorted into increasing numerical order, so that

x1 ≤ x2 ≤ · · · ≤ xi−1 ≤ xi ≤ xi+1 ≤ · · · ≤ xN−1 ≤ xN .

In the ordered sample there are i data points less than or equal to xi. So, if the sample is represen-tative of the population, and the sample is “big enough” the probability that a random X is lessthan or equal to the ith ordered value is i/N . In other words, P [X ≤ xi] = i/N . Unless we knowthat no value of X can exceed xN , we must accept some probability that X > xN . So, P [X ≤ xN ]should be less than 1. In such cases, the unbiased estimate1 E[FX(xi)] for P [X ≤ xi] is i/(N + 1)

The empirical CDF computed from a ordered sample of N values is

F̂X(xi) = i

N + 1

The empirical PDF is basically a histogram of the data. The following Matlab lines plot empiricalCDFs and PDFs from a vector of random data, x.

1 E.J. Gumbel, Statistics of extremes, Columbia Univ Press, 1958Lasse Makkonen, “Problems in the extreme value analysis,” Struct. Safety 2008:30:405-419


https://www.sciencedirect.com/science/article/pii/S0167473007000045



1 N = length(x); % number o f v a l u e s in the sample2 nBins = f loor (N /50); % number o f b ins in the histogram3 [fx ,xx] = hist (x, nBins ); % compute the histogram4 fx = fx / N * nBins /(max(x)-min(x))); % s c a l e the histogram to a PDF5 F_x = ([1:N])/(N+1); % empir i ca l CDF6 subplot (211); bar(xx ,fx ); % p l o t empi r i ca l PDF7 subplot (212); sta irs ( sort (x), F_x ); % p l o t empi r i ca l CDF8 probability_of_failure = sum(x >0) / N % p r o b a b i l i t y t h a t X > 0

The number of values in the sample greater than xi is (N − i). If the sample is representative, theprobability of a value exceeding xi is Prob[X > xi] = 1−FX(xi) ≈ 1− i/N . If the N samples werecollected over a period of time T , the average exceedence rate (number of events greater than xiper unit time) is ν(xi) = N(1− FX(xi))/T ≈ N(1− i/N)/T = (N − i)/T .

8 Random variable generation using the Inverse CDF method

A sample of a random variable having virtually any type of CDF, P [X ≤ x] = P = FX(x) canbe generated from a sample of a uniformly distributed random variable, U , (0 < U < 1), aslong as the inverse CDF, x = F−1

X (P ) can be computed. There are many numerical methods forgenerating a sample of uniformly distributed random numbers. It is important to be aware thatsamples from some methods are “more random” than samples from others. The Matlab commandu = rand(1,N) computes a (row) vector sample of N uniformly distributed random numbers with0 < u < 1.

If X is a continuous rv with CDF FX(x) and U has a uniform distribution on (0, 1), then therandom variable F−1

X (U) has the distribution FX . Thus, in order to generate a sample of datadistributed according to the CDF FX , it suffices to generate a sample, u, of the rv U ∼ U [0, 1] andthen make the transformation x = F−1

X (u).

For example, if X is exponentially distributed, the CDF of X is given by

FX(x) = 1− e−x/µ,

soF−1X (u) = −µ ln(1− FX(x)).

Therefore if u is a value from a uniformly distributed rv in [0, 1], then

x = −µ ln(u)

is a value from an exponentially distributed random variable. (If U is uniformly distributed in [0,1]then so is 1− U .)

As another example, if X is log-normally distributed, the CDF of X is

FX(x) = Φ( ln x− ln xm

σlnX

).

If u is a sample from a standard uniform distribution, then

x = exp[ln xm + Φ−1(u)σlnX

]is a sample from a lognormal distribution.

Note that since expressions for Φ(z) and Φ−1(P ) do not exist, the generation of normally distributedrandom variables requires other numerical methods. x = muX + sigX*randn(1,N) computes a(row) vector sample of N normally distributed random numbers.




0u

1

pdf

f (u

) U

x

−1

u = F (x)

x

F(x)X

dx

XdF

f (x) =

pdf

cdf

X

X

x = F (u)

X

x=a x=b

x=bx=a

1

0u

1

pdf

f (u

) U

x

−1

u = F (x)

x

F(x)X

dx

XdF

f (x) =

pdf

cdf

X

X

x = F (u)

X

x=a

x=bx=a

x=b

1

Figure 1. Examples of the generation of uniform random variables from the inverse CDF method.

1

pdf

f (u

) U

x

x

dx

XdF

f (x) = X

01

u

u = F (x)X

x = F (u)X

−1

F(x)X

pdf

cdf

pdf

f (u

) U

x

x

dx

XdF

f (x) = X

u1 1

1

u = F (x)X

0

pdf

cdf

x = F (u)

F(x)X

X

−1

Figure 2. Examples of the generation of random variables from the inverse CDF method. The density ofthe horizontal arrows u is uniform, whereas the density of the vertical arrows, x = F−1

X (u), is proportionalto F ′X(x), that is, proportional to fX(x).




9 Functions of Random Variables and Monte Carlo Simulation

The probability distributions of virtually any function of random variables can be computed usingthe powerful method of Monte Carlo Simulation (MCS). MCS involves computing values of functionswith large samples of random variables.

For example, consider a function of three random variables, X1, X2, and X3, where X1 is normallydistributed with mean of 6 and standard deviation of 2, X2 is log-normally distributed with medianof 2 and coefficient of variation of 0.3, and X3 is Rayleigh distributed with mode of 1. The function

Y = sin(X1) +√X2 − exp(−X3)− 2

is a function of these three random variables and is therefore also random. The distribution functionand statistics of Y may be difficult to derive analytically, especially if the function Y = g(X) iscomplicated. This is where MCS is powerful. Given samples of N values of X1, X2 and X3, asample of N values of Y can also be computed. The statistics of Y (mean, variance, PDF, andCDF) can be estimated by computing the average value, sample variance, histogram, and empericalCDF of the sample of Y . The probability P [Y > 0] can be estimated by counting the number ofpositive values in the sample and dividing by N . The Matlab command P_Y_gt_0 = sum(y>0)/Nmay be used to estimate this probability.

10 Monte Carlo Simulation in Matlab

1 % MCS intro .m2 % Monte Carlo Simulat ion . . . an i n t r o d u c t o r y example3 %4 % Y = g (X1,X2,X3) = s in (X1) + s q r t (X2) − exp(−X3) − 2 ;5 %6 % H.P. Gavin , Dept . C i v i l and Environmental Engineering , Duke Univ , Jan . 201278 % X1 X2 X39 % normal lognormal Rayle igh

10 mu1 = 6; med2 = 2; mod3 = 1;11 sd1 = 2; cv2 = 0.3;1213 N = 1000; % number o f random v a l u e s in the sample1415 % (1) generate a l a r g e sample f o r each random v a r i a b l e in the problem . . .1617 X1 = mu1 + sd1*randn(1,N);18 X2 = logn_rnd (med2 ,cv2 ,1,N);19 X3 = Rayleigh_rnd (mod3 ,1,N);2021 % (2) e v a l u a t e the func t i on f o r each random v a r i a b l e to compute a new sample2223 Y = sin (X1) + sqrt (X2) - exp(-X3) - 2;2425 % suppose ” p r o b a b i l i t y o f f a i l u r e ” i s Prob [ g (X1,X2,X3) > 0 ] . . .2627 Probability_of_failure = sum(Y >0) / N2829 % (3) p l o t h is tograms of the random v a r i a b l e s3031 sort_X1 = sort (X1 );32 sort_X2 = sort (X2 );33 sort_X3 = sort (X3 );34 CDF = ([1:N] -0.5) / N; % empir i ca l CDF of a l l q u a n t i t i e s




0 2 4 6 8 100

0.05

0.1

0.15

P.D

.F.

0 2 4 6 8 10

0.2

0.4

0.6

0.8

X1 : normal

C.D

.F.

1 2 3 4 50

0.2

0.4

0.6

1 2 3 4 5

0.2

0.4

0.6

0.8

X2 : log-normal

0.5 1 1.5 2 2.5 3 3.50

0.1

0.2

0.3

0.4

0.5

0.6

0.5 1 1.5 2 2.5 3 3.5

0.2

0.4

0.6

0.8

X3 : Rayleigh

-3 -2 -1 0 10

0.2

0.4

0.6

0.8

P.D

.F.

-2.5 -2 -1.5 -1 -0.5 0 0.50

0.2

0.4

0.6

0.8

1

Y = g(X1,X

2,X

3)

C.D

.F.

Y>0

Figure 3. Analytical and empirical PDF’s and CDF’s for X1, X2, and X3, and the Empirical PDF andCDF for Y = g(X1, X2, X3)




1 nBins = f loor (N /50);2 figure (1)3 c l f4 subplot (231)5 [fx ,xx] = hist ( X1 , nBins , nBins /(max(X1)-min(X1 )) ); % histogram of X16 hold on7 bar(xx ,fx , ’FaceColor ’ ,[1 1 1])8 plot (sort_X1 , normpdf (sort_X1 ,mu1 ,sd1), ’LineWidth ’ ,2);9 hold off

10 axis (’tight ’)11 ylabel (’P.D.F.’)12 subplot (234)13 hold on14 sta irs ( sort_X1 ,CDF ,’-b’,’LineWidth ’ ,2)15 plot (sort_X1 , normcdf (sort_X1 ,mu1 ,sd1),’-r’ )16 hold off17 axis (’tight ’)18 ylabel (’C.D.F.’)19 xlabel (’X_1 : normal ’)20 subplot (232)21 [fx ,xx] = hist ( X2 , nBins , nBins /(max(X2)-min(X2 )) ); % histogram of X222 hold on23 bar(xx ,fx , ’FaceColor ’ ,[1 1 1]);24 plot (sort_X2 , logn_pdf (sort_X2 ,med2 ,cv2), ’LineWidth ’ ,2);25 hold off26 axis (’tight ’)27 subplot (235)28 hold on29 sta irs (sort_X2 ,CDF , ’-b’, ’LineWidth ’ ,2)30 plot (sort_X2 , logn_cdf (sort_X2 ,[ med2 ,cv2 ]),’-r’ )31 hold off32 axis (’tight ’)33 xlabel (’X_2 : log - normal ’)34 subplot (233)35 [fx ,xx] = hist ( X3 , nBins , nBins /(max(X3)-min(X3 )) ); % histogram of X236 hold on37 bar(xx ,fx , ’FaceColor ’ ,[1 1 1]);38 plot (sort_X3 , Rayleigh_pdf (sort_X3 ,mod3), ’LineWidth ’ ,2 );39 hold off40 axis (’tight ’)41 subplot (236)42 hold on43 sta irs (sort_X3 ,CDF , ’-b’, ’LineWidth ’ ,2)44 plot (sort_X3 , Rayleigh_cdf (sort_X3 ,mod3),’-r’ )45 hold off46 axis (’tight ’)47 xlabel (’X_3 : Rayleigh ’)48 nBins = f loor (N /20);49 figure (2)50 c l f51 subplot (211)52 [fx ,xx] = hist ( Y, nBins , nBins /(max(Y)-min(Y)) ); % histogram of Y53 hold on54 bar(xx ,fx , ’FaceColor ’ ,[1 1 1]);55 plot ([0 0] ,[0 0.5] , ’-k’,’LineWidth ’ ,3)56 hold off57 axis (’tight ’)58 ylabel (’P.D.F.’)59 subplot (212)60 hold on61 sta irs ( sort (Y),CDF)62 plot ( [0 0] ,[0 1],’-k’,’LineWidth ’ ,3);63 hold off64 axis (’tight ’)65 text (0.5 ,0.5 , ’Y >0 ’)66 ylabel (’C.D.F.’)67 xlabel (’Y = g(X_1 ,X_2 ,X_3)’);