Lecture 2 Elements of Probability...

Lecture 2Elements of Probability Theory

Americo Barbosa da Cunha Junior

Universidade do Estado do Rio de Janeiro – UERJ

NUMERICO – Nucleus of Modeling and Experimentation with Computers

http://numerico.ime.uerj.br

[email protected]

www.americocunha.org

Uncertainties 2018April 8, 2018

Florianopolis - SC, Brazil

c© A. Cunha Jr (UERJ) Uncertainty Quantification in Computational Predictive Science 1 / 47

http://numerico.ime.uerj.br

www.americocunha.org

Probability in Dimension 1


Random experiment

An experiment which repeated under same fixed conditions producedifferent results is called random experiment.

Examples:

1 Rolling a cube-shaped fare die

2 Choosing an integer even number randomly

3 Measuring temperature


Probability space

The mathematical framework in which a random experiment is de-scribed consists of a triplet (Ω,Σ,P), called probability space.

The elements of a probability space are:

Ω: sample space(set with all possible events)

Σ: σ-algebra on Ω(set with relevant events only)

P: probability measure(measure of expectation of an event occurrence)


Sample space

A non-empty set which contains all possible events for a certainrandom experiment is called sample space, being represented by Ω.

Examples:

1 Rolling a cube-shaped fare die (finite Ω)

Ω = 1, 2, 3, 4, 5, 62 Choosing an integer even number randomly (denumerable Ω)

Ω = · · · ,−8,−6,−4,−2, 0, 2, 4, 6, 8, · · · 3 Measuring temperature in Kelvin (non-denumerable Ω)

Ω = [a, b] ⊂ [0,+∞)


σ-algebra of events

In general, not all of the events in Ω are of interest.

Intuitively, a σ-algebra on Ω is the set of relevant outcomes for arandom experiment. Formally, Σ is a σ-algebra on Ω if

φ ∈ Σ(contains the empty set)

Ac ∈ Σ for any A ∈ Σ(closed under complementation)∞⋃i=1

Ai ∈ Σ for any Ai ∈ Σ

(closed under denumerable unions)


Probability measure

A probability measure is a function P : Σ→ [0, 1] ⊂ R such that

P A ≥ 0 for any A ∈ Σ(probability is nonnegative)

P Ω = 1(entire space has probability one)

P

∞⋃i=1

Ai

=∞∑i=1

P Ai for any Ai mutually disjoint

(σ-additivity)

Remark:P φ = 0 (empty set has probability zero)


An example in continuous probability

A point is randomly choosen in a square (side b).

Event A: point lie above main diagonal

Event B: point lie in main diagonal

Event C: point lie outside main diagonal

P A =area of upper triangle

area of square=

0.5 b2

b2=

1

2

P B =area of main diagonal

area of square=

0

b2= 0

P C =area outside main diagonal

area of square=

b2 − 0

b2= 1


Remarks on probability

Probability zero

An impossible event has probability zero(e.g. roll a six faces dice, numbered from 1 to 6, and get 7)

Not every event with probability zero is impossible(e.g. randomly pick a point on the main diagonal of a square)

Probability one

An event which occurrence is certain has probability one(e.g. throw a coin and obtain head or tail)

Not every event with probability one occurs(e.g. randomly pick a point outside square’s main diagonal)


Conditional probability

Consider a pair of random events A and B such that P B > 0.

The conditional probability of A, given the occurrence of B, denoted

as PA|B

, is defined as

PA | B

=P A ∩ BP B

.

It follows that

P A ∩ B = PA | B

× P B .


An example in conditional probability

Somebody rolls a pair of six-sided dice.

A = value rolled on die 1

B = value rolled on die 2

What is the probability that A = 2 given that A + B ≤ 5?

A=2

P A = 6/36

A+B≤5

P A+B≤5 = 10/36

A |A+B≤5

P A |A+B≤5 = 3/10


Conditional probability — Wikipedia, The Free Encyclopedia, 2017.

An example in conditional probability

Somebody rolls a pair of six-sided dice.

A = value rolled on die 1

B = value rolled on die 2

What is the probability that A = 2 given that A + B ≤ 5?A=2

P A = 6/36

A+B≤5

P A+B≤5 = 10/36

A |A+B≤5

P A |A+B≤5 = 3/10


Conditional probability — Wikipedia, The Free Encyclopedia, 2017.

Independence of events

If the occurrence of an event B does not affect the occurrence of anevent A one has

PA | B

= P A .

In this way, once P A ∩ B = PA | B

× P B it is true that

P A ∩ B = P A × P B .

Events A and B in which the latter holds are said to be independent.

Remark:This notion generalizes itself naturally to n events.


An example in events independence

A card is drawn from a deck with 52 unknown cards.

Event 1: Q “queen”Event 2: ♠“spade”Are these events independent?

Probability space 2 (unfair deck):

cards =

2♦,3♦,4♦,5♦,6♦,7♦,8♦,9♦,10♦,J♦,Q♦,K♦,A♦,

2♥,3♥,4♥,5♥,6♥,7♥,8♥,9♥,10♥,J♥,Q♥,K♥,A♥,Q♠,Q♠,Q♠,Q♠,Q♠,Q♠,Q♠,Q♠,Q♠,Q♠,Q♠,Q♠,Q♠,Q♠,Q♠,Q♠,Q♠,Q♠,Q♠,Q♠,Q♠,Q♠,Q♠,Q♠,Q♠,Q♠,

P2 Q = 28/52P2 ♠ = 1/2P2 Q♠ = 1/2 6= 28/52︸︷︷︸

P2Q

× 1/2︸︷︷︸P2♠

Events are not independent.





Probability space 1 (fair deck):

cards =

2♦,3♦,4♦,5♦,6♦,7♦,8♦,9♦,10♦,J♦,Q♦,K♦,A♦,

2♣,3♣,4♣,5♣,6♣,7♣,8♣,9♣,10♣,J♣,Q♣,K♣,A♣,2♥,3♥,4♥,5♥,6♥,7♥,8♥,9♥,10♥,J♥,Q♥,K♥,A♥,2♠,3♠,4♠,5♠,6♠,7♠,8♠,9♠,10♠,J♠,Q♠,K♠,A♠

P1 Q = 4/52P1 ♠ = 13/52P1 Q♠ = 1/52 = 4/52︸︷︷︸

P1Q

× 13/52︸︷︷︸P1♠

Events are independent.


cards =

2♦,3♦,4♦,5♦,6♦,7♦,8♦,9♦,10♦,J♦,Q♦,K♦,A♦,


P2 Q = 28/52P2 ♠ = 1/2P2 Q♠ = 1/2 6= 28/52︸︷︷︸

P2Q

× 1/2︸︷︷︸P2♠







cards =

2♦,3♦,4♦,5♦,6♦,7♦,8♦,9♦,10♦,J♦,Q♦,K♦,A♦,

2♣,3♣,4♣,5♣,6♣,7♣,8♣,9♣,10♣,J♣,Q♣,K♣,A♣,2♥,3♥,4♥,5♥,6♥,7♥,8♥,9♥,10♥,J♥,Q♥,K♥,A♥,2♠,3♠,4♠,5♠,6♠,7♠,8♠,9♠,10♠,J♠,Q♠,K♠,A♠

P1 Q = 4/52

P1 ♠ = 13/52P1 Q♠ = 1/52 = 4/52︸︷︷︸

P1Q

× 13/52︸︷︷︸P1♠



cards =

2♦,3♦,4♦,5♦,6♦,7♦,8♦,9♦,10♦,J♦,Q♦,K♦,A♦,


P2 Q = 28/52P2 ♠ = 1/2P2 Q♠ = 1/2 6= 28/52︸︷︷︸

P2Q

× 1/2︸︷︷︸P2♠







cards =

2♦,3♦,4♦,5♦,6♦,7♦,8♦,9♦,10♦,J♦,Q♦,K♦,A♦,

2♣,3♣,4♣,5♣,6♣,7♣,8♣,9♣,10♣,J♣,Q♣,K♣,A♣,2♥,3♥,4♥,5♥,6♥,7♥,8♥,9♥,10♥,J♥,Q♥,K♥,A♥,2♠,3♠,4♠,5♠,6♠,7♠,8♠,9♠,10♠,J♠,Q♠,K♠,A♠

P1 Q = 4/52P1 ♠ = 13/52

P1 Q♠ = 1/52 = 4/52︸︷︷︸P1Q

× 13/52︸︷︷︸P1♠



cards =

2♦,3♦,4♦,5♦,6♦,7♦,8♦,9♦,10♦,J♦,Q♦,K♦,A♦,


P2 Q = 28/52P2 ♠ = 1/2P2 Q♠ = 1/2 6= 28/52︸︷︷︸

P2Q

× 1/2︸︷︷︸P2♠







cards =

2♦,3♦,4♦,5♦,6♦,7♦,8♦,9♦,10♦,J♦,Q♦,K♦,A♦,

2♣,3♣,4♣,5♣,6♣,7♣,8♣,9♣,10♣,J♣,Q♣,K♣,A♣,2♥,3♥,4♥,5♥,6♥,7♥,8♥,9♥,10♥,J♥,Q♥,K♥,A♥,2♠,3♠,4♠,5♠,6♠,7♠,8♠,9♠,10♠,J♠,Q♠,K♠,A♠

P1 Q = 4/52P1 ♠ = 13/52P1 Q♠ = 1/52

= 4/52︸︷︷︸P1Q

× 13/52︸︷︷︸P1♠



cards =

2♦,3♦,4♦,5♦,6♦,7♦,8♦,9♦,10♦,J♦,Q♦,K♦,A♦,


P2 Q = 28/52P2 ♠ = 1/2P2 Q♠ = 1/2 6= 28/52︸︷︷︸

P2Q

× 1/2︸︷︷︸P2♠







cards =

2♦,3♦,4♦,5♦,6♦,7♦,8♦,9♦,10♦,J♦,Q♦,K♦,A♦,

2♣,3♣,4♣,5♣,6♣,7♣,8♣,9♣,10♣,J♣,Q♣,K♣,A♣,2♥,3♥,4♥,5♥,6♥,7♥,8♥,9♥,10♥,J♥,Q♥,K♥,A♥,2♠,3♠,4♠,5♠,6♠,7♠,8♠,9♠,10♠,J♠,Q♠,K♠,A♠

P1 Q = 4/52P1 ♠ = 13/52P1 Q♠ = 1/52 = 4/52︸︷︷︸

P1Q

× 13/52︸︷︷︸P1♠



cards =

2♦,3♦,4♦,5♦,6♦,7♦,8♦,9♦,10♦,J♦,Q♦,K♦,A♦,


P2 Q = 28/52P2 ♠ = 1/2P2 Q♠ = 1/2 6= 28/52︸︷︷︸

P2Q

× 1/2︸︷︷︸P2♠







cards =

2♦,3♦,4♦,5♦,6♦,7♦,8♦,9♦,10♦,J♦,Q♦,K♦,A♦,


P2 Q = 28/52P2 ♠ = 1/2P2 Q♠ = 1/2 6= 28/52︸︷︷︸

P2Q

× 1/2︸︷︷︸P2♠







cards =

2♦,3♦,4♦,5♦,6♦,7♦,8♦,9♦,10♦,J♦,Q♦,K♦,A♦,


P2 Q = 28/52

P2 ♠ = 1/2P2 Q♠ = 1/2 6= 28/52︸︷︷︸

P2Q

× 1/2︸︷︷︸P2♠







cards =

2♦,3♦,4♦,5♦,6♦,7♦,8♦,9♦,10♦,J♦,Q♦,K♦,A♦,


P2 Q = 28/52P2 ♠ = 1/2

P2 Q♠ = 1/2 6= 28/52︸︷︷︸P2Q

× 1/2︸︷︷︸P2♠







cards =

2♦,3♦,4♦,5♦,6♦,7♦,8♦,9♦,10♦,J♦,Q♦,K♦,A♦,


P2 Q = 28/52P2 ♠ = 1/2P2 Q♠ = 1/2

6= 28/52︸︷︷︸P2Q

× 1/2︸︷︷︸P2♠







cards =

2♦,3♦,4♦,5♦,6♦,7♦,8♦,9♦,10♦,J♦,Q♦,K♦,A♦,


P2 Q = 28/52P2 ♠ = 1/2P2 Q♠ = 1/2 6= 28/52︸︷︷︸

P2Q

× 1/2︸︷︷︸P2♠



Further remarks on probability

The last example shows that:

Different probability spaces, for the same random experiment,can produce different predictions

A probability space that does not accurately describe a randomevent can produce completely erroneous predictions

The notion of independence strongly depends on the probabilitymeasure employed


Random variable

A mapping X : Ω→ R is called a random variable (RV) if thepreimage of every real number under X is a relevant event, i.e.,

X−1(x) =ω ∈ Ω : X (ω) ≤ x

∈ Σ, for every x ∈ R.

A collection of events in Ω ismapped to an interval E on thereal line under such mapping.

RV are numerical characteristics of interesting events.

Remark:A random variable is a function from Ω to R, not a real number.


*Picture from http://theanalysisofdata.com/probability/2_1.htmll

http://theanalysisofdata.com/probability/2_1.htmll

Examples of random variables

1 Rolling die experiment

Ω = 1, 2, 3, 4, 5, 6

X1(ω) =

1 if ω is even

0 if ω is odd

(random variable)

Σ =φ, 1, 3, 5, 2, 4, 6,Ω

X2(ω) = ω2

(not a random variable)

2 Temperature (in Kelvin) measurement experiment

Ω = [a, b] ⊂ [0,+∞)

X (ω) = −459.67 + 1.8ω(random variable)

Σ = B[a,b] (Borel σ-algebra)


G. Grimmett and D. Welsh, Probability: An Introduction. Oxford University Press, 2 edition, 2014.

Probability distribution

The probability distribution of X , denoted by FX , is defined as theprobability of the elementary event X ≤ x, i.e.,

FX (x) = P X ≤ x .

FX is also known as cumulative distribution function (CDF) and hasthe following properties:

0 ≤ FX (x) ≤ 1

FX is a monotonic, non decreasing, right continuous function

Px1 < X ≤ x2 = FX (x2)− FX (x1) =

∫ x2

x1

dFX (x)∫RdFX (x) = 1

FX (−∞) = 0 and FX (+∞) = 1


Probability density function

If the function FX is differentiable, its derivative pX (x) = dFX (x)/dxis called probability density function (PDF) of X , and one has

FX (x) =

∫ x

−∞pX (ξ) dξ.

Note also that:

pX (x) ≥ 0 for every x ∈ R∫RpX (x) dx = 1

Remark:Intuitively, pX (x) dx can be thought of as the probability of Xfalling within the infinitesimal interval [x , x + dx ].


Types of random variables

1 Discrete random variableDistribution is discontinuous.Assumes a denumerable number of values.Typically associated with counting processes.

2 Continuous random variableDistribution is continuous.Assumes a non-denumarable number of values.Typically associated with measuring processes.

3 Mixed random variableDistribution has points of discontinuity.Assumes a non-denumarable number of values.A “mixture” of the two previous types.

4 Singular random variableDistribution is different from the other types.It has theoretical interest only.


Function of random variable

The function of random variable Y = h (X ), for a random vari-able X and measurable mapping h : R → R, is also a randomvariable, which has its own probability distribution.

Example:

Let h(x) = x2 and X : Ω→ [1, 2] be a random variable such that

FX (x) =

0 if x < 1

1/2 if 1 ≤ x ≤ 2

1 if x > 2.

The composition Y = X 2 is a random variable with distribution

FY (y) =

0 if y < 1

1/2√y if 1 ≤ y ≤ 4

1 if y > 4.


Mathematical expectation operator

The mathematical expectation of a random variable X is defined as

E X =

∫Rx dFX (x).

The mathematical expectation is a linear operator since

E α1 X1 + α2 X2 = α1 E X1+ α2 E X2 ,

for any pairs of number α1, α2 and random variables X1,X2.

Theorem (law of the unconscious statistician):

Given a measurable mapping h : R → R and a random variable Xthe expected value of h (X ) is given by

Eh (X )

=

∫Rh (x) dFX (x).


Mean value

The mean value of the random variable X is defined as

µX = E X

=

∫Rx dFX (x)

=

∫Rx pX (x) dx .

(measure of the central tendency)

Remark:The mean value µX is the constant which best approximate therandom variable X . The error of this approximation is the standarddeviation σX .


Variance

The variance of the random variable X is defined as

σ2X = E

(X − µX )2

=

∫R

(x − µX )2 dFX (x)

=

∫R

(x − µX )2 pX (x) dx .

(measure of dispersion about the mean)

Note that variance can also be written as

σ2X = E

X 2−(E X

)2.

Remark:σ2X has the same unit as X 2.


Standard deviation and variation coefficient

Other second-order statistics of X are the standard deviation

σX =√σ2X ,

and the variation coefficient

δX = σX/µX , µX 6= 0.

(both are measures of dispersion about the mean)

Remark:σX has the same unit as X and δX is dimensionless.


Probability Distributions


Uniform

Notation: U(a, b)

Support: [a, b]

Parameters:

−∞ < a < b < +∞ — boundaries

PDF:

pX (x) =1

b − a1[a,b](x)

Statistics:

µ =1

2(a + b)

σ2 =1

12(b − a)2

Probability Density Function

Cumulative Distribution Function


Uniform distribution — Wikipedia, The Free Encyclopedia, 2017.

Gamma

Notation: Gamma(k, θ)

Support: (0,+∞)

Parameters:

k — shape parameterθ — scale parameter

PDF:

pX (x) =1

Γ(k) θkxk−1 e(−x/θ) 1(0,+∞)(x)

Statistics:

µ = k θσ2 = k θ2




Gamma distribution — Wikipedia, The Free Encyclopedia, 2017.

Gaussian

Notation: N (µ, σ2)

Support: (−∞,+∞)

Parameters:

µ — meanσ2 — variance

PDF:

pX (x) =1√

2π σ2exp

−(x − µ)2

2σ2




Normal distribution — Wikipedia, The Free Encyclopedia, 2017.

Characterization of a probability distribution

Given the mean µ and standard deviation σ of a random variable.Is the distribution well-defined?

No! Multiple distributions may have the same statistics.

pX (x) =1√6π

exp

−(x − 1)2

6

and pX (x) =

1

61[−2,4](x)

have µ = 1 and σ =√

3.

What type of information determines a distribution?

cumulative distribution function

probability density function (if exists)

quantile function

characteristic function

moment-generating function (if exists)


Probability in Dimension n


Random vector

Let x = (x1, · · · , xn) ∈ Rn. A random vector X = (X1, · · · ,Xn) is acollection of n random variables Xi : Ω → R that together may beconsidered a (measurable) mapping X : Ω→ Rn.

A collection of event in Ωis mapped into a region onthe Euclidean space under suchmapping.


*Picture from http://theanalysisofdata.com/probability/4_1.html

http://theanalysisofdata.com/probability/4_1.html

Joint probability distribution

The joint probability distribution of random vector X = (X1, · · · ,Xn),denoted by FX, is defined as

FX(x1, · · · , xn) = PX1 ≤ x1 ∩ · · · ∩ Xn ≤ xn

.

Thus,

P a < X ≤ b =

∫ b1

a1

· · ·∫ bn

an

dFX (x1, · · · , xn),

in which a < X ≤ b = a1 < X1 ≤ b1 ∩ · · · ∩ an < Xn ≤ bn.

FX is also known as joint cumulative distribution function.


Joint probability density function

If pX(x1, · · · , xn) = ∂n FX/∂x1 · · · ∂xn exists, for any x1, · · · , xn,then it is called joint probability density function of X, and one has

FX(x1, · · · , xn) =

∫ x1

−∞· · ·∫ xn

−∞pX(ξ1, · · · , ξn) dξ1 · · · dξn.

Note also that:

pX(x1, · · · , xn) ≥ 0 for every (x1, · · · , xn) ∈ Rn∫ +∞

−∞· · ·∫ +∞

−∞pX(x1, · · · , xn) dx1 · · · dxn = 1


Marginal probability density function

The marginal probability density function of Xi is defined as

pXi(xi ) =

∫ +∞

−∞· · ·∫ +∞

−∞︸︷︷︸n−1 times

pX(x1, · · · , xn) dx1 · · · dxi−1 dxi+1 · · · dxn,

for i = 1, · · · , n.


Conditional distribution

Consider a pair of jointly distributed random variables X and Y .The conditional distribution of X , given the occurrence of the valuey of Y , is defined as

FX |Y (x | y) =FX ,Y (x , y)

FY (y).

Thus

FX ,Y (x , y) = FX |Y (x |y)× FY (y),

and

pX ,Y (x , y) = pX |Y (x |y)× pY (y).

Remark:This definition extends naturally to the n-dimensional case.


Independence of distributions

The random variables X and Y are said to be independent if therealization of X does not affect the probability distribution of Y ,i.e.,

FX |Y (x |y) = FX (x).

Therefore, for independent random variable one has

FX Y (x , y) = FX (x)× FY (y),

and

pX Y (x , y) = pX (x)× pY (y).

Remark:This definition extends naturally to the n-dimensional case.


Statistics of random vectors

second-order random vector

E‖ X ‖2

=

∫Rn

‖ x ‖2 dFX(x) < +∞

mean vector

mX = E X =

∫Rn

x dFX(x) ∈ Rn

correlation matrix

[RXY] = E

XYT

=

∫Rn

xxT dFXY(x, y) ∈ Rn×n

covariance matrix

[KXY] = E

(X−mX) (Y−mY)T∈ Rn×n

= [RXY]−mXmTY

Remark:Matrices [RXY] and [KXY] are symetric positive semi-definitewhen X = Y.


Correlation of random variables

The random vectors X = (X1, · · · ,Xn) and Y = (Y1, · · · ,Yn) aresaid to be uncorrelated if covariance matrix [KXY] is null, i.e.,

[RXY] = mXmTY .

If two random vectors are independent, then they are uncorrelated.

independence =⇒ uncorrelation

But uncorrelated random vectors are not independent in general.

uncorrelation 6=⇒ independence

Remark:Uncorrelated random vectors which the joint distribution is Gaussianare independent.


Notions of Random Processes


Random process

A real-valued random process (also called stochastic process) definedon probability space (Ω,Σ,P), indexed by t ∈ T , is a mapping

(t, ω) ∈ T × Ω→ X (t, ω) ∈ R,

such that, for fixed t, the output is a random variable X (t, ·), whilefor fixed ω, X (·, ω) is a function of t (sample function).


*Picture from https://www.vocal.com/noise-reduction/statistical-analysis-random-signals/

https://www.vocal.com/noise-reduction/statistical-analysis-random-signals/

Interpretation and classification

Note that, by definition, a real-valued random process X (t, ω) is acollection of real-valued random variables indexed by a parameter,and can be thought of as a time-dependent random variable.

According to the nature of the set of indices T , a stochastic processis classified as:

discrete – if T is countable

continuous – if T is uncountable


Wiener process (Brownian motion process)


Wiener process — Wikipedia, The Free Encyclopedia, 2017.

White noise and colored noise

White noiseColored noise


White noise — Wikipedia, The Free Encyclopedia, 2017.

Finite-dimensional distribution

A random process Xt = X (t, ω) is statistically defined to the order nby a set of its n-th order joint probability distribution, i.e.,

FXt (x1, · · · , xn) = P(Xt1 ≤ x1, · · · ,Xtn ≤ xn

),

which is the joint CDF of random variables Xt1 , · · · ,Xtn .


Statistics of a random process

second-order random process

EX 2t

=

∫Rx2 dFXt (x) < +∞

mean function

µX (t) = E Xt =

∫Rx dFXt (x)

correlation function

corrX (t1, t2) = EXt1Xt2

=

∫Rx1 x2 dFXt1Xt2

(x1, x2)

covariance function

covX (t1, t2) = E(

Xt1 − µX (t1)) (

Xt2 − µX (t2))

= corrX (t1, t2)− µX (t1)µX (t2)


Stationary process

A stochastic process is stationary when all the random variables ofthat stochastic process are identically distributed.


Stationary process — Wikipedia, The Free Encyclopedia, 2017.

References

G. Grimmett and D. Welsh, Probability: An Introduction. Oxford University Press, 2 edition, 2014.

J. Jacod and P. Protter, Probability Essentials. Springer, 2nd edition, 2004.

A. Klenke, Probability Theory: A Comprehensive Course. Springer, 2nd edition, 2014.

A. Papoulis and S. U. Pillai, Probability, Random Variables and Stochastic Processes. McGraw-HillEurope; 4th edition, 2002.


Lecture 2 Elements of Probability...

Documents

Transcript of Lecture 2 Elements of Probability...