Download - An Introduction to Probability and Stochastic Processes ... · An Introduction to Probability and Stochastic Processes for Ocean, Atmosphere, and Climate Dynamics 1: Basic Probability

An Introduction to Probability and

Stochastic Processes for Ocean,

Atmosphere, and Climate Dynamics

1: Basic Probability

Adam Monahan

[email protected]

School of Earth and Ocean Sciences, University of Victoria

An Introduction to Probability and Stochastic Processes for Ocean, Atmosphere, and Climate Dynamics1: Basic Probability – p. 1/23

Introduction

“Climate is what you expect, weather is what you get.”

- Robert A. Heinlein

⇒ “expectation” lies at heart of notion of climate

⇒ this is a fundamentally probabilistic perspective

Probability is a natural way of looking at atmosphere, ocean, and

climate dynamics, both in terms of

statistics (probability and data)

stochastics (the dynamics of probability)

These lectures will provide an overview of probability theory &

stochastic processes in the context of atmosphere, ocean, and climate

dynamics


Probability: Basic concepts

Probabilities are used to characterise processes with indeterminate

outcomes - that is, that are random

Any given random process X is associated with a set of possible

basic outcomes

Ω = x1, x2, ..., xn

(which doesn’t have to be discrete set)

Can define an “event” A as any subset of these basic outcomes, and

assign to that a probability P (A) of its occurrence, with the rules:

(i) 0 ≤ P (A) ≤ 1

(ii) P (Ω) = 1 (something’s got to happen)

(iii) P (A or B) = P (A) + P (B) − P (A and B)

(additivity of mutually exclusive events)An Introduction to Probability and Stochastic Processes for Ocean, Atmosphere, and Climate Dynamics1: Basic Probability – p. 3/23

Probability densities

Our main focus will be case of basic outcome space of continuous

variables Ω = x = ω ∈ RDefine “differential” probability of x falling in small volume dx

dP (x0) = P (x0 ≤ x ≤ x0 + dx)

If P (x) smooth, can define probability density function (pdf)

P (x0 ≤ x ≤ x0 + dx) = px(x) dx

for which∫

px(x) dx = 1

Can also define cumulative distribution function

F (x0) = P (x ≤ x0) =

∫ x0

−∞

px(x) dx


Joint and marginal pdfs

Probability of A and B together is joint probability

P (A andB) = P (A, B)

For x ∈ RN , px(x) = px(x1, x2, ..., xN ) is the joint pdf of multiple

scalar random variables

Can find pdf over subspace of x (the marginal pdf) by integrating

out other variables

E.g. in R2:

px1(x1) =

∫

px(x1, x2) dx2

px2(x2) =

∫

px(x1, x2) dx1


Conditional probabilities

The conditional probability P (A|B) measures the probability of A

given that B is known to be true

Conditional probabilities are related to joint probabilities:

P (A, B) = P (A|B)P (B) = P (B|A)P (A)

P (A|B) is a measure of the dependence of A and B

Independence: B irrelevant to A, so P (A|B) = P (A) and

P (A, B) = P (A)P (B)

For x, y independent, joint pdf factors as product of marginals:

pxy(x, y) = px(x)py(y) (independence)


Bayes’ Theorem

From properties of conditional distributions

p(x|y) =p(y|x)p(x)

p(y)=

p(y|x)p(x)∫

p(x, y)dx=

p(y|x)p(x)∫

p(y|x)p(x)dx⇒

p(x|y) =p(y|x)p(x)

∫

p(y|x)p(x)dx

Provides formal model of “knowledge acquisition” of system

variable x through “experimental outcome” y:

p(x) initial uncertain knowledge of x (the prior)

p(y|x) predicted outcome of experiment

p(x|y) improved knowledge of x given outcome of experiment y

(the posterior)


Expectation and moments

The pdf px(x) defines a measure of relative probabilities, leading to

the definition of the expectation (average) of a function f(x):

Ef =

∫

f(x)px(x) dx

Can interpret marginal pdf as “expectation” of conditional:

px1(x1) =

∫

px(x1, x2) dx2 =

∫

p(x1|x2)px2(x2) dx2

Expectation used to define moments, which are useful

characterisations of a pdf

For scalar random variables:

mean(x) = Ex pdf centre

var(x) = std2(x) = E(x − Ex)2 pdf spread


Expectation and moments

Two common higher-order (reduced) moments:

skewness

skew(x) =E(x − Ex)3

std3(x)

measures asymmetry of pdf

0 1 2 3 4 50

0.2

0.4

0.6

0.8

1

1.2

1.4

x

p x(x)

skew(x) > 0skew(x) = 0skew(x) < 0

kurtosis

kurt(x) =E(x − Ex)4

std4(x)−3

measures flatness of pdf

−4 −2 0 2 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

x

p x(x)

kurt(x) > 0kurt(x) = 0kurt(x) < 0


Moments of sea surface winds

mean(U) (m/s)

0 50 100 150 200 250 300 350−60

−40

−20

0

20

40

60

−10

−5

0

5

10

std(U) (m/s)

0 50 100 150 200 250 300 350−60

−40

−20

0

20

40

60

0

2

4

6

8

skew(U)

0 50 100 150 200 250 300 350−60

−40

−20

0

20

40

60

−2

−1

0

1

2

Zonal Wind

mean(w) (ms−1)

0 50 100 150 200 250 300 350

−50

0

50

2

4

6

8

10

12

std(w) (ms−1)

0 50 100 150 200 250 300 350

−50

0

50

1

2

3

4

5

skew(w)

0 50 100 150 200 250 300 350

−50

0

50

−1

−0.5

0

0.5

1

Wind speed


Covariance and correlation

Given two variables x and y with joint pdf pxy(x, y) define

covariance

cov(x, y) = E(x − Ex)(y − Ey)

and correlation

corr(x, y) =cov(x, y)

std(x)std(y)

These are measures of dependence between x and y

If x and y are independent, covariance and correlation vanish - but

not vice versa! (in general)


Covariance and correlation

x 2

(a)

−4 −2 0 2 4−4

−2

0

2

4

(b)

−4 −2 0 2 4−4

−2

0

2

4

x1

x 2

(c)

−4 −2 0 2 4−4

−2

0

2

4

x1

(d)

−4 −2 0 2 4−4

−2

0

2

4

(a) indep, uncorr

(b) dep, corr

(c) dep, uncorr

(d) dep, corr


Covariance matrix & EOFs

For x ∈ RN define covariance matrix

Σxx = E

(x − Ex)(x − Ex)T

so var(x) = diag(Σxx)

Efficiently described in terms of eigenvectors ek known as empirical

orthogonal functions (EOFs):

Σxxek = λkek

EOFs ⇒ orthogonal basis in RN which efficiently partitions variance

In EOF basis new (uncorrelated) variables given by projections

ak = x · ek with diagonal covariance matrix

(Σaa)ij = λjδij


Families of pdfs

There are many families of pdfs characterised by one or more

parameters whose properties are well-studied

The most important of these are the two-parameter Gaussian (or

normal) distributions

For a Gaussian scalar x

px(x) =1√

2πσ2exp

(

−(x − µ)2

2σ2

)

such that

mean(x) = µ std(x) = σ

skew(x) = 0 kurt(x) = 0

−4 −2 0 2 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

(x−µ)/σ

p x(x)


Multivariate Gaussian

Random vector X ∈ RN is multivariate Gaussian when pdf is

p(x) =1

√

(2π)N det Σxx

exp

(

−1

2(x − µ)T Σ−1

xx (x − µ)

)

where µ = mean(x)

Σxx = cov(x,x)

For a multivariate Gaussian

corr(xi, xj) = 0

⇒ x1, x2 independent

all marginals are Gaussian

x1

x 2

−5 0 5−4

−3

−2

−1

0

1

2

3

4p(x

1,x

2)

EOF1

EOF2


Central limit theorem and law of large numbers

One reason that Gaussian distributions are important is the central

limit theorem, which states (loosely) that if xn is a sequence of

independent mean-zero random variables with finite variance, then

Zn =1√n

n∑

i=1

xi

becomes Gaussian with finite variance as n → ∞Another useful asymptotic result is the law of large numbers: for xn

a sequence of independent and identically distributed random

variableslim

n→∞

1

n

n∑

i=1

xn → mean(x)

That is: as n becomes very large, fluctuations in sum vanish


Maximum entropy pdfs

Get moments from pdf; but how can we go the other way?

The entropy

H = −∫

px(x) ln px(x)dx

measures “information content” of a pdf

Given first N moments, can find pdf which maximises H subject to

the constraint it must have specified moments; takes form

px(x) =1

Zexp

(

Lixi)

where Li are Lagrange multipliers of constrained optimisation

Provides “least biased” pdf with given information

Only mean and variance given, maximum entropy pdf Gaussian


Change of variables

Suppose y = f(x) where f is invertible

Variable y is just another way of talking about x; intuitively expect

P (y1 < y < y2) = P (x1 < x < x2)

if y1 = f(x1) and y2 = f(x2) ⇒ “conservation of probability”

In terms of densities:

px(x) dx = py(y) dy = py(y)df

dxdx

so

py(y) =px(f−1(y))

df/dx

For x,y ∈ RN

py(y) =px(f−1(y))

det(∂fi/∂xj)An Introduction to Probability and Stochastic Processes for Ocean, Atmosphere, and Climate Dynamics1: Basic Probability – p. 18/23

Change of variables: example

Know the joint pdf of zonal and meridional wind components:

puv(u, v), but want marginal pdf of wind speed pw(w)

1. move from (u, v) to polar coordinates (w, θ):

(u, v) = (w cos θ, w sin θ)

2.∣

∣

∣

∣

∣

∣

∂wu ∂θu

∂wv ∂θv

∣

∣

∣

∣

∣

∣

=

∣

∣

∣

∣

∣

∣

cos θ −w sin θ

sin θ w cos θ

∣

∣

∣

∣

∣

∣

= w

⇒ pwθ(w, θ) = wpuv(w cos θ, w sin θ)

3. Integrate over θ to get marginal:

pw(w) = w

∫ π

−π

puv(w cos θ, w sin θ) dθ


Case study: Tropical Pacific SST

EOF1

150oE 180oW 150oW 120oW 90oW 16oS 8oS 0o

8oN 16oN

0 0.2 0.4 0.6 0.8

EOF2


8oN 16oN

−0.4 −0.2 0 0.2 0.4

Antisymmetric Component a(1)


8oN 16oN

−0.2 0 0.2 0.4 0.6 0.8

Symmetric Component a(2)


8oN 16oN

−0.15 −0.1 −0.05 0 0.05 0.1

Standard Deviation


8oN 16oN

0.2 0.4 0.6 0.8 1

Skewness


8oN 16oN

−0.5 0 0.5 1


Case study: Tropical Pacific SST

−3 −2 −1 0 1 2 3 4−3

−2

−1

0

1

2

3

PC1

PC

2


Case Study: 20 hPa geopotential height


References

Gardiner, C.W. Handbook of Stochastic Methods, Springer

Monahan, A.H. & A. Dai, 2004. “The spatial and temporal structure

of ENSO nonlinearity”. J. Clim, 17, 3026-3036.

Monahan, A.H., J.C. Fyfe, & L. Pandolfo, 2003. “The vertical

structure of wintertime climate regimes of the Northern Hemisphere

extratropical atmosphere”. J. Clim, 16, 2005-2021.

von Storch, H. & F.W. Zwiers, Statistical Analysis in Climate

Research, Cambridge University Press

Wilks, D.S. Statistical Methods in the Atmospheric Sciences,

Academic Press