Environmental Data Analysis with MatLab

44
Environmental Data Analysis with MatLab Lecture 3: Probability and Measurement Error

description

Environmental Data Analysis with MatLab. Lecture 3: Probability and Measurement Error. SYLLABUS. - PowerPoint PPT Presentation

Transcript of Environmental Data Analysis with MatLab

Page 1: Environmental Data Analysis with  MatLab

Environmental Data Analysis with MatLab

Lecture 3:Probability and Measurement Error

Page 2: Environmental Data Analysis with  MatLab

Lecture 01 Using MatLabLecture 02 Looking At DataLecture 03 Probability and Measurement Error Lecture 04 Multivariate DistributionsLecture 05 Linear ModelsLecture 06 The Principle of Least SquaresLecture 07 Prior InformationLecture 08 Solving Generalized Least Squares Problems Lecture 09 Fourier SeriesLecture 10 Complex Fourier SeriesLecture 11 Lessons Learned from the Fourier TransformLecture 12 Power SpectraLecture 13 Filter Theory Lecture 14 Applications of Filters Lecture 15 Factor Analysis Lecture 16 Orthogonal functions Lecture 17 Covariance and AutocorrelationLecture 18 Cross-correlationLecture 19 Smoothing, Correlation and SpectraLecture 20 Coherence; Tapering and Spectral Analysis Lecture 21 InterpolationLecture 22 Hypothesis testing Lecture 23 Hypothesis Testing continued; F-TestsLecture 24 Confidence Limits of Spectra, Bootstraps

SYLLABUS

Page 3: Environmental Data Analysis with  MatLab

purpose of the lecture

apply principles of probability theoryto data analysis

and especially to use it to quantify error

Page 4: Environmental Data Analysis with  MatLab

Error,an unavoidable aspect of measurement,

is best understood using the ideas of probability.

Page 5: Environmental Data Analysis with  MatLab

d=?

random variable, dno fixed value until it is realized

d=?indeterminate

d=1.04indeterminate

d=0.98

Page 6: Environmental Data Analysis with  MatLab

random variables have systematics

tendency to takes on some values more often than others

Page 7: Environmental Data Analysis with  MatLab

example:d = number of deuterium atomsin methane

CH

HH

HCD

HH

HCD

DH

HCD

DH

DCD

DD

Dd =0 d=1 d =2 d =3 d =4

Page 8: Environmental Data Analysis with  MatLab

tendency or random variable to take on a given value, d, described by a probability, P(d)

P(d) measured in percent, in range 0% to 100%or

as a fraction in range 0 to 1

Page 9: Environmental Data Analysis with  MatLab

P0.0 0.5

0

1

2

3

4 d

d P0 0.101 0.302 0.403 0.154 0.05

d P0 10%1 30%2 40%3 15%4 5%

P

four different ways to visualize probabilities

Page 10: Environmental Data Analysis with  MatLab

probabilities must sum to 100%

the probability that d is something is 100%

Page 11: Environmental Data Analysis with  MatLab

continuous variablescan take fractional values

0

5

dept

h, d d=2.37

Page 12: Environmental Data Analysis with  MatLab

d

d1

d2

p(d)

area, A

The area under the probability density

function, p(d), quantifies the

probability that the fish in between depths d1 and d2.

Page 13: Environmental Data Analysis with  MatLab

an integral is used to determine area, and thus probability

probability that d is between d1 and d2

Page 14: Environmental Data Analysis with  MatLab

the probability that the fish is at some depth in the pond is 100% or unity

probability that d is between its minimum and

maximum bounds, dmin and dmax

Page 15: Environmental Data Analysis with  MatLab

How do these two p.d.f.’s differ?

dp(d)

dp(d)00

55

Page 16: Environmental Data Analysis with  MatLab

Summarizing a probability density function

typical value“center of the p.d.f.”

amount of scatter around the typical value“width of the p.d.f.”

Page 17: Environmental Data Analysis with  MatLab

several possible choices of a “typical value”

Page 18: Environmental Data Analysis with  MatLab

0

5

10

d

15

p(d)

mode

dmode

One choice of the ‘typical value’ is the mode or maximum

likelihood point, dmode.It is the d of the peak of

the p.d.f.

Page 19: Environmental Data Analysis with  MatLab

0

10

d

15

p(d)

median

dmedian

area=50%

area=50%

Another choice of the ‘typical value’ is the

median, dmedian.It is the d that divides

the p.d.f. into two pieces, each with 50% of the

total area.

Page 20: Environmental Data Analysis with  MatLab

0

5

10

d

15

p(d)

mean

dmean

A third choice of the ‘typical value’ is the mean or

expected value, dmean.

It is a generalization of the usual definition of the mean

of a list of numbers.

Page 21: Environmental Data Analysis with  MatLab

≈ sd

ds

≈ s NsN

data

histogram

Ns

dsp≈ s P(ds)

probability distribution

step 1: usual formula for mean

step 2: replace data with its histogram

step 3: replace histogram with probability distribution.

Page 22: Environmental Data Analysis with  MatLab

If the data are continuous, use analogous formula containing an

integral:

≈ s p(ds)

Page 23: Environmental Data Analysis with  MatLab

MabLab scripts for mode, median and mean[pmax, i] = max(p); themode = d(i);

pc = Dd*cumsum(p); for i=[1:length(p)] if( pc(i) > 0.5 ) themedian = d(i); break; endend

themean = Dd*sum(d.*p);

Page 24: Environmental Data Analysis with  MatLab

several possible choices of methods to quantify width

Page 25: Environmental Data Analysis with  MatLab

d

dtypical

p(d)dtypical – d50/2

dtypical + d50/2

area, A = 50%

One possible measure of with this the length of the d-axis over which 50%

of the area lies.

This measure is seldom used.

Page 26: Environmental Data Analysis with  MatLab

A different approach to quantifying the width of p(d) …

This function grows away from the typical value:

q(d) = (d-dtypical)2so the function q(d)p(d) is

small if most of the area is near dtypical , that is, a narrow p(d)large if most of the area is far from dtypical , that is, a wide p(d)

so quantify width as the area under q(d)p(d)

Page 27: Environmental Data Analysis with  MatLab

variance

width is actually square root of variance, that is, σd.

use mean for dtypical

Page 28: Environmental Data Analysis with  MatLab

d

p(d) q(d) q(d)p(d)

d

d - s

d +s

dmax

dmin

visualization of a variance calculation

now compute the area

under this function

Page 29: Environmental Data Analysis with  MatLab

MabLab scripts for mean and variance

dbar = Dd*sum(d.*p);

q = (d-dbar).^2; sigma2 = Dd*sum(q.*p); sigma = sqrt(sigma2);

Page 30: Environmental Data Analysis with  MatLab

two important probability density distributions:

uniform

Normal

Page 31: Environmental Data Analysis with  MatLab

uniform p.d.f.

ddmin dmax

p(d)1/(dmax- dmin)

probability is the same everywhere in the range of possible values

box-shaped function

Page 32: Environmental Data Analysis with  MatLab

Large probability near the mean, d. Variance is σ2.

0 10 20 30 40 50 60 70 80 90 1000

0.02

0.04

0.06

0.08

d2σ

Normal p.d.f.

bell-shaped function

Page 33: Environmental Data Analysis with  MatLab

d

d =10 30

0

40

d

0

40 s =2.5 105 20 4015 20 25

exemplary Normal p.d.f.’s

same variancedifferent means

same meansdifferent variance

Page 34: Environmental Data Analysis with  MatLab

probability between d±nσNormal p.d.f.

Page 35: Environmental Data Analysis with  MatLab

functions of random variables

data with measurement

error

data analysis process

inferences with

uncertainty

Page 36: Environmental Data Analysis with  MatLab

simple example

data with measurement

error

data analysis process

inferences with

uncertainty

one datum, duniform p.d.f.

0<d<1

m = d2 one model parameter, m

Page 37: Environmental Data Analysis with  MatLab

functions of random variables

given p(d)with m=d2

what is p(m) ?

Page 38: Environmental Data Analysis with  MatLab

use chain rule and definition of probabiltiy to deduce relationship

between p(d) and p(m)=

absolute value added to handle

case where direction of integration

reverses, that is m2<m1

Page 39: Environmental Data Analysis with  MatLab

with m=d2 and d=m1/2intervals:d=0 corresponds to m=0d=1 corresponds to m=1

p(d)=1 so m[d(m)]=1

p.d.f.: p(d) = 1sop[d(m)]=1derivative:∂d/ ∂ m = (1/2)m-1/2 so:p(m) = (1/2) m-1/2on interval 0<m<1

Page 40: Environmental Data Analysis with  MatLab

d

0

1

m

0

1

p(d) p(m)

note that p(d) is constant

while

p(m) is concentrated near m=0

Page 41: Environmental Data Analysis with  MatLab

mean and variance of linear functions of random variables

given that p(d) has mean, d, and variance, σd

2 with m=cdwhat is the

mean, m, and variance, σm

2, of p(m) ?

Page 42: Environmental Data Analysis with  MatLab

the result does not require knowledge of p(d)

formula for mean

the mean of m is c times the mean of d

Page 43: Environmental Data Analysis with  MatLab

formula for variance

the variance of m is c2 times the variance of d

Page 44: Environmental Data Analysis with  MatLab

What’s Missing ?

So far, we only have the tools to study a single inference made from a single datum.

That’s not realistic.

In the next lecture, we will develop the tools to handle many inferences drawn from many data.