Statistical Methods - uni-muenchen.de · Statistical Methods An Introduction for...

USMUSM

UniversityObservatoryMunich

Statistical MethodsAn Introduction for (Astro-)Physicists

2

USMContent

Fundamental terms of statistics and data analysis, with examples from physics and astrophysics

probability: axioms, Bayes-theoremprobability distribution functions of one or several random variables

• expectation value, variance, moments, characteristic function, variable transformation, variable

reduction, covariance

Tchebychev inequality, central limit theorem important distributions

• binomial, multinomial, Poisson, normal, exponential, chi-squared

measurement errors, error propagationestimation: consistent, unbiased, efficientMaximum Likelihood methods, minimum variance boundlinear regression, chi-square minimization, goodness of fit(hypothesis testing

• error of first and 2nd kind, F-test, Student-test, Kolmogorov-Smirnov-test)

3

USMLiterature

R.J. Barlow, Statistics, John Wiley & Sons, 1989

S. Brandt, Statistical and Computational Methods in Data Analysis, North-Holland, 1976 (2nd ed.)

G. Bohm & G. Zech, Einführung in Statistik und Messwertanalyse für Physiker, Springer, 2012?

G. Zech, Einführung in Statistik und Messwertanalyse für Physiker, Vorlesungsskript, Univ. Siegen,http://personal.ifae.es/jamin/lehre/bayes/Zech04.pdf

4

USMI. Probabilities

Majority of predictions affected by uncertainties (“the only certain things in life are taxes and death”). Thus, dealing with probabilities and statistics is sensible for everybody. Inevitable for experimental and empirical sciences.

• accuracy of experiments restricted by precision of used devices

• underlying processes often stochastic (at random; stochastics (greek): the art of conjecture, die Kunst des Mutmaßens) estimates for measurement quantities and their accuracy required

• estimates with errors enable to check hypotheses. Results can be improved subsequently, by adding new measurements and suitable averaging prescriptions.

• statistics yields mathematical algorithms to conclude, from a certain sample, on the properties of the underlying parent population.

example 1: Polls allow to predict distribution of parliament seats. Parent population is the entity of voters, the sample is a representative selection of them. Important to know the accuracy of the prediction.

example 2: Determine the mean life time of an unstable nucleus, from the observation of 100 decays. Randomness induced by quantum mechanical effects. Sample representative for the entity of all possible decays, if experimental device able to measure all decay times (between zero and infinity) with sufficient precision.

example 3: Determine the frequency of a pendulum, from 10 observations. The estimate for the actual frequency and its uncertainty are determined by suitable averages. It is assumed that the frequency can be determined with arbitrary precision for an infinite number of observations, and that a finite accuracy is the result of a restricted number of observations. Actual observations are a sample of infinite possible observations.

example 4: Test whether two experimental devices work similar. Compare samples from both devices. Test whether these samples originate from the same parent population.

Difference between observation and measurement:An observation (event) is the element of a sample (with one or more elements). Measurement is a parameter estimate, attributed with an (in)accuracy.

• Example: Decay times for 10 pion decays (observations), The estimate of the decay rate is a measurement.

• Fit of a straight line: observations are data points, slope is measurement.

5

USMAxioms of probability

let S={E1, E2, E3, …} be the set of possible results of an experiment = events. events are said to be mutually exclusive if it is impossible that both of them occur in one result. For every event E there is a probability P(E) which is a real number satisfying the axioms of probability (Kolmogorov 1950):

(simplified version of Kolmogorov’s axioms)

1 2 1 2 1 2

i

I. ( ) 0II. ( or ) ( ) ( ) if E and E are mutually exclusiveIII. ( ) 1, where the sum is over all mutually exclusive eventsi

P EP E E P E P E

P E

≥= +

=∑

6

USM

random events → probabilities

A+B means A or BA·B means A and B;

• if P(A·B) = 0, then A and B mutually exclusive

random events can be described by random variables = variatesa realization of a variate is an observation (event)

event E complementary event E (not E)from axiom III: ( ) 1 ( )and thus ( ) 1

P E P EP E

→

= −≤

7

USMEmpirical (classical) probabilities

Frequency definition (frequentists’ interpretation):In a large number N of experiments the event A is observed to occur n times. Then

The set of all N cases (N repetitions of the same experiment or N simultaneous identical experiments) is called the collective or ensembleIn this case, the probability is not only a property of the experiment, but the joint property of experiment and ensemble

• example (von Mises, 1957): German insurance companies found that the fraction of their male clients dying at the age of 40 is 1.1%

• but this is not the probability that a particular man dies at this age. If data had been collected from other samples (all Germans, German hang-glider pilots,…), the outcome would have been different. Thus, the probability depends on the collective from which it has been taken.

as well: experiments must be repeatable, under identical conditions.“What is the probability that is will rain tomorrow?”“Will the General Motors shares raise tomorrow?”and: are we allowed to speak about the probability that, e.g., the mass of the Higgs particle lies in the range of 100 to 200 GeV/c2

( ) limN

nP AN→∞

=

8

USMObjective probabilities

Peirce (1910): probability is a property of device/ experiment, e.g. a dieresurrected by Popper (in connection with quantum mechanics): objective probability or propensity (in German: Hang, Neigung)seems reasonable when considering equally likely cases, e.g., due to symmetry (coin, die etc.)breaks down for continuous variables(transformation can make uniform, symmetric distribution non-uniform, and there is no natural choice for the “best” variable)

9

USMSubjective probability ‒ Bayesian statistics

definition: conditional probability P(A|B) is the probability of Agiven B is true

implies: P(A·B) = P(A|B) P(B), reasonable definition: If P(A·B)=P(A) P(B), then the probabilities are independent of each other: in this case, P(A|B)=P(A)!Bayes’ theorem (published posthumously by R. Price1763), undisputed:

( | ) ( ) ( | ) ( ) [ ( )]

and also( ) ( ) ( ) ( )

P A B P B P B A P A P A B

P A B P A P B P A B

= = ⋅

+ = + − ⋅

( )( | )( )

P A BP A BP B

⋅=

“Venn-diagram”

10

USMRule of total probability

A collection of sets E1, E2, …, Eksuch that E1∪ E2 ∪ E3 ∪ … ∪ Em = S is said to be exhaustive.

Assume E1, E2, .., Ek are k mutually exclusive and exhaustive sets. ThenP(B) = P(B ∩ E1) + P(B ∩ E2) + … + P(B ∩ Ek) = P(B·E1) + P(B·E2) + … + P(B·Ek) = P(B|E1)P(E1) + P(B|E2)P(E2) + … + = ∑P(B|Ek)P(Ek).

S B

11

USM

Thus

( | ) ( ) ( | ) ( )( | )( ) ( | ) ( )

or

( | ) ( ) ( | ) ( )( | )( ) ( | ) ( ) ( | ) ( )

with ( ) 1 ( )

i ii

P B A P A P B A P AP A BP B P B A P A

P B A P A P B A P AP A BP B P B A P A P B A P A

P A P A

= =

= =+

= −

∑

12

USMExamples

example 1: probabilities for drawing certain cards from a well-shuffled card game with 32 cardsP(Queen): 4/32 =1/8P(spade): 1/4P(Queen of spade): 1/8*1/4=1/32 (spade and queen)P(Queen or spade): 1/8+1/4-1/32=11/32 (not mutually exclusive)P(spade|queen): 1/4 = P(spade) (independent events)

example 2: Calculate the fraction of female students, from the fraction of women and students in the population, and from the fraction of students among the female population P(A)=0.05 fraction of students in populationP(B)=0.52 fraction of women in populationP(A|B) = 0.07 fraction of students among female population

( | ) ( ) 0.07 0.52( | ) 0.728( ) 0.05

P A B P BP B AP A

⋅= = =

13

USMExamples (cont’d)

example 3: Infected? (from Gigerenzer 2002, realistic numbers)

HIV-screening for persons without risky behaviourpositive test-result (D) with respect to two modern tests (ELISA, Western-Blot-Test) in Germany: H1: one of 10000 men HIV-infected (non risk-group)P(D | H1) = 0.999 that positive test (D) if man infected P(D | H2) = 0.0001 that positive test if not infected.

Calculate P(H1 |D) that there is an actual infection if a man (non risk-group) tests positive

4

4 4 0.49

( | 1) ( 1) ( | 1) ( 1)( 1| )( ) ( | 1) ( 1) ( | 2) ( 2)

0.999 100.999 10 0.0001 (1 10 )

1approximation: ( | 1) 1 ( 1| ) ; ( | 2)1( 1)

( | 2) ( 1) : test OK ( | 2

8!

)

9

P D H P H P D H P HP H DP D P D H P H P D H P H

P D H P H D P D HP H

P D H P HP D H

−

− −

= =+

⋅= =

⋅ + ⋅ −

≈ ⇒ ≈+

= ( 1) : prob. that actually infected = 0.5( 1)( | 2) ( 1) : prob. that actually infected very low, ( 1| )

( | 2)

P HP HP D H P H P H D

P D H≈

14

USMBayesian statistics

so far, so good … (if all probabilities known, not disputed)but: applied also to statements which are regarded as ‘unscientific’ in the frequency definition.probability of a theory (it will rain tomorrow, parity is not violated…) is considered as a subjective ‘degree of belief’. Subsequent experimental evidence then modifies this initial degree of belief.expressed as

What is the probability of a theory???if complete ignorance, uniform distribution assumed …(see example below, “The first night in paradise”)

• otherwise, suitable choice due to symmetry arguments, laws of nature, empirical knowledge, experts opinion…

But: with respect to which parameter? (example: mass or mass^2 give different priors)

(result|theory)(theory|result)= (theory)P(result)

PP P‘prior’

15

USM

example: assume you toss a coin 3 times and obtain always “head”. Calculate probability that coin is a phoney, i.e., has a head on each side.

If you have drawn the coin from your pocket, the prior should be very small. Let P(phoney)=10-6.ThenP(phoney|3 heads)=8· 10-6, i.e., reasonably small

Now assume that you have played against the car salesman Honest Adi for a beer, and that Honest Adi has given you the coin. In this case, the a priori probability that the coin is a phoney might be higher, you estimate 5%, and one finds P(phoney|3 heads)=0.3, which is a considerable chance.

3

(3 heads|phoney)(phoney|3 heads)= (phoney)(3 heads|phoney) (phoney)+ (3 heads| not phoney)(1- (phoney))

(3 heads|phoney)=1

1(3 heads| not

prior ?

phoney)= 0.1252

: (phoney)= ??

PP PP P P P

P

P

P

⎛ ⎞ =⎜ ⎟⎝ ⎠

16

USMThe first night in paradise

From G. Gigerenzer 2004, “The evolution of statistic thinking”, Unterrichtswissenschaft, 32

17

USM

18

USMThe game show problem

© Christian Rieck -- www.spieltheorie.de/Spieltheorie_Anwendungen/ziegenproblem.htm

Das Ziegenproblem ist eines der Probleme, das die Gemüter lange Zeit erhitzt hat und ganze Scharen von Mathematikern an den Rand der Verzweiflung gebracht hat (insbesondere, weil Sie von ihrer Intuition irregeführt wurden und lange gebraucht haben, das zu bemerken). Es gibt wohl keinen Spieltheoretiker, der Ende der 1980er Jahre nicht in irgendeiner Form über dieses Problem nachgedacht hat. Und das, obwohl es eigentlich gar kein Mehrpersonenspiel ist. Aber zunächst die Regeln für diejenigen, die das Problem noch nicht kennen sollten.

In einer amerikanischen Quizsendung steht eine Kandidatin vor drei verschlossenen Türen, hinter denen in einem Fall ein Auto steht und in zwei Fällen eine Ziege. Die Kandidatin darf jetzt eine der Türen wählen; anschließend öffnet der Showmaster eine der verbleibenden zwei Türen, und zwar immer so, dass auf jeden Fall eine Tür mit Ziege geöffnet wird, sodass das Auto also hinter einer der noch verschlossenen Türen sein muss. Er bietet der Kandidatin dann an, jetzt noch einmal die Türe zu wechseln oder bei der zuerst gewählten Tür zu bleiben, bevor sie geöffnet wird. Die Kandidatin bekommt dann das, was hinter der von ihr endgültig gewählten Tür steht (wobei wir hier davon ausgehen wollen, dass sie das Auto der Ziege vorzieht).

In einer Kolumne von Marilyn vos Savant (www.marilynvossavant.com/articles/gameshow.html) stellte jemand die Frage, ob es in dieser Situation besser sei zu wechseln oder bei der ursprünglichen Wahl zu bleiben. Die meisten Menschen dachten damals, dass es egal sein müsse. Marily vos Savant, die den höchsten jemals gemessenen IQ hat und daher als der intelligenteste Mensch der Welt gilt, antwortete allerdings lapidar mit "wechseln ist besser" und löste damit die Diskussion aus, in der es Wochen dauerte, bis sich die Menschheit auf die bis heute akzeptierte Lösung einigen konnte. Davor bekam sie allerdings so nette Zuschriften wie: "Sie sind die Ziege!", oder: "Sie haben einen Fehler gemacht. ... Wenn sich all diese Doktoren irren würden, dann wäre unser Land in ernsthaften Schwierigkeiten." Aber wenigstens ist der intelligenteste Mensch der Welt dadurch berühmt geworden.

Dabei lässt sich das Problem durch Anwendung des Satzes von Bayes lösen (Sie finden die Erklärung zu diesem Satz von Bayes auch in meinem Spieltheorie-Buch). Darin ist E das unbeobachtbare Ereignis (wo steht das Auto?) und B ist die Beobachtung (welche Tür lässt der Quizmaster geschlossen?). Das führt dann zu folgenden Werten, die wir für den Satz von Bayes brauchen:

http://www.spieltheorie.de/Spieltheorie_Anwendungen/ziegenproblem.htm

http://www.marilynvossavant.com/articles/gameshow.html

19

USMThe game show problem (cont’d)

P(E) = Wahrscheinlichkeit, dass hinter einer Tür das Auto steht;da es ein Auto hinter drei Türen gibt, ist dieser Wert 1/3

P(B) = Wahrscheinlichkeit, mit der eine Tür nach der Wahl des Quizmasters geschlossen bleibt;da der Quizmaster eine von zwei verschlossenen Türen öffnet, ist dieser Wert 1/2

P(B/E) Wahrscheinlichkeit, mit der der Quizmaster eine Tür geschlossen lässt, wenn hinter ihr ein Auto steht;da der Quizmaster nur eine Tür öffnet, wenn hinter ihr kein Auto steht, ist dieser Wert 1

P(E/B) = Wahrscheinlichkeit, mit der hinter der vom Quizmaster geschlossen gelassenen Tür ein Auto ist;dies ist der Wert, den wir suchen.

Eingesetzt in die Formel von Bayes ergibt sich:

Die Wahrscheinlichkeit, dass das Auto hinter der Tür steht, die der Quizmaster geschlossen lässt, beträgt somit 2/3, wogegen sie hinter der ursprünglichen Tür nur 1/3 beträgt. Somit ist klar, dass man seine Chancen auf das Auto verdoppelt, wenn man wechselt. Vos Savant hatte also Recht.

Ich habe weiter oben schon erwähnt, dass dies eigentlich kein Mehrpersonenspiel ist. Denn der Quizmaster ist hier kein Entscheider, der sich noch aufgrund seiner eigenen Präferenzen zwischen zwei Türen zu entscheiden hat, sondern er verhält sich wie ein rein ausführender Algorithmus, der nur eine Tür öffnet, hinter der mit Sicherheit kein Auto steht. Somit handelt es sich hier um ein Spiel gegen die Natur (das heißt gegen die Wahrscheinlichkeitsverteilung, mit der hinter den Türen Autos und Ziegen stehen).

( | ) ( ) 1 1/3 2( | )( ) 1/ 2 3

P B E P BP E BP B

⋅= = =

20

USMII. Probability distributions – one random variable

random events are characterized by random variables Probability distribution functions associate random variables with corresponding probabilitiesdiscrete and continuous random variables (r.v.)in the following, probabilities refer to one r.v. x, i.e., one property which can be quantified.

definition: (cumulative) distribution function (c.d.f) F(t) defines the probability of finding a value being smaller than t,

from the probability axioms, we obtain the following properties for F(t)• F(t) increases monotonically with t

• F(-∞)=0

• F(∞) =1

There are discrete and continuous distributions

( ) ( ) withF t P x t t= < −∞ < < ∞

21

USM

describe probabilities for the occurrence of N discrete, different events, with

example: die; the probability to dice a certain number xi is P(xi ) =1/6, xi=i for i=1,6

discrete distributions can be treated as continuous distributions, via the Dirac δ-function

( ) ( ) ( )i i i

Discrete distributions

P x F x F xε ε= + − −

( ) 1ii

P x =∑

F(x)P(x)

probability distribution distribution function

22

USMContinuous distributions

instead of probability distribution, define probability density f(x) (p.d.f. = prob. density function) with

( )( )

and properties( ) ( ) 0

( ) 1; thus,

( ) ( ) ( ) ( )b

a

dF xf xdx

f f

f x dx

P a x b F b F a f x dx

∞

−∞

=

−∞ = +∞ =

=

≤ < = − =

∫

∫

23

USM

example: life-times of instable particles follow an exponential distribution

0

exp( / )( ) for t 0 and with mean life-time

( ) ( ') ' ( ') ' 1 exp( / ),

and the probability that the particle lives longer than is( ) ( ) ( ) exp( 1)

t t

tf t

F t f t dt f t dt t

P t F F

τ ττ

τ

ττ τ

−∞

−= ≥

⇒

= → = − −

> = ∞ − = −

∫ ∫

24

USMExpectation value

note: if x is a r.v., than any function u(x) is a r.v. as welldistributions have characteristic parameters such as expectation value, width and asymmetry

the expectation value or mean of a r.v. x results from averaging over x according to its distribution,

-

-

( ) discrete dist.

( )( ) continuous dist.

( ) ( ) discrete dist.

( ( ))( ) ( ) continuous dist.

i ii

i ii

x P x

E x x xxf x dx

u x P x

E u x u uu x f x dx

μ ∞

∞

∞

∞

⎧⎪⎪= =< >= = ⎨⎪⎪⎩

⎧⎪⎪= =< >= ⎨⎪⎪⎩

∑

∫

∑

∫

25

USM

calculation rules: let α,β be constants and u and v functions of x

the expectation value is the centre of gravity of the distribution

the expection value is a linear operat

independen

( ) ; ( ( )) ( )( ) ( ) ( );

if , are r.v., thenE(u(x)v(y))=E(u)E(v) (see Sect.

or!

IIIt

)

E E E u E uE u v E u E v

x y

α αα β α β

= =+ = +

26

USMCentral moments of a r.v.

Let’s choose especially

0 1

2 22

( ) ( ) with ( ( )) : ' {( ) }which is called the n-th central moment or the n-th moment about the mean.

Lowest order central moments are' 1 and ' 0

The quantity ' ( ) ( ) {( ) }

i

n nnu x x E u x E x

Var x x E x

μ μ μ

μ μ

μ σ μ

= − = = −

= =

= = = −s the lowest central moment which contains information about the

average deviation ovariance

f x from the mean. It's called the of st x an, an dardd is t devia nhe tioσ

27

USMVariance

the variance measures the mean quadratic deviation from the mean.the standard deviation σ=√ Var has the same units as x, will be identified with the errors of measurementsthe mechanical analogue to the variance is the moment of inertiacalculation rules

different representation

2

2 2

( ) 0, ( ) ( )( ) ( ) ( ) (see S if , are independen ect. III)t

Var Var x Var xVar x y Var x V x yar y

α α α

α β α β

= =

+ = +

2 2 2

2 2 2

2 2 2 2

( ) {( ) } ( 2 ) ( ) 2 = ( ) oThe variance (and all other moments) centra

l

r

Var x E x E x xE x

E x x x

is invariant to translations of the r.v.!!!

μ μ μ

μ μ

μ

= − = − + =

= − + =

− < > − < >

28

USMVariance of a convolution

We measure a quantity x with p.d.f. g, and the measurement is smeared out according to a p.d.f. h. (→ convolution, see below). We look for the variance of the measurement x’. Since the variance is translation-invariant, we set E(x) to zero.

The variance of x’ is the sum of the variances of the distributions g and h. For sequential measurements of a quantity the individual errors add quadratically(see below)

2 2

2 2

2 2

2 2 2 2

2 2 2

Otherwise (E(x) 0)

( ') ( ) ( ' )

' ' ( ) ( ' ) '

[( ' ) 2 ' ] ( ) ( ' )

Analog

'

[ 2 ( ) ] ( ) ( )

[ 2 ] ( ) ( )

' 2u

f x g x h x x dx

x x g x h x x dxdx

x x xx x g x h x x dxdx

u x u x x g x h u dxdu

u xu x g x h u dxdu u x

x u x u x

= −

< >= − =

= − + − − =

= + + − =

= + + =< > + < >

< >=< > + < > + < >

≠

><

∫∫∫

∫∫∫∫∫∫

2 2

2 2

' , ' ( )( ') '

e derivation identical result' !( ) ( )

x u x x u xVar x x x Var u Var x

< >=< > + < > < > = < > + < >

⇒ =< > − < > = +

29

USMSkewness

measures the asymmetry of a distribution3 33

1 3

3 2 3

3

21 1

' {( ) }/ ...

E(x ) 3 =

sometimes one finds ( )a positive skew describes a distribution with a tail which extends t

skewness is invariant to translations and elongation

h

s

o t e

E xμγ μ σσ

μσ μσ

β γ

= = − =

− −

=right.

30

USMCurtosis/Kurtosis

measures how pronounced the tails of the distribution are4 44

2 4

4 3 2 2 4

4

2 2

2

' {( ) }/ ...

( ) 4 ( ) 6 ( ) 3 =

3 is defined in such a way as to be zero for a distribution

positive implies a relatively higher,

normal=Gaussia

narrower peak and wid

n

E x

E x E x E x

μβ μ σσ

μ μ μσ

γ β

γ

= = − =

− + −

= −

2

er wingsthan the normal distribution , and vice versa (wider peak, shorter wings

with same mean) for negative

and .

σγ

31

USMExamples

3 different p.d.f, all with zero mean and unit variance, but different skewness and curtosis. Left: linear scale; right: logarithmic scale.

Let ( ) . Then ( ) 0 and ( ) 1

The r.v. has particularly simple properties, and i redus called a ced (normalized) variable

xu x E u Var u

u

μσ−

= = =

32


n

0

2 2

13 3

24 4

<t > exp( / ) !

<t>=<t >=2

2, skewed, with tail to the right<t >=6

6, higher peak and wider wings <t >=24 than

life-time (exponen

no

tial) distributio

m

n

r al

nnt t dt nτ τ

τ

μ ττσ τ

τγ

τγ

τ

∞

= − =

=⎫

=⎪⎪⇒ =⎬⎪ =⎪⎭

∫

dist.

⎧⎪⎪⎪⎨⎪⎪⎪⎩

life-time (blue) and normal (red) distribution, τ=1both distributions have identical mean and variance (indicated by dotted lines)

33

USMOther parameters of a distribution

2

2

00.5

( ) maxif distribution has a differentiable probability density, then the mode is determined via

( ) 0, ( ) 0

if one maximum, distr. , otherwise

mode :

unimodal multimodal

median :

F(

mm P x x

d df x

x

x

x

fdx dx

x

= =

= <

0.5

.5 0.5

-

0.25 0.75

) ( ) 0.5

For a continous distribution, ( ) 0.5

The median divides the total range of

lower and up

x into

per qu

two regions

artile

of equal probability.

F( ) 0.25; F( ) 0.75s:

full

x

P x x

f x dx

x x

∞

= < =

=

= =

∫

is independent of the tailsfor a Gaus

width of halsian distribu

f maximum (FWHM)tion, 2.

35FWHM σ=

34

USMChebychev’s inequality

The values of a r.v. are somewhere in the neighbourhood of the mean μ. Deviations from the mean are less probable the larger they are compared with σ. This fact is expressed by Chebychev’s inequality (which is generally very weak):

2 2

2 -2

2 2 2

"The probability of being more than standard 1(| | ) , 1 deviations away from the mean is lower than "

for a continuous r.v.(| | ) (( ) )

( ) wit

P

h

roof

k

x kP x k k

k k

P P x k P x k

P g t dtσ

μ σ

μ σ μ σ∞

⎧− > < ≥ ⎨

⎩

= − > = − >

= ∫2 2

2 2

2

2 2

0

the p.d.f. of ( - )

{( ) } ( ) ( ) ( ) ( )

Since integration over positive values only and g(t) positiv definite (p.d.f),

the integral can be approximated ( )

k

k

g t x

E x E t tg t dt tg t dt tg t dt

tg t

σ

σ

μ

σ μ∞ ∞

−∞

=

= − = = = +∫ ∫ ∫

2 2

2 2 2 2 22

( ) as

10 ( ) , i.e., q .e . d .

b b

a a

k

dt a g t dt

k g t dt k P Pkσ

σ σ σ∞

⎛ ⎞>⎜ ⎟

⎝ ⎠

> + = <

∫ ∫

∫

35

USMExample

(| | ) can be alternatively written as

( ) = ( ) for postive deviations and as

( ) = ( ) for negative deviations

Test of Tchebychev's inequality for the life-time dis

k

k

P x k

P x k f t dt

P x k f t dt

μ σ

μ σ

μ σ

μ σ

μ σ

∞

+

−

−∞

− >

> +

< −

∫

∫

2

(1 )

tribution

exp( / )( ) ( (1 )) exp( (1 )) , q.e.d.

( ) ( (1 )) 0 for 1, else not defined (c.f. Fig. page 29)k

tP x k P x k dt k k

P x k P x k kτ

τμ σ ττ

μ σ τ

∞−

+

<

−> + → > + = = − + <<

< − → − = =

∫

36

USMMoments of a distribution

remember central moments (of r.v. or distribution)

analogue definition: moments of distribution

remember as well

the probability density function is uniquely defined by its moments, as we will show now

n' {( ) } ( ) ( )n nE x x f x dxμ μ μ∞

−∞

= − = −∫

n n

1

1

( ) ( ) or ( ) )

( ):

(n n n nk k

kE x x f x dx E x x P x

x E xμ

μ μ

μ

∞ ∞

=−∞

= = = =

=< >==

∑∫

12

23

3 1

' 0

' ( )

'

Var x

μ

μ σ

μ γ σ

=

= =

=

37

USMCharacteristic function

definition: The characteristic function of a p.d.f. f(x) is

for continuous distribution, the characteristic function is the Fourier transform of f(x) (Note the (missing) normalization.) Thus, the transform is invertible

… and the characteristic function defines the p.d.f.

k=1

( ) ( ) ( ) or ( ) Note: the lower summation

index might be also 0

kitxitx itxkt E e e f x dx e P xφ

∞ ∞

−∞

= = ∑∫

38

USMCharacteristic function and moments

0 0

The n-th derivative of the characteristic function is

( ) ( ) ( ) .

At 0 one obtains

( ) ( )( ) ( ) , i.e.,

Thus, the Taylor expansion of ( ) around

nn itx

n

n nn n

nn nt t

d t ix e f x dxdt

t

d t d tix f x dx idt dt

t

φ

φ φ μ

φ

∞

−∞

∞

−∞= =

=

=

= =

∫

∫

0 00

0,

1 ( ) 1( ) ( )! !

delivers all moments of the distribution. Since the Fourier transform canbe uniquely inverted and the Taylor expansion of the characteristic functioncon

nn n

nnn nt

t

d tt t itn dt n

φφ μ∞ ∞

= ==

=

= =∑ ∑

( ) ( )

0

sists of the moments, we conclude that indeed , as stated above.

For the central moments, we find in analogy

1'(

the m

) ( ) ( ) ( ) '!

Note i

oments define the p.d.f.

n p

it x it x nn

n

t E e e f x dx itn

μ μφ μ∞ ∞

− −

=−∞

= = →∑∫2

22 2

0

'( )articular that ' ( )t

d txdtφμ σ

=

= = −

39

USMSum of two independent r.v.

The p.d.f. of a sum of two independent r.v. is the (inverse) Fourier-transform of the products of the two corresponding characteristic functions!

, independent( )

Let with independent r.v. , and corresponding p.d.f.s ( ), ( ).Calculate the distribution ( ).

( ) ( ) ( ) ( ) ( ), i.e.,

( ) ( ) ( )

and thus

(

x yit x y itx ity itx ity

h

h f g

z x y x y f x g yh z

t E e E e e E e E e

t t t

h z

φ

φ φ φ

+

= +

= = =

=

1) ( )2

itzhe t dtφ

π

∞−

−∞

= ∫

40

USMExample

characteristic function and moments of the exponential distribution.

( )

0 0

2

1

0

1( ) for 0 (e.g., for the life-time distribution)

( )

By differentition,( )

( )( ) !

( )

( ) ! , we obtain th

x

itx x it x

n n

n n

n n

n nt

f x e x

t e e dx eit it

d t idt it

d t n idt it

d t n idt

λ

λ λ

λ λτ

λ λφ λλ λ

φ λλ

φ λλ

φλ

−

∞∞− − +

+

=

= ≥ =

= = =− + −

=−

=−

=

∫

n

2 2 3 31 2 1 3

e moments

! ! , e.g.

, , ( 3 )

without explicitly

(compare Fig. page 2

calculating the integrals defining the expectation

/ 2,

val

9)

ues!

n nn nμ λ τ

μ μ τ σ μ μ τ γ μ σ μ μ σ

−= =

= = = − = = − − =

41

USMTransformation of variables

given a p.d.f f(x), we like to know the p.d.f g(u), when u is a (uniquely invertible) function of x, u(x)example: given a distribution of velocities f(v), we want to calculate the distribution of energies, ½mv2

for discrete distributions, this is trivial. The probability for the event u(xk) (where u is a function of x) is the same as for the event xk itself,

P(u(xk))=P(xk)

for continuous distributions, we have to invoke calculus

42

USM

2 2

1 1

1 2 1 2 1 1 2 2

p.d.f. f(x) and uniquely invertible function u(x) are given. Calculate g(u)( ) ( ) with ( ) and ( ).

( ) ( ) ( ) ( ) and thus

( ) ( )

The absolute s

x u

x u

P x x x P u u u u u x u u x

P f x dx g u du g u du f x dx

dxg u f xdu

< < = < < = =

= = ⇒ =

=

∫ ∫

ign garantuees that the p.d.f. is positive. Integrating this equation yields ( ) ( ).

If ( ) is invertible, but no longer uniquely, and thus ( ) is ambiguous, one has to sum over all contributing

F x G u

u x x u

=

branch 1 branch 2

branches

( ) ( ) ( ) ...dx dxg u f x f xdu du

⎧ ⎫ ⎧ ⎫= + +⎨ ⎬ ⎨ ⎬⎩ ⎭ ⎩ ⎭

Transformation via a parabola. The sum of the indicated areas under f(x) are equal to the area under g(u).

Transformation of a p.d.f. f(x) to g(u) via u(x). Theindicated areas are equal.

Calculation of the transformed p.d.f.

43

USMExamples

m

2

calculate the p.d.f. for the area of a circle from a (see Sect. IV)distribution of radii between 0 and r .

1p.d.f. for : ( ) fo

examp

r 0 ; ( ) 0 else.0

( ) ( ) with ;

uniform le 1:

mm

r f r r r f rr

dr dAg A f r A rdA d

π

= < < =−

= =

12

0 0

12 ; ( )2

1 1 1( ) ; Test: ( ) 1!2 2

m m

m

A A

m m

r g Ar r r

g A g A dA A dAA A A

ππ

−

= =

= = =∫ ∫

2

22 ( )

2

Calculate the distribution for the of a reduced r.v. which itself should be normally distributed.

( - ) 1 and ( ) e (see Sect. IV)2

example 2: squar

The function ( )

e

has two

xxu f x

x u

μσμ

σ σ π

−−⎡ ⎤= =⎢ ⎥⎣ ⎦

branches!

44


2 2

branch 1 branch2

/ 2

2

1 1; ( )2 2 2 2 2

Since the contributions from branch 1 and 2 are identical, we obtain1( ) ,

2which is the so-called -distribution for one degr

u u

u

dx g u e edu u u u

g u eu

σ σ σσ π σ π

πχ

− −

−

⎧ ⎫ ⎧ ⎫= ± = + −⎨ ⎬ ⎨ ⎬

⎩ ⎭ ⎩ ⎭

=

ee of freedom (see Sect. IV)

2

2

kinetic energy for a 1-D ideal gas. The p.d.f. of the velocity of a particle into direction is

( ) . Calculate the corresponding energy distribution.2

A

example 3:

s above1 ; both

2

mvkT

x

mf v ekT

dvdE mE

π−

=

= ±

/ /

branches have similar contributions, thus

2 1( )22

E kT E kTmg E e ekTmE kTEπ π

− −= =

45

USMCalculation of the transformation

Now, the original and the transformed p.d.f., f(x) and g(u), are given, and the transformation u(x) needs to be calculated. This situation is frequently met in Monte-Carlo simulations. Random number generators usually create uniformly distributed r.v., and we look for the transformation law which transforms these uniformly distributed r.v. into others which are distributed following a given p.d.f. (defined by the process to be investigated).

1

( ') ' ( ') '; integration yields the c.d.f.s

( ) ( ) and thus ( ) ( ( ))

x u

f x dx g u du

F x G u u x G F x−∞ −∞

−

=

= =

∫ ∫

46

USM

The problem can be solved analytically only if both p.d.f.s f and g can be integrated analytically, and if the inverse of G can be calculated.in other cases (which are the majority), numerical methods have to be applied. Most powerful is the rejection method by von Neumann (see, e.g., “Numerical Recipes” and http://www.usm.uni-muenchen.de/people/puls/lessons/numpraktnew/montecarlo/mc_manual.pdf

1

In the former case of being a uniform distribution over the unit interval, i.e., ( ) 1 for 0 1 and ( ) 0 else,

we obtain ( ) and thus( )

example

; ( ).

Create exponentially dist u: rib te

ff x x f x

F x xG u x u G x−

= ≤ ≤ ==

= =

'

0uniformly distr. x in unit interval

d r.v. from a uniform distribution.( ) ;

( ) ' : ( ) ;

1 ; ( ) ln(1 ) / ln( ) /

u

uu

u

g u e

G u e du F x x

e x u x x x

λ

λ

λ

λ

λ

λ λ

−

−

−

=

= = =

− = = − − −

∫

http://www.usm.uni-muenchen.de/people/puls/lessons/numpraktnew/montecarlo/mc_manual.pdf

http://www.usm.uni-muenchen.de/people/puls/lessons/numpraktnew/montecarlo/mc_manual.pdf

47

USM

P.d.f.s for a uniform distribution (black), generated by a random number generator from N=103 (left) and N=106 subsequent numbers. The corresponding exponential distribution (λ=2, blue) has been created from these numbers using the transformation method as described above. Displayed are histograms with bin size 0.02. Analytical p.d.f.s in green and red. IDL (interactive data language) code below.

48

USM

until now, univariate distributions: one r.v.generalization to several r.v. “easy”: multivariate (also: more-dimensional) distributionsin the following, only continuous distributions

definition of prob. distribution for two r.v., x,y:

corresponding joint p.d.f.

( , ) ( ' , ' ) with(- ,- ) 0, ( , ) 1

III. Distributions of several random variables ‒ multivariate p.d.f.s

F x y P x x y yF F

= < <∞ ∞ = ∞ ∞ =

2 ( , )( , ) ( , ) 1 and

( , ) ( , ) b d

a c

F x yf x y f x y dxdyx y

P a x b c y d f x y dxdy

∞ ∞

−∞ −∞

∂= ⇒ =

∂ ∂

≤ < ≤ < =

∫ ∫

∫ ∫

49

USMMarginal distributions

following problem: sometimes the c.d.f F(x,y) is approximately determined (by many measurements), but only the probability distribution of x (irrespective of y) is of interest. example: the appearance of a certain disease is known as a function of location and date. For a certain investigation, the dependence on date is without interest. In this case, we marginalize the distribution, i.e., we integrate over the whole range in y

( , ) ( , ) ( )b b

a a

P a x b y f x y dy dx g x dx∞

−∞

⎡ ⎤≤ < −∞ < < ∞ = =⎢ ⎥

⎣ ⎦∫ ∫ ∫

50

USM

( ) ( , ) is a p.d.f. of , called the marginal distribution of .

The corresponding distribution of is

( ) ( , )

Marginal distributions are "projections" of the joint p.d.f. onto the

g x f x y dy x x

y

h y f x y dx

∞

−∞

∞

−∞

=

=

∫

∫axes.

( , ) ( ) ( )

Now, we can define the conditional probability for ' given that ' is known:( ' | ' ).

The corresponding p.d.f. is given by( , )

Two r.v. , are independe

(

nt if

| )

f x y g x h y

y xP y y y dy x x x dx

f x yf

y

y

x

x

=

≤ < + ≤ < +

= ,( )

and the above probability results as ( | ) .conditional probabilities as defined above areNote: normalized !

g xf y x dy

51

USM

(see Sect. I) is then expressed by

( ) ( , ) ( | ) ( ) .

If the variables are independent, t

The rule of total probability

hen( , ) ( ) ( )( | ) ( )( ) ( )

Any constraint on one vari

h y f x y dx f y x g x dx

f x y g x h yf y x h yg x g x

∞ ∞

−∞ −∞

= =

= = =

∫ ∫

able cannot contribute information aboutthe other, if the variables are independe

Bayes theorem for two-dimensional dis

nt!

( | ) ( )tribu

( | ) ( ) ( ,tions:

)f x y h y f y x g x f x y= =

52

USMExample

superposition of two normal-distributions, with corresponding marginal and conditional p.d.f.s

g

h

f 0.6672

( , 1) / (1); Remember that this conditional pdf is normalized, i.e.,

and not independent,No ste: ince ( | ) depends on !

| 1

( 1)f y x g f y

x y f y x

x

x

dy= = = =∫

( , )f x y dy= ∫

( , )f x y dx= ∫

53

USMExample (cont’d)

54

USMMoments

in analogy to univariate distributions, we define

20

02

11

'''

μμμ

==

= =

cov( ,

"covarianc

( )

e ") x y

x y

E xy μ μ

=

= − =

55

USM

similarly, we define

examples

[ ]{ } { } { }( )222 2

( ( , )) ( , ) ( , )

( ( , )) ( , ) ( ( , )) ( , ) ( , )

E u x y u x y f x y dxdy

u x y E u x y E u x y E u x y E u x yσ

=

= − = −

∫∫

2 2

2

2 2 2 2

2

( , ) ( ) ( ) ( )

( ) (( ) ( ))

= ( ( ) ( ))

= ( ) ( ) 2 ( )( )

=

x y

x y x y

u x y ax by E ax by aE x bE y

ax by E ax by E ax by

E a x b y

E a x b y ab x y

a

σ

μ μ

μ μ μ μ

= + ⇒ + = +

⎡ ⎤+ = + − + =⎣ ⎦⎡ ⎤− + − =⎣ ⎦⎡ ⎤− + − + − − =⎣ ⎦

( )( )

2 2 2( ) ( ) 2 cov( , )

( , ) and , independent, i.e., ( , ) ( ) ( )

i) (

(c.f. Sect. II)

(c.f. Sect. I

) ( , ) ( ) ( ) ( ) ( )

= ( ) ( )

ii

I

)

)

x b y ab x y

u x y xy x y f x y g x h y

E xy xyf x y dxdy xyg x h y dxdy xg x dx yh y dy

E x E y

σ σ+ +

= = ⇒

= = = =∫∫ ∫∫ ∫ ∫

cov( , ) ( )( ) ( ) ( ) 0 !!!x yx y x y g x h y dxdyμ μ= − − =∫∫

56

USMCovariance, correlation coefficient

from definition of covariance, we see that • cov(x,y) is positive if values x>μx (x<μx) appear preferentially together with values y>μy (y<μy).

• cov(x,y) is negative if values x>μx (x<μx) appear preferentially together with values y<μy (y>μy).

• if the knowledge of x does not give information about the probable position of y, the covariance

vanishes (see Fig. below)

if cov(x,y) ≠0, the variables x,y are called correlated, otherwise uncorrelated. correlation is quantified by the dimensionless correlation coefficient

cov( , )( , ) , 1 ( , ) 1; ( ) ( )

the limiting values are reached when and b>0 ( 1) or b<0 ( 1) calculate cov( , ) ( ) - ( ) ( )

proof:

x yx y x yx y

y a bxx y E xy E x E y

ρ ρσ σ

ρ ρ

= − ≤ ≤

= + = = −

=

2 2 2 2with and then 11 or cov( , ) ( ) ( ) ( )

2

by a bxx y x y x y b

ρσ σ σ

⎫⎪ = + → = ±⎬⎡ ⎤= + − − ⎪⎣ ⎦⎭

f(x,y)=const for different correlation coefficients (linear dependency: f(x,y)=f(y|x)f(x) with f(y|x)=δ(y-(a+bx))

57

USM

Note: for independent (uncorrelated) variables → cov(x,y)=0But: cov(x,y)=0 does not necessarily imply that x,y are independent, since covariance detects only linear dependencies.Example: let x be uniformly distributed between [-1,1], and y=x2

• Then: y depends on x, but cov(x,y)=E(x3)-E(x)E(x2)=0, since expectation values of odd quantities=0!

In other words: there are cases when cov(x,y)=0, but the conditional p.d.f. f(y|x) depends on x. Independence is only warranted if f(y|x) = f(y)!

58

USMTransformation of variables

analogous to 1-D (univariate) case• given f(x,y) and u(x,y), v(x,y)

exampleabsolute value of Jacobi-determinante

,Then: ( , ) ( , ) ( , ) ( , ),

x yg u v dudv f x y dxdy g u v f x yu v∂ ∂

= ⇒ = ⋅∂ ∂

2 2

2

( ) / 2

/ 2

2

0

1Transform 2-D normal distribution into polar coordinates cos , sin2

cos sin,sin cos,

1( , ) ,with marginal distributions 2

( , )

x y

r

r

e x r y r

x yx y r r r

x y r rr

g r re

g g r dπ

ϕ ϕπ

ϕ ϕϕ ϕϕ

ϕ ϕ

ϕπ

ϕ ϕ

− +

−

= =

∂ ∂∂ ∂ ∂ ∂= = =

∂ ∂ −∂ ∂∂ ∂

⇒ =

= =∫2 / 2

0

1 and ( , ) , i.e.,2

the distribution factorizes into the marginal distributions (independent variates!)

rre g g r drϕ ϕπ

∞− = =∫

59

USMReduction of variables

problem: we have f(x,y), and need g(u) with u(x,y).solution: use standard transformation, by introducing a 2nd variable v(x,y) (usually, choose v=x)

example: given 2-D uniform distribution

( , ) ( , )and marginalize with respect to

( ) ( , )

f x y h u vu

g u h u v dv

→

= ∫

2 see next page, left figure

Calculate

1 if [0, ] and [0, ]( , )

0 else

( , ) already normalized

1 1, 1 1

(

( , ) ( , )

)!Note:

x yf x y

f x yx y

u x y x y u u h u v f x y

g x y

⎧ ∈ Δ ∈ Δ⎪= Δ⎨⎪⎩

∂ ∂= + ∂ ∂ ∂ ∂= = = ⇒ = = 21 0,v x x yu v

v v= ∂ ∂∂ ∂ Δ

∂ ∂

+

60

USM

max max

min min

( ) ( )

max min2( ) ( )

2 2

1( ) ( , ) ( , ) ( ( ) ( ))

From above figure (middle): ( ) ( ) [ , ], since [0, ]1 1: ( ) [0, ] (slope=1) ( ) ( 0)

: ( )

v u x u

v u x u

g u h u v dv h u x dx x u x u

u x x y x x y

u x u u g u u u

u x u

= = = −Δ

= + ∈ + Δ ∈ Δ

< Δ ∈ ⇒ = − =Δ Δ

> Δ ∈

∫ ∫

2 2

max

The distribution of the sum of two uniformly distributed quantitie

1 1[ , ]

s is triangular-shaped, see

( ) ( ( )) (2 )

1( )

Nabove figure (right)

the distriote:

u g u u u

g g

− Δ Δ ⇒ = Δ − − Δ = Δ −Δ Δ

= Δ =Δ

bution of x-y looks similar, when the abscissa is shifted by -Δ

61

USMCalculation of the transformation

as in the 1-D case: integration and inversion of primitive functionimportant example: Box-Muller algorithm to create normally distributed variates from uniform distribution (random number generator)

[ ]

2

2

2

/ 2

' / 21 1

0

' / 2

remember 2-D normal distribution in polar coordinates1( , ) (factorized in and )

2distribution in r:

( ) ' ' ( ) (uniform distribution (w.r.t. 0,1 )

( )

r

rr

r

g r drd re drd r

G r r e dr F x x

G r e

ϕ ϕ ϕ ϕπ

−

−

−

=

= = =

=

∫

0 1 1

2 20

2 2

| ; 2ln(1 )distribution in :

1( ) ' ( ) (uniform distribution )2

( ) ; 22

r x r x

H d F x x F

H x x

ϕ

ϕ

ϕ ϕπ

ϕϕ ϕ ππ

= = − −

= = =

= = =

∫

62

USM

1 2 1 2

1 2 1 2

in Carthesian coordinates:

cos 2ln(1 ) cos(2 ) 2ln( ) cos(2 )

sin 2ln(1 ) sin(2 ) 2ln( ) sin(2 )

These variables are independent and normally distributed with expectation value ze

x r x x x x

y r x x x x

ϕ π π

ϕ π π

= = − − −

= = − − −

2 2 2 2( ) / 2 / 2 / 2

1 2two uniformly distributed

ro and unit varian

variates

ce.1 1 1( , ) =

2 2 2 ,Thus two normally distributed varia: tes x,y

x y x y

x x

f x y e e eπ π π

− + − −

⇒

=

P.d.f.s for a uniform distribution (black), generated by a random number generator from N=105 subsequent numbers. The corresponding normal distribution (blue) has been created from these numbers using the Box-Muller algorithm. Displayed are histograms with bin size 0.02. Analytical p.d.f.s in green and red.

63

USMDistributions with more than two variables

{ }

1 2 3

1

( , , ,... ) ( ) (in vector notation)

( ) ... ( ) ( )

particularly important is ,

cov( , ) ( )( )

probability den

sity

expectation value

covariance matrix

(see Sec

N

N

ii

ij i j i i j j

f x x x x f

E u u f dx

C

C x x E x µ x µ

∞ ∞

=−∞ −∞

=

=

= = − −

∏∫ ∫

x

x x

{ }

2

1

21 2 3

t. V)

The covariance matrix is symmetric, and the diagonal elementsare the variances: ( ) ( )

Matrix notation: with ( , , ,... ) and ,...

( )( )

ii i i

N

N

C Var x xxx

x x x x

x

E

σ= =

⎛ ⎞⎜ ⎟⎜ ⎟= =⎜ ⎟⎜ ⎟⎝ ⎠

= − −

T

T

x x

C x µ x µ

64

USM

1

1

transformation of variables

independent, identically distributed (i.i.d.) var

with Jacobi determinant

...g( ) ( )

...

(u.i.v. = unabhängig, identisch verteilt)For p

iabl

arameter estim

es

ates,

N

N

x xfy y∂ ∂

=∂ ∂

y x

1

sample of independent measurements might

be used. The p.d.f. for independent variables which are identically distributed according to ( ) is given by

( ) ( )N

ii

N

f Nf x

f f x=

=∏x

65

USMIV. Important distributions

Binomial distribution

experiment with two mutually exclusive outcomes, i.e.,

calculate the probability that n experiments have k times the outcome A.• What is the probability to obtain (exactly!) 4 times the six when rolling the die 10 times? Answer: ≈0.054

• What is the probability to toss “number” only one time in 20 trials? Answer: ≈1.75 ·10-5

with ( ) and ( ) 1S A A P A p P A p q= + = = − =

1 510, 4, ( ) ( )6 6

n k p A p A= = = =

1 120, 1, ( ) ( )2 2

n k p A p A= = = =

let's assign the random variable to the outcome of experiment .

1 if the result occurs, and 0 if occurs. Our above question can be rephrased then to the question regarding the probability

i

i i

x i

x A x A= =

1

distribution of the random number

,

and, particularly, to the probability P( )

n

ii

x x

x k=

=

=

∑

66

USM

answer depends on two factors

i) What is the probability to obtain the result in the experiments and to obtain in the remaining ?Since the experiments are independent, this probability is given by theproduct of the

A first kA n k−

( )

probabilities of the individual events, i.e.,(1 )

ii) How many possibilities for the event " times result in experiments"do exist? This is given by the binomial coefficients,

!!

k n kp p

k A n

n nk k n k

−−

⎛ ⎞=⎜ ⎟ −⎝ ⎠

( )

!

Thus, the probability ( ) is given by !( ) (1 )

! !n k n kp

P x knB k p p

k n k−

=

= −−

67

USM

expectation value and variance

( )22 2 2 2

n

i=1

expectation value and variance of experiment( ) 1 0 (1 )

( ) ( ) ( ) 1 0 (1 ) (1 )

The corresponding values for the random variable x= are

(exploiting the

i

i i i

i

singleE x p p p

Var x E x E x p p p p p pq

x

= ⋅ + ⋅ − =

⎡ ⎤= − = ⋅ + ⋅ − − = − =⎣ ⎦

∑

2

calculation rules for independent variates)

( ) "mean number of successes"

( ) ( ) (1 )

E x k np

Var x x np p npqσ

= =

= = − =

( )1 1

' '

' 0 ' 0

(cumultative) distribution function!( ) ( ' ) ( ') (1 )

'! ' !

k kn k n kp

k k

nF k F k k B k p pk n k

− −−

= =

= < = = −−∑ ∑

68

USM

Binomial distribution, ( ), as a function of . Top panel: fixed , different ;

middle: fixed , different ; bottom: different values of and , but =const

npB k k p n

n p n p np

69

USM

Example: Detector efficiency• spark chambers (95% efficient) are used to measure the tracks of cosmic rays. At least three

points are needed to define a track. How efficient is a stack of three chambers? Would using 4 or

5 chambers give significant improvement?

3 3 0 30.95

The probability of three hits from three chambers is3!(3;3,0.95) (3) (1 ) 0.95 0.857

3!0!

For four chambers, the probability of three or four hits isP(3;4,0.95)+P(4;4,0.95)=0.171+0.815=0.986

P B p p= = − = =

For five chambers, the probability of three, four or five hits isP(3;5,0.95)+P(4;5,0.95)+P(5;5,0.95)=0.021+0.204+0.774=0.999!

70

USMMultinomial distribution

binomial distribution: 2 different outcomesmultinomial: more than 2 different outcomes, mutually exclusive!

1 2 3

1 2 31

, , ,..., 1 2 21

1

+ + +...+ with ( ) and 1

When experiments are performed, the probability of finding events is given by

!( , , ,..., )!

We define

j

l

l

l j j jj

j j

lkn

p p p p l jlj

jj

S A A A A P A p p

n k A

nM k k k k pk

x

=

=

=

= = =

=

∑

∏∏

1

1 if experiment yields , and 0 otherwise. Then

and

( ) , with covariance matrix

( ) ( Kronecker ), i.e.,

(1 ) as before, but nonvanishing, n

ij j

n

j iji

j j

ij i ij j ij

ii i i

i A

x x

E x np

C np p

C np p

δ δ δ

=

=

=

=

= −

= −

∑

That there a correlation was to be expected, since the are not independent due to

the constraint 1. I.e., if there are more successes for class than expected

egative correlation

j

j

ij i j

is x

p i

C np p

=

= −

∑ ( ( )),

the values of for all other classes are smaller than (

negati

)

correla i !ve t on

i

j j

E x

x j E x

⇒

71

USMFrequency; law of big numbers

probabilities, e.g., pj in case of the multinomial distribution, are usually not known a priori but have to be obtained from experiments. Thefrequency of event Aj in n experiments is given by

This frequency is a random number, since it depends on the results of the particular n experiments.

1

1 1n

ij ji

h x xn n=

= =∑

2

1( ) ( ) ,

i.e., the expectation value of the frequency of an event is the correspondingprobability, and

1 1 1( ) ( ) (1 ) ( )

This i the law of big numbers !s

jj j

jj j j

xE h E E x p

n n

xVar h Var Var x p p h

n n n nσ

⎛ ⎞= = =⎜ ⎟

⎝ ⎠

⎛ ⎞= = = − ⇒ ∝⎜ ⎟

⎝ ⎠

For large , the standard deviation of the nfrequency vanishes beyond any given limit, which justifies the frequencydefinition of probability (cf. Sect. I).

72

USMPoisson distribution

The study of the lower panel of the last figure (binomial distribution) suggests that this distribution approaches a fixed distribution if n tends to infinity but the product (the expectation value) np=λ is kept constant.Indeed,

( )

( )

!( ; , ) ( ; , / ) 1! !

! ( 1)( 2) ( 1) for !

1 1 for (definition of the exp function)

Thus,

( ; , / ) ( , ) which i!

k n k

k

n k n

k

nP k n p P k n nk n k n n

n n n n n k n nn k

e nn n

eP k n n P kk

λ

λ

λ λλ

λ λ

λλ λ

−

−−

−

⎛ ⎞ ⎛ ⎞= = −⎜ ⎟ ⎜ ⎟− ⎝ ⎠ ⎝ ⎠

= − − ⋅⋅ ⋅ − + → →∞−

⎛ ⎞ ⎛ ⎞− → − → →∞⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

→ = s the

start with P(0)= , and then successively multiply by and dividcalcula

Poiss

tion:

on-distribu

e by 1,2,3,4

t

,

ion

...

and describes the probability of obtaining k events if the expected number is

e λ

λ

λ−

. to obtain P(1), P(2) etc.

73

USM

interpretation: Suppose λ events are expected to occur in some interval. Split up this interval into n very small sections, so that the chance to find two events in one section is negligible. The probability that one section contains one event is then p=λ/n.

The probability of finding k events in the n sections is given by the binomial distribution,

P(k;n,p=λ/n)

which approaches the Poisson distribution for large n.

Note: the Poisson distribution is defined only for integer values of k!

74

USM

Poisson distribution for differentexpectation values

1

2

3

( , ) 1

( ) ( ) ( )

[this is consistent with the binomial distribution:

Total probability

expectation value and variance

skewnes

( ) and

Var( ) (1 ) (1 ) for ]

' s

k

P k

E kVar k k

E k np nn

k np p n nn n

λ

λ

σ λ

λ λ

λ λ λ

μ λ

∞

=

=

=

= =

= = =

= − = − → →∞

=

∑

1/ 233 3/ 2

(third central moment)'= ,

i.e., the distribution becomes increasingly symmetric for increasing

μ λγ λσ λ

λ

−

→

= =

75

USM

application:

Poisson distribution describes asymptotic behavior of binomial distribution with constant λ=np, i.e., with a (very) low probability for the individual process. Thus, it should be applied when there are many trials but only few successes. Since one has no idea on the number of trials (only that there are many), it describes the cases of sharp events occurring in a continuum.

examples: • the number of flashes of lightning in a thunderstorm (it is meaningless to ask

how often there is no flash)

• the number of clicks in a Geiger counter (meaningless to ask about “non-

clicks”)

76

USMA historical example

Statistics on the numbers of Prussian soldiers kicked to death by horses. In the 19th century is was reported that there were 122 deaths in ten different army corps over twenty years, i.e., the mean number of deaths per corps and per year is λ=122/200=0.61.The probability of, e.g., no death is thenP(0,0.61)=0.5434 per year and corps. In twenty years and ten corps, there should be 108.7 cases where no death should have happened. Actually, 109 such events have been reported.

Number of deaths per year and corps

actual numberreported for 20 years and 10 corps

predictions from Poisson statistics

01234

109652231

108.766.320.24.10.6

77

USMSupernova 1987A

The following table gives the numbers of neutrino events detected in 10 s intervals by the Irvine-Michigan-Brookhaven experiment on Feb. 23rd 1987 (around which time SN1987A has been firstly seen)

The average number of events per interval (ignoring the interval with 9 events) is 0.77The Poisson predictions agree well with the data, except for the interval with the 9 events. Thus, the background due to random events is Poisson and well understood, and the nine events cannot be due to fluctuations, but must have come from a different event (the supernova).

no. of events 0 1 2 3 4 5 6 7 8 9

0 1

0.00030.003

no. of intervals 1042 860 307 78 15 3 0 0

prediction 1064 823 318 82 16 2 0.3 0.03

78

USMTwo Poisson distributions

If there are two separate types of Poisson distributed events, and we do not distinguish between the two, then the probability of k=k1+k2 events is also Poisson, with mean equal to the sum of the two individual means.

1

1 1 1 2 1 20

0 0 0

( ) ( , ) ( , ) ( , )

Proof via characteristic function of Poisson distribution( )( ) ( , ) exp( ) exp (

Remembe

1)! !

ch: ar ar

k

k

k it kitk itk it it

Pk k k

P k P k P k k P k

e et e P k e e e e ek k

λλ λ

λ λ λ λ

λ λφ λ λ λ

=

−∞ ∞ ∞− −

= = =

= − = +

⎡ ⎤= = = = = −⎣ ⎦

∑

∑ ∑ ∑

1 2

1 2

sum P( ) P( ) 1 2

1 2 P( )

cteristic function of sum of independent variables is productof their characteristic functions (Sect. II)

( ) ( ) ( ) exp ( 1) exp ( 1)

exp ( )( 1) (

it it

it

t t t e e

e

λ λ

λ λ

φ φ φ λ λ

λ λ φ +

⇒

⎡ ⎤ ⎡ ⎤= = − − =⎣ ⎦ ⎣ ⎦⎡ ⎤= + − =⎣ ⎦

1 2

).

Thus, the sum of two independent, Poisson distributed variables is Poisson-distributed as well, with =

t

λ λ λ+

79

USM

can be generalized to any number of Poisson processes

example: signal with background• expected are S signals with an average background B. The average

fluctuation (standard dev.) of the observed number of events k is thus

• If we subtract the average background from the signal, this fluctuation

remains conserved, of course.

• If the exact expectation value of the background is not known, the

uncertainty is even larger (error propagation)

( )S B S Bσ + = +

For an expected signal =100 and background 50 we observe

on average 150 events with a standard deviation of 150. After

subtracting the background, the average signal is =100 150

S B

S

=

±

80

USMUniform distribution

so far, only distributions of one or more discrete variables discussedwill now turn to continuous distribution functionsmost simple case: the uniform distribution (already mentioned before):

constant probability density in a certain interval, elsewhere 0.( ) ( ) 0 ,

From the normalization, ( ) 1, we obtain

1 , -

and the distribution function becomes

( )

f x c a x bf x x a x b

f x dx

cb a

F x

∞

−∞

= ≤ <= < ≥

=

=

=

∫

1 - - -

( ) 0 ( ) 1

x

a

x adx a x bb a b a

F x x aF x x b

= ≤ <

= <= ≥

∫

81

USM

uniform distributions with a=0, b=1 created by random number generators (RNGs).Note: in many RNGs, “0” not included, i.e., lowermost value =ε (machine dependent)important for Monte Carlo methodsdifferent distributions obtained from transformation methods (see Sect. II/III)

2

1 1( ) ( )- 21( ) ( )

12

a

b

E x xdx a bb a

Var x b a

= = +

= −

∫

82

USMGaussian (or normal) distribution

assume binomial distribution with random variable k

( )

[ ]0

!( ; , ) (1 )! !

characteristic function:

( ) ( ; , ) exp( ) (1 ) (without proof)

use reduced variable

and interpret this as the sum of two independent r.v. (though

k n k

nitk

k

nP k n p p pk n k

t e P k n p it p p

k k k npu

φ

σ σ

−

∞

=

= −−

= = + −

− −= =

∑

( )( )

2 32

the 2nd term is constant)

( ) exp exp (1 )

take logarithm of ( ) and expand exp , thereafter expanding ln 1+f t/

1 (1 )ln ( ) ( )2

n

u

u

u

itnp itt p p

itt

np pt t O

φσ σ

φ σσ

φ σσ

−

⎡ ⎤⎛ ⎞ ⎛ ⎞= − + −⎜ ⎟ ⎜ ⎟⎢ ⎥⎝ ⎠ ⎝ ⎠⎣ ⎦⎛ ⎞ ⇒⎜ ⎟⎝ ⎠

−= − +

83

USM

2

2

Thus, accounting for (1 ) and in the limit of n , we find1( ) exp2

This is the characteristic function of a binomial distribution, using a reducedvariable, in the limit of large .Ba

u

np p

t t

n

σ

φ

= − →∞

⎛ ⎞= −⎜ ⎟⎝ ⎠

2

ck-transformation yields the corresponding p.d.f.,

1 1( ) exp22

which is called the Gaussian or normal distribution. Since is a reduced variable,( ) should be 0 and ( ) should be 1.

Test:

f u u

uE u Var u

π⎛ ⎞= −⎜ ⎟⎝ ⎠

2

( ) 02 2

2 20 0

1 1( ) exp 022

'( ) ( )( ) 1, q.e.d.E u

t t

E u u u

d t d tVar udt dt

π

φ φ

∞

−∞

=

= =

⎛ ⎞= − =⎜ ⎟⎝ ⎠

= − = − =

∫

84

USM

2

2

2

2

2

A more general form of the normal distribution is

1 ( )( ) exp .22

Since ( ) and ( ) , the conventional representation is

1 ( )( ) exp22

The inflection points of this di

x af xbb

E x a Var x b

xf x

π

μσπσ

⎛ ⎞−= −⎜ ⎟

⎝ ⎠= =

⎛ ⎞−= −⎜ ⎟

⎝ ⎠stribution (zero curvature) are located at .

Once again, this is the limit of a binomial distribution with the above expectation value and variance, in the limit .

The corresponding characterist

n

μ σ±

→∞

2 2

ic function is1( ) ( ) exp( )exp2

The characteristic function of a normal distribution withzero mean is itself a normal distribution withTheorem:

zero mean. The productof th

e var

itxt e f x dx it tφ μ σ⎛ ⎞= = −⎜ ⎟⎝ ⎠∫

iances of both distributions is one.

85

USM

2 2

n0

2 41 2 3 4

The characteristic function transformed to is1'( ) exp2

1 '( )With ' (Sect. II), we find the central moments

' 0, ' , ' 0, ' 3 (remember kurtosis, Sect.

n

n nt

y x

t t

d ti dt

μ

φ σ

φμ

μ μ σ μ μ σ=

= −

⎛ ⎞= −⎜ ⎟⎝ ⎠

=

= = = =

2 1

22

2

( ) /22

2

I),and

' 0, 0,1,2,3,...(2 )!' .2 !

Corresponding cumulative distribution functions are

1 1( ) exp22

1 ( ) 1 1( ) exp exp2 22 2

k

kk k

x

o

xx

o

kkk

x x dx

x xx dx u duμ σ

μ

μ σ

ψπ

μ μψ ψσ σπσ π

+

−∞

−

−∞ −∞

= =

=

⎛ ⎞= −⎜ ⎟⎝ ⎠

⎛ ⎞− −⎛ ⎞ ⎛= − = − =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝⎝ ⎠

∫

∫ ∫ ⎞⎜ ⎟

⎠

86

USM

( )

( )

0

0

0

0

The probability of observing within a band width 2 around the expectation value zero is

( ) ( ) ( ) ( ( ) ( ))

2 ( ) 2 ( ) 2 ( ) 2 1

andthe probability

x x

x x

x x

o

x x

P t x f u du f u du f u du f u f u

f u du f u du f u du xψ

− −

−∞ −∞

≤ = = + = = −

= = − = −

∫ ∫ ∫

∫ ∫ ∫

( ) ( )

of a random variable being observed within an integer multiple of the standard deviation from the mean

2 1 2 1

o onP x n nσμ σ ψ ψσ

⎛ ⎞− ≤ = − = −⎜ ⎟⎝ ⎠

( ) ( ) ( )( ) ( ) ( )( )

from Tchebychev inequality(Sect. II)

0.682 0.318 1.0

2 0.954 2 0.046 2 0.25

3 0.998

P x P x P x

P x P x P x

P x

μ σ μ σ μ σ

μ σ μ σ μ σ

μ σ

− ≤ = − > = − > <

− ≤ = − > = − > <

− ≤ = ( ) ( ) 3 0.002 3 0.11P x P xμ σ μ σ− > = − > <

“3σ-error“

87

USMMultivariate normal distribution

1 2The joint normal distribution of variables ( , ,..., ) is defined as1( ) exp2

with a , Matrix. Since ( ) symmetric about ,s

( ) ( )

ymmetric

d , i.e.,

nn x x x x

k

n

E

n

φ

φ

φ∞

−∞

= =

⎧ ⎫= −⎨ ⎬⎩ ⎭

×

− =∫

T

T

x

x (x - a) B(x - a)

B x a

x a x x 0

(x

( )

1 2

Differentiating w.r.t. (=0), we find for the th component

1( ) ( )d ( ) +( ) ( ) 2 ( ) d ,2

and for all components

, ,

.

...

,

i ik ki

n

i

B x aa

a a a

φ φ φ∞ ∞

−∞ −∞

∂ ⎛ ⎞⎛ ⎞− = − − − − − =⎜ ⎟⎜ ⎟∂ ⎝ ⎠⎝ ⎠

⎛ ∂ ∂ ∂∂ ∂ ∂⎝

∑∫ ∫

a

x a x x x e x a x

) = a = μ

x 0

[ ]( )( )

( )

( ) ( )d ( )d , which implies that

( symmetric) and thus

The Matrix in the exponent of ( ) the inverse of is ju the cst ovariance

E

E

φ φ

φ

−

∞ ∞

−∞ −∞

⎞− = −⎜ ⎟

⎠

= =

∫ ∫T

1

T

T

x a x x I - (x - a) B(x - a) x x = 0

(x -

C (x - a)(x

a)(x - a) B = I B

B x

- a) B

matrix,the vector formed by the ex

a pend the vector ctation va lues.a

88

USMBinormal (or bivariate normal) distribution

( )

21 1 2

21 2 2

22 1 2

2 2 2 21 2 1 2 1 2 1

221 1 1

1 2 21 2 1

22

cov( , )With , we obtain

cov( , )

cov( , )1cov ( , ) cov( , )

1Case 1: independent vari

01 1

able

( , ) exp1

s

2 20

x xx x

x xx x x x

xx x

σσ

σσ σ σ

σ μφ

πσ σ σσ

− ⎛ ⎞= = ⎜ ⎟

⎝ ⎠

⎛ ⎞−= ⎜ ⎟− −⎝ ⎠

→

⎛ ⎞⎜ ⎟ ⎛ −⎜ ⎟= → = −⎜⎜⎜ ⎟ ⎝⎜ ⎟⎝ ⎠

1C B

B

B ( )

( )

22 2

22

/ 2

1exp , i.e.,2

becomes the product of two normal distributions (the leading factor from normalization)

1(for variables with non-vanishing covariance, one obtains 2 det( )

Cas

n

x

n k

μσ

φ

π

⎞ ⎛ ⎞−−⎟ ⎜ ⎟⎟ ⎜ ⎟

⎠ ⎝ ⎠

=B

( )( ) ( )2 2

1 1 2 21 1 2 21 2 2 2 22

1 1 2 21 2

1 2

1

1 1( , ) exp 22(1 )2 1

cov( , )Let's use reduced variables, , 1,2 and correlation coe

e 2: dependent variab

fficient

es

s

l

i ii

i

x xx xx x

x x xu i

μ μμ μφ ρρ σ σ σ σπσ σ ρ

μ ρσ σ

⎡ ⎤⎛ ⎞− −− −⎢ ⎥= − − +⎜ ⎟⎜ ⎟−⎢ ⎥− ⎝ ⎠⎣ ⎦−

= = = 1 22

cov( , )u uσ

= →

89

USM

1 2

2

2 21 2 1 22

1 1( , ) exp( ), with22 det

1111

Lines of constant probability density result from constant exponent1 ( 2 ) const

1Let const=1, i.e., the prob. density has decr

u u

u u u u

φπ

ρρρ

ρρ

= −

−⎛ ⎞= ⎜ ⎟−− ⎝ ⎠

+ − =−

Tu BuB

B

(This corresponds to the 1-D case where at 1 (i.e., ( - ) )

the prob. density has decreased by the same factor. )

eased by a factor of

exp(-1/2)=1/ e from the maximum, (0,0).

In the original vari

u x μ σ

φ= ± ± =

( ) ( )2 221 1 2 21 1 2 2

2 21 1 2 2

1 2

ables, we then have

2 1 ,

which is the equation of an ellipse withellips

center at ( , ) and is called thee of covariance (

The Fehlerellipse). extreme va olu s f e

x xx x

x

μ μμ μρ ρσ σ σ σ

μ μ

− −− −− + = −

1 2

1 1 2 2

1 2

and are located at

and ,

i.e., the ellipse fits exactly into the rectangular box between these limits. The total probability of observing a pair of and inside the ellipse is 1- ex

x

x x

μ σ μ σ± ±

p(-1/ 2).

90

USM

1 2'

'1

'2

'2

1=0.7 (green) = -31.60 , 0.625

=0.0

covariance ellipses centered at (2,2), with

(red) = 0.0

2, 1.6

0 ,

15

1, 2 , a

1.000

2

0

=-0.3 (blue) = 2

nd

, 2

0.16 ,

ρ θ σ

ρ θ σ σ

ρ σ

σ

σ σ

θ

= =

→ = =

→ =

→

=

' '1 2

' '1 2

=-0.999 (black) = 35.2

0.9188, 1

6 , 0.0365, 1.7316

.4683

ρ θ σ σ

σ=

→ = =

=

1 1

2 2

By a simple rotation, the correlation can be put to zero (diagonalization by orthogonal transformation).The corresponding transformation is

' cos sin , with

' -sin cos

2tan 2

x xx x

θ θθ θ

θ

⎛ ⎞ ⎛ ⎞⎛ ⎞=⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠ ⎝ ⎠

= 1 22 2

1 2

1 22 2 2

'2 1 21 2 2 2 2

1 2 1 22 2 2

' 2 1 22 2

1

uncorr

,

and new semi-major and semi-minor axes (corresponding to the variances of the variables 'elated and ')

(1 )sin cos 2 sin cos

(1 )o

c

x x

ρσ σσ σ

σ σ ρσσ θ σ θ σ σ θ θ

σ σ ρσσ

−

−=

+ −

−= 2 2 2

2 1 2

2 21 2

1 2 ' ' '2 ' 21 2 1 2

s sin 2 sin cos

In the rotated coordinate system, the distribution has the simple form

' '1 1( ', ') exp2 2

x xx x

θ σ θ σ σ θ θ

φπσ σ σ σ

+ +

⎧ ⎫⎛ ⎞⎪ ⎪= − +⎨ ⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭

91

USM

' '1 2

The probability enclosed by the covariance-ellipse can be calculated as follows:Consider the rotated coordinate system, and work in reduced variables. In this case, the p.d.f. reads

1( , ) exp(2

u uφπ

= ' 2 ' 21 2

2 1' ' ' ' 21 2 1 2

circle 0 0

1- ( ),2

and the total probability inside the covariance-ellipse (which in the transformed variables is the unit circle) can be calculated from

1( , ) exp(- / 2)2

u u

u u du du d r r drφ ϕπ

+

=∫∫ ∫

1

2

2

1

0

This is the probability that any ( , ) pair is located within the covariance-ellipse, and applies for all binormal distributions, in

exp(- / 2) 1 exp( 1/ 2) 0

dependent of their specific cor

3

r

. 93

x x

rπ

= − = − − =∫

(distribution in transformed coordinate system independent of correlation).

The area inside the covariance ellipse is called the "1- confidence region", since it comprises t

elation te

he region r

rm

wheσ

1,2

e the p.d.f. has decreased from the maximum by a factor of exp(-1/ 2),in analogy to the 1-D case (independent of correlation and the specific ).

Similarly, one can calculate the 2- confidence regi

σ

σ2 2

on (where the probability density has decreased by a factor of exp(-(2 ) / 2 ) exp( 4 / 2), with a total probability inside the corresponding ellipse of 1 exp( 4 / 2) 0.865 (in the above i

σ σ = −− − =

ntegral, replace the upper limit by r=2), and so on for the n- interval.

Finally, one can generalize this consideration to arbitrary dimensions.

σ

92

USM

Generally, the 1- confidence interval denotes the region where the probability density has decreased by the factor of exp(-1/ 2)σ

binormal distribution as before, with =-0.9, and contour plots for the 1-,2- and 3- covariance ellipses

In the lower panel, the coordinate system has been transformed (rotated, streched)and displays

ρσ

the transformed binormal distribution (with unit variances and =0) and corresponding covariance "ellipses" for =1,2,3

Note that the volume (corresponding to the total probability inside the contour l

ρ

σ

evels) remains preserved under the transformation(e.g., for thin ellipses with large the probability densities are larger)

ρ

93

USM

covariance ellipses for σ=1,2,3, corresponding probabilities andstandard-deviations with respectto the two directions

2σ1

2σ2

deviation confidence-niveau [%]

left: probability inside n-σconfidence region; right: interval limits in units of σ,for a given confidence niveau (probability)

94

USMχ2-distribution

2

Remember from Sect. II, "calculation of the transformed p.d.f.":

Calculate the distribution for the of a reduced r.v. which itself should be normally distributed.

squa

ex

re

ampl

( - ) and

e 2

xu μσ

⎡ ⎤= ⎢ ⎥⎣ ⎦

2

2( )

/ 2

2

2

2

1 ( ) e2

,

which is the so-called -distribution for one degree of freedom.For convenience, we denote by in the following.

Now, let's add inden

1( )2

( ) 1, ( ) 2

dent

x

ug u eu

E u

f x

u

f

Var u

μσ

σ π

χ

χ

π

−

−

−

=

⇒ =

=

=

22

2i=1

, normally distributed and reduced random variables

( - )fi i

i

xu μχσ

= =∑

95

USM

2This results in the , and plays an important role in the comparison of measurements and theoretical predictions(e.g., line

-distribution for degrees o

ar regressions). In this cas

f fre

1( )

ed

(

e

om

g u

f

f

χ

=Γ

/ 2 1 / 2/ 2 with Gamma-function and

(from the definition and using the calculation rules

, / 2)2

( ) , ( for expectation value and va

) 2 ri ce

an

f uf u e

E u f Var u f

− − Γ

= =

2max

2

)

Maximum of -distribution for 2 at 2.

For 2, we obtain an exponential distributionFor large , -distribution approaches normal distribution.

The role of the degrees of freedom will be d

f u f

ff

χ

χ

> = −

=

iscussed in Sect. x.x

96

USMThe central limit theorem (CLT)

{ }1

Remember: Normal distribution was derived as the asymptotic distribution for

lim

when describes the outcome of an experiment with two possible results, 0,1 .

Let's now investigate more

n

in i

i i

x x

x x

→∞=

=

=

∑

2

general sums of this type.

We assume that the are independent r.v. and originate from the distribution

with mean and vari

same, arbitr

ance . The characteristic function

"Classical" theorary

em:

ix

μ σ

'

'

'

2( ) 2

20 0

2 2 3

of this distribution (for ) is

( ) ( )( ) ( ), with 0 and

Thus, the Taylor expansion is given by ( (0) 1)1( ) 1 ( )2

i

i

i

i i

it xx

t t

x

x x

d t d tt E edt dt

t t O t

μ

μ

φ φφ σ

φ

φ σ

−

= =

= −

= = = −

=

= − +

97

USM

'

'

2

We introduce now a new variable

, which simply contracts the scale. The corresponding charact. function is

t( ) ( ) (exp( )) ( ), and therefore

( ) 1 .... 2

i

i i

i

i ii

itu iu x

u

x xun n

xt E e E itn n

ttn

μσ σ

μφ φσ σ

φ

−= =

−= = =

= − + 3/ 2 with higher terms at most of order ( )

Making use of the fact that the characteristic function of the sum of independent r.v. is given by the product of the individual charact. func

O n

n

−

( )1 1

2

2

tions, and going to the limit ,we find for

lim lim that

( ) lim ( ) lim 1 ....2

1( ) exp( ),2

which is just the charact. function of the standardi

i

n ni

in ni i

nn

u un n

u

n

xu un

tt tn

t t

μσ

φ φ

φ

→∞ →∞= =

→∞ →∞

→∞

−= =

⎛ ⎞= = − +⎜ ⎟

⎝ ⎠

= −

∑ ∑

zed normal distribution, with expectation value 0 and variance 1.

98

USM

1

1 1

2

In terms of the of the original variable then,

1lim ( + ) lim lim ,

( ) ,

arithmetic mea

( ) /

the back-transformed distribution is normal, w

n

ith me

n

n ni

i in n ni i

x

xx nu un n nn

E x Var x n

μσσ μ

μ σ

=

→∞ →∞ →∞= =

⎡ ⎤= = = +⎢ ⎥⎣ ⎦

= =

∑∑ ∑

2

an and standard deviation / .

Thus, the "classical" central limit theorem reads:

If the are a set of independent r.v. each distributed with mean and variance ,then in the limit of their ar

i

n

xn

μ σ

μ σ→∞

1

2

ithmetic mean 1

is normal distributed with mean and variance / .

Under certain assumptions [see, e.g., : the Lyapunov criterium ("weak" asymmetry) or theeven weaker Li

n

ii

xx xn n

nμ σ=

= = ∑

ndeberg condition], a central limit theorem can be formulated.If these conditions apply, the sum of arbitrary (i.e., not identical) distributed r.v converges

to a normal dist

"generaliz

ribution,

ed"

wit 2

1 1

h mean and variance .n n

i ii i

μ σ= =∑ ∑

Wikipedia

99

USMExamples for the CLT

CLT for several cases:upper panel: arithmetic mean of n=1, 2, 30 uniformly distributed r.v.overplotted is the corresponding Gaussianwith μ= 0.5 and variance =1/(12*n)

middle panel: arithmeticmean of n=1, 2, 30 exponentially (λ=1)distributed r.v.overplotted is the corresponding Gaussianwith μ= 1 and variance =1/n

lower panel: sum of n=1, 2, 30 exponentially (λ=1)plus n=1,2, 30 uniformly distributed r.v.overplotted is the corresponding Gaussianwith μ= n*1+n*0.5 and variance =n*1+n/12.

sample size =1e6, binsize=0.005

100

USM

The CLT in its generalized form is the base of assuming experimental errors as being normally distributed:

each measurement error is assumed to consist of an accumulation of small individual errors (with unknown distribution), whereas their sum (the measured error) can be described by a Gaussian.

101

USMlog-normal distribution

single-tailed probability distribution of a random variable whose logarithm is normally distributed. If y is a random variable with a normal distribution, then x = exp(y) has a log-normal distribution likewise, if x is log-normally distributed, then log(x) is normally distributed. (The base of the logarithmic function does not matter) a variable might be modeled as log-normal if it can be thought of as the product of many independent factors which are positive and close to 1. (see figure next page)log (x) = log of product = sum of log’s -> CLT -> log (x) normally distributedplays an important role in, e.g., economy, biology, mechanics and astrophysics

2

2 2

2

2

/ 2

2

1 (ln( ) )( , , ) exp( )22

( ) e

Var(x)=(e 1)e

xf xx

E x μ σ

σ μ σ

μμ σσπσ

+

+

−= −

=

−

102

USM

pdf (left) and cumulative distribution function (right) for a log-normal distribution with μ=0 and different σ

5

7

ii=1

Left: simulation of a log-normal distribution from asample of 10 r.v. which are distributed according to

x= x with independent ,

where the are uniformly distributed within the interval 0.4

i

i

x

x

∏

[ ],1.6 .The estimators (Sect. VI) for and areˆ ˆ=-0.47 and =1.02Overplotted is a theoretical log-normal distribution with these parameters

μ σμ σ

http://upload.wikimedia.org/wikipedia/commons/4/46/Lognormal_distribution_PDF.png

http://upload.wikimedia.org/wikipedia/commons/e/e6/Lognormal_distribution_CDF.png

Statistical Methods - uni-muenchen.de · Statistical Methods An Introduction for...

Documents

Transcript of Statistical Methods - uni-muenchen.de · Statistical Methods An Introduction for...