Statistical Data Analysis: Lecture 5

16
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Statistical Data Analysis: Lecture 5 1 Probability, Bayes’ theorem 2 Random variables and probability densities 3 Expectation values, error propagation 4 Catalogue of pdfs 5 The Monte Carlo method 6 Statistical tests: general concepts 7 Test statistics, multivariate methods 8 Goodness-of-fit tests 9 Parameter estimation, maximum likelihood 10 More maximum likelihood 11 Method of least squares 12 Interval estimation, setting limits 13 Nuisance parameters, systematic uncertainties 14 Examples of Bayesian approach

description

Statistical Data Analysis: Lecture 5. 1Probability, Bayes’ theorem 2Random variables and probability densities 3Expectation values, error propagation 4Catalogue of pdfs 5The Monte Carlo method 6Statistical tests: general concepts 7Test statistics, multivariate methods - PowerPoint PPT Presentation

Transcript of Statistical Data Analysis: Lecture 5

Page 1: Statistical Data Analysis:  Lecture 5

G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Statistical Data Analysis: Lecture 5

1 Probability, Bayes’ theorem2 Random variables and probability densities3 Expectation values, error propagation4 Catalogue of pdfs5 The Monte Carlo method6 Statistical tests: general concepts7 Test statistics, multivariate methods8 Goodness-of-fit tests9 Parameter estimation, maximum likelihood10 More maximum likelihood11 Method of least squares12 Interval estimation, setting limits13 Nuisance parameters, systematic uncertainties14 Examples of Bayesian approach

Page 2: Statistical Data Analysis:  Lecture 5

G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 2

What it is: a numerical technique for calculating probabilitiesand related quantities using sequences of random numbers.

The usual steps:

(1) Generate sequence r1, r2, ..., rm uniform in [0, 1].

(2) Use this to produce another sequence x1, x2, ..., xn

distributed according to some pdf f (x) in which we’re interested (x can be a vector).

(3) Use the x values to estimate some property of f (x), e.g., fraction of x values with a < x < b gives

→ MC calculation = integration (at least formally)

MC generated values = ‘simulated data’→ use for testing statistical procedures

The Monte Carlo method

Page 3: Statistical Data Analysis:  Lecture 5

G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 3

Random number generatorsGoal: generate uniformly distributed values in [0, 1].

Toss coin for e.g. 32 bit number... (too tiring).

→ ‘random number generator’

= computer algorithm to generate r1, r2, ..., rn.

Example: multiplicative linear congruential generator (MLCG)

ni+1 = (a ni) mod m , where

ni = integer

a = multiplier

m = modulus

n0 = seed (initial value)

N.B. mod = modulus (remainder), e.g. 27 mod 5 = 2.

This rule produces a sequence of numbers n0, n1, ...

Page 4: Statistical Data Analysis:  Lecture 5

G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 4

Random number generators (2)

The sequence is (unfortunately) periodic!

Example (see Brandt Ch 4): a = 3, m = 7, n0 = 1

← sequence repeats

Choose a, m to obtain long period (maximum = m 1); m usually close to the largest integer that can represented in the computer.

Only use a subset of a single period of the sequence.

Page 5: Statistical Data Analysis:  Lecture 5

G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 5

Random number generators (3)are in [0, 1] but are they ‘random’?

Choose a, m so that the ri pass various tests of randomness:

uniform distribution in [0, 1],

all values independent (no correlations between pairs),

e.g. L’Ecuyer, Commun. ACM 31 (1988) 742 suggests

a = 40692 m = 2147483399

Far better generators available, e.g. TRandom3, based on Mersennetwister algorithm, period = 2199371 (a “Mersenne prime”).See F. James, Comp. Phys. Comm. 60 (1990) 111; Brandt Ch. 4

Page 6: Statistical Data Analysis:  Lecture 5

G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 6

The transformation methodGiven r1, r2,..., rn uniform in [0, 1], find x1, x2,..., xn

that follow f (x) by finding a suitable transformation x (r).

Require:

i.e.

That is, set and solve for x (r).

Page 7: Statistical Data Analysis:  Lecture 5

G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 7

Example of the transformation method

Exponential pdf:

Set and solve for x (r).

→ works too.)

Page 8: Statistical Data Analysis:  Lecture 5

G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 8

The acceptance-rejection method

Enclose the pdf in a box:

(1) Generate a random number x, uniform in [xmin, xmax], i.e.

r1 is uniform in [0,1].

(2) Generate a 2nd independent random number u uniformly

distributed between 0 and fmax, i.e.

(3) If u < f (x), then accept x. If not, reject x and repeat.

Page 9: Statistical Data Analysis:  Lecture 5

G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 9

Example with acceptance-rejection method

If dot below curve, use x value in histogram.

Page 10: Statistical Data Analysis:  Lecture 5

G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 10

Improving efficiency of the acceptance-rejection method

The fraction of accepted points is equal to the fraction ofthe box’s area under the curve.

For very peaked distributions, this may be very low andthus the algorithm may be slow.

Improve by enclosing the pdf f(x) in a curve C h(x) that conforms to f(x) more closely, where h(x) is a pdf from which we can generate random values and C is a constant.

Generate points uniformly over C h(x).

If point is below f(x), accept x.

Page 11: Statistical Data Analysis:  Lecture 5

G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 11

Monte Carlo event generators

Simple example: ee →

Generate cos and :

Less simple: ‘event generators’ for a variety of reactions: e+e- → , hadrons, ... pp → hadrons, D-Y, SUSY,...

e.g. PYTHIA, HERWIG, ISAJET...

Output = ‘events’, i.e., for each event we get a list ofgenerated particles and their momentum vectors, types, etc.

Page 12: Statistical Data Analysis:  Lecture 5

12

A simulated event

PYTHIA Monte Carlopp → gluino-gluino

Page 13: Statistical Data Analysis:  Lecture 5

G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 13

Monte Carlo detector simulationTakes as input the particle list and momenta from generator.

Simulates detector response:multiple Coulomb scattering (generate scattering angle),particle decays (generate lifetime),ionization energy loss (generate ),electromagnetic, hadronic showers,production of signals, electronics response, ...

Output = simulated raw data → input to reconstruction software:track finding, fitting, etc.

Predict what you should see at ‘detector level’ given a certain hypothesis for ‘generator level’. Compare with the real data.

Estimate ‘efficiencies’ = #events found / # events generated.

Programming package: GEANT

Page 14: Statistical Data Analysis:  Lecture 5

G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 14

Wrapping up lecture 5

We’ve now seen the Monte Carlo method:calculations based on sequences of random numbers,used to simulate particle collisions, detector response.

So far, we’ve mainly been talking about probability.

But suppose now we are faced with experimental data. We want to infer something about the (probabilistic) processes that produced the data.

This is statistics, the main subject of the following lectures.

Page 15: Statistical Data Analysis:  Lecture 5

G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 15

Extra slides

Page 16: Statistical Data Analysis:  Lecture 5

G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 16

“True” random numbers1955 the RAND Corporation published a book of random numbersgenerated with an “electronic roulette wheel”, based on randomfrequency electronic pulses.

You can download all 1,000,000 of them (and buy the book)from www.rand.org.