Statistics for HEP Roger Barlow Manchester University
description
Transcript of Statistics for HEP Roger Barlow Manchester University
Slide 1
Statistics for HEPRoger Barlow
Manchester University
Lecture 2: Distributions
Slide 2
0
0.2
0.4
0 1 2 3 4 5
r
The Binomialn trials r successesIndividual success probability
p
rnr pp
rnr
npnrP
)1(
)!(!
!),;(
Variance
V=<(r- )2>=<r2>-<r>2
=np(1-p)
Mean
=<r>=rP( r )
= np
1-p p q
Met with in Efficiency/Acceptance calculations
Slide 3
Binomial Examples
0
0.5
1
r
0
0.1
0.2
0.3
r
0
0.1
0.2
0.3
0.4
r
0
0.1
0.2
0.3
0.4
r
n=10 p=0.2 p=0.5 p=0.8
0
0.1
0.2
0.3
r
p=0.1 n=20
0
0.1
0.2
r
n=50 n=5
Slide 4
Poisson
‘Events in a continuum’e.g. Geiger Counter clicksMean rate in time intervalGives number of events in
data0
0.1
0.2
0.3
r
!);(
rerP
r Mean
=<r>=rP( r )
= Variance
V=<(r- )2>=<r2>-<r>2
=
=2.5
Slide 5
Poisson Examples
0
0.1
0.2
0.3
r
0
0.2
0.4
0.6
0.8
r
0
0.1
0.2
0.3
0.4
r
0
0.1
0.2
r
0
0. 2
r
0
0. 1
r
=25=10=5.0
=0.5 =2.0=1.0
Slide 6
Binomial and Poisson
From an exam paperA student is standing by the road, hoping to hitch a lift. Cars pass
according to a Poisson distribution with a mean frequency of 1 per minute. The probability of an individual car giving a lift is 1%. Calculate the probability that the student is still waiting for a lift
(a) After 60 cars have passed
(b) After 1 hour
a) 0.9960=0.5472 b) e-0.6=0.5488
Slide 7
Poisson as approximate binomial
0
0.1
0.2
0.3
r
Poisson mean 2
Binomial: p=0.1, 20 tries Binomial: p=0.01, 200 tries
Use: MC simulation (Binomial) of Real data (Poisson)
Slide 8
Two Poissons
2 Poisson sources, means 1 and 2 Combine samplese.g. leptonic and hadronic decays of W
Forward and backward muon pairsTracks that trigger and tracks that don’t
What you get is a Convolution
P( r )= P(r’; 1 ) P(r-r’; 2 )
Turns out this is also a Poisson with mean 1+2 Avoids lots of worry
Slide 9
The GaussianProbability
Density
ex
xP22 2/)(
2
1),;(
Mean
=<x>=xP( x ) dx
=Variance
V=<(x- )2>=<x2>-<x>2
=
Slide 10
Different GaussiansThere’s only one!
Normalisation (if required)
Location change
Width scaling factor
Falls to 1/e of peak at x=
Slide 11
Probability Contents
68.27% within 195.45% within 299.73% within 3
90% within 1.645 95% within 1.960 99% within 2.576 99.9% within
3.290These numbers apply to Gaussians and only Gaussians
Other distributions have equivalent values which you could use of you wanted
Slide 12
Central Limit TheoremOr: why is the Gaussian Normal?
If a Variable x is produced by the convolution of variables x1,x2…xN
I) <x>=1+2+…N
II) V(x)=V1+V2+…VN
III) P(x) becomes Gaussian for large N
There were hints in the Binomial and Poisson examples
Slide 13
CLT demonstration
Convolute Uniform distribution with itself
0
0.02
0.04
0.06
0.08
0.1
0.12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Series1
00.010.020.030.040.050.060.070.08
1 5 9
13
17
21
25
29
0
0.02
0.04
0.06
0.08
0.1
0.12
1 2 3 4 5 6 7 8 9 100
0.02
0.04
0.06
0.08
0.1
0.12
1 4 7 10 13 16 19
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
1 7 13 19 25 31 37 43 49
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
1 7 13 19 25 31 37 43 49 55
Slide 14
CLT Proof (I) Characteristic functions
Given P(x) consider
The Characteristic Function
For convolutions, CFs multiplyIf then Logs of CFs Add
)()(~
)( kkPdxxPee ikxikx
)()()( xhxgxf )(~
)(~)(~
khkgkf
Slide 15
CLT proof (2) Cumulants
CF is a power series in k<1>+<ikx>+<(ikx)2/2!>+<(ikx)3/3!>+…
1+ik<x>-k2<x2>/2!-ik3<x3>/3!+…Ln CF can then be expanded as a series
ikK1 + (ik)2K2/2! + (ik)3K3/3!…
Kr : the “semi-invariant cumulants of Thiele”Total power of xr
If xx+a then only K1 K1+a
If xbx then each Kr brKr
Slide 16
CLT proof (3)
• The FT of a Gaussian is a Gaussian
Taking logs gives power series up to k2
Kr=0 for r>2 defines a Gaussian• Selfconvolute anything n times: Kr’=n Kr
Need to normalise – divide by n Kr’’=n-r Kr’= n1-r Kr
• Higher Cumulants die away fasterIf the distributions are not identical but similar the same
argument applies
2/2/ 2222 kx ee
Slide 17
CLT in real life
Examples• Height• Simple
Measurements• Student final
marks
Counterexamples• Weight• Wealth• Student entry
grades
Slide 18
Multidimensional Gaussian
e yxyxyyxx yxyx
yx
yxyxyxP
/))((2/)(/)()1(2
1
2
22222
12
1
),,,,;,(
ee yyxx yx
yxyxyxyxP
2222 2/)(2/)(
2
1),,,;,(
Slide 19
Chi squared
Sum of squared discrepancies, scaled by expected error
Integrate all but 1-D of multi-D Gaussian
n
i i
iix
1
2
2
2/22/
2 2
)2/(
2);(
e
nnP n
n
Mean n Variance 2n
CLT slow to operate
Slide 20
Generating DistributionsGiven int rand()in stdlib.hfloat Random()
{return ((float)rand())/RAND_MAX;}float uniform(float lo, float hi)
{return lo+Random()*(hi-lo);}float life(float tau)
{return -tau*log(Random());}float ugauss() // really crude. Do not use
{float x=0; for(int i=0;i<12;i++) x+=Random(); return x-6;}
float gauss(float mu, float sigma) {return mu+sigma*ugauss();}
Slide 21
A better Gaussian Generator
float ugauss(){
static bool igot=false;
static float got;
if(igot){igot=false; return got;}
float phi=uniform(0.0F,2*M_PI);
float r=life(1.0f);
igot=true;
got=r*cos(phi);
return r*sin(phi);}
Slide 22
More complicated functions
Find P0=max[P(x)]. Overestimate if in doubtRepeat :
Repeat: Generate random xFind P(x)Generate random P in range 0-P0
till P(x)>P
till you have enough data If necessary make x non-random and compensate
0
5
10
15
20
25
1 3 5 7 9 11 13 15 17 19
Slide 23
Other distributions
Uniform(top hat)=width/12
Breit Wigner (Cauchy)Has no variance – useful for wide tails
LandauHas no variance or mean
Not given by .Use CERNLIB ee
Slide 24
Functions you need to know
•Gaussian/Chi squared
• Poisson• Binomial• Everything else