Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for...
Transcript of Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for...
Dr Lorenzo Moneta CERN PH-SFT
CH-1211 Geneva 23 sftweb.cern.ch
root.cern.ch
1
Introduction to Statistical Methods for Data Analysis
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics
• Probability definition • Probability Density Functions • Some typical distributions • Bayes Theorem • Parameter Estimation • Hypothesis Testing
2
Outline
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics
• A lot of the material for this introduction to statistical methods is extracted from a course: –Statistical Methods for Data Analysis
(Luca Lista, INFN Napoli)
–Material available also in his book • Statistical Methods for Data Analysis in Particle Physics
(Springer) – http://www.springer.com/us/book/9783319201757
• Other suggested book is –Data Analysis in High Energy Physics (Wiley)
3
References
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics
• Two main different definitions: –Frequentist
• Probability is the ratio of the number of occurrences of an event to the total number of experiments, in the limit of very large number of repeatable experiments.
• Can only be applied to a specific classes of events (repeatable experiments)
• Meaningless to state: “probability that the lightest SuSy particle’s mass is less tha 1 TeV”
–Bayesian • Probability measures someone’s the degree of belief that
something is or will be true: would you bet? • Probability measures someone’s the degree of belief that
something is or will be true: would you bet? – Probability that Barcelona will win the next Champion League
4
Definition Of Probability
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics
• Assume all accessible cases are equally probable • Valid on discrete cases only
–Problem in continuous cases (definition of metrics)
5
Classical Probability
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics
• Distribution of number of successes on N trials –e.g. spinning a coin or a dice N times
• Each trial has a probability p of success
• Average: <n> = Np • Variance: <n2>-<n>2 = Np(1-p) • Used for efficiency • In ROOT is available as
6
Binomial Distribution
ROOT::Math::binomial_pdf(n,p,N)
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics
• Law of large numbers
• this means also that
• circular definition of probabilities –a phenomenon can be proven to be random only if we
observe infinite cases
7
Frequentist Probability
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics
• Probability of A, given B : P(A|B) –probability that an event known to belong to set B is also
member of set A –P(A | B) = P(A ∩ B) / P(B)
–A is independent of B ifthe conditional probability of A given B is equal to theprobability of A: • P(A | B) = P(A)
–Hence, if A is independent on B • P(A | B) = P(A) P(B)
–If A is independent on B, B is independent on A8
Conditional Probability
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 9
Prob. Density Functions (PDF)
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics
• Average = µ • Variance = σ2 • Widely used
because of thecentral limit theorem
10
Gaussian (Normal) Distribution
x5− 4− 3− 2− 1− 0 1 2 3 4 5
PDF(
x)
0
0.2
0.4
0.6
0.8
1
1.2
=0.3σ=0 µ
=1σ=0 µ
=3σ=0 µ
=1σ=-2 µ
Gaussian PDF
TMath::Gaus(x, μ, σ,true) ROOT::Math::normal_pdf( x, σ, μ )TF1 f(“f”,”gausn”,xmin,xmax);x = gRandom->Gaus(μ, σ);
N.B. “gausn” for a normalised (PDF) Gaussian
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics
• Sum of n random variables xn converges to a Gaussian, irrespective of the original distributions of the variables xn (only some basic regularity conditions must hold) –∑xn → Gaussian –Example adding n flat distributions
11
Central limit theorem
/ ndf 2χ 87.47 / 83
Constant 3.7± 306.4
Mean 0.013± 5.011
Sigma 0.009± 1.293
0 1 2 3 4 5 6 7 8 9 100
50
100
150
200
250
300 / ndf 2χ 87.47 / 83
Constant 3.7± 306.4
Mean 0.013± 5.011
Sigma 0.009± 1.293
<x> for n = 5 (x is uniform in [0,10])
/ ndf = 422.9 / 972χ
Constant 2.3± 190.8
Mean 0.022± 4.989
Sigma 0.015± 2.031
0 1 2 3 4 5 6 7 8 9 100
20
40
60
80
100
120
140
160
180
200
220 / ndf = 422.9 / 972χ
Constant 2.3± 190.8
Mean 0.022± 4.989
Sigma 0.015± 2.031
<x> for n = 2 (x is uniform in [0,10])
n = 2 n = 5
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics
• Standard Deviation
• Model for position of rain drops, time of cosmic ray passage, etc..
• Basic distribution for pseudo-random number generation
12
Uniform (“flat”) distribution
ROOT::Math::uniform_pdf( x, a, b)x = gRandom->Uniform(a, b);
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics
• Given a PDF f(x) the cumulative is defined as
• The PDF for F is uniform distributed in [0,1]
• Inverting the cumulative distribution one can generate pseudo-random numbers according to any distribution
13
Cumulative Distribution
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics
x5− 4− 3− 2− 1− 0 1 2 3 4 500.050.1
0.150.20.250.30.350.4
normal_pdf
x5− 4− 3− 2− 1− 0 1 2 3 4 5
p
0
0.2
0.4
0.6
0.8
1
normal_cdfnormal_cdf_c
p0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
3−
2−
1−
0
1
2
3 normal_quantilenormal_quantile_c
• Probability density function – ROOT::Math::normal_pdf(x,σ,μ)
• Cumulative distribution and its complement (right tail integral) – ROOT::Math::normal_cdf(x,σ,μ) – ROOT::Math::normal_cdf_c(x,σ,μ)
• Inverse of the cumulative distributions (quantile distributions) – ROOT::Math::normal_quantile(p,σ)– ROOT::Math::normal_quantile_c(p,σ)
14
Example of Cumulative Distributions
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics
• Probability to have n entries in x a subset of X >> x
• Limit of binomial distribution when p = x/X = 𝜈/N << 1 –P(n | 𝜈, N) for N → ∞ is a Poisson( n | 𝜈)
–Limit of Poisson for large 𝜈 is a Gaussian
15
Poisson Distribution
ROOT::Math::poisson_pdf(n,𝝂)
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics
• Poisson becomes a Gaussian for large 𝜈
16
Poisson limit for large 𝜈
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics
• Add an asymmetric power-law tail to a Gaussian PDF with proper normalisation and continuity of PDF and its derivative
17
Crystal Ball Function
ROOT::Math::crystalball_pdf(x,α,n,σ,μ)TF1 f(“f”,”crystalballn”,xmin,xmax)
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics
• Model the fluctuations in the energy loss of particles in this layers
18
Landau Distribution
ROOT::Math::landau_pdf(x,s,m)TF1 f(“f”,”landaun”,xmin,xmax)
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics 19
Bayes Theorem
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics
• A person received a diagnosis of a serious illness • The probability to detect positively a ill person is
~100% • The probability to give a positive result on a healthy
person is 0.2%
• What is the probability that the person is really ill? • Is 99.8% a reasonable answer ?
20
A concrete example
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics
• We know: –P(+ | ill) ~ 100 % → P(- | ill) << 1 –P(+ | healthy) = 0.2 % → P(- | healthy) = 99.8
• Using Bayes theorem we want to know –P(ill | +) = P( + | ill) P(ill)/P(+) ~ P(ill)/P(+)
• We need to know –P(ill) = probability that a random person is ill << 1 –P(healthy) = 1-P(ill)
• We have also – P(+) = P(+ | ill) P(ill) + P(+|healhy)P(healty)
~ P(ill) + P(+| healthy) • Result: P(ill | +) ~ P(ill) / (P(ill) + P(+| healthy) )
21
Result using Bayes theorem
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics
• Result: • P(ill | +) ~ P(ill) / (P(ill) + P(+| healthy) )
• Using some numbers • P(ill) = 0.1 % • P(+ | healthy) = 0.2%
• Then we have: • P(ill|+) = .1 /(.1+.2) = 33 %
22
Result from Bayes theorem (2)
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics
• Likelihood function: –given some observed events: x1,… xn –Likelihood function is the PDF of the variables x1,… xn –L (x1,… xn | 𝛳1,…𝛳n )
• Bayes theorem can be written as
23
Likelihood Function
likelihood function prior probability
normalisation term
likelihood prior
normalisation term
posterior
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics 24
Repeated use of Bayes theorem
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics
• Posterior summarises all information on the unknown parameters θ given the data
• From the posterior one can estimate best parameter values and probability intervals (credible intervals)
• Result depends on the prior distribution
25
Bayesian Inference
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics
• Perform analytical integration –feasible in very few simple cases
• Use numerical integration –May be CPU intensive –difficult for large multi-dimensional cases
• Markov Chain Monte Carlo • sample parameter space efficiently using a random walk
heading to the regions of higher probability • Metropolis algorithm to sample according to a PDF f(x)
26
How to compute the Posterior PDF
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 27
Markov-Chain Monte Carlo
Available in ROOT in the RooStats package
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics
• Bayesian probability is subjective –depends on prior probabilities or degrees of belief about
the unknown parameters • Problem on how to represent lack of knowledge
–e.g. uniform distribution is not invariant under coordinate transformations • uniform in log𝛳 is scale-invariant
– Jeffreys prior: prior invariant under parameter transformation
• Recommend a study of the sensitivity of the result on the chosen prior PDF
28
Problem with Bayesian approach
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 29
Frequentist vs Bayesian Inference
Dr Lorenzo Moneta CERN PH-SFT
CH-1211 Geneva 23 sftweb.cern.ch
root.cern.ch
30
Parameter Estimation
• Parameter estimate • Likelihood function • Maximum Likelihood method • Property of estimators
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 31
Statistical Inference
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 32
Parameter estimators
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 33
Likelihood Function
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 34
Maximum Likelihood Estimates
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 35
Gaussian approximation
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics
• Consistency • Bias • Efficiency • Robustness
36
Estimator properties
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 37
Estimator consistency
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics 38
Bias
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 39
Efficiency
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 40
Robustness
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 41
Parameter uncertainties with ML
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 42
Error Determination
Dr Lorenzo Moneta CERN PH-SFT
CH-1211 Geneva 23 sftweb.cern.ch
root.cern.ch
43
Hypothesis Testing
• Definition of hypothesis testing • Neyman-Pearson lemma and
Likelihood ratio
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 44
Hypothesis Tests
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics
• H0 : null hypothesis –the hypothesis we want to prove that is false –e.g. the data contains only background (no Higgs signal)
• H1 : alternate hypothesis –e.g. the data contains signal (Higgs) and background
• α : significance level: probability to reject H1 if true (error of first kind) –α = 1 - selection efficiency
• 𝛽 : probability to reject H0 if true (error of second kind) –power (probability to reject H0 if H1 is true) = 1 - 𝛽 –𝛽= misidentification probability
45
Hypothesis Test
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics 46
Example: Cut analysis
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 47
Likelihood Ratio
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 48
Newman Pearson Lemma
Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics
• We will look next lectures on how to – how to use multivariate (machine learning) methods to do
classification and more – estimate the parameter uncertainty (errors) in maximum
likelihood fits – estimate confidence intervals – use hypothesis tests for estimate the discovery
significance of new particles
• We will complement this with examples in TMVA, RooFit and RooStats
49
Summary