Maria Grazia Pia, INFN Genova Statistical Toolkit Power of Goodness-of-Fit tests M.G. Pia 1 B....
-
Upload
hayden-connolly -
Category
Documents
-
view
214 -
download
1
Transcript of Maria Grazia Pia, INFN Genova Statistical Toolkit Power of Goodness-of-Fit tests M.G. Pia 1 B....
Maria Grazia Pia, INFN Genova
Statistical ToolkitPower of Goodness-of-Fit tests
B. Mascialino1, A. Pfeiffer2, M.G. PiaM.G. Pia11, A. Ribon2, P. Viarengo3
1INFN Genova, Italy 2CERN, Geneva, Switzerland
3IST – National Institute for Cancer Research, Genova, Italy
CHEP 2006Mumbai, 13-17 February 2006
Fluorescence spectrum of Icelandic Basalt8.3 keV beam
Counts
Energy (keV)
Maria Grazia Pia, INFN Genova
Some use casesRegression testing
– Throughout the software life-cycle
Online DAQ– Monitoring detector behaviour w.r.t. a reference
Simulation validation– Comparison with experimental data
Reconstruction– Comparison of reconstructed vs. expected distributions
Physics analysis– Comparisons of experimental distributions (ATLAS vs. CMS Higgs?)– Comparison with theoretical distributions (data vs. Standard Model)
The test statistics computation concerns the agreement
between the two samples’ empirical distribution functions
Historical background…Validation of Geant4 physics models through comparison of
simulation vs. experimental data or reference databases
Maria Grazia Pia, INFN Genova
G.A.P Cirrone, S. Donadio, S. Guatelli, A. Mantero, B. Mascialino, S. Parlati, M.G. Pia, A. Pfeiffer, A. Ribon, P. Viarengo
“A Goodness-of-Fit Statistical Toolkit”IEEE- Transactions on Nuclear Science (2004), 51 (5): 2056-2063.
Releases are publicly downloadable from the web code, documentation etc.
Releases are also distributed with LCG Mathematical Libraries
Also ported to Java, distributed with JAS
Maria Grazia Pia, INFN Genova
Vision of the project
Rigorous software processsoftware process
Basic visionvision– General purpose tool
– Toolkit approach (choice open to users)
– Open source product
– Independent from specific analysis tools
– Easily usable in analysis and other tools
Build on a solid architecturearchitecture
Clearly define scopescope,
objectivesobjectives
Flexible, Flexible, extensible, extensible,
maintainablemaintainable system
Software quality quality
Maria Grazia Pia, INFN Genova
Maria Grazia Pia, INFN Genova
GoF algorithms (latest public release)
Algorithms for binned distributionsAlgorithms for binned distributions – Anderson-Darling test– Chi-squared test – Fisz-Cramer-von Mises test–
Algorithms for unbinned distributionsAlgorithms for unbinned distributions – Anderson-Darling test– Cramer-von Mises test– Goodman test (Kolmogorov-Smirnov test in chi-squared
approximation)
– Kolmogorov-Smirnov test– Kuiper test
Maria Grazia Pia, INFN Genova
Recent extensions: algorithms
It is the most complete software for the comparison of two distributions, even among commercial/professional statistics tools
– goal: provide all 2-sample GoF algorithms existing in statistics literature
Publication in preparation to describe the new algorithmsSoftware release: March 2006
Fisz-Cramer-von Mises test and Anderson-Darling test exact asymptotic distribution (earlier: critical values)
Tiku test Cramer-von Mises test in a chi-squared approximation
Improved tests
New tests Weighted Kolmogorov-Smirnov, weighted Cramer-von Misesvarious weighting functions available in literatureWatson test can be applied in case of cyclic observations, like the Kuiper test
In preparation Girone test
Maria Grazia Pia, INFN Genova
Simple user layerSimple user layer– Shields the user from the complexity of the underlying algorithms and design– Only deal with the user’s analysis objectsanalysis objects and choice of comparison algorithmcomparison algorithm
User Layer
First release: user layer for AIDA analysis objects– LCG Architecture Blueprint, Geant4 requirement
July 2005: added user layer for ROOT histograms– in response to user requirements
Other user layer implementations foreseen– easy to add– sound architecture decouples the mathematical component and the user’s representation of
analysis objects
Maria Grazia Pia, INFN Genova
Power of GoF testsDo weDo we really need really need such a wide such a wide collection of GoF tests? collection of GoF tests? Why?Why?
Which is the most appropriatemost appropriate test to compare two distributions?
How “goodgood” is a test at recognizing real equivalent distributions and rejecting fake ones?
Which test to
use?
No comprehensive study of the relative power of GoF tests exists in literature
– novel research in statisticsnovel research in statistics (not only in physics data analysis!)
SystematicSystematic study of allall existing GoF tests in progress
– made possible by the extensive collection of tests in the Statistical Toolkit
Maria Grazia Pia, INFN Genova
Method for the evaluation of power
N=1000Monte Carlo
replicas
Pseudoexperiment: a random drawing
of two samples from two parent distributions
For each test, the p-value computed by the GoF Toolkit derives from the analytical calculation of the asymptotic distribution, often depending on the samples sizes
Confidence Level = 0.05
Parent distribution 1
Sample 1n
Sample 2m
GoF testGoF test
Parent distribution 2
PowerPower = = # pseudoexperiments with p-value < (1-CL)
# pseudoexperiments
Maria Grazia Pia, INFN Genova
Parent distributions
1)(1 xf
Uniform)
2(
2
2
2
1)(
x
exf
Gaussian
||3
2
1)( xexf
Double exponential
24
1
11)(
xxf
Cauchy
xexf )(5
Exponential
Contaminated Normal Distribution 2
)1,1(5.0)4,1(5.0)(7 xf
)9,0(1.0)1,0(9.0)(6 xf
Contaminated Normal Distribution 1Also Breit-Wigner, other distributions being considered
Maria Grazia Pia, INFN Genova
Characterization of distributions
025.05.0
5.0975.0
xx
xxS
125.0875.0
025.0975.0
xx
xxT
Parent distributionParent distribution SS TTf1(x) Uniform 1 1.267
f2(x) Gaussian 1 1.704
f3(x) Double exponential 1 2.161
f4(x) Cauchy 1 5.263
f5(x) Exponential 4.486 1.883
f6(x) Contamined normal 1 1 1.991
f7(x) Contamined normal 2 1.769 1.693
SkewnessSkewness TailweightTailweight
Maria Grazia Pia, INFN Genova
General alternativeCompare different distributions
Parent1 ≠ Parent2
Unbinned distributions
Maria Grazia Pia, INFN Genova
The power increases as a function of the sample size
FLAT vs
EXPONENTIAL
Sample size
Em
piric
al p
ower
(%
)
Symmetricvs
skewed
Short tailedvs
Medium tailed
Sample size
Em
piric
al p
ower
(%
)
Symmetricvs
SkewedMedium tailed
vsMedium tailed
ADAD KK WW
KSCvMADKW
DOUBLE EXPONENTIALvs
CN1
Sample size
Em
piric
al p
ower
(%
)Medium tailed
vsMedium tailed
Symmetricvs
Symmetric
DOUBLE EXPONENTIAL vs
EXPONENTIAL
Sample size
Em
piric
al p
ower
(%
)
Medium tailedvs
Medium tailed
Symmetricvs
Simmetric
No clear winnerNo clear winner
CN1 vs
CN2
Maria Grazia Pia, INFN Genova
GAUSSIANvs
DOUBLE EXPONENTIAL
Samples size
Em
piric
al p
ower
(%
)
Very similar
distributions
Simmetricvs
SimmetricMedium tailed
vsLong tailed
The power increases as a function of the sample size
Sample size
Em
piric
al p
ower
(%
)
CAUCHYvs
EXPONENTIAL
Long tailedvs
Medium tailed
Symmetricvs
Asymmetric
ADAD
GAUSSIANvs
DOUBLE EXPONENTIAL
Sample size
Em
piric
al p
ower
(%
)
Symmetricvs
SymmetricMedium tailed
vsLong tailed
CvMCvM
GAUSSIAN vs
CN2
Sample size
Em
piric
al p
ower
(%
)Symmetric
vsskewed
Medium tailedvs
Medium tailed
CvMCvMADAD
KSCvMADKW
GAUSSIANvs
CN1
Sample size
Em
piric
al p
ower
(%
)
Symmetricvs
Symmetric
Medium tailedvs
Medium tailed
CvMCvM KSKS
Maria Grazia Pia, INFN Genova
The power varies as a function of the parent distributions’ characteristics
Samples size = 5
EXPONENTIALvs
OTHER DISTRIBUTIONS
Tailweight 2ND distribution
Em
piric
al p
ower
(%
) Samples size = 15
Tailweight 2ND distribution
Em
piric
al p
ower
(%
)
EXPONENTIALvs
OTHER DISTRIBUTIONS
Em
piric
al p
ower
(%
)
Tailweight 2ND distribution
Em
piric
al p
ower
(%
)
Tailweight 2ND distribution
Samples size = 5 Samples size = 15
FLATvs
OTHER DISTRIBUTIONS
FLATvs
OTHER DISTRIBUTIONS
KSCvMADKW
Sample size = 5
EXPONENTIALvs
OTHER DISTRIBUTIONS
Tailweight 2ND distribution
Em
piric
al p
ower
(%
) Sample size = 15
Tailweight 2ND distribution
Em
piric
al p
ower
(%
)
EXPONENTIALvs
OTHER DISTRIBUTIONS
Em
piric
al p
ower
(%
)
Tailweight 2ND distribution
Em
piric
al p
ower
(%
)
Tailweight 2ND distribution
Distribution1asymmetric
Distribution1symmetric
FLATvs
OTHER DISTRIBUTIONS
FLATvs
OTHER DISTRIBUTIONS
Maria Grazia Pia, INFN Genova
The power varies as a function of parent distributions’ characteristics
Sample size = 5
EXPONENTIALvs
OTHER DISTRIBUTIONS
Tailweight 2ND distribution
Em
piric
al p
ower
(%
) Sample size = 15
Tailweight 2ND distribution
Em
piric
al p
ower
(%
)
EXPONENTIALvs
OTHER DISTRIBUTIONS
Em
piric
al p
ower
(%
)
Tailweight 2ND distribution
Em
piric
al p
ower
(%
)
Tailweight 2ND distribution
Sample size = 5 Sample size = 15
FLATvs
OTHER DISTRIBUTIONS
FLATvs
OTHER DISTRIBUTIONS
KSCvMADKW
Distribution1asymmetric
Distribution1symmetric
Maria Grazia Pia, INFN Genova
Comparative evaluation of tests
ShortShort
(T(T<1.5)<1.5)
MediumMedium
(1.5 < T < 2)(1.5 < T < 2)
LongLong
(T>2)(T>2)
SS~~11 KS KS – CVM CVM - AD
SS>1.5>1.5 KS - AD CVM-AD CVM - ADSke
wne
ssS
kew
ness
TailweightTailweightPreliminary
Maria Grazia Pia, INFN Genova
Location-scale alternativeSame distribution, shifted or
scaled
Parent1(x) = Parent2 ((x-θ)/τ)
Maria Grazia Pia, INFN Genova
Power increases as a function of sample size
KSCvMADKW
Em
piric
al p
ower
(%
)
Sample size
EXPONENTIAL
θ =0.5, τ = 0.5
Em
piric
al p
ower
(%
)
Sample size
θ =0.5, τ = 1.5
EXPONENTIAL
KK WW
CvMCvM
KSKS
Em
piric
al p
ower
(%
)
Sample size
EXPONENTIAL
θ =1.0, τ = 0.5
No clear No clear winnerwinner
Sample size
Em
piric
al p
ower
(%
)
EXPONENTIAL
θ =1.0, τ = 1.5
CvMCvM
KSKS
Maria Grazia Pia, INFN Genova
Power increases as a function of sample size
Sample size
Em
piric
al p
ower
(%
)
Em
piric
al p
ower
(%
)
Em
piric
al p
ower
(%
)
Em
piric
al p
ower
(%
)
Sample size
Sample size Sample size
KSCvMADKW
θ =0.5, τ = 0.5
θ =1.0, τ = 0.5
θ =0.5, τ = 1.5
θ =1.0, τ = 1.5
CN2
CN2
CN2
CN2
No clear winnerNo clear winner
WW KK
Maria Grazia Pia, INFN Genova
Power decreases as a function of tailweight
No clear No clear winnerwinner
KSCvMADKW
Tailweight
Em
piric
al p
ower
(%
)
θ =1.0, τ = 0.5
N=10
TailweightEm
piric
al p
ower
(%
)
θ =1.0, τ = 1.5
N=10
TailweightEm
piric
al p
ower
(%
) θ =0.5, τ = 0.5
N=10
Em
piric
al p
ower
(%
)
Tailweight
θ =0.5, τ = 1.5
N=10
No clear No clear winnerwinner
Maria Grazia Pia, INFN Genova
Binned distributions
General alternative Compare different distributions
Parent1 ≠ Parent2
Maria Grazia Pia, INFN Genova
Chi-squared test: POWER %
Norm DoubleExp Cauchy CN1
DoubleExp 20
Cauchy 63 35
CN1 21 31 16
CN2 100 99 55 86
Samples size = 500Number of bins = 20
Maria Grazia Pia, INFN Genova
Preliminary results
No clear winner clear winner for all the considered distributions in general– the performance of a test depends on its intrinsic features as well as on
the features of the distributions to be compared
Practical recommendations
1) first classify the type of the distributions in terms of skewness and tailweight
2) choose the most appropriate test given the type of distributions
Systematic study of the power in progress– for both binned and unbinned distributions
Topic still subject to research activity in the domain of statistics
Publication in preparation
Maria Grazia Pia, INFN Genova
Surprise…
KSCvMADKW
KSCvMADKW
KSCvMADKW
FlatGaussian
Exponential
Inefficiency
Inefficiency
Inefficiency
General alternative, same distributions
CL = 95%, expect 5% inefficiency
Anderson-DarlingAnderson-Darling exhibits an unexpected inefficiencyinefficiency at low numerosity
Not documented anywhere in literature!
Limitation of applicability?
Maria Grazia Pia, INFN Genova
Outlook
1-sample GoF tests (comparison w.r.t. a function)
Comparison of two/multi-dimensional distributions
Systematic study of the power of GoF tests
Goal to provide an extensive set of algorithms so far published in statistics literature, with a critical evaluation of their relative strengths and applicability
Treatment of errors, filtering
New release coming soon
New papers in preparation
Other components beyond GoF? Suggestions are welcome…
Maria Grazia Pia, INFN Genova
ConclusionsA novel, complete software toolkit for statistical analysis is being developed
– rich set of algorithms– sound architectural design– rigorous software process
A systematic study of the power of GoF tests is in progress– unexplored area of research
Application in various domains– Geant4, HEP, space science, medicine…
Feedback and suggestions are very much appreciated
The project is open to developers interested in statistical methods
Maria Grazia Pia, INFN Genova
IEEE Transactions on Nuclear ScienceIEEE Transactions on Nuclear Sciencehttp://ieeexplore.ieee.org/xpl/RecentIssue.jsp?puNumber=23
Prime journal on technology in particle/nuclear physics
Review process reorganized about one year ago Associate Editor dedicated to computing papers
Various papers associated to CHEP 2004 published on IEEE TNS
Papers associated to CHEP 2006 are welcomePapers associated to CHEP 2006 are welcome
Manuscript submission: http://tns-ieee.manuscriptcentral.com/Papers submitted for publication will be subject to the regular review process
Publications on refereed journals are beneficial not only to authors, but to the whole community of computing-oriented physicists
Our “hardware colleagues” have better established publication habits…
Further info: [email protected]