Sept. 12-15, 2005M. Block, Phystat 05, Oxford PHYSTAT 05 - Oxford 12th - 15th September 2005...

Sept. 12-15, 2005 M. Block, Phystat 05, Oxford

PHYSTAT 05 - Oxford 12th - 15th September 2005

Statistical problems in Particle Physics, Astrophysics and

Cosmology

“Sifting data in the real world”

Martin BlockNorthwestern University


“Sifting Data in the Real World”,

M. Block, arXiv:physics/0506010 (2005).

“Fishing” for Data


Generalization of the Maximum Likelihood Function


Hence,minimize i (z), or equivalently, we minimize 2 i 2i


Problem with Gaussian Fit when there are Outliers


Robust Feature:

w(z) 1/i2 for large i

2


Lorentzian Fit used in “Sieve” Algorithm


Why choose normalization constant =0.179 in Lorentzian 02?

Computer simulations show that the choice of =0.179 tunes the Lorentzian so that minimizing 0

2, using data that are gaussianly distributed, gives the same central values and approximately the same errors for parameters obtained by minimizing these data using a conventional 2 fit.

If there are no outliers, it gives the same answers as a 2 fit.

Hence, using the tuned Lorentzian 02 , much like using the

Hippocratic oath, does “no harm”.


“Sieve’’ Algorithm: SUMMARY


All cross section data for Ecms > 6 GeV,

pp and pbar p, from Particle Data Group


All data (Real/Imaginary of forward scattering amplitude), for Ecms > 6 GeV,

pp and pbar p, from Particle Data Group


We use real analytical amplitudes that saturate the Froissart bound with the term ln2(/m), where is the laboratory energy and m is the proton (pion) mass. We simultaneously fit the cross section and (the ratio of the real to the imaginary portion of the forward scattering amplitude), where:

Fitting the “Sieved” pp and p data with analytic amplitudes


Only 3 Free Parameters

However, only 2, c1 and c2, are needed in cross section fits !


Cross section model fits for Ecms > 6 GeV, anchored at 4 GeV,

pp and pbar p, after applying “Sieve” algorithm to Real World data


-value fits for Ecms > 6 GeV, anchored at 4 GeV,

pp and pbar p, after applying “Sieve” algorithm


What the “Sieve” algorithm accomplished for the pp and pbar p data

Before imposing the “Sieve algorithm:

2/d.f.=5.7 for 209 degrees of freedom;

Total 2=1182.3.

After imposing the “Sieve” algorithm:

Renormalized 2/d.f.=1.09 for 184 degrees of freedom, for 2i > 6 cut;

Total 2=201.4.

Probability of fit ~0.2.

The 25 rejected points contributed 981 to the total 2 , an average 2i

of ~39 per point.

Similar results were found when fitting +p and -p data from the Particle Data Group (not shown due to lack of time!)


Cross section and -value predictions for pp and pbar-p

The errors are due to the statistical uncertainties in the fitted parameters

LHC prediction

Cosmic Ray Prediction


100 data points, gaussianly distributed on the straight line y=1-2x; 20 noise points, randomly distributed, with 2

i>6.

After 2i>6 cut:

Best fit is y=0.998-2.014x; R2

min/=1.01; fit to all data has 2

min/=4.8


100 data points, gaussianly distributed about the constant y=10; 40 noise points, randomly distributed, with 2

i>4.

After 2i>4 cut:

Best fit is y=9.98R2min/=1.09; fit to all

data has 2

min/=4.39.


Lessons learned from computer studies of a straight line and a constant model

where is the parameter error found in the 2 fit


2renorm = 2

obs/R-1 renorm = r2 obs,

where is the parameter error


100 data points, gaussianly distributed about the parabola y=1+2x +0.5x2; 35 noise points, randomly distributed about nearby parabola y=12+2x+0.2x2; We have 13 “inliers”.

After 2i>6 cut: 113

points are kept; Best fit is y=1.23+2.04x+0.48x2

BONUS: Seems to also work reasonably well in separating two similar distributions!

What happens when we try to separate two similar distributions?

Sept. 12-15, 2005M. Block, Phystat 05, Oxford PHYSTAT 05 - Oxford 12th - 15th September 2005...

Documents

Transcript of Sept. 12-15, 2005M. Block, Phystat 05, Oxford PHYSTAT 05 - Oxford 12th - 15th September 2005...