Post on 16-Oct-2021
1
ASTR633AstrophysicalTechniques
STATISTICSPRIMER(originalnotesbyPatHenry,editedbyMikeLiuandJonathanWilliams)
Astronomerscannotavoidstatistics:
1. Wealwaysdealwithprobabilities:observingtimeislimited,sowewanttoobservejustlongenoughthatthereisahighprobabilitywehaveseenwhatwesought.Samplesizesareinevitablyfinite.Whatistheprobabilitythataparticularinterestingeffectisreal?
2. Noexperimentallydeterminedquantityisofuseunlessithasanerrorassociatedwithit.Weneedstatisticstocalculateerrors.
Herearesomeofthemostcommonsituationswhereastronomersusestatistics:
1. Detectionofasignal:isthegammarayburstvisibleintheoptical?HaveIdetectedanemissionline?
2. Aretwoquantitiescorrelated?Howsignificantly?3. Estimatetheparametersofamodel.Whataretheerrorsontheparameters?
Wasthemodelreasonableinthefirstplace?4. Comparisonofsampleswith
(a)thepredictionsofamodel(dotheyagree?)(b)eachother(aretheyfromthesamepopulation?)
Itgenerallycomesdowntocommonsense.
1. Ifitdoesn’tlookright,itprobablyisn’t.2. Therearelotsofwaystoscrewup,butonlyonewaytoberight.3. Mostresultsarenotrevolutionary.Beforeyoustartdraftingapressrelease,
makesureyouhaven'tmadeamistake...1.SampleandParentPopulationSupposewemakeNmeasurements,xi,ofaquantityx.(e.g.,multiplemeasurementsofastellarmagnitude,thedeclinationsofNstarsinthegalacticplane,etc.)TheseNmeasurementsarecalledasample.TheparentpopulationisahypotheticalinfinitesetofmeasurementsofwhichouroriginalNisassumedtobearandomsubset.Theparentpopulationisthe“truth”,whichwecanneverobtain.Thefundamentaltaskofstatisticsistoinferthepropertiesoftheparentpopulationfromthesample.(Note:thisistheso-called“frequentist”interpretationofstatistics.We’lldiscussthealternativeBayesianpointofviewlater.)
2
1.1ProbabilityDensityFunctionTheparentpopulationleadstotheprobabilitydensityfunction,p(x),wherep(x)dxistheprobabilitythatanxiwillbeintherange[x,x+dx]
𝑝(𝑥)𝑑𝑥 = 𝐴(𝑥)𝐴)*)
Clearly,thetotalareaisunityprobability,i.e.∫ 𝑝(𝑥)∞
,- 𝑑𝑥 = 1Probabilitythata<x<b= ∫ 𝑝(𝑥)/
0 𝑑𝑥Probabilityispositive-definite:p(x)>=0
Allquantitiesofinterestareobtainedfromintegralsofp(x).Ifxisadiscretevariable,thentheintegralsturnintosums.1.2PropertiesoftheParentPopulation
Location Mean:
𝜇 ≡ 3 𝑥𝑝(𝑥)𝑑𝑥∞
,∞
= 1𝑁5𝑥6
7
689
Median
𝜇9/;,suchthat12 = 3 𝑝(𝑥)𝑑𝑥
DE/F
,∞
= 3 𝑝(𝑥)𝑑𝑥∞
DE/F
5 𝑝(𝑥6)
DE/F
689
=5 𝑝(𝑥6)7
DE/F
Mode
𝜇G0H,suchthat𝑃(𝜇G0H) ≥ 𝑃(𝑥 ≠ 𝜇G0H)
3
ForasymmetricalPDF,these3areusuallyequal.ForanasymmetricalPDF,themedianismorestablethanthemean.Asinglebadpointcanbiasthemeanbyalargeamountbuthardlychangethemedian.(Thisisanexampleofa“robust”statistic.)
However,themedianisslightlynoisier(or“lessefficient”)thanthemean,withavariancethatisp/2=1.57xlarger(forlargeN).
Width Variance
𝜎; ≡ 3(𝑥 − 𝜇);𝑝(𝑥)𝑑𝑥∞
,∞
= N 3 𝑥;𝑝(𝑥)𝑑𝑥∞
,∞
O −𝜇;
≡ 1𝑁5
(𝑥6 − 𝜇);7
689
= 1𝑁5𝑥6;
7
689
−𝜇;
Standarddeviation𝜎 ≡ (variance)9/;
Moments
𝜇U ≡ 3(𝑥 − 𝜇)U𝑝(𝑥)𝑑𝑥∞
,∞
=1𝑁5
(𝑥6 − 𝜇)U7
689
μ0=1μ1=0μ2=σ2skewness≡ 𝛽9 ≡
DWDFW/F
deviationfromsymmetry=0forsymmetric,>0fortailextendingtopositive,andviceversa.
kurtosis≡ 𝛽; ≡
DXDFF− 3
degreeof"peakiness",wherethe-3makesthekurtosisofaGaussian=0
4
1.3EstimatingPropertiesoftheParentPopulationfromaSampleSampleMean
�̅� ≡ 1𝑁[5𝑥6
7\
689
• Is�̅�agoodestimatorofμ?Whatisitsaveragevalue?
< �̅� >= 3 �̅�-
,-𝑝(𝑥)𝑑𝑥 = 3 _
1𝑁5𝑥6
7
689
`-
,-𝑝(𝑥)𝑑𝑥 =
1𝑁53 𝑥6𝑝(𝑥)𝑑𝑥
-
,-=
1𝑁5𝜇 = 𝜇
7
689
7
689
Thus�̅�isanunbiasedestimatorofμ.Onaverage,itgetstherightanswer.Whatisthevarianceof�̅�?
𝑉𝑎𝑟(𝑥) = 𝑉𝑎𝑟 N1𝑁[5𝑥6
7\
689
O = 1𝑁[;
5𝑉𝑎𝑟(𝑥6)7\
689
= 1𝑁[;
5𝜎;7\
689
= 𝜎;
𝑁[
Knownasthe“standarddeviationofthemean”orthe“standarderror”.SampleVariance
𝑠; ≡ 1
𝑁 − 15(𝑥6 − �̅�); = 1
𝑁 − 15𝑥6;7
689
7
689
−𝑁
𝑁 − 1 �̅�;
Notethe1/(N-1)insteadof1/N.Thisisbecausetheexpectedvalueofs2isσ2andisthereforeanunbiasedestimator(leftasanexerciseforthereader...)
5
2.ErrorsNomeasurementofxiisinfinitelyprecise.Ithasanerrorassociatedwithit.Whataretheerrorsonvariouscomputedquantities?2.1AnalyticalconsiderationsConsidersomefunctionofuandv=f(u,v).Thepropagationoferrorscanbedeterminedasfollows: 𝑓(𝑢, 𝑣) = 𝑓(0,0) + 𝑢 jk
jl+ 𝑣 jk
jm+ ℎ𝑖𝑔ℎ𝑒𝑟𝑜𝑟𝑑𝑒𝑟𝑡𝑒𝑟𝑚𝑠
𝑓 −𝑓̅ = (𝑢 − 𝑢u) jk
jl+(𝑣 − �̅�) jk
jm+ ℎ𝑖𝑔ℎ𝑒𝑟𝑜𝑟𝑑𝑒𝑟𝑡𝑒𝑟𝑚𝑠,𝑤ℎ𝑒𝑟𝑒𝑓̅ = 𝑓(𝑢u, �̅�)
𝜎k; = lim7→-
97∑ (𝑓 − 𝑓̅); ≈ lim
7→-
97∑ |(𝑢6 − 𝑢u)
jkjl+(𝑣6 − �̅�)
jkjm};
𝜎k; = lim7→-
97∑ ~(𝑢6 − 𝑢u); �
jkjl�;+(𝑣6 − �̅�); �
jkjm�;+ 2 �jk
jl� �jk
jm� (𝑢6 − 𝑢u)(𝑣6 − �̅�)�
𝜎k; = �𝜕𝑓𝜕𝑢�
;
𝜎l; +�𝜕𝑓𝜕𝑣�
;
𝜎m; + 2 �𝜕𝑓𝜕𝑢� �
𝜕𝑓𝜕𝑣�𝜎lm
wherewehavedefinedthecovariance
𝜎lm = lim7→-
1𝑁5(𝑢6 − 𝑢u)(𝑣6 − �̅�)
Note:-Expansiontofirstorderonly,soonlytruefor“small”errors(e.g.σu/u~σv/v~10%),i.e.intheregimeof1storderTaylorseries.-Equationsaresimilarifmorethan2variablesareinvolved.-Firsttwotermsdominatesincepositivedefinite,whilethe3rdterm(covariance)canhavesomecancellation,asitcanbenegative.Itiszeroforuncorrelatedu&v(whichisoftenthecase).SeeProblemSet#2forexamples
6
2.2Monte-CarloerrorpropagationEmpiricallydetermineerrorsbycreatingfakedatasets.Dothisifyouwishtoavoidmakinganyassumptionsabouttheunderlyingdistribution.
a) Iferrorsarewellcharacterized,jiggleeachdatapointusingGaussianrandomnumbers
b) Bootstrapanewsamplebypickingatrandomwithreplacement
Fitthemodeltothefakedataset.Dothismanytimesandmakeahistogramofthebest-fitparameters.Computethemean(ormedian)andstandarddeviationoftheparameters.Canreadilyextendtoanyfunctionoftheparameters.3.CommonlyusedProbabilityDensityFunctions(PDF)3.1UniformDistribution
𝑝(𝑥; 𝑎, 𝑏) = �1
𝑏 − 𝑎 , 𝑎 ≤ 𝑥 ≤ 𝑏
0, 𝑥 < 𝑎, 𝑥 > 𝑏
𝜇 = /�0
;𝜎; = (/,0)F
9;
Thissimpledistributionisusedasatoolinstudiesofgeneralcontinuousdistributionandisparticularlyvaluableinnon-parametricstatistics,e.g.,generatingrandomvaluesfromaspecificPDFasexplainedinthefollowing:Foranygivenfunction,𝑦 = 𝐹(𝑥),thePDFsofxandyarerelatedby
|𝑝(𝑥)𝑑𝑥| = |𝑝(𝑦)𝑑𝑦|fundamentaltransformationlawofprobability
7
Nowconsiderthespecificcase:
𝑝(𝑥) = �1, 0 ≤ 𝑥 ≤ 10, 𝑥 < 0, 𝑥 > 1
then(for0<x<1),
𝑥 = 3 𝑝(𝑥′)𝑑𝑥′ = 3 𝑝(𝑦′)𝑑𝑦′�(H)
�(�)
H
�
Thisstatesthatthecumulativedistributionofp(y)isuniformlydistributed.Theutilityisbestshownbyexample(seeproblemset):ConsiderastellarIMF,𝜉(𝑀)~𝑀,;.��between1and100Msun.CalculatethecumulativePDF,
𝐶𝐷𝐹(𝑀) = ∫ 𝜉(𝑀)𝑑𝑀�9
∫ 𝜉(𝑀)𝑑𝑀9��9
= 𝐶(1 − 𝑀,9.��)
where𝐶[= (1 −100,9.��)]isaconstantthatnormalizestheCDFtounityat100Msun.Thishasauniformdistributionsowegeneratearandomsetofuniformlydistributednumbers{x0,x1,x2,…}andinverttoamassdistribution,{𝑀6} = (1 −{𝑥6}/𝐶),9/9.��
8
3.2BinomialDistributionRecallthenumberofdifferentwaysnitemscanbetakenxatatimeis(“nchoosex”):
�𝑛𝑥� ≡
𝑛!𝑥! (𝑛 − 𝑥)!
wheren!=n(n-1)(n-2)…1and0!=1
Consideranobservationwithonly2possibleoutcomes(e.g.redgalaxiesorbluegalaxies;planetdetectionornon-detection).Lettheprobabilityofobtainingoneoutcome(redgalaxy,planetdetection=“success”)bep,andtheprobabilityofobtainingtheotheroutcome(bluegalaxy,planetnon-detection=“failure)beq=1-p.Theprobabilityofobtainingxsuccessesinnobservationsis=(#ofwaystogettoxsuccesses)x(probabilityofonesuchsetofxsuccesses)
𝑓(𝑥; 𝑛, 𝑝, 𝑞) ≡𝑛!
𝑥! (𝑛 − 𝑥)! 𝑝H𝑞U,H
Thisdoesindeedaddupto1,asitshould:
5𝑛!
𝑥! (𝑛 − 𝑥)! 𝑝H𝑞U,H
U
H8�
= (𝑝 + 𝑞)U = 1U = 1
Mean
𝜇 ≡ 3 𝑥𝑓(𝑥)𝑑𝑥-
,-¡⎯⎯£5 ~𝑥
𝑛!𝑥!(𝑛 − 𝑥)!𝑝
H𝑞U,H�7
H8�
𝜇 = 5𝑛!
𝑥!(𝑛 − 𝑥)!~𝑝𝜕𝜕𝑝𝑝
H� 𝑞U,H = 𝑝𝜕𝜕𝑝 �5 ~
𝑛!𝑥!(𝑛 − 𝑥)!𝑝
H𝑞U,H�7
H8�
¤7
H8�
𝜇 = 𝑝𝜕𝜕𝑝
(𝑝 + 𝑞)U= 𝑝𝑛(𝑝 + 𝑞)U,9 = 𝑛𝑝
9
Variance
𝜎; ≡ 3 (𝑥 − 𝜇);𝑓(𝑥)𝑑𝑥-
,-¡⎯⎯£5~𝑥;
𝑛!𝑥!(𝑛 − 𝑥)!𝑝
H𝑞U,H� − 𝜇;7
H8�
¡⎯⎯£�𝑝
𝜕𝜕𝑝� �𝑝
𝜕𝜕𝑝�5~𝑥
𝑛!𝑥!(𝑛 − 𝑥)!𝑝
H𝑞U,H�7
H8�
−𝜇;
¡⎯⎯£�𝑝
𝜕𝜕𝑝�
[𝑝𝑛(𝑝 + 𝑞)U,9] −𝜇;
¡⎯⎯£ 𝑝𝑛 +𝑝;𝑛(𝑛 − 1) −𝑛𝑝; = 𝑛𝑝(1 − 𝑝) = 𝑛𝑝𝑞
E.g.,supposewerolltendice.Whatistheprobabilitythatxdicehavelandwiththe1up?Ifwethrowonedie,theprobabilityoflandingwith1upisp=1/6.Ifwethrow10dice,theprobabilityforxofthemlandingwith1upisgivenbythebinomialdistributionwithn=10andp=1/6:
𝑝 �𝑥; 𝑛 = 10, 𝑝 = 9¥, 𝑞 = �
¥� =
10!𝑥! (10 − 𝑥)!�
16�
H
�56�
9�,H
𝜇 = 𝑛𝑝 = 10 ×1 6© = 1.67𝜎 = «𝑛𝑝𝑞 = 10 ×1 6© ×5 6© = 1.181
3.3PoissonDistributionTheBinomialdistributiongetshardtoevaluateforlargen(becauseofthefactorial)andofteninsuchexperiments,neitherthenumberofpossibleeventsnnortheprobabilitypisknown.Weneedanexpressionthattellsusaboutthestatisticsofhavingdetectedanaveragenumberofeventspertimeinterval(μ=np).
Example:youareusingaGeigercountertomeasuretheemissionsfromablockofradioactivematerial.Youdon’tknowthetotalnumberofatoms(=n,thenumberoftrials)orthedecayprobability(=p),butyoudomeasurethemeancountrateμ.Youwanttoknowtheprobabilitydistributionassociatedwithμ.
10
Lettheaveragecountrateatwhichphotonsarrivebeλpersecond.Let𝑃(𝑥, 𝜆𝑡)betheprobabilityofxphotonsarrivingduringanintervalt.Thentheprobabilityof1photonarrivingindtis
𝑃(1, 𝜆𝑡) = 𝜆𝑑𝑡
forverysmalldt.Theprobabilityof>=2arrivingisnegligiblysmallifdtissmallenough.So:
𝑝(0, 𝜆𝑡) = 1– 𝑃(1, 𝜆𝑡)– 𝑃(2, 𝜆𝑡)– 𝑃(3, 𝜆𝑡) −… .= 1– 𝜆𝑡
Nowconsideranarbitrarynumberofcountsintimeinterval(t+dt),whichcanbewrittenas2terms,basedonwhathappenedasthe“last”event(photonarrivedornophotonarrived):
𝑃°𝑥, 𝜆(𝑡 + 𝑑𝑡)± = 𝑃(𝑥 − 1, 𝜆𝑡)𝑃(1, 𝜆𝑑𝑡) + 𝑃(𝑥, 𝜆𝑡)𝑃(0, 𝜆𝑑𝑡)
𝑃(𝑥, 𝜆𝑡) +𝑑𝑃(𝑥, 𝜆𝑡)
𝑑𝑡 𝑑𝑡 = 𝑃(𝑥 − 1, 𝜆𝑡)𝜆𝑑𝑡 + 𝑃(𝑥, 𝜆𝑡)(1 − 𝜆𝑑𝑡)𝑑𝑃(𝑥, 𝜆𝑡)
𝑑𝑡 = 𝜆𝑃(𝑥 − 1, 𝜆𝑡) − 𝜆𝑃(𝑥, 𝜆𝑡)
Thesolutiontothisdifferentialequationis
𝑃(𝑥, 𝜆𝑡) =(𝜆𝑡)H
𝑥! 𝑒,²)
settingμ=λtgivesusthePoissondistribution
𝑝(𝑥; 𝜇) =𝜇H
𝑥! 𝑒,D
wherex=#ofevents(integernumber).μ=countrate.
Let’scheckthatthisisproperlynormalized:
5𝑓(𝑥; 𝜇) =∞
H8�
5 𝜇H
𝑥! 𝑒,D
∞
H8�
= 𝑒,D 5𝜇H
𝑥!
∞
H8�
= 𝑒,D𝑒D = 1
11
Mean
𝜇 ≡ 3 𝑥𝑓(𝑥)𝑑𝑥-
,-¡⎯⎯£5 ~𝑥
𝜇H
𝑥! 𝑒,D� = 𝑒,D 5 ~𝑥
𝜇H
𝑥! �7
H8�
7
H8�
= 𝑒,D �0 +5~𝜇H
(𝑥 − 1)!�7
H89
¤ = 𝑒,D �𝜇5³𝜇H,9
(𝑥 − 1)!´7
H89
¤ = 𝑒,D𝜇𝑒D = 𝜇
Variance
𝜎; ≡ 3 (𝑥 − 𝜇);𝑓(𝑥)𝑑𝑥-
,-¡⎯⎯£5~𝑥;
𝜇H
𝑥! 𝑒,D� − 𝜇;
7
H8�
¡⎯⎯£𝑒,D𝜇(𝜇𝑒D +𝑒D) −𝜇; = 𝜇
Famousresultthat𝜎 = √𝜇.e.g.,ifwedetectNphotons,thentheerroris±√𝑁.Notethatsomecareisrequiredforverylowcountrates,whereN=0canoccurcommonly.WhenN=0,itwouldbesillytosaytheuncertaintyisalso0.TheuncertaintyinNcountsisthesquarerootoftheexpectednumberofcounts,𝜎(𝑁) = √𝜇where𝜇 = ⟨𝑁⟩.3.4Gaussian(orNormal)DistributionTheNormaldistributionisanapproximationtothebinomialdistributionforthelimitingcasewherethenumberofpossibledifferentoutcomesislargeandtheprobabilityofsuccessforeachisfinitelylarge,sonp>>1.ItIsalsothelimitingcaseforthePoissondistributionwhenμbecomeslarge.
𝑝(𝑥; 𝜇, 𝜎) = 1
𝜎√2𝜋𝑒,(H,D)F;ºF
Itisthemostimportantdistributioninstatistics!
Itisleftasanexercisetothereadertoshowthattheexpressionisindeedproperlynormalized,thatthemeanisμandthevarianceisσ2.BinomialandPoissonPDFstendtowardGaussiansastheirmeanincreases(μ≥20).You’llseereferencesto“log-normaldistribution”:thisiswhenthelogofavariablehasaGaussian(akanormal)distribution.
12
3.4.1CentralLimitTheoremSupposethatnindependentrandomvariables,xi,ofunknownprobabilitydensityfunctionareidenticallydistributedwiththesamemeanμandvariance𝜎;(bothfinite).Asnbecomeslarge,thedistributionof�̅� = 9
U∑ 𝑥6 tendstoaGaussiandistributionwithmeanμ
andvariance𝜎;/𝑛.Alsoknownasthelawoflargenumbers,itallowsforquantitativeprobabilitiestobeestimatedinexperimentalsituationsinvolvinganaverage.3.4.2ConfidenceLimitsoftheGaussianDistributionTheprobabilitythatameasurementwillfallwithin±nσofthemeanis
𝑃(𝑛, 𝜎) = 3 𝑝(𝑥; 𝜇, 𝜎)𝑑𝑥
D�Uº
D,Uº
n P(-nσ<x<nσ)1 68.27%2 95.45%3 99.73%4 99.9937%5 99.999943%
Inotherwords,foraGaussian,100±20means:-Thereisa68%probabilitythat80≤μ≤120(with16%μislargerand16%μissmaller)-Thereisa95.5%chancethat60≤μ≤140-Thereisa99.7%chancethat40≤μ≤160Notethattheprobabilitytableisfora“2-tailed”probability,e.g.ifwewanttoknowtheprobabilityofanothertriallandingwithinagivenrange.Wealsocareabout“1-tailed”probability.
Forexample:whatisthechancethatthe100±20resultisconsistentwith0?Separationfrom0is(5×standarddeviation),sochanceis
1– 𝑃(5𝜎)
2 =1– 0.99999943
2 =5.7 × 10,½
2
Wesaythat100±20isa“5-sigmameasurement.”
13
100±50isa2σmeasurementanditisconsistentwith0.The2σ(95.5%confidence)upperlimitis100+(2x50)=200.Thismeansthatthemeasurementis≤200atthe97.7%confidencelevel.Byconvention(thoughthisisfield-dependent),measurementswhichare<5σaretreatedwithcaution,andthosewhichare<3σareseenasconsistentwithzero.Whysuchstringentlimitsas3-5σ?Becauseanyobserver
• Isbiased,e.g.terminatedobservationwhenexpectedresultwasfound.• Can’testimateσ,e.g.chosea“nicequietpiece”ofthedataora“source-freeregion”
togetthebackground.YoucansimilarlycalculateconfidenceintervalsfordifferentdistributionssuchasBinomialandPoisson.Example:Detectionofasourceandmeasurementofitsbrightness.Wemeasure101countsina1''radiusaperturecenteredonthesourceand1800backgroundcountsinanannulusrangingfrom1''to5''radius.(a)Howconfidentarewethatthereisasourcethere?Weneedtoassesshowdifferentthesourceisfromthebackground,i.e.isthesourcestatisticallydistinctfromjustfluctuationsinthebackgroundcounts?
Expectedbackgroundin1′′aperture
= 1800
𝜋[(5ÆÆ); − (1ÆÆ);]𝜋(1ÆÆ); =
180024
= 75counts𝜎/Ç(1ÆÆ) = √75 = 8.66fromPoissonstatistics.canusePoissonherebecausehave>20counts.Netcountsabovebackground=101–75=26
Significanceofdetectionabovebackground=26/8.66=3.0σConfidence=(99.73+0.27/2)=99.87%(extra0.135isbecauseit’ssingle-tailed)
(b)Howbrightisit?Totalflux=101 ±√101
14
Backgroundin1’ = 1800 ± √1800𝜋[(5ÆÆ); − (1ÆÆ);]𝜋
(1ÆÆ); =1800 ± 42.4
24 = 75 ± 1.8cts
Netflux=101–75=26Uncertaintyonnetflux=√101 +1.8; = 10.2(onlymarginallylargerb/cofuncertaintyinskybackground)Significanceofmeasurement=26/10.2=2.7σ
Itismoredifficulttomeasurethefluxthantodetermineexistence.E.g.measuringexistenceisonlya1-bitmeasurement(yesvsno),whereasmeasuringthefluxyouneedmoreinfo.3.4.3BivariateGaussianDistributionTestAsitsnamesuggests,it’sthejointGaussiandistributionoftwovariables.
𝑝°𝑥, 𝑦;𝜇H, 𝜇�, 𝜎H, 𝜎�, 𝜌± = 1
2𝜋𝜎H𝜎�(1 − 𝜌;)𝑒
,ËF;(9,ÌF)
where
𝑧; = (𝑥 − 𝜇H);
𝜎H;+
(𝑦 − 𝜇�);
𝜎�;−2𝜌(𝑥 − 𝜇H)(𝑦 − 𝜇�)
𝜎H𝜎�
andthePearsoncorrelation
𝜌 = 𝐶𝑜𝑟(𝑥, 𝑦) = 𝑉𝑎𝑟(𝑥, 𝑦)𝜎H𝜎�
𝑉𝑎𝑟(𝑥, 𝑦) = 𝜎H�; = 1𝑁5(𝑥6 − 𝜇H)(𝑦6 − 𝜇�)
7
689
Inmatrixnotation,
𝑝°𝑥, 𝑦;𝜇H, 𝜇�, 𝜎H, 𝜎�, 𝜌± = 1
2𝜋𝜎H𝜎�(1 − 𝜌;)exp �−
12𝐷
Î𝐶,9𝐷�
where
𝐷 =Ï𝑥 − 𝜇H𝑦 − 𝜇�
Ð , 𝐶 = Ï𝜎H; 𝜎H�;
𝜎H�; 𝜎�;Ð = "𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒𝑚𝑎𝑡𝑟𝑖𝑥"
15
Shapelooksmorecircularforsmallρ,moreellipticalforlargeρ
Canreadilygeneralizeto>2variables4.MaximumLikelihoodofGaussianVariablesSupposewehaveNdatapoints𝑦6witherrors𝜎6 thatareindependentofothersandGaussiandistributedabout𝑦u,whichyoudonotknowbutwanttofindout.Theprobabilityofany𝑦6 is
𝑃(𝑦6)𝛥𝑦 = 1
𝜎6√2𝜋𝑒,(�Ô,�u)
F
;ºÔF 𝛥𝑦
TheprobabilityofallN𝑦6'sisjusttheproductoftheindividualprobabilities
𝑃(𝑦6)𝛥𝑦 = Ö𝑃(𝑦6)𝛥𝑦7
689
=Ö1
𝜎6√2𝜋𝑒,(�Ô,�u)
F
;ºÔF 𝛥𝑦
7
689
Definethelikelihoodasℒ ≡ −2 ln𝑃,whichforourcaseis:
ℒ = 5(𝑦6 − 𝑦u);
𝜎6;
7
689
+ 25lnØ𝜎6√2𝜋Ù7
689
− 2𝑁𝑙𝑛𝛥𝑦
Theprincipleofmaximumlikelihoodisthemostprobableestimateof𝑦uoccurswhenℒisminimized.
6 The Bivariate Normal Distribution
which is just the product of two independent normal PDFs. We can get someinsight into the form of this PDF by considering its contours, i.e., sets of pointsat which the PDF takes a constant value. These contours are described by anequation of the form
x2
σ2X
+y2
σ2Y
= constant,
and are ellipses whose two axes are horizontal and vertical.In the more general case where X and Y are dependent, a typical contour
is described byx2
σ2X
− 2ρxy
σXσY+
y2
σ2Y
= constant,
and is again an ellipse, but its axes are no longer horizontal and vertical. Figure4.11 illustrates the contours for two cases, one in which ρ is positive and one inwhich ρ is negative.
x
y
x
y
Figure 4.11: Contours of the bivariate normal PDF. The diagram on the left(respectively, right) corresponds to a case of positive (respectively, negative) cor-relation coefficient ρ.
Example 4.28. Suppose that X and Z are zero-mean jointly normal randomvariables, such that σ2
X = 4, σ2Z = 17/9, and E[XZ] = 2. We define a new random
variable Y = 2X − 3Z. We wish to determine the PDF of Y , the conditional PDFof X given Y , and the joint PDF of X and Y .
As noted earlier, a linear function of two jointly normal random variables isalso normal. Thus, Y is normal with variance
σ2Y = E
[(2X − 3Z)2
]= 4E[X2] + 9E[Z2] − 12E[XZ] = 4 · 4 + 9 · 17
9− 12 · 2 = 9.
Hence, Y has the normal PDF
fY (y) =1√
2π · 3e−y2/18.
16
4.1WeightedMeanThemostlikelyvalueof𝑦uistheminimumvalueofℒ
𝜕ℒ𝜕𝑦u
= 5𝜕𝜕𝑦u
7
689
³(𝑦6 −𝑦u);
𝜎6;´ = −25
(𝑦6 −𝑦u)𝜎6;
7
689
= 0
=>5𝑦6𝜎6;
7
689
= 𝑦u51𝜎6;
7
689
=>𝑦u = ∑ 𝑤6𝑦67689
∑ 𝑤67689
Û wheretheweights𝑤6 = 1 𝜎6;©
aka“inverse-varianceweighting”.Notethatforequalweights,𝜎6 = 𝜎,
𝑦u = 1𝑁5𝑦6
7
689
asbefore
Theerrorontheweightedmeancomesfromourpreviousresultonerrorpropagation.
𝜎�u; = 5�𝜕𝑦u𝜕𝑦6
�;7
689
𝜎6; = 5𝜎6; Ï𝜕𝜕𝑦6
³∑ 𝑤6𝑦67689
∑ 𝑤67689
´Ð;7
689
=5𝜎6;
(∑ 𝑤67689 );
7
689
_𝜕𝜕𝑦6
Ý5𝑤6𝑦6
7
689
Þ`
;
=5𝜎6;𝑤6;
(∑ 𝑤67689 );
7
689
= ∑ 𝑤67689
(∑ 𝑤67689 );Û = 1 ∑ 𝑤67
689©
Inotherwords,
1𝜎�u;
= 51𝜎6;
7
689
if𝜎6 = 𝜎(dataallhavesameuncertainties),then𝜎�u = 𝜎/√𝑁asbefore.
17
4.2LinearRegressionorWeightedLeastSquaresNowsupposewehavemeasured2quantitiesinpairs,{𝑥6, 𝑦6}.Wethinkthat𝑦 = 𝑎 + 𝑏𝑥.Howdowederiveaandbfromourdata?Note:thisformisnotasrestrictiveasitseems.Wecantransformlotsofrelationsintothisform,e.g.𝑦 = 𝐴𝑥ß => log 𝑦 = log𝐴 + 𝛼 log 𝑥andthenperformanalysisonlogyandlogx.
ℒ = 25(𝑦6 − 𝑎 − 𝑏𝑥6);
2𝜎;
7
689
+ 25ln𝜎6√2𝜋7
689
− 2𝑁 ln ∆𝑦
𝜕ℒ𝜕𝑎 = 0 ⟹5
𝑦6𝜎6;= 𝑎5
1𝜎6;+ 𝑏5
𝑥6𝜎6;
𝜕ℒ𝜕𝑏 = 0 ⟹5
𝑥6𝑦6𝜎6;
= 𝑎5𝑥6𝜎6;+ 𝑏5
𝑥6;
𝜎6;
Twolinearequationsintwounknowns,aandb,withsolution
𝑎 = 1∆Ï5
𝑥6;
𝜎6;5
𝑦6𝜎6;−5
𝑥6𝜎6;5
𝑥6𝑦6𝜎6;
Ð , 𝑏 = 1∆ Ï5
1𝜎6;5
𝑥6𝑦6𝜎6;
−5𝑥6𝜎6;5
𝑦6𝜎6;Ð
wherethedenominator,
∆= 51𝜎6;5
𝑥6;
𝜎6;− Ï5
𝑥6𝜎6;Ð;
Theuncertaintiesinaandbfollowfromerrorpropagation,
𝜎0; = 5�𝜕𝑎𝜕𝑦6
�;
𝜎6; = 1∆5
𝑥6;
𝜎6;,𝜎/; = 5�
𝜕𝑏𝜕𝑦6
�;
𝜎6; = 1∆5
1𝜎6;
Note1:numpy.polyfitwithdeg=1doesthisNote2:don’tthrowawayupperlimits–theyhaveinformation.We'regettingaheadofourselvesbutmodernfolksuseaBayesianapproachtolinefittingthatfits“censored”data:Kelly2007,ApJ,665,1489
18
4.3LinearCorrelationIntheabsenceofanyhypothesis,anyknowledge,oranythingbettertodo,weoftencorrelateyiagainstxiinthehopeofdiscoveringsomeNewandUniversalTruth.Beforedoingso,youshouldask:-Doestheeyeseeanything?Ifnot,stopunlesstryingtodisprovesomehypothesis.-Istheapparentcorrelationduetoaselectioneffect?Commonmistakesarefluxlimits.-ApplytheRuleofThumb.Doesthecorrelationgoawayifyouplaceyourthumboversomeofthedata?UsethePearsoncorrelation,definedearlierforbivariateGaussiandistributions,
𝜌or𝑟 =𝜎H�;
𝜎H𝜎�
r=-1or+1meansdataperfectlyfitastraightline.
• “r”measuresthedegreeoflinearcorrelationbetween2variableswithoutknowledgeoftheerrors.Shouldnotbeusedfornon-linearrelationship.(Checkforthisbyplottingthedataandtakingalook!)
• Nodistinctionbetweenx&yintheformula.Eithercanbethedependent/independentvariable.
• Assumesvariablesareapproximatelynormallydistributed.• Tellsnothingabouttheslopeofthebestfittingline,onlythedegreeofcorrelation.
Forfinite-sizeddatasets,canhavefiniterevenforuncorrelateddata,andvice-versa,simplyduetotheuncertaintyin“r”.Canshowthat
𝜎ã = 1 − 𝑟;
√𝑁 − 1
Notethat𝜎ã CANNOTbeuseddirectlytoindicatesignificanceofacorrelationand/orwhetheroneobservedcorrelationissignificantlystrongerthananother.Itonlytellstheerroronthemeasurementofrinthissample.4.3.1Student'stdistributionInthecasewherex&yforma2-dimensionalGaussianabouttheirmeanvalues,thenwecantestwhethertheobserved“r”isconsistentwithaparentpopulationwithnocorrelation(r=0).Todoso,usethequantity
𝑡 ≡ 𝑟ä𝑁 − 21 − 𝑟;
19
whichisdistributedintheno-correlationcaseastheStudent’st-distributionwith𝜈 = 𝑁 −2degreesoffreedom:
𝑓(𝑡; 𝜈) = 𝛤(𝜈 + 12 )
√𝜈𝜋𝛤 �𝜈2� �1 +𝑡;𝜈 �
(ç�9) ;⁄
wherethegammafunctionisdefinedas
𝛤(𝑧) = 3 𝑡Ë,9𝑒)𝑑𝑡∞
�
andisjustthefactorialfunctionextendedtonon-integers.forintegerx,Γ(x+1)=x!
𝑓(𝑡; 𝜈)hasμ=0,𝜎; = 𝜈/(𝜈 − 2)forν>2
Inthiscase,wewanttheprobabilityforthe2-taileddistribution,∫ 𝑓(𝑡; 𝜈)𝑑𝑡)
,) .Ifwealreadyknewthesignofthecorrelation,wewoulduse1-tail∫ 𝑓(𝑡; 𝜈)𝑑𝑡)
,∞
FigurefromWikipediahttps://en.wikipedia.org/wiki/Student%27s_t-distributionpythonhasthiscodedupofcourse…Example:supposewefindr=0.5forN=10datapoints.
𝑡 = 0.5ä80.75 = 1.63
from scipy.stats import t c = t.cdf(1.63, 8) à 0.93=>7%chancethattwouldbehigherinaonesidedtestsig = 100*(1-2*(1-c)) à 86%significanceintwo-tailedtestStudentwasthepseudonymofW.S.Gossett(1876-1937).HewasachemistwhoworkedforGuinnessBreweryinDublin,Ireland,developedt-testtostudythequalityofbrewingingredients.Guinnessdidnotallowtheirchemiststopublishtheirfindings(toavoidtheircompetitorsfromlearningtheywereemployingstatisticians),hencethepseudonym.
20
4.3.2Fisherz-transformationWiththesameassumptionthatx&yaredrawnfromatwo-dimensionalGaussiandistribution,wecantestwhetherthedifferenceoftwononzero“r”valuesissignificantforN>10datasets.E.g.ifachangeinsomecontrolvariablesignificantlyaltersanexistingcorrelationbetweentwovariables.UseFisher’sz-transformation:
𝑧 = 9;𝑙𝑛 �9�ã
9,ã�=arctanh(r)
thisconvertsthePearsoncoefficienttoanapproximatelynormallydistributedvariable,z,withameanvalueof
𝑧̅ =12 ~𝑙𝑛 �
1 + 𝜌1 − 𝜌� +
𝜌𝑁 − 1�
andastandarddeviationof
𝜎Ë ≈1
√𝑁 − 3
wherer isthetruevalueofthecorrelationcoefficientfortheparentpopulation(tobetestedagainstthemeasured“r”).WecanusetheGaussianprobabilitytablesforthisandalsototestthesignificanceofadifferenceintwovaluesof“r”.Youcanalsoassumeaρ,createagaussianz,andinvert(=tanhz)tocreateconfidenceintervalsforrforlargesamples. 4.4Chi-squaredWehavediscussedprobabilitydistributionsandtheirstatistics.Nowwediscusswhetheraparticulardistributionandmodelactuallyfitthedata.Definea“badnessoffit”metric:
𝜒; = 5Ï𝑦6 − 𝑦°𝑥6, 𝑎ê±
𝜎êÐ
7
689
;
where𝑦°𝑥6, 𝑎ê±isageneralfunctionof𝑥6 withmodelparameters𝑎ê .
NoteforPoissonvariables,𝜎6 = ë𝑦°𝑥6, 𝑎ê±,butinordertogetconfidencesfrom𝜒;,we
need𝑦(𝑥6, 𝑎ê) > 20forall𝑥6 .Ifnotpossible,thenneedtobinupthedata(sumtheyi)forenoughxitomakeitso.
21
Procedurethenistoadjustmodelparameters,aj,untilthe𝜒;isminimized(i.e.leastbad)à𝜒G6U; .Thisthenmaximizesthelikelihood.#ofdegreesoffreedom=ν=(N–M),where
N=#ofdatapointsM=#ofmodelparameters
4.4.1Distributionofchi-squared
𝑝(𝜒;; 𝜈) = (𝜒;)
ç,;;
2ç;𝛤 �𝜈2�
e,ìF;
where𝜇 = 𝜈, 𝜎; = 2𝜈, 𝜈 = 𝑁 −𝑀.DistributiongoestoGaussianasNà∞
4.4.2GoodnessoffitForthebestfit,theexpectedvalueis
𝜒G6U; = (𝑁 −𝑀) ± «2(𝑁 −𝑀) = 𝜐 ± √2𝜐Soifitisn’t,weknowwehaveabadfit.Thiscancomefrom
• Wrongmodel:𝜒; ≫ 𝜈• Wrongmeasurementerrors:𝜒; ≪ 𝜈(toopessimistic),≫ 𝜈(toooptimistic)• Dataarenotnormallydistributed:𝜒; ≫ 𝜈
Thisisoftenwrittenas“reduced𝜒;”=𝜒ýG6U; = 𝜒G6U,ç; = 𝜒G6U;
𝜈©
𝜒ýG6U; = 1 ± ä2
𝑁 −𝑀
sothebestfitoccursaround𝜒ýG6U; ≈ 1,thoughthescatteraroundthisdependsonDOF.4.4.3.ConfidenceLimitsforGoodnessofFitWhatweneedistheprobabilityofobserving𝜒; > 𝜒G6U;
𝑃°> 𝜒G6U,"; ± = 1
2"/;Γ(𝜐/2)3(𝜒;)(",;)/;𝑒,$F/;
-
$%Ô&F
𝑑(𝜒;)
seescipy.stats.chi2anduseinthesamewayasforthestudent-tdistributionabove
22
4.4.4ConfidenceLimitsfortheFittedParameters(OBSOLETE)WetypicallywanttoidentifyaregionintheM-dimensionalspaceoftheparameters{aj}aboutthebestfitthatcontainsagivenpercentageofthetotalprobabilitydistribution,e.g.“thereisa99%chancethatthetrueparametervaluesareinthisregion.”Observergetstopicktheconfidencevalue,butcertainonesarecommon,e.g.,68%(1-sigma),95%(2sigma),etc.Notethatthesevaluesareaconventiontiedtoagaussiandistribution,eventhoughtheprobabilitydistributionoftheparametersareoftennotnormallydistributed.Conventionally,onechooses{aj}suchthat𝜒; = 𝜒G6U; +𝛥𝜒;(𝐶𝐿,#𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠).𝛥𝜒;isdistributedas𝜒;with𝜈 = 𝑀(=numberofparameters)degreesoffreedom,NOTwith𝜈 = 𝑁 −𝑀(=theactualdegreesoffreedomofthefit).Youcanmarginalizethistoasubset{aj}ofM’parameters.Then𝛥𝜒;isdistributedas𝜒;with𝜈 = 𝐼(=numberofinterestingparameters)degreesoffreedom.Example:M=1(i.e.onlyfitting1parameter).
Inthiscase,chi-squaredistributionforν=1isthesameasthesquareofasinglenormallydistributeddistribution(i.e.Gaussian).
𝛥𝜒; < 1occurs68.3%ofthetime(1sigmafornormaldistribution)𝛥𝜒; < 4occurs95.5%ofthetime(2sigma)𝛥𝜒; < 9occurs99.7%ofthetime(3sigma)
Example:M=2
1-sigmaCL:
fora1is[X1,X2]fora2is[Y1,Y2]
1-sigmaCLonbothparametersispurpleellipse,i.e.68.3%ofalldeterminationsofa1anda2shouldlieintheellipse.
CAUTION:verylimitedapplicationinrealastrophysicalsituations.See“DosandDon’tsofReducedChi-Squared”byAndraeetal.(arXiv:1012:3754).UseBayesianmethodsinstead.
23
4.4.5.SignificanceofAddingAnotherParameterWeoftenwanttoknowifitisnecessarytoaddanotherparametertoamodelwehavebeenfittingtoourdata.E.g.,mightbereddeninginMilkyWayandinthehostgalaxytoasupernova.𝜒;willofcoursebelowerbecausetheadditionalfreedomenablesustogetthemodelcurveclosertothedata.Butisthedecreasesignificant?Supposewehavetwomodelswith𝜒U;and𝜒G; andnandmDOF,respectively.Therearetwousefulstatisticstotestifthe𝜒;valuesbetweenthe2modelsaresignificantlydifferent.(1)TheFstatistic:
𝐹U,G ≡𝜒U; 𝑛©𝜒G; 𝑚©
followstheFprobabilitydistribution:
𝑝U,G(𝑥) = Γ �𝑛 + 𝑚2 �𝑛U/;𝑚G/;
Γ(𝑛/2)Γ(𝑚/2) 𝑥U;,9
(𝑚 + 𝑛𝑥)(U�G)/;
hasmeanandvariance
𝜇 = 𝑚
𝑚 − 2
𝜎; = 2𝑚;(𝑚 + 𝑛 − 2)𝑛(𝑚 − 2);(𝑚 − 4)
WewanttheprobabilityofobservingsuchalargeFvalue:
𝑃(> 𝐹; 𝑛,𝑚) = 3 𝑝U,G(𝑥)𝑑𝑥-
*
Caution:definitionofthestatisticdoesnotdistinguishbetweenexperiment“1”and“2”.Sowecanform2statistics,onethereciprocaloftheother(F12andF21).BotharedistributedaccordingtotheFdistribution.Typically,wetestboth,checkingthatF12isnottoolargeandF21isnottoosmall.Asusual,pythonisyourfriend;scipy.stats.f
CAVEAT:Twoimportantconditionsmustbesatisfiedtousethisstuff(Protassovetal2002,ApJ,571,545–readthesummarysection6)
24
1. Thetwomodelsyouarecomparingmustbe“nested”,i.e.allowedparametersvaluesofonemodelmustbeasubsetofthoseoftheothermodel.(SeealsoFreemanetal1999,APJ,524,53).e.g.Cannotcompareblackbodymodelwithsynchrotronemissionmodel,butcancomparegoodnessoffitforeachwith𝜒;fitting.
2. Zerovaluesofadditionalparametersmustnotbeontheboundaryofpossible
parameters.E.g.cannotcompare2pointsourceswith1,nordetectionofanemissionlineinaspectrum.Neithercaseallowsfornegativeflux.
Legitimateuses:-brokenpowerlawvssinglepowerlaw-non-solarvssolarabundance-comparingvariancesof2samples4.4.6.ProsandConsofChi-squaredPros:
• Mostpeoplehaveheardofthis,someevenaccepttheresultsJ• Sinceadditivebydefinition,differentsamplescanbetestedallatonce.• Automaticallygivesanestimateofwhethermodelisacceptable.
Cons:
• Datamustusuallybebinnedàlossofinformation.• Datamustbenormallydistributed.• Ifdatadonotagreewiththemodel,cannottellwhichdirectionisoff.• Cannotbeusedwithsmallsamples(<~20).• SeeAndraeetal.2010(arXiv:1012.3754)
5.RankTestsAgeneralsetoftestscomesfromreplacingthevaluesofNpairsofmeasurements(xi,yi)withtheirranks(Ri,Si).e.g.ifwehave4pairs(xi,yi)=(1,3)(5,0)(3,2)(4,1)
thenwehave(Ri,Si)=(1,4)(4,1)(2,3)(3,2)Forsimplicity,assumenoties.(SeeNumericalRecipesforthemoregeneralcase.)Whydothis?Becauseranksaredrawnfromauniformdistributionbetween1andN,witheachrankoccurringonlyonce(assumingnoties).Fromtheuniformdistribution: 𝑅u = 𝑆 ̅ = (𝑁 + 1) 2⁄ ,𝜎-; = 𝜎.; = (𝑁 − 1); 12⁄ ,etc.
25
Wehavethereforetransformedfromvariableswithunknowndistributionstooneswithknowndistributions.Thereissomelossofinformationinreplacingtheoriginalnumbersbytheirranks,butnotmuch.Andthestatisticsofranksaremorerobustthanstatisticsoftheoriginalvariables,justasthemedianismorerobustthanthemean(andslightlynoisier).5.2.1SpearmanRankCorrelationCoefficientDefinetobethelinearcorrelationcoefficientoftheranks:
𝑟. =∑ (𝑅6 − 𝑅u)6 (𝑆6 − 𝑆 ̅)
«∑ (𝑅6 − 𝑅u);6 «∑ (𝑆6 − 𝑆 ̅);6= 1 − 6
∑ (𝑅6 − 𝑆6);6
𝑁� − 𝑁
(laststepoccursifnoties,becausevaluesofRiandSiareknown)
Obviouslywhenx&yarecorrelated,R&Swillbetoo.Asbefore,−1 ≤ 𝑟. ≤ +1.Ahighvalueindicatessignificantcorrelation.Totestthelevelofsignificance,canactuallycalculatethedistributionexplicitlyforNsmallwhenR&Sareuncorrelated.IfN>50,cancompute
𝑡 ≡ 𝑟.ä𝑁 − 21 − 𝑟.;
whichisdistributedaccordingtoStudent’ststatistic,withN-2degreesoffreedom.scipy.stats.spearmanr 5.2.2Kolmogorov-Smirnov(K-S)TestUsedforunbinneddatathatareacontinuousfunctionofasinglevariable,i.e.alistofvalues,e.g.distributionofprotoplanetarydiskmasses.Determineswhetherasampleagreeswithafunction(“one-sided/sample”)orwhether2samplescomefromthesameparentpopulation(“two-sided/sample”).Pros:--nolossofinformation--canbeusedforverysmallsamplesCons:--cannotbeusedforparameterestimation
26
Thetestcomparesthecumulativedistributionofranks:
• Rankyoursampleinascendingorderofx.• CalculateSN(x),where
𝑆7(𝑥) = /
0𝑥 < 𝑥9
𝑟𝑁𝑥ã ≤ 𝑥 < 𝑥ã�9
1𝑥 ≥ 𝑥7
andNisthesizeofthesample.SN(x)isthefractionofthedatawithvalueslessthanx.
• Ifhavetwosamples,thencalculateSN1(x)andSN2(x)• Ifhaveafunction,thencalculate𝐹(𝑥) ≡ ∫ 𝑓(𝑦)𝑑𝑦H
,∞
• TheK-Sstatisticis
𝐷9,[601 = max,∞2H2∞
|𝑆7(𝑥) − 𝐹(𝑥)|𝐷;,[601 = max
,∞2H2∞3𝑆7E(𝑥) − 𝑆7F(𝑥)3
WhatmakestheK-Sstatisticusefulisthatitsdistributioninthecaseofdatadrawnfromthesame(unknown)distributioncanbecalculated.Andit'sallhere,readytoplugn'play:scipy.stats.kstest
27
6.PartingComments
• Don'thidedata.• Trytousedistribution-freetests.• Lotsofteststochoosefrom,butnotallareequallypowerfulforagivenapplication.• Don’tgettooenamored/lostintheworldofstatistics.Youarebudding
astronomers,notbuddingstatisticians.• USECOMMONSENSE.