Statistical significance, confidence, uncertainty · 2019-04-30 · Practical vs. statistical...
Transcript of Statistical significance, confidence, uncertainty · 2019-04-30 · Practical vs. statistical...
Statisticalsignificance,confidence,uncertainty
TressaL.Fowler
AccountingforUncertainty
•Observational•Model
• Modelparameters• Physics• Verificationscores
•Sampling• Verificationstatisticisarealizationofarandomprocess• Whatiftheexperimentwerere-rununderidenticalconditions?Wouldyougetthesameanswer?
Uncertaintyestimatesareamongalonglistofimportantverificationpractices
• Welldefinedquestionsorgoals.• Large,representative,(identical?)sample.• Consistent,independentobservations.• Appropriatemethodsandstatistics.• Uncertaintyestimates.• Spatial,temporal,andconditionaldifferencesevaluated.• Userrelevantresults.• Thoroughlytestedsoftware.
Youcan’tfixbyanalysiswhatyoubungledbydesign.- Light,SingerandWillett.
Definequestion(s)first.Thentheconfidenceintervalisaroundtherightstatistic.
•Whichmodelisbest?•Ismymodelupgradeanimprovement?
•Howfrequentlyareceilingsinthecorrectcategory?
Practicalvs.statisticalsignificance
•Maynotbethesame.Why?•Failuretousesignificantfigures.•Verylargesamplesizes.•Statsassumesindependentsamples,butweatherrarelydelivers.
•Whichdoyouneed?Both!
Twowaystoexaminescores
CIaboutPairwise Differencesmayallowforbetterdifferentiationofmodelperformance
CIaboutActualScoresmaybedifficulttodifferentiatemodelperformancedifferences
Model1
Model2
Diff:Model1- Model2
SS – CIs do not encompass 0
ConfidenceIntervals(CIs)
“Ifwere-runtheexperimentNtimes,andcreateN(1-α)100%CI’s,thenweexpectthetruevalueoftheparametertofallinside(1-α)100oftheintervals.”
Confidenceintervalscanbeparametric ornon-parametric…
TypesofConfidenceIntervals
Bootstrap
• Availableforalmostanystatistic.
• Morerobusttooutliers.• Sensitivetolackofcontinuity,smallsamples.
Parametric(normal)
• Sensitivetodeparturesfromassumeddistribution.
• Oftensensitivetooutliers.• Notavailableforsomestatistics.
NormalApproximationCI’s
Is a (1-α)100% Normal CI for ϴ, where • ϴ is the statistic of interest (e.g., the forecast mean)• se(ϴ) is the standard error for the statistic• zv is the v-th quantile of the standard normal distribution
where v= α/2.• A typical value of α is 0.05 so (1-α)100% is refered to as the 95th
percentile Normal CI
Estimate
StandardnormalvariatePopulation(“true”)
parameter
NormalApproximationCI’s
θ
se(θ)
zα/2
ApplicationofNormalApproximationCI’s
• Independenceassumption(i.e.,“iid”)– temporalandspatial• Shouldcheckthevalidityoftheindependenceassumption• METaccountsforfirstordertemporalcorrelation
• Normaldistributionassumption• Shouldcheckvalidityofthenormaldistribution(e.g.,qq-plots,othermethods)
• METdoesnotdothis– shouldbedoneoutsideofMET• However…METappliesappropriateapproachestoverificationstatistics
• Multipletesting• Whencomputingmanyconfidenceintervals,thetruesignificancelevelsareaffected(reduced)bythenumberofteststhataredone.
NormalApproximationCI’s
•NormalapproximationisappropriatefornumerousverificationmeasuresExamples:Meanerror,Correlation,ACC,BASER,POD,FAR,CSI
•AlternativeCIestimatesareavailableforothertypesofvariablesExamples:forecast/observationvariance,GSS,HSS,FBIAS,BrierScore
•Allapproachesexpectedthesamplevaluestobeindependentandidenticallydistributed.
IIDBootstrapAlgorithm
(Nonparametric)BootstrapCI’s
1. Resamplewithreplacement fromthesample(forecastandobservationpairs), x1,x2,...,xn
2. Calculatetheverificationstatistic(s)ofinterestfromtheresampleinstep1.
3. Repeatsteps1and2manytimes,sayBtimes,toobtainasampleoftheverificationstatistic(s)θB .
4. Estimate(1-α)100%CI’sfromthesampleinstep3.
EmpiricalDistribution(Histogram)ofstatisticcalculatedonrepeatedsamples
5%
5%
Boundsfor90%CI
ValuesofstatisticθB
BootstrapCIConsiderations
•Numberofpointsimpactsspeedofbootstrap• Grid-basedtypicallyusesmorepointsthanPoint-based• THUS:BootstrapisquickerwithPoint-based
•Numberofresamples impactsspeedofbootstrap• Recommendedvalueis1000• Ifyouneedtoreduce– trytodeterminewheresolutionsconvergetopickyourvalue
• BootstrapcanbedisabledinMET,ifconcernedaboutcomputespeed- checkstatusinconfig filebeforerunning
METViewer alternatives
• Twotypesofparametricintervalsavailablewhereappropriate.• Accumulatescores(e.g.overallaverage),findparametricinterval.
• Summarizescores(e.g.findaverageormedianvalueofalldailyPODvalues),findintervalappropriateforaverageormedian.
• Bootstrapthestatistics foreachfieldovertime.• Measures(between-field)uncertaintyoftheestimatesovertime,ratherthanthewithinfielduncertainty.
• Pairwisedifferencestatisticsandintervals(witheventequalization).
• Givesmorepowertodetectdifferencesbyeliminatingcasetocasevariability.
Conclusions
• Uncertaintyestimatesareanessentialpartofgoodverificationevaluations.
• Allestimatesarewrong,someestimatesareuseful.• METandMETViewer developersstrivetoprovidethemostcorrectandusefulintervalsforoutputstatistics.
Appendix C of MET Documentation: http://www.dtcenter.org/met/users/docs/overview.php
References and further reading• Gilleland, E., 2010: Confidence intervals for forecast verification. NCAR Technical
Note NCAR/TN-479+STR, 71pp. Availableat:https://opensky.ucar.edu/islandora/object/technotes%3A491
• Jolliffe and Stephenson (2011): Forecast verification: A practitioner’s guide, 2nd
Edition, Wiley & sons• JWGFVR (2009): Recommendation on verification of precipitation forecasts.
WMO/TD report, no.1485 WWRP 2009-1• Nurmi (2003): Recommendations on the verification of local weather forecasts.
ECMWF Technical Memorandum, no. 430• Wilks (2012): Statistical methods in the atmospheric sciences, ch. 7. Academic
PressSee also
http://www.cawcr.gov.au/projects/verification/