An Introduction to Forecast Verification - HZGCOSMO/CLM/ART Training Forecast Verification Felix...
Transcript of An Introduction to Forecast Verification - HZGCOSMO/CLM/ART Training Forecast Verification Felix...
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
An Introduction to Forecast Verification
Felix Fundel
Deutscher Wetterdienst
FE 15 – Vorhersagbarkeit & Verifikation
Telefon:+49 (69) 8062 2422
Email: [email protected]
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Outline
Basics
What is verificationWhat is a good forecast?
Reasons for NWP verification?Why is NWP erroneous?
Answers verification can giveTypes of forecasts
Types of observationsForecast properties
Methods
1 )Deterministic ForecastsContinuousCategorical
2) Ensemble Prediction SystemsEnsembleProbabilistic
3) SpatialFuzzyObject based
Final RemarksTake special care ofVerification guideline
Active fields of researchFurther reading
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
What is Verification?
Comparison of prediction (forecast) and truth (observation or analysis)
Usually the “truth” is not known. Using “evaluation” or “validation” instead of “verification“ would be more adequate
Infer on the goodness (quality and value) of the forecast (qualitatively or quantitatively)
I Basics
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
What is a good forecast?
Deterministic point forecast
Straight forward error characterization(accuracy & correlation)
However:How much error do I allow for?What would be an adequate reference forecast?Do I demand the same quality for day 1 as for day 7?Is the observation really this exact?
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
What is a good forecast?
Deterministic spatial forecast
Should the forecast be evaluated point by point?
Should I allow for some spatial inaccuracy
Is the forecast equally good on all spatial scales?
How would a forecaster perceive the forecast quality?
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
What is a good forecast?
Ensemble forecast
Is the observation within the ensemble range?
Would outside be ok as well?
How much ensemble spread is good?
Can I say anything about forecast quality from just one realization?
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Reasons for NWP verification
Monitoring and quantification of errors→ for communication to the public or customers
Unravel systematic forecast errors→ for developers and forecasters
Compare different models / experiments→ for decision making, developers
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Why is NWP erroneous?
Deficiencies in the model• Coarse grid resolution (vertical & horizontal)• Coarse temporal resolution• Parameterization of physical processes (e.g. radiation,
precipitation)• Numerical approximations• Errors from model boundaries (regional models)• Coding errors
Inaccurate initial conditions• Too few observations (spatial & temporal)• Uncertain observations (instrument error & representativity)• Errors in data assimilation
Even a perfect model with perfect initial conditions will have a limited predictability (butterfly effect)
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Some answers verification can deliver
The forecast is x % better/worse than a reference forecast (e.g. climatology, persistence, other model)
The forecast is valuable up to a lead-time of x days?
Forecast quality depends on time of the day, region, season, meteorological conditions…
Will I have economical benefit from using the forecasts for decision making?
The forecast is/is not calibrated
The forecast is capable of representing the location, timing, shape, magnitude of objects (e.g. rain cells)
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Types of forecasts
Deterministic model• Decide for in initial state and start a
single integration
Ensemble prediction• Make many integrations (e.g. from
different initial states, or multi model or analogue ensemble…)
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Types of observations
• Point based• SYNOP • Ships, buoys • TEMP• Satellites• Airplanes• Observers
• Spatial• Rain radar• Satellites• Model analysis
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Observations are not the truth!
• Each observation is a model itself (exception: counting events)
• Observations should come with an uncertainty which, if possible, should be considered in verification (but is rarely done)
Verification against observations is (almost) alway s flawed
• Model value is a grid box average and observations within a grid box might vary strongly. Many ways exist to match observation to grid point
• Is the observed value really what you want you model to predict?
• Gridded observations or analysis might be a work around but then, those rely on models (statistical or physical) again and might not be independent from the forecast.
Caution!
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Forecast properties
Deterministic prediction systemBias - mean errorAssociation - e.g. correlation
Ensemble prediction systemReliability - conditional bias over several categories (usually forecast probabilities)Resolution - ability to resolve events in different subsetsSharpness - spread of the forecast distributionUncertainty - observation variability
BothSkill - Value w.r.t. a reference forecast (e.g. persistence or climatology)Value - Is the forecast helpful for decision making
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Types of verificationContinuousFor deterministic predictions as time-series, spatial data or both combinedExample: temperature, pressure, upper-air variables
Dichotomous (binary, 2 categories, yes/no, special case multiple categories)For deterministic predictions as time-series, spatial data or both combinedExample: rain yes or no? cloud amount category, wind speed, warnings
EnsembleFor ensemble models considering the forecast distributionExample: Does the ensemble spread capture the forecast uncertainty
ProbabilisticFor probabilities derived from ensemble modelsExample: probability to exceed wind speed of 10 m/s?
SpatialMostly deterministic modelExample: Are objects predicted at the correct location?
II Verification Methods
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Deterministic Forecasts
BIAS (mean error)
Shows the average direction of the error (positive or negative)
In NWP defined positive (negative) if model forecas t quantities are larger (smaller) than observed
Does not indicate the magnitude of the error as pos itive an negative values might cancel each other out
Same unit as variable
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Deterministic Forecasts
MAE (mean absolute error)
Average error magnitude
Does not indicate the direction of the error
Same unit as variable
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
RMSE (root mean squared error)
Average error magnitude with quadratic weight
Sensitive to large errors, always larger than MAE ( sum of bias and error variance)
RMSE > MAE means variation in errors
Same unit as variable
Deterministic Forecasts
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Correlation coefficient & anomaly correlation
Correspondence between forecast and observations
Measures linear association and phase errors.
Independent from biases.
Can give misleading results if verification sample is inhomogeneous (e.g. temperature correlation with day and night values i n one sample)
(Anomaly correlation should be used to reduce effec ts of inhomogeneity)
Deterministic Forecasts
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
EXAMPLE DETEMINISTIC TEMP VERIFICATION OF GEOPOTENT IAL OVER EUROPE
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Categorical Forecasts
Observed
yes no Total
Forecast yes hits false alarms forecast yes
no misses correct negatives forecast no
Total observed yes observed no total
• Used for binary data• If data is not binary, decide for a threshold (e.g. precipitation > 10mm/h) and
make your data binary• Sum up all entries in the contingency table
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Categorical Forecasts!!CAUTION!!Famous example: Tornado forecast verificationCollection of tornado forecasts (yes/no) and outcom es
Observed
yes no Total
Forecast yes 28 72 100
no 23 2680 2703
Total 51 2752 2803
Accuracy = (28+2680)/2803 = 96.6% (published in Ame ric. Meteorol. Journal 1884)
If no tornados were forecast at all: Accuracy = (0+ 2752)/2803 = 98.2%
It is advisable to use more measures than just accu racy…
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Categorical Forecasts
fraction of correct forecasts (best=1)
under or over forecasting (best=1)
correctly forecast events (best=1)
wrongly forecast events (best=0)
Can forecast separate yes from no events (best=1)
How much more often is a event forecast correctly than incorrectly (best= Inf)
Many more exist, see http://www.cawcr.gov.au/projec ts/verification/#Methods_for_dichotomous_forecasts
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Multi-categorical Forecasts
Observed Category Total
i,j 1 2 ... K
1 n(F1,O1) n(F1,O2) ...n(F1,OK)
N(F1)
Forecast 2 n(F2,O1) n(F2,O2) ...n(F2,OK)
N(F2)
Category...
... ... ... ... ...
K n(FK,O1) n(FK,O2) ...n(FK,OK)
N(FK)
Total N(O1) N(O2) ... N(OK) N
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Ensemble Forecasts(taking into account all members)
• An EPS provides a range of forecasts • By comparing just a singe EPS
forecast to an observation nothing can be said about the forecast quality!
• Even an observation outside the EPS does not mean the forecast is wrong
• Evaluation an EPS requires the collection of many cases
• This allows to infer on the statistical correctness of the forecast distribution given by the EPS
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Ensemble Forecasts
Reliability Resolution
Measures ability to discriminate different events
Measures average agreement
ObservationsForecasts
Forecasts
Event 1
Event 2
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Ensemble Forecasts
Talagrand diagram (a.k.a. rank histogram)
• Count the number of cases an observation falls in each of the bins given by the ensemble forecast (e.g. 20 Member EPS has 21 bins)
• As each member should be equally likely the talagrand diagram should be flat (necessary (not sufficient) criterion for reliability)
• Tails in the diagram indicate overall biases • Peaks on the left and right indicate too little spread• Hill shape indicates too much spread
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Example COSMO -DE-EPS
Hourly precipitation summer 12
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Ensemble ForecastsSpread/Skill behavior
• The spread (width) of an ensemble forecast should be related to the uncertainty of the forecast
• It is desirable to have growing spread when the forecast error grows• Common measure for spread is the average of the standard deviation over a set of
ensemble forecast• Common measure for skill is the RMSE of the ensemble mean• In case of no bias, spread and skill should give the same value
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Example COSMO -DE-EPS
RMSE (SKILL)
STDEV(SPRED)
VMAX_10M July 2012
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Ensemble ForecastsCRPS (continuous rankes probability score)
• Like MSE for ensemble predictions• Observation and forecast are expressed as cumulative density function and the
average difference of the probabilities is calculated• Can be decomposed in reliability, resolution and uncertainty components
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Ensemble Forecasts
Other measure
• Check the number of outliers, i.e. observation falling outside the ensemble range 2/(n+1) is expected for a perfect EPS!
• The ensemble mean is often verified in a deterministic manner
• Each member can be verified in a deterministic manner
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Reliability diagram
• Visualization of conditional (on forecast probability) biases• Decide for a threshold (e.g. temperature>=°C), convert forecasts to probabilities
exceeding this threshold and convert observation to binary according to threshold• Plot frequency of observation for each forecast probability class• Binning requires a lot of data
Probabilistic Forecasts(transform ensemble to probability)
overconfident forecast
(not enough spread)
biased forecast
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Example COSMO -DE-EPS
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Brier Score
• Like MSE for probabilistic forecasts (magnitude of error between forecast probability [0%-100%] and observed probability [0% or 100%])
• Decide for a threshold (e.g. temperature>=°C), convert forecasts to probabilities exceeding this threshold and convert observation to binary according to threshold
• Can be decomposed in resolution, reliability and uncertainty component• Perfect score = 0
Probabilistic Forecasts
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
ROC (relative operating characteristic)
• Decide for an event threshold• Calculate contingency table entries for a set of probability thresholds• Plot POD against FAR• Perfect when area under ROC curve = 1• Measures forecast resolution (forecasts can discriminate events)• If line falls under diagonal, forecast is worse than a random guess• Can be used to compare with deterministic forecast
Probabilistic Forecasts
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Probabilistic Forecasts
Relative (economic) value score
C = Costs for taking preventive actionL = Loss if no preventive action was taken
• For each possible forecast probability calculate contingency table entries• Calculate VS for a number of cost/lass values [0-1]• Quantifies the relative monetary value of an EPS for a decision making problem• Considers the costs of a forecast user linked to the forecast event• Can give an indication for the best probability a forecast user should base her
decision on• For calibrated (unbiased) forecast systems the best probability equals C/L
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Deterministic & Probabilistic
Skill Score (reduction of error variance)
• Can be applied to any score• Reference could e.g. be a climatological forecast, a persistence forecast or another
model• Perfect score = 1 , 0 indicates no improvement over reference• Result is % improvement in score compared to reference forecast• Ultimate answer to the question “how good is the forecast?”• Allows to compare scores for different events (e.g. easy and hard to predict)
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Example COSMO -DE-EPS
Hourly precipitation summer 12
Reference: determ. COSMO
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Spatial Verification
Problems of traditional (point-to-point) verification methods
Double Penalty• Location or timing errors in the forecast are
penalized double• E.g. rain if forecast where there is no rain
observed and no rain is forecast where it actually was observed
• Increasingly problematic with increasing resolution of forecast models
Forecast of objects
• Properties of objects like rain cells of cloud are important aspects of a forecast and not captured by point-to-point-verification
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Spatial Verification
FUZZY (neighborhood)
• Decide for a set of thresholds• Gradually smooth forecast and/or observed fields
(set of smoothing functions are possible)• Decide for a verification measure, e.g. Fraction Skill Score
• Useful if forecast and observation are available as grid (e.g. observation from rain radar)
• Shows useful scales of predictability • Popular for comparing models with different horizontal resolution• Reduced double penalty effect with larger scales• Many dichotomous or probabilistic scores can be used for analysis
∑ ∑
∑
= =
=
+
−−=
N
i
N
iobsfcst
N
iobsfcst
PP
PP
1 1
22
1
2
N1
N1
)(N1
1FSS
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Spatial Verification
Object-based
• Extract objects from forecast/observed fields (e.g. rain cells, clouds,…)• Make statistics on properties of those objects (e.g. size, location, magnitude,…)• Some object based methods use an object matching and tracking in time• Object based methods try to mimic the forecast users perception of the forecast quality• Double penalty effects can be avoided
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Take special care of
• Calculating scores is easy, preparing data for the verification task is usually most time consuming
• Data quality (eliminate erroneous measurements)
• Use an appropriate score (e.g. no RMSE to verify precipitation)
• Use a homogeneous data set (stratify as much as possible)
• Stratify after observations or external factors (rather than forecast values)
• Use as many as possible data (aggregate if possible)
• Try to implement error bars (and avoid dependent observations)
• Keep in mind that your observation and verification is usually imperfect (don’t expect perfect results)
III Final Remarks
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Verification Guidelineo Who is the user?
(forecaster, developer, administrative, decision ma ker,…)
o What forecast aspects are relevant to this user?(parameter, domain, warnings, scenarios,…)
o What observations are available?(point, gridded, analysis, quality,…)
o What methods are possible with the given data(deterministic, ensemble, probabilistic, fuzzy, …)
o What score(s) give(s) the right information?(bias, association, skill scores, economic value,…)
o What is an appropriate reference to compare the for ecast to?(other model, climatology, persistence, randomness)
o How should the result be visualized best?(text, case studies, graphs, errorbars,…)
o Look at your data and don’t rely on scores only!(make scatterplots and look at individual forecasts )
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Active fields of research
• Use of observation uncertainty in verification
• Spatial verification methods applied to ensemble prediction systems
• Accounting timing errors in verification (avoiding double-penalty effects in time)
• Verifying forecast scenarios
• Verification of extreme events
• Multivariate verification
-
COSMO/CLM/ART Training Forecast Verification Felix Fundel
Further readingCollection of methods & scores (lots of further lin ks)http://www.cawcr.gov.au/projects/verification
Short Introductionhttp://www.dwd.de/DE/forschung/wettervorhersage/num_modellierung/05_verifikation/verifikation_node.html
Collection of monitoring & verification products (o nly at DWD)http://oflxs04.dwd.de/~mkoehler/plot-catalog/index.php
WV verification report (only at DWD)http://intranet.res.bund.de/downloads/VB52_Gesamtdokument.pdf
Books/PapersForecast Verification (Joliffe & Stephenson)Fuzzy Methods Review (Ebert)WMO guidelines and meetings
ToolsR and the packages “verification” and “SpatialVx”R package Rfdbk for using feedback filesCOSMO Common Verification VERSUSCOSMO Spatial Verification VAST