An Introduction to Forecast Verification - HZGCOSMO/CLM/ART Training Forecast Verification Felix...

45
COSMO/CLM/ART Training Forecast Verification Felix Fundel An Introduction to Forecast Verification Felix Fundel Deutscher Wetterdienst FE 15 – Vorhersagbarkeit & Verifikation Telefon:+49 (69) 8062 2422 Email: [email protected]

Transcript of An Introduction to Forecast Verification - HZGCOSMO/CLM/ART Training Forecast Verification Felix...

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    An Introduction to Forecast Verification

    Felix Fundel

    Deutscher Wetterdienst

    FE 15 – Vorhersagbarkeit & Verifikation

    Telefon:+49 (69) 8062 2422

    Email: [email protected]

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Outline

    Basics

    What is verificationWhat is a good forecast?

    Reasons for NWP verification?Why is NWP erroneous?

    Answers verification can giveTypes of forecasts

    Types of observationsForecast properties

    Methods

    1 )Deterministic ForecastsContinuousCategorical

    2) Ensemble Prediction SystemsEnsembleProbabilistic

    3) SpatialFuzzyObject based

    Final RemarksTake special care ofVerification guideline

    Active fields of researchFurther reading

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    What is Verification?

    Comparison of prediction (forecast) and truth (observation or analysis)

    Usually the “truth” is not known. Using “evaluation” or “validation” instead of “verification“ would be more adequate

    Infer on the goodness (quality and value) of the forecast (qualitatively or quantitatively)

    I Basics

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    What is a good forecast?

    Deterministic point forecast

    Straight forward error characterization(accuracy & correlation)

    However:How much error do I allow for?What would be an adequate reference forecast?Do I demand the same quality for day 1 as for day 7?Is the observation really this exact?

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    What is a good forecast?

    Deterministic spatial forecast

    Should the forecast be evaluated point by point?

    Should I allow for some spatial inaccuracy

    Is the forecast equally good on all spatial scales?

    How would a forecaster perceive the forecast quality?

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    What is a good forecast?

    Ensemble forecast

    Is the observation within the ensemble range?

    Would outside be ok as well?

    How much ensemble spread is good?

    Can I say anything about forecast quality from just one realization?

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Reasons for NWP verification

    Monitoring and quantification of errors→ for communication to the public or customers

    Unravel systematic forecast errors→ for developers and forecasters

    Compare different models / experiments→ for decision making, developers

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Why is NWP erroneous?

    Deficiencies in the model• Coarse grid resolution (vertical & horizontal)• Coarse temporal resolution• Parameterization of physical processes (e.g. radiation,

    precipitation)• Numerical approximations• Errors from model boundaries (regional models)• Coding errors

    Inaccurate initial conditions• Too few observations (spatial & temporal)• Uncertain observations (instrument error & representativity)• Errors in data assimilation

    Even a perfect model with perfect initial conditions will have a limited predictability (butterfly effect)

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Some answers verification can deliver

    The forecast is x % better/worse than a reference forecast (e.g. climatology, persistence, other model)

    The forecast is valuable up to a lead-time of x days?

    Forecast quality depends on time of the day, region, season, meteorological conditions…

    Will I have economical benefit from using the forecasts for decision making?

    The forecast is/is not calibrated

    The forecast is capable of representing the location, timing, shape, magnitude of objects (e.g. rain cells)

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Types of forecasts

    Deterministic model• Decide for in initial state and start a

    single integration

    Ensemble prediction• Make many integrations (e.g. from

    different initial states, or multi model or analogue ensemble…)

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Types of observations

    • Point based• SYNOP • Ships, buoys • TEMP• Satellites• Airplanes• Observers

    • Spatial• Rain radar• Satellites• Model analysis

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Observations are not the truth!

    • Each observation is a model itself (exception: counting events)

    • Observations should come with an uncertainty which, if possible, should be considered in verification (but is rarely done)

    Verification against observations is (almost) alway s flawed

    • Model value is a grid box average and observations within a grid box might vary strongly. Many ways exist to match observation to grid point

    • Is the observed value really what you want you model to predict?

    • Gridded observations or analysis might be a work around but then, those rely on models (statistical or physical) again and might not be independent from the forecast.

    Caution!

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Forecast properties

    Deterministic prediction systemBias - mean errorAssociation - e.g. correlation

    Ensemble prediction systemReliability - conditional bias over several categories (usually forecast probabilities)Resolution - ability to resolve events in different subsetsSharpness - spread of the forecast distributionUncertainty - observation variability

    BothSkill - Value w.r.t. a reference forecast (e.g. persistence or climatology)Value - Is the forecast helpful for decision making

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Types of verificationContinuousFor deterministic predictions as time-series, spatial data or both combinedExample: temperature, pressure, upper-air variables

    Dichotomous (binary, 2 categories, yes/no, special case multiple categories)For deterministic predictions as time-series, spatial data or both combinedExample: rain yes or no? cloud amount category, wind speed, warnings

    EnsembleFor ensemble models considering the forecast distributionExample: Does the ensemble spread capture the forecast uncertainty

    ProbabilisticFor probabilities derived from ensemble modelsExample: probability to exceed wind speed of 10 m/s?

    SpatialMostly deterministic modelExample: Are objects predicted at the correct location?

    II Verification Methods

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Deterministic Forecasts

    BIAS (mean error)

    Shows the average direction of the error (positive or negative)

    In NWP defined positive (negative) if model forecas t quantities are larger (smaller) than observed

    Does not indicate the magnitude of the error as pos itive an negative values might cancel each other out

    Same unit as variable

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Deterministic Forecasts

    MAE (mean absolute error)

    Average error magnitude

    Does not indicate the direction of the error

    Same unit as variable

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    RMSE (root mean squared error)

    Average error magnitude with quadratic weight

    Sensitive to large errors, always larger than MAE ( sum of bias and error variance)

    RMSE > MAE means variation in errors

    Same unit as variable

    Deterministic Forecasts

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Correlation coefficient & anomaly correlation

    Correspondence between forecast and observations

    Measures linear association and phase errors.

    Independent from biases.

    Can give misleading results if verification sample is inhomogeneous (e.g. temperature correlation with day and night values i n one sample)

    (Anomaly correlation should be used to reduce effec ts of inhomogeneity)

    Deterministic Forecasts

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    EXAMPLE DETEMINISTIC TEMP VERIFICATION OF GEOPOTENT IAL OVER EUROPE

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Categorical Forecasts

    Observed

    yes no Total

    Forecast yes hits false alarms forecast yes

    no misses correct negatives forecast no

    Total observed yes observed no total

    • Used for binary data• If data is not binary, decide for a threshold (e.g. precipitation > 10mm/h) and

    make your data binary• Sum up all entries in the contingency table

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Categorical Forecasts!!CAUTION!!Famous example: Tornado forecast verificationCollection of tornado forecasts (yes/no) and outcom es

    Observed

    yes no Total

    Forecast yes 28 72 100

    no 23 2680 2703

    Total 51 2752 2803

    Accuracy = (28+2680)/2803 = 96.6% (published in Ame ric. Meteorol. Journal 1884)

    If no tornados were forecast at all: Accuracy = (0+ 2752)/2803 = 98.2%

    It is advisable to use more measures than just accu racy…

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Categorical Forecasts

    fraction of correct forecasts (best=1)

    under or over forecasting (best=1)

    correctly forecast events (best=1)

    wrongly forecast events (best=0)

    Can forecast separate yes from no events (best=1)

    How much more often is a event forecast correctly than incorrectly (best= Inf)

    Many more exist, see http://www.cawcr.gov.au/projec ts/verification/#Methods_for_dichotomous_forecasts

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Multi-categorical Forecasts

    Observed Category Total

    i,j 1 2 ... K

    1 n(F1,O1) n(F1,O2) ...n(F1,OK)

    N(F1)

    Forecast 2 n(F2,O1) n(F2,O2) ...n(F2,OK)

    N(F2)

    Category...

    ... ... ... ... ...

    K n(FK,O1) n(FK,O2) ...n(FK,OK)

    N(FK)

    Total N(O1) N(O2) ... N(OK) N

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Ensemble Forecasts(taking into account all members)

    • An EPS provides a range of forecasts • By comparing just a singe EPS

    forecast to an observation nothing can be said about the forecast quality!

    • Even an observation outside the EPS does not mean the forecast is wrong

    • Evaluation an EPS requires the collection of many cases

    • This allows to infer on the statistical correctness of the forecast distribution given by the EPS

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Ensemble Forecasts

    Reliability Resolution

    Measures ability to discriminate different events

    Measures average agreement

    ObservationsForecasts

    Forecasts

    Event 1

    Event 2

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Ensemble Forecasts

    Talagrand diagram (a.k.a. rank histogram)

    • Count the number of cases an observation falls in each of the bins given by the ensemble forecast (e.g. 20 Member EPS has 21 bins)

    • As each member should be equally likely the talagrand diagram should be flat (necessary (not sufficient) criterion for reliability)

    • Tails in the diagram indicate overall biases • Peaks on the left and right indicate too little spread• Hill shape indicates too much spread

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Example COSMO -DE-EPS

    Hourly precipitation summer 12

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Ensemble ForecastsSpread/Skill behavior

    • The spread (width) of an ensemble forecast should be related to the uncertainty of the forecast

    • It is desirable to have growing spread when the forecast error grows• Common measure for spread is the average of the standard deviation over a set of

    ensemble forecast• Common measure for skill is the RMSE of the ensemble mean• In case of no bias, spread and skill should give the same value

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Example COSMO -DE-EPS

    RMSE (SKILL)

    STDEV(SPRED)

    VMAX_10M July 2012

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Ensemble ForecastsCRPS (continuous rankes probability score)

    • Like MSE for ensemble predictions• Observation and forecast are expressed as cumulative density function and the

    average difference of the probabilities is calculated• Can be decomposed in reliability, resolution and uncertainty components

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Ensemble Forecasts

    Other measure

    • Check the number of outliers, i.e. observation falling outside the ensemble range 2/(n+1) is expected for a perfect EPS!

    • The ensemble mean is often verified in a deterministic manner

    • Each member can be verified in a deterministic manner

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Reliability diagram

    • Visualization of conditional (on forecast probability) biases• Decide for a threshold (e.g. temperature>=°C), convert forecasts to probabilities

    exceeding this threshold and convert observation to binary according to threshold• Plot frequency of observation for each forecast probability class• Binning requires a lot of data

    Probabilistic Forecasts(transform ensemble to probability)

    overconfident forecast

    (not enough spread)

    biased forecast

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Example COSMO -DE-EPS

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Brier Score

    • Like MSE for probabilistic forecasts (magnitude of error between forecast probability [0%-100%] and observed probability [0% or 100%])

    • Decide for a threshold (e.g. temperature>=°C), convert forecasts to probabilities exceeding this threshold and convert observation to binary according to threshold

    • Can be decomposed in resolution, reliability and uncertainty component• Perfect score = 0

    Probabilistic Forecasts

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    ROC (relative operating characteristic)

    • Decide for an event threshold• Calculate contingency table entries for a set of probability thresholds• Plot POD against FAR• Perfect when area under ROC curve = 1• Measures forecast resolution (forecasts can discriminate events)• If line falls under diagonal, forecast is worse than a random guess• Can be used to compare with deterministic forecast

    Probabilistic Forecasts

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Probabilistic Forecasts

    Relative (economic) value score

    C = Costs for taking preventive actionL = Loss if no preventive action was taken

    • For each possible forecast probability calculate contingency table entries• Calculate VS for a number of cost/lass values [0-1]• Quantifies the relative monetary value of an EPS for a decision making problem• Considers the costs of a forecast user linked to the forecast event• Can give an indication for the best probability a forecast user should base her

    decision on• For calibrated (unbiased) forecast systems the best probability equals C/L

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Deterministic & Probabilistic

    Skill Score (reduction of error variance)

    • Can be applied to any score• Reference could e.g. be a climatological forecast, a persistence forecast or another

    model• Perfect score = 1 , 0 indicates no improvement over reference• Result is % improvement in score compared to reference forecast• Ultimate answer to the question “how good is the forecast?”• Allows to compare scores for different events (e.g. easy and hard to predict)

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Example COSMO -DE-EPS

    Hourly precipitation summer 12

    Reference: determ. COSMO

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Spatial Verification

    Problems of traditional (point-to-point) verification methods

    Double Penalty• Location or timing errors in the forecast are

    penalized double• E.g. rain if forecast where there is no rain

    observed and no rain is forecast where it actually was observed

    • Increasingly problematic with increasing resolution of forecast models

    Forecast of objects

    • Properties of objects like rain cells of cloud are important aspects of a forecast and not captured by point-to-point-verification

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Spatial Verification

    FUZZY (neighborhood)

    • Decide for a set of thresholds• Gradually smooth forecast and/or observed fields

    (set of smoothing functions are possible)• Decide for a verification measure, e.g. Fraction Skill Score

    • Useful if forecast and observation are available as grid (e.g. observation from rain radar)

    • Shows useful scales of predictability • Popular for comparing models with different horizontal resolution• Reduced double penalty effect with larger scales• Many dichotomous or probabilistic scores can be used for analysis

    ∑ ∑

    = =

    =

    +

    −−=

    N

    i

    N

    iobsfcst

    N

    iobsfcst

    PP

    PP

    1 1

    22

    1

    2

    N1

    N1

    )(N1

    1FSS

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Spatial Verification

    Object-based

    • Extract objects from forecast/observed fields (e.g. rain cells, clouds,…)• Make statistics on properties of those objects (e.g. size, location, magnitude,…)• Some object based methods use an object matching and tracking in time• Object based methods try to mimic the forecast users perception of the forecast quality• Double penalty effects can be avoided

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Take special care of

    • Calculating scores is easy, preparing data for the verification task is usually most time consuming

    • Data quality (eliminate erroneous measurements)

    • Use an appropriate score (e.g. no RMSE to verify precipitation)

    • Use a homogeneous data set (stratify as much as possible)

    • Stratify after observations or external factors (rather than forecast values)

    • Use as many as possible data (aggregate if possible)

    • Try to implement error bars (and avoid dependent observations)

    • Keep in mind that your observation and verification is usually imperfect (don’t expect perfect results)

    III Final Remarks

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Verification Guidelineo Who is the user?

    (forecaster, developer, administrative, decision ma ker,…)

    o What forecast aspects are relevant to this user?(parameter, domain, warnings, scenarios,…)

    o What observations are available?(point, gridded, analysis, quality,…)

    o What methods are possible with the given data(deterministic, ensemble, probabilistic, fuzzy, …)

    o What score(s) give(s) the right information?(bias, association, skill scores, economic value,…)

    o What is an appropriate reference to compare the for ecast to?(other model, climatology, persistence, randomness)

    o How should the result be visualized best?(text, case studies, graphs, errorbars,…)

    o Look at your data and don’t rely on scores only!(make scatterplots and look at individual forecasts )

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Active fields of research

    • Use of observation uncertainty in verification

    • Spatial verification methods applied to ensemble prediction systems

    • Accounting timing errors in verification (avoiding double-penalty effects in time)

    • Verifying forecast scenarios

    • Verification of extreme events

    • Multivariate verification

  • COSMO/CLM/ART Training Forecast Verification Felix Fundel

    Further readingCollection of methods & scores (lots of further lin ks)http://www.cawcr.gov.au/projects/verification

    Short Introductionhttp://www.dwd.de/DE/forschung/wettervorhersage/num_modellierung/05_verifikation/verifikation_node.html

    Collection of monitoring & verification products (o nly at DWD)http://oflxs04.dwd.de/~mkoehler/plot-catalog/index.php

    WV verification report (only at DWD)http://intranet.res.bund.de/downloads/VB52_Gesamtdokument.pdf

    Books/PapersForecast Verification (Joliffe & Stephenson)Fuzzy Methods Review (Ebert)WMO guidelines and meetings

    ToolsR and the packages “verification” and “SpatialVx”R package Rfdbk for using feedback filesCOSMO Common Verification VERSUSCOSMO Spatial Verification VAST