Post on 08-Jul-2020
Training Seminar, 5 Nov 2008 1
Verification at JMA
on Ensemble Prediction
Hitoshi Sato, Yukiko Naruse
Climate Prediction Division
Japan Meteorological Agency
Verification on Ensemble Prediction 2
Contents
Part Ⅰ one-month prediction
Purposes of verification
Verification of one-month prediction
Part Ⅱ seasonal prediction
Verification of seasonal prediction
Standardized Verification System (SVS) for
Long-Range Forecasts (LRF)
Verification I : One-month prediction 3
Verification at JMA
on Ensemble Prediction
Part Ⅰ One-month prediction
Purposes of verification
Verification of one-month prediction
methods
results
Verification I : One-month prediction 4
Why Verify?
Purposes of Verification are:
to monitor forecast quality how accurate are the forecasts and
are they improving?
to guide forecasters and users help forecasters understand model biases and skills
help users interpret forecasts
to guide future developments identify and correct model faults
Verification I : One-month prediction 5
1-month forecast
3-month forecast
Warm/Cold season forecast
Verification I : One-month prediction 6
Verification of
operational 1-month forecasts
Error Map for Every Forecast
Ensemble mean forecast error maps,
RMSE and Anomaly Correlation
Probabilistic forecast
Reliability diagrams and ROC curves
Time sequence of ACC and RMSE
Summary in each year
Verification I : One-month prediction 7
Verification of 1-month
ensemble mean forecast maps
Z500 over the Northern Hemisphere
Stream function (850hPa, 200hPa)
Observation Forecast Error
Verification I : One-month prediction 8
Verification of
Probabilistic forecasts
・Reliability diagrams and Brier skill scores
・ROC curves and area
Verification I : One-month prediction 9
Reliability diagram
The reliability diagram plots the
observed frequency (Y-axis)
against the forecast probability
(X-axis).
The diagonal line indicates perfect
reliability (observed frequency
equal to forecast probability for
each category).
perfect reliability
climatology
Brier Scores
forecast frequency
Points below (above) the diagonal line
indicate overforecasting (underforecasting).
Verification I : One-month prediction 10
Steps for making reliability diagram
1. For each forecast probability category, count the number of
observed occurrences
2. Compute the observed relative frequency in each category k
obs. relative frequencyk = obs. occurrencesk / num. forecastsk
3. Plot observed relative frequency vs forecast probability
4. Plot sample climatology (no resolution line)
sample climatology = obs. occurrences / num. forecasts
5. Plot forecast frequency
Climatology
(no resolution)
Reliability diagram
Forecast frequency
11
Brier (skill) score
N
i
ii opN
BS1
2)(1
size sample :N
1)or(0occurrenceobserved:
yprobabilitforecast:
i
i
o
p
Brier Score measures mean squared error
of the probability forecasts.
Brier Skill Score measures skill relative to a
reference forecast (usually climatology).
referencereferenceperfect
reference
BS
BS
BSBS
BSBSBSS
1
Range: 0 to 1. Perfect score: 0
Range: minus infinity to 1. BSS=0 indicates no skill compared to the
reference forecast. Perfect score: 1.
12
Decomposition of the Brier score
)1()(1
)(1 2
11
2 oooonN
opnN
BSK
k
kk
K
k
kkk
Murphy(1973) showed that the Brier score can be decomposed into 3 terms (for K
probability classes and N samples). These terms show sources of error.
Reliability (brel) Resolution (bres) Uncertainty
(bunc) -- the mean squared difference
between the forecast
probability and the observed
frequency.
Perfect score: 0
-- the mean squared difference
between the observed frequency
and climatological frequency.
-- indicates the degree to which the
forecast can separate different
situations.
climatologial
forecast score:0
-- measures the
variability of the
observations.
occurrenceicalclimatolog:o
13
Brier skill score
referenceforecastperfect
referenceforecast
ScoreScore
ScoreScoreScoreSkill
Brier skill score
limlim
lim 10 cc
c
BS
BS
BS
BSBSBSS
Reliability skill score
lim
1cBS
brelBrel
buncBSc lim
Resolution skill score
Range: minus infinity to 1. Perfect score: 1
BSS=0 indicates no skill compared to the climatology. BSS>0 : better than clim.
= the relative skill of the probabilistic forecast to the climatology
limcBS
bresBres
Score×100
Perfect score: 1
Perfect score: 1
The larger these skill scores are, the better.
Verification I : One-month prediction 14
Interpretation of Reliability diagram and BSS
Event : Z500 Anomaly > 0
Northern Hemisphere
Spring 2008 (2008/2/28 ~2008/5/29)
1st week forecast
(day 2-8) 3rd and 4th week forecast
(day 16-29)
overforecasting
underforecasting
BSS<0
inferior to climatology
BSS>0
better than climatology
Verification I : One-month prediction 15
Relative Operating Characteristic
(ROC)
ROC is created by plotting the hit rate
(Y-axis) against the false alarm rate (X-
axis) using increasing probability
thresholds to make the yes/no decision.
The area under the ROC curve (=ROC
area) is frequently used as a score.
Perfect: ROC area= 1
No skill: ROC area= 0.5
Verification I : One-month prediction 16
Steps for making ROC diagram
1. For each forecast probability category, count the number of hits,
misses, false alarms, and correct non-events
2. Compute the hit rate and false alarm rate in each category k
hit ratek= hitsk/ (hitsk+ missesk)
false alarm ratek= false alarmsk/ (false alarmsk+ correct non-eventsk)
3. Plot hit rate vs false alarm rate
4. ROC area is the integrated area under the ROC curve
yes no
yes hits false alarms
no misses correct non-events
total Observed yes
Observed no
Observed
Fore
cast
Verification I : One-month prediction 17
Interpretation of ROC curves
ROC is not sensitive to bias in the forecast. A biased forecast may still have good resolution
and produce a good ROC curve, which means that it may be possible to improve the
forecast through calibration.
Thus, the ROC can be considered as a measure of potential usefulness.
On the other hand, reliability diagram measures bias. It is a good partner to the ROC.
Event: Z500 anomaly > 0
Northern Hemisphere
Spring 2008 1st week forecast 3rd and 4th week forecast
high resolution
(high potential skill)
low resolution
(low potential skill)
Perfect
performance
Verification I : One-month prediction 18
Anomaly Correlation and RMSE
Time sequence of Anomaly Correlation and RMSE
in each season and in each year
Verification I : One-month prediction 19
Anomaly Correlation of T850 in summer 2008
Seasonal mean scores
Verification I : One-month prediction 20
Anomaly correlation of Z500 over the Northern Hemisphere
Verification I : One-month prediction 21
Anomaly correlation of Z500 over the Northern Hemisphere
(1996-2008) 28 days mean, running mean of 52 forecasts (1year)
El nino La nina
Verification I : One-month prediction 22
Summary of PartⅠ: One-month prediction
Verification of operational prediction ・・・TCC website
Forecast error map (visual verification)
Reliability diagram, BSS, ROC
ACC, RMSE
Verification of Hindcast ・・・inside only
Bias, ACC, RMSE, forecast map, ・・・
Improvement of forecast skills
Verification I : One-month prediction 23
References
Murphy, A.H., 1973: A new vector partition of the probability score. J. Appl. Meteor.,
12, 595-600.
http://www.eumetcal.org.uk/eumetcal/verification/www/english/courses/msgcrs/inde
x.htm
http://www.bom.gov.au/bmrc/wefor/staff/eee/verif/verif_web_page.html
http://www.ecmwf.int/newsevents/meetings/workshops/2007/jwgv/index.html