[email protected] 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish...

68
15.4.2005 NOMEK - Verification Training - OSLO / 1 pertti.nurmi@fmi. fi Pertti Nurmi ( Finnish Meteorological Institute ) neral Guide to Forecast Verificati ( Methodology ) NOMEK - Oslo 15.-16.4.2005 0 10 20 30 1980 1985 1990 1995 2000 T max D+2 0.2 0.4 0.6 T mean D+6-10 T mean D+1-5 T2m ; M E & M AE; ECM W F & LAM A verage over 30 stations; Winter2003 -1 0 1 2 3 4 5 6 12 18 24 30 36 42 48 54 60 72 84 96 108 120 M AE_ECM W F M AE_LAM M E_ECM W F M E _LA M (C) (hrs )

Transcript of [email protected] 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish...

Page 1: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Pertti Nurmi( Finnish Meteorological Institute )

General Guide to Forecast Verification( Methodology )

NOMEK - Oslo15.-16.4.2005

0

10

20

30

1980 1985 1990 1995 2000

T max D+2

0.2

0.4

0.6

T mean D+6-10

T mean D+1-5

T2m; ME & MAE; ECMWF & LAMAverage over 30 stations; Winter 2003

-1

0

1

2

3

4

5

6 12 18 24 30 36 42 48 54 60 72 84 96 108 120

MAE_ECMWF

MAE_LAM

ME_ECMWF

ME_LAM

(C)

( hrs )

Page 2: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

A glimpse to verification history,USA Tornados, 1884 (The Finlay case)

2680 + 302800

= 96,8 %

TornadoTornado observed

forecastYes No fc S

Yes 30 70 100

No 20 2680 2700

obs S 50 2750 2800

Page 3: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Glimpse to history from the Wild West, cont’d...

2750 + 02800

= 98,2 %

96,8 %

TornadoTornado observed

forecastYes No fc S

Yes 30 70 100

No 20 2680 2700

obs S 50 2750 2800

TornadoTornado observed

forecastYes No fc S

Yes 0 0 0

No 50 2750 2800

obs S 50 2750 2800

Never forecasta Tornado

Page 4: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Another interpretation:

96,8 %3050= 60 % POD, Probability Of Detection

70100

= 70 % FAR, False Alarm Ratio

98,2 %

POD = FAR = B = 0 % !

TornadoTornado observed

forecastYes No fc S

Yes 0 0 0

No 50 2750 2800

obs S 50 2750 2800

TornadoTornado observed

forecastYes No fc S

Yes 30 70 100

No 20 2680 2700

obs S 50 2750 2800

Back to the original results:

10050= 2 B or FBI, (Frequency) Bias

Page 5: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

First reminder on verification:

An act (“art”?) of countless methods and measures

An essential daily real-time practice in the operational forecasting environment

An active feedback and dialogue process is a necessity

A fundamental means to improve weather forecasts and services

Page 6: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Outline:1. Introduction - History

2. Goals and general guidelines

3. Continuous variables

4. Categorical events• Binary (dichotomous; yes/no) forecasts• Multi-category forecasts

5. Probability forecasts6. Forecast value (NOT covered under these lectures)

References• Literature• Websites

…You heard it already

Acknowledgement: Laurie Wilson (CMC, Canada)

<= Break?

Page 7: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Outline:1. Introduction - History

2.Goals and general guidelines

Page 8: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Goals of *THIS* Training:

Understand the basic properties and relationships among common verification measures

Learn to extract useful information from (graphical) verification results

Increase interest in forecast verification and the methods Apply them during everyday forecasting practice

Emphasis is on verification of weather elements rather than, e.g. NWP fields

2.

Page 9: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Goals of (objective) Verification:

• “Administrative” Feedback process to operational forecasters => Example follows ! Monitor the quality of forecasts and potential trends in quality Justify cost of provision of weather services Justify acquisition of additional or new models, equipment, …

• “Scientific” Identify strengths and weaknesses of a forecast product leading to

improvements, i.e. provide information to direct R&D

• Value (NOT covered explicitly under these lectures) Determine the (economic) value of the forecasts to users Quantitative information on user’s economic sensitivity to weather

is needed

2

Page 10: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Personal scoring (example) ...2

A B C A B C

Page 11: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Principles of (objective) Verification:

• Verification activity has value only if the information generated leads to a decision about the forecast itself or the forecast system being verified User of the information must be identified Purpose of the verification must be known in advance

• No single verification measure can provide complete information about forecast quality

• Forecasts should be formulated in a verifiable form

2

Page 12: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Operational Verification - “State-of-the-Art”

• Comprehensive comparison of forecast(er)s vs. observations

• Stratification and aggregation (pooling) of results

• Statistics of guidance forecasts (e.g. NWP, MOS)

• Instant feedback to forecasters

• Statistics of individual forecasters – e.g. Personal biases

• Comprehensive set of tailored verification measures

• Simplified measures for laymen

• Continuity into history

2

Page 13: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Allan Murphy’s (rip) “Goodness”:

• Consistency:– Forecasts agree with forecaster’s true belief about

the future weather [ strictly proper ]; cf. Hedging

• Quality:– Correspondence between observations and forecasts

[ verification ]

• Value:– Increase or decrease in economic or other kind of

value to someone as a result of using the forecast [ decision theory ]

2

Page 14: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Verification Procedure… Define predictand types:• Continuous: Forecast is a specific value of the variable

• Categorical - Probabilistic: Forecast is the probability of occurrence of ranges of values of the variable (categories)

Temperature; fixed time (e.g. noon), Tmin, Tmax, time-averaged (e.g. 5-day)

Wind speed and direction; fixed time, time-averaged

Precipitation (vs. no precipitation) - POP; with various rainfall thresholds

Precipitation type Cloud amount Strong winds (vs. no strong wind); with various wind force thresholds

Night frost (vs. no frost)

Fog (vs. no fog)

2

Page 15: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Verification Procedure, cont’d…

Define the purpose of verification– Scientific vs. administrative

– Define questions to be answered

Distinguish the dataset of matched observation and forecast pairs

Dataset stratification (from “pooled” data)– “External” stratification by time of day, season, forecast lead-time etc.

– “Internal” stratification, to separate extreme events for example According to forecast According to observation

– Maintain sufficient sample size

2

Page 16: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Outline:1. Introduction - History

2. Goals and general guidelines

3.Continuous variables

Page 17: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

3. Continuous Variables: First explore the data• Scatterplots of forecasts vs. observations

– Visual relation between forecast and observed distributions– Distinguish outliers in forecast and/or observation datasets– Accurate forecasts have points on a 45 degree diagonal

• Additional scatterplots– Observations vs. [ forecast - observation ] difference– Forecasts vs. [ forecast - observation ] difference– Behaviour of forecast errors with respect to observed or forecast

distributions - potential clustering or curvature in their relationships

• Time-series plot of forecasts vs. observations (or forecast error)– Potential outliers in either forecast or observation datasets– Trends and time-dependent relationships

• Neither scatterplots nor time series plots provide any concrete measures of accuracy

Page 18: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Continuous Variables - Example 1; Exploring the data

Scatterplot of one year of ECMWF three-day T2m forecasts (left) and forecast errors (right) versus observations at a single location. Red, yellow and green dots separate the errors in three categories.

3

Page 19: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Mean Error aka Bias

ME = ( 1/n ) Σ ( f i – o i )

– Average error in a given set of forecasts

– Simple and informative score on the behaviour of a given weather element

– With ME > 0 ( < 0 ), the system exhibits over- (under-) forecasting

– Not an accuracy measure; Does not provide information of magnitude of errors

– Should be viewed in comparison to climatology

Mean Absolute Error

MAE = ( 1/n ) Σ | f i – o i |

– Average magnitude of errors in a given set of forecasts

– Linear measure of accuracy

– Does not distinguish between positive and negative forecast errors

– Negatively oriented, i.e. smaller is better

– Illustrative => recommended to view ME and MAE simultaneously => Examples follow !

Range: - to Perfect score = 0

Range: 0 to Perfect score = 0

Continuous Variables3

Page 20: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Mean Squared Error

MSE = ( 1/n ) Σ ( f i – o i ) 2

or its square root, RMSE, which has the same unit as the forecast parameter

– Negatively oriented, i.e. smaller is better

– A quadratic scoring rule; Very sensitive to large forecast errors !!! Harmful in the presence of potential outliers in the dataset Care must be taken with limited datasets Fear for high penalties easily leads to conservative forecasting

– RMSE is always >/= MAE

– Comparison of MAE and RMSE indicates the error variance

– MSE - RMSE decomposition is not dealt with here:

Acknowledge Anders Persson (yesterday)

Range: 0 to Perfect score = 0

Continuous Variables3

Page 21: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Continuous Variables - Example 1, cont’d…

Scatterplot of one year of ECMWF three-day T2m forecasts (left) and forecast errors (right) versus observations at a single location. Red, yellow and green dots separate the errors in three categories. Some basic statistics like ME, MAE and MSE are also shown. The plots reveal the dependence of model behaviour with respect to temperature range, i.e. over- (under‑) forecasting in the cold (warm) tails of the distribution.

3

Page 22: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Continuous Variables – Example 2

T2m; ME & MAE; ECMWF & LAMAverage over 30 stations; Winter 2003

-1

0

1

2

3

4

5

6 12 18 24 30 36 42 48 54 60 72 84 96 108 120

MAE_ECMWF

MAE_LAM

ME_ECMWF

ME_LAM

(C)

( hrs )

Temperature bias and MAE comparison between ECMWF and a Limited Area Model (LAM) (left), and an experimental post-processing scheme (PPP) (right), aggregated over 30 stations and one winter season. In spite of the ECMWF warm bias and diurnal cycle, it has a slightly lower MAE level than the LAM (left). The applied experimental “perfect prog” scheme does not manage to dispose of the model bias and exhibits larger absolute errors than the originating model – this example clearly demonstrates the importance of thorough verification prior to implementing a potential post-processing scheme into operational use.

T2m; ME & MAE; ECMWF & PPPAverage over 30 stations; Winter 2003

0

1

2

3

4

5

6

6 12 18 24 30 36 42 48 54 60 72 84 96 108 120

MAE_ECMWF

MAE_PPP

ME_ECMWF

ME_PPP

(C)

hrs

3

Page 23: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

3

MOS vs. EP MAE

Aggregate of:

6 months; Jan – June 3 lead times; +12, +24, +48 hr 4 stations in Finland

Continuous Variables: Aggregation (pooling) vs. Stratification

Page 24: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

3Continuous Variables: Aggregation (pooling) vs. Stratification

Stratified by lead time Stratified by month

Stratified by monthStratified by station

Page 25: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

3

MOS vs. EP Bias

Aggregate of:

6 months; Jan – June 3 lead times; +12, +24, +48 hr 4 stations in Finland

Continuous Variables: Aggregation (pooling) vs. Stratification

Page 26: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

3Continuous Variables: Aggregation (pooling) vs. Stratification

Stratified by lead time Stratified by month

Stratified by stationStratified by station

Page 27: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

General Skill Score

SS = ( A – A ref ) / ( A perf – A ref )

where A = the applied measure of accuracy,

subscript ”ref” refers to some reference forecast, ”perf” to a perfect forecast

For negatively oriented accuracy measures like MAE or MSE :

SS = [ 1 - A / A ref ] * 100

i.e. Relative accuracy of the % improvement over a reference system

– Reference is typically climatology or persistence; => Apply both; Examples follow !

– If negative, the reference (climate or persistence) is better

MAE_SS = [ 1 - MAE / MAE ref ] * 100

MSE_SS = [ 1 - MSE / MSE ref ] * 100

– Latter also known as Reduction of Variance, RV– SS can be unstable for small sample sizes, especially with MSE_SS

Range: - to 100Perfect score = 100

Continuous Variables3

Page 28: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

T2m; MAE; Average over 3 stations & forecast ranges +12-120 hrs

0

1

2

3

4

Winter2001

Spring2001

Summer2001

Autumn2001

Winter2002

Spring2002

Summer2002

Autumn2002

Winter2003

Timeaverage

(C) End Product "Better of ECMWF / LAM"

Mean Absolute Errors of End Product and DMO temperature forecasts (left), and Skill of the End Products over model output (right). The better of either ECMWF or local LAM is chosen up to the +48 hour forecast range (hindcast), thereafter ECMWF is used. The figure is an example of both aggregation (3 stations, several forecast ranges, two models, time-average) and stratification (seasons).

T2m; Skill of End Product over "Better of ECMWF / LAM"

0

5

10

15

Winter2001

Spring2001

Summer2001

Autumn2001

Winter2002

Spring2002

Summer2002

Autumn2002

Winter2003

Timeaverage

(%) Skill

Continuous Variables – Example 33

Page 29: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Linear Error in Probability Space

LEPS = ( 1/n ) Σ | CDFo (f i) – CDFo (o i) |

where CDFo is the Cumulative probability Density Function of the observations,

determined from a relevant climatology

– Corresponds to MAE transformed into probability space from measurement space

– Does not depend on the scale of the variable

– Takes into account the variability of the weather element

– Can be used to evaluate forecasts at different locations

– Computation requires definition of cumulative climatological distributions at each location

– Encourages forecasting in extreme tails of the climate distributions

– Penalizes less than for similar sized errors in a more probable region of the distribution

i.e. opposite to MSE; =>Examples will follow !

Skill Score

LEPS_SS = [ 1 - LEPS / LEPS ref ] * 100

Range: 0 to 1Perfect score = 0

Range: - to 100Perfect score = 100

Continuous Variables3

Page 30: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

LEPS for a hypothetical distribution and location: The climatological frequency distribution (left) is transformed to a cumulative probability density distribution (right). A 2 ”unit” forecast error around the median, 13 vs. 15 “units” (red arrows), would yield a LEPS value of c. 0.2 in the probability space ( | 0.5 – 0.3 |, red arrows).

An equal error in the measurement space close to the tail of the distribution, 21 vs. 23 ”units” (blue arrows), would result a LEPS value of c. 0.05 ( | 0.95 – 0.9 |, blue arrows) => Fc errors of rare events are much less penalized using LEPS !

Hypothetical Climatological Distribution

0

5

10

15

20

25

30

35

40

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

n Climatology

Hypothetical Cumulative Density Function

0,0

0,2

0,4

0,6

0,8

1,0

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

CDF

0,2

0,05

Page 31: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Skill comparison (example A) ...

Continuous Variables3

Page 32: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Continuous Variables

Skill comparison (example B) ...

3

Page 33: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Continuous Variables3

Skill comparison (example C) ...

Page 34: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Continuous variables - Summary: Verify a comprehensive set of local weather elements Produce scatterplots & time-series plots, including forecasts

and/or observations against their difference ”Stratify & Aggregate” + Compute ME, MAE, MAE_SS Additionally, compute LEPS, LEPS_SS, MSE, MSE_SS

Examples 1 - 4 in the General Guide to Verification (NOMEK Training)

Examples: Temperature: fixed time (e.g. noon, midnight), Tmin, Tmax, time-averaged (e.g. 5-day)

Wind speed and direction: fixed time, time-averaged

Accumulated precipitation: time-integrated (e.g. 6, 12, 24 hours)

Cloudiness: fixed time, time-averaged; However, typically categorized

3

Page 35: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Outline:1. Introduction - History

2. Goals and general guidelines

3. Continuous variables

4.Categorical events• Binary (dichotomous; yes/no)

forecasts

• Multi-category forecasts

Page 36: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

EventEvent observed

forecastYes No Marginal total

Yes Hit False alarm Fc Yes

No Miss Corr. rejection Fc No

Marginal total Obs Yes Obs No Sum total

EventEvent observed

forecastYes No Marginal total

Yes a b a + b

No c d c + d

Marginal total a + c b + d a + b + c + d =n

TornadoTornado observed

forecastYes No fc S

Yes 30 70 100

No 20 2680 2700

obs S 50 2750 2800

4. Categorical Events

Page 37: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

EventEvent observed

forecastYes No Marginal total

Yes a b a + b

No c d c + d

Marginal total a + c b + d a + b + c + d =n

Bias aka Frequency Bias Index

B = FBI = ( a + b ) / ( a + c ) [ ~ Fc Yes / Obs Yes ]

– With B > 1 , the system exhibits over-forecasting.

– With B < 1 , the system exhibits under-forecasting.

Proportion Correct

PC = ( a + d ) / n [ ~ ( Hits + Correct rejections ) / Sum total ]

– Most simple and intuitive performance measure.

– Usually very misleading because rewards correct “Yes” and “No” forecasts equally.

– Can be maximized by forecasting the most common category all the time.

– Strongly influenced by the more common category.

– Never for extreme event verification !!!

Range: 0 to Perfect score = 1

Range: 0 to 1Perfect score = 1

TornadoTornado observed

forecastYes No fc S

Yes 30 70 100

No 20 2680 2700

obs S 50 2750 2800

B = 2.00PC = 0.97

Categorical Events4

Page 38: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

EventEvent observed

forecastYes No Marginal total

Yes a b a + b

No c d c + d

Marginal total a + c b + d a + b + c + d =n

Probability Of Detection, Hit Rate ( H ), Prefigurance

POD = a / ( a + c ) [ ~ Hits / Obs Yes ]

– Sensitive to misses only, not false alarms.

– Can be artificially improved by over-forecasting (rare events).

– Complement score Miss Rate, MR = 1 – H = c / (a+c)

– Must be examined together with …

False Alarm Ratio

FAR = b / ( a + b ) [ ~ False alarms / Fc Yes ]

– Sensitive to false alarms only, not misses.

– Can be artificially improved by under-forecasting (rare events).

– Increase of POD can be achieved by increasing FAR, and vice versa.

Range: 0 to Perfect score = 1

Range: 0 to 1Perfect score = 0

TornadoTornado observed

forecastYes No fc S

Yes 30 70 100

No 20 2680 2700

obs S 50 2750 2800

B = 2.00PC = 0.97

POD = 0.60FAR = 0.70

Categorical Events4

Page 39: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

EventEvent observed

forecastYes No Marginal total

Yes a b a + b

No c d c + d

Marginal total a + c b + d a + b + c + d =n

Post agreement

PAG = a / ( a + b ) [ ~ Hits / Fc Yes ]

– Complement of FAR (i.e. = 1 – FAR).

– Sensitive to false alarms, not misses.

False Alarm Rate, Probability of False Detection ( POFD )

F = b / ( b + d ) [ ~ False alarms / Obs No ]

– False alarms, given the event did not occur (Obs No).

– Sensitive to false alarms only, not misses.

– Can be artificially improved by under-forecasting (rare events) – ref. Tornado case.

– Generally used with POD (or H) to produce the ROC score for probability forecasts;

– Otherwise rarely used.

Range: 0 to Perfect score = 1

Range: 0 to 1Perfect score = 0

TornadoTornado observed

forecastYes No fc S

Yes 30 70 100

No 20 2680 2700

obs S 50 2750 2800

B = 2.00PC = 0.97POD = 0.60FAR = 0.70

PAG = 0.30F = 0.03

Categorical Events4

Page 40: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

EventEvent observed

forecastYes No Marginal total

Yes a b a + b

No c d c + d

Marginal total a + c b + d a + b + c + d =n

Hanssen & Kuiper’s Skill Score, True Skill Statistics

KSS = TSS = POD – F = ( ad – bc ) / [ (a+c) (b+d) ] – Popular combination skill score of POD and F.

– Measures ability to separate “yes” cases (POD) from “no” cases (F).

– For rare events, d cell is high => F small => KSS close to POD.

Threat Score, Critical Success Index

TS = CSI = a / ( a + b + c )– Simple popular measure of rare events. Sensitive to hits, false alarms and misses.

– Measure of forecast after removing correct (simple) “no” forecasts from consideration.

– Sensitive to climatological frequency of event.

– More balanced than POD or FAR.

Range: -1 to Perfect score = 1No skill level = 0

Range: 0 to 1Perfect score = 1No skill level = 0

TornadoTornado observed

forecastYes No fc S

Yes 30 70 100

No 20 2680 2700

obs S 50 2750 2800

B = 2.00PC = 0.97POD = 0.60FAR = 0.70PAG = 0.30F = 0.03

KSS = 0.57TS = 0.25

Categorical Events4

Page 41: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

EventEvent observed

forecastYes No Marginal total

Yes a b a + b

No c d c + d

Marginal total a + c b + d a + b + c + d =n

Equitable Threat Score

ETS = ( a – a r ) / ( a + b + c – a r ) where a r = ( a + b ) ( a + c ) / n

… is the number of hits due to random forecasts.

Simple TS may include hits due to random chance.

Heidke Skill Score

HSS = 2 ( ad – bc ) / [ ( a + c )( c + d ) + ( a + b )( b + d ) ]– One of the most popular skill measures for categorical forecasts.

– Score against random chance.

Range: -1/3 to Perfect score = 1No skill level = 0

Range: - to 1Perfect score = 1No skill level = 0

TornadoTornado observed

forecastYes No fc S

Yes 30 70 100

No 20 2680 2700

obs S 50 2750 2800

B = 2.00PC = 0.97POD = 0.60FAR = 0.70PAG = 0.30F = 0.03KSS = 0.57TS = 0.25

ETS = 0.24HSS = 0.39

Categorical Events4

Page 42: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

EventEvent observed

forecastYes No Marginal total

Yes a b a + b

No c d c + d

Marginal total a + c b + d a + b + c + d =n

Odds ratio

OR = a d / b cMeasures forecast system’s probability (odds)to score a hit (H) as compared to making a false alarm (F):

OR = [ H / ( 1 – H ) ] / [ F / ( 1 – F ) ]

– Independent of potential biases between observations and forecasts.

Transformation into a skill score, ranging from -1 to +1:

ORSS = ( ad – bc) / ( ad + bc ) = ( OR – 1 ) / ( OR + 1 )

– Produces typically very high absolute skill values, due to definition.

– Practically never used in meteorological forecast verification.

Range: 0 to Perfect score = No skill level = 1

TornadoTornado observed

forecastYes No fc S

Yes 30 70 100

No 20 2680 2700

obs S 50 2750 2800

B = 2.00PC = 0.97POD = 0.60FAR = 0.70PAG = 0.30F = 0.03KSS = 0.57TS = 0.25ETS = 0.24HSS = 0.39

OR = 57.43ORSS = 0.97

Categorical Events4

Page 43: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

RainRain observed

forecastYes No fc S

Yes 52 45 97

No 22 227 249

obs S 74 272 346

Contingency table of one year (with 19 missing cases) of categorical rain vs. no rain forecasts (left), and  resulting statistics (right). Rainfall is a relatively rare event at this particular location, occurring in only c. 20 % (74/346) of the cases. Due to this, PC is quite high at 0.81. The relatively high rain detection rate (0.70) is “balanced” by high number of false alarms (0.46), with almost every other rain forecast having been superfluous. This is also seen as biased over-forecasting of the event (B=1.31). Due to the scarcity of the event the false alarm rate is quite low (0.17) – if used alone this measure would give a very misleading picture of forecast quality. The Odds Ratio shows that it was 12 times more probable to make a correct (rain or no rain) forecast than an incorrect one. The resulting skill score (0.85) is much higher than the other skill scores which is to be noted - this is a typical feature of the ORSS due to its definition.

B = 1.31 TS = 0.44 PC = 0.81 ETS = 0.32

~> POD = 0.70 KSS = 0.53 FAR = 0.46 HSS = 0.48 PAG = 0.54 OR = 11.92 F = 0.17 ORSS = 0.85

Precipitation in Finland

Categorical Events – Example 54

Page 44: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Multi-category Events• Extension of 2*2 to several (k) mutually exhaustive categories

– Rain type: rain / snow / freezing rain (k=3)– Wind warnings: strong gale / gale / no gale (k=3)

• Only PC (Proportion Correct) can be directly generalized

• Other verification measures need be converted into a series of 2*2 tables– “Forecast event” distinct from the “non-forecast event”

Generalization of KSS and HSS – measures of improvement over random forecasts:

KSS = { Σ p ( fi , oi ) - Σ p ( fi ) p ( oi ) } / { 1 - Σ ( p (fi) ) 2 }

HSS = { Σ p ( fi , oi ) - Σ p ( fi ) p ( oi ) } / { 1 - Σ p ( fi ) p ( oi )}

Observed

Forecasto 1 o 2 o 3 fc S

f 1 r s t S f a = r b= s+t

f 2 u v w S f 2 a = v b= u+w c= u+x d= v+w+y+z

f 3 x y z S f 3 a = z b= x+y c= s+y d= r+t+x+z

obs S S o S o 2 S o 3 S c= t+w d= r+s+u+v

4

Page 45: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Multi-category contingency table of one year (with 19 missing cases) of cloudiness forecasts (left), and resulting statistics (right). Results are shown exclusively for forecasts of each cloud category, together with the overall PC, KSS and HSS scores. The most marked feature is the very strong over-forecasting of the “partly cloudy” category leading to numerous false alarms (B=2.5, FAR=0.8), and, despite this, the poor detection (POD=0.46). The forecasts cannot reflect the observed U‑shaped distribution of cloudiness at all. Regardless of this inferiority both overall skill scores are relatively high (c. 0.4), following the fact that most of the cases (90 %) fall either in the “no cloud” or “cloudy” category - neither of these scores takes into account the relative sample probabilities, but weight all correct forecasts similarly.

No clouds (0-2) Partly cloudy (3-5) Cloudy (6-8)

B = 0.86 B = 2.54 B = 0.79 POD = 0.58 POD = 0.46 POD = 0.65

~> FAR = 0.32 FAR = 0.82 FAR = 0.18 F = 0.13 F = 0.25 F = 0.19 TS = 0.45 TS = 0.15 TS = 0.57

Overall: PC = 0.61 KSS = 0.41 HSS = 0.37

CloudsClouds observed

forecast0 - 2 3 - 5 6 - 8 fc S

0 - 2 65 10 21 96

3 - 5 29 17 48 94

6 - 8 18 10 128 156

obs S 112 37 197 346

4 Multi-category Events – Example 6

Cloudiness in Finland

Page 46: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

The previous data transformed into hit/miss bar charts, either given the observations (left), or given the forecasts (right). The green, yellow and red bars denote correct and one and two category errors, respectively. The U-shape in observations is clearly visible (left), whereas there is no hint of such in the forecast distribution (right).

112

197

37

96

156

94

4 Multi-category Events – Example 6, cont’d…

Page 47: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Multi-category Events

Example from Finland, Again !

33

106

4

Page 48: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Examples Rain (vs. no rain); with various rainfall thresholds

Snowfall; with various thresholds

Strong winds (vs. no strong wind); with various wind force thresholds

Night frost (vs. no frost)

Fog (vs. no fog)

Categorical (binary, multi-category) events - Summary: Verify a comprehensive set of categorical local weather events

• Compile relevant contingency tables• Include multi-category events• Focus on adverse and/or extreme local weather

“Stratify & Aggregate” + Compute B, (PC), POD & FAR, (F),

(PAG), KSS, TS, ETS, HSS Additionally, compute OR, ORSS, ROC

Examples 5 - 6 in the General Guide to Verification (NOMEK Training)

4

Page 49: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Outline:1. Introduction - History

2. Goals and general guidelines

3. Continuous variables

4. Categorical events• Binary (dichotomous; yes/no) forecasts• Multi-category forecasts

5.Probability forecasts

Page 50: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Why Probability Forecasts ?

®“… the widespread practice of ignoring uncertainty when formulating and communicating forecasts represents an extreme form of inconsistency and generally results in the largest possible reductions in quality and value.”

- Allan Murphy (1993)

5

( A sophisticated, indirect phrase to emphasize the importance of addressing uncertainty… )

Page 51: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Why Probability Forecasts ?

®“… Go look at the weather,I believe it’s gonna rain”

- Legendary Chicago Blues ArtistMuddy Waters (early 1960s)

singing ”Clouds in My Heart”

5

( A simple, direct phrase to emphasize uncertainty in everyday life… )

Page 52: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Probability Forecasts

• All forecasting involves some level of uncertainty

• Deterministic forecasts cannot address the inherent uncertainty of the weather parameter or event

• Probabilities of the expected event (with values between 0 % and 100 % or 0 and 1) take into account the underlying joint distribution { p ( f, x ) } between forecasts and observations

• Conversion of probability forecasts to categorical events is simple (but not necessarily advisable) by defining the “on/off” threshold; Reverse is not straightforward.

• Verification is somewhat laborious => Large datasets are required to obtain any significant information

5

Page 53: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Reliability Diagram: Preparation

• Stratify probability forecasts and observations into deciles

• For each decile, record observed frequency of the event

• Keep track of the number of cases in each decile “bin”

• Plot on a diagram

• Plot additional histogram of the number of

fc cases in each bin => Sharpness Diagram

FC Probability FCs Events Non-Events Obs. RelativeBin in Bin in Bin in Bin Frequency0 65 2 63 3

10 23 3 20 1320 26 6 20 2330 25 10 15 4040 25 10 15 4050 20 10 10 5060 35 20 15 5770 30 20 10 6780 35 25 10 7190 12 10 2 83100 4 4 0 100

300 120 180

5

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Forecast probability (%)

Ob

se

rve

d r

el.

fre

qu

en

cy

(%

)

Page 54: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Reliability Diagram: Interpretation

Reliability– Above 45 degree line => Underforecasting

– Below 45 degree line => Overforecasting

– Analogous to bias

– One of the components of the Brier score (see later)

Sharpness (Resolution)

– U-shaped histogram best

– Gaussian distribution worst

– Measure of spread (variance)

in the distribution of forecasts

5

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Forecast probability (%)

Ob

se

rve

d r

el.

fre

qu

en

cy

(%

)

Page 55: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Climatology Minimal RESOLUTION Underforecasting

Good RESOLUTION at expense of RELIABILITY

Reliable forecasts of a rare event

Small sample size

from Wilks, 19955

Page 56: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Brier Score

BS = ( 1/n ) Σ ( p i – o i ) 2

– Most common accuracy measure of probability forecasts; Note: o i is binary (0 or 1) !!! – Analogous to MSE in probability space; Negatively oriented, i.e. smaller is better

– A quadratic scoring rule; Very sensitive to large forecast errors !!! Careful with limited datasets

– For two categories only, for multiple categories see RPS …

– Strongly influenced by climatological frequency of the verification sample Different samples not to be compared

– BS can be algebraically decomposed => Reliability, Resolution, Uncertainty

Brier Skill Score

BSS = [ 1 – BS / BS ref ] * 100

BS ref = ( 1/n ) Σ ( ref i – o i ) 2

where ref i is either climatological relative frequency of the event, or persistence

Range: 0 to 1Perfect score = 0

Probability Forecasts: Measures

Range: - to 100Perfect score = 100

5

Page 57: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Ranked Probability Score

RPS = ( 1/(k-1)) Σ { ( Σ p i ) – ( Σ o i ) } 2

where k is the number of probability categories

Ranked Probability Skill Score

RPSS = [ 1 – RPS / RPS ref ] * 100

– Vector generalization of BS and BSS to multi-event or multi-category situations

– Measures the sums of squared differences in cumulative probability space

– Quadratic score – Penalizes most severely when forecast probabilities are further from the actual observed distributions

– As with BSSS, RPSS is very sensitive to size of dataset

Range: 0 to 1Perfect score = 0

Range: - to 100Perfect score = 100

Probability Forecasts: Measures5

Page 58: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Probability Forecasts - Signal Detection Theory:ROC - Relative Operating Characteristic

• Determines the ability of a forecasting system to separate situations when a signal is present (e.g. occurrence of rain) from absent signal (noise)

• In other words: Assesses the performance of a forecasting system to discriminate between occurrence (Yes) and non-occurrence (No) of an event

• Tests e.g. model performance relative to a specific threshold

• Applicable for two-category probability forecasts and also categorical deterministic forecasts, i.e. allows their comparison

• Gained wider and wider popularity in meteorological forecast verification during recent years

( Receiver-Operating Characteristics in medical sciences )

5

Page 59: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Signal Detection Theory: ROC Curve

• Graphical representation in a square box of the Hit rate (H) (y-axis) against the False Alarm Rate (F) (x-axis) for different potential decision thresholds

• Curve is plotted from a “binned” set of probability forecasts by stepping (or sliding) a decision threshold (e.g. 10% probability intervals) through the forecasts, each probability decision threshold generating a separate 2*2 contingency table

– The probability forecast is transformed into a set of categorical “yes/no” forecasts

– A set of value pairs of H and F is obtained, forming the curve

• It is desirable that H be high and F be low, i.e. the closer the point is to the upper left-hand corner, the better the forecast

• A perfect forecast system, with only correct forecasts & no false alarms, (regardless of the threshold chosen) has a “curve” that rises from (0,0) (H=F=0) along the y-axis to (0,1) (upper left-hand corner; H=1, F=0) and then straight to (1,1) (H=F=1)

EventEvent observed

forecastYes No Marginal total

Yes a b a + b

No c d c + d

Marginal total a + c b + d a + b + c + d =n

H = a / ( a + c )

F = b / ( b + d )

5

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

H

F

10%

20%

30%

40%

50%

60%

90%

80%

70%

Page 60: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

ROC Curve Generation EventEvent observed

forecastYes No Marginal total

Yes a b a + b

No c d c + d

Marginal total a + c b + d a + b + c + d =n

5

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

H

F

10%

20%

30%

40%

50%

60%

90%

80%

70% To learn more about ROC and Signal Detection Theory, check:

http://wise.cgu.edu/

H = a / ( a + c )

F = b / ( b + d )

a+c =1920 b+d =5351

Example

Probability # of Cumulative # of Non- Cumulative Non- H FThreshold Occurences Occurences Occurences Occurencies (%) (%)

a b

0 - 9 43 1920 613 5351 100 10010 - 19 172 1877 1389 4738 98 8920 - 29 283 1705 1183 3349 89 6330 - 39 350 1422 936 2166 74 4040 - 49 323 1072 602 1230 56 2350 - 59 287 749 327 628 39 1260 - 69 169 462 151 301 24 670 - 79 163 293 88 150 15 380 - 89 89 130 40 62 7 190 - 99 41 41 22 22 2 0

( S a ) ( S b )

Page 61: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Signal Detection Theory: ROCA Area

• Area under the ROC curve

• A relative index and a widely used summary measure

• Decreases from 1 when curve moves downward from the ideal top-left corner

• A useless forecast system is along the diagonal, when H=F and the area is = 0.5;

Such system cannot discriminate between occurrences and non-occurrences of the event

ROCA based skill score:

ROC_SS = 2 * ROCA - 1

• Negative below the diagonal

• At it’s minimum: ROC_SS = - 1, when ROCA = 0

• ROC is applicable for deterministic categorical forecast

– ROC_SS translates into KSS (= H – F)

– Only one single decision threshold - only a single ROC point results

Typically this is “inside“ the ROC area, i.e. indicating worse quality

• ROC, ROCA and ROC_SS are directly related to a decision-theoretic approach

– Can be related to the economic value of probability forecasts to end users

– Allowing for the assessment of the costs of false alarms

Range: -1 to 1Perfect score = 1

Range: 0 to 1Perfect system = 1

5

Page 62: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Reliability (left) and ROC (right) diagrams of one year of PoP forecasts. (The data are the same as earlier – where PoPs were transformed into categorical yes/no forecasts by using 50 % as the “on/off” threshold.) The inset box in the reliability diagram shows the frequency of use of the various forecast probabilities (sharpness) and the horizontal dotted line the climatological event probability. The reliability curve (with open circles) indicates strong over-forecasting bias throughout the probability range. This seems to be a common feature at this particular location as indicated by the qualitatively similar 10-year average reliability curve (dashed line). Brier skill scores (BSS) are computed against two reference forecast systems. Of these, climatology appears to be a much stronger “no skill opponent” than persistence. The ROC curve (right) is constructed on the basis of forecast and observed probabilities leading to different potential decision thresholds and respective value pairs of H and F. Also ROCA and

ROC_SS values are shown. The black dot represents the single value ROC from the categorical binary case of Example 5 (Slide #39) (H=POD=0.7; F=0.17).

5 Probability Forecasts – Example 7

Page 63: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

ROC Curve: Probability FC Scheme, Vers.1 5

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

H

F

10%

20%

30%

40%

50%

60%

80%

70%

D0

10

20

30

40

50

60

70

0 10 20 30 40 50 60 70 80 90 100

Page 64: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

ROC Curve: Probability FC Scheme, Vers.2 5

0

10

20

30

40

50

60

70

0 10 20 30 40 50 60 70 80 90 100

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

10%

20%

50%

60%

90%

80%

70%

D

5%

40%

H

F

Page 65: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

Probability Forecasts - Summary: Verify a comprehensive set of probability forecasts focusing

on adverse and/or extreme weather Produce reliability diagrams, including sharpness

distribution Compute BS, BSS, for multi-category events RPS, RPSS

Produce ROC diagrams, ROCA, ROC_SS Example 7 in the General Guide to Verification (NOMEK Training)

BS:• Based on squared error

• Decompositions provide insight into several performance attributes (NOT discussed here)

• Dependent on frequency of occurrence of the event

ROC:• Considers forecasts’ ability to discriminate

between Yes and No events

• Provides verification information for individual decision thresholds

• Less dependent on frequency of occurrence of event

5

Page 66: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

FINAL reminder on verification:

An act (“art”?) of countless methods and measures

An essential daily real-time practice in the operational forecasting environment

An active feedback and dialogue process is a necessity

A fundamental means to improve weather forecasts and services

Page 67: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

References: LiteratureBougeault, P., 2003. WGNE recommendations on verification methods for numerical prediction of weather elements and severe weather events (CAS/JSC WGNE Report No. 18)

Proceedings, Making Verification More Meaningful (Boulder, 30 July - 1 August 2002)

Proceedings, SRNWP Mesoscale Verification Workshop (De Bilt, 2001)

Proceedings, WMO/WWRP International Conference on Quantitative Precipitation Forecasting (Vols. 1 and 2, Reading, 2 - 6 September 2002)

Wilks, D.S., 1995. Statistical Methods in the Atmospheric Sciences: An Introduction (Chapter 7: Forecast Verification) (Academic Press)

Jolliffe, I.T. and D.B. Stephenson, 2003. Forecast Verification: A Practitioner’s Guide in Atmospheric Sciences (Wiley)

Stanski, H.R., L.J. Wilson and W.R. Burrows, 1989. Survey of Common Verification Methods in Meteorology (WMO Research Report No. 89-5)

Cherubini, T., A. Ghelli and F. Lalaurette, 2001. Verification of precipitation forecasts over the Alpine region using a high density observing network (ECMWF Tech. Mem., 340, 18pp)

Murphy, A.H. and R.L. Winkler, 1987. A General Framework for Forecast Verification (Mon. Wea. Rev., 115, 1330-1338)

Stephenson, D.B., 2000. Use of the “Odds Ratio” for Diagnosing Forecast Skill (Weather and Forecasting, 15, 221-232)

Grazzini, F and A. Persson, 2003: User Guide to ECMWF Forecast Products (ECMWF Met. Bull., M3.2) Thornes, J.E. and D.B. Stephenson, 2001. How to judge the quality and value of weather forecast products (Meteorol. Appls., 8, 307-314)

Page 68: pertti.nurmi@fmi.fi 15.4.2005 NOMEK - Verification Training - OSLO / 1 Pertti Nurmi ( Finnish Meteorological Institute ) General Guide to Forecast Verification.

15.4.2005NOMEK - Verification Training - OSLO / [email protected]

References: Websiteshttp://www.bom.gov.au/bmrc/wefor/staff/eee/verif/verif_web_page.html http://www.bom.gov.au/bmrc/wefor/staff/eee/verif/jwgv/jwgv.htmll - WMO/WWRP/WGNE Working Group on Verification websites

http://www.bom.gov.au/bmrc/wefor/staff/eee/verif/Workshop2004/home.html - International Verification Methods Workshop (Montreal, 2004)

http://www.rap.ucar.edu/research/verification/ver_wkshp1.html - Making Verification More Meaningful Workshop (Boulder, 2002)

http://www.chmi.cz/meteo/ov/wmo - WMO/WWRP Workshop on the Verification of QPF (Prague, 2001)

http://hirlam.knmi.nl/open/srnwp/ - EUMETNET/SRNWP Mesoscale Verification Workshops (DeBilt, 2004)

http://www.sec.noaa.gov/forecast_verification/Glossary.html - NOAA/SEC Glossary of verification terms

http://nws.noaa.gov/tdl/verif - NOAA MOS verification website

http://wwwt.emc.ncep.noaa.gov/gmb/ens/verif.html - NOAA EPS Verification website

http://www.wmo.ch/web/www/DPS/SVS-for-LRF.html - WMO/CBS Standardised Verification System for Long-Range Forecasts

http://www.ecmwf.int/products/forecasts/d/charts/medium/verification - Verification of ECMWF Forecasting System

http://www.ecmwf.int/products/forecasts/guide - User Guide to ECMWF Forecast Products