Evaluation of Potential Performance Measures for the Advanced Hydrologic Prediction Service
Gary A. WickNOAA Environmental Technology Laboratory
On Rotational Assignment with the NWSOffice of Hydrologic Development
November 2003
Overview
Performance measures for the Advanced Hydrologic Prediction Service (AHPS)
Review of probabilistic forecast verification measures
Trial application with operational forecast data Recommendations
AHPS
AHPS Science Activities
Development Activity Effectiveness Contribution to forecast maturity
Number of science tools deployed per year
Contribution to information content
Advanced Hydrologic Prediction Service (AHPS)
DeploymentNumber of forecast points
Coverage area
Maturity, e.g. metrics addressing Usage of forecast information
Probabilistic forecast effectiveness
Program Performance
Existing Measures
Future Measures
Science Projects Performance
Probabilistic Forecast Verification
Categorical forecasts Brier Score Rank Probability Score (RPS)
Categorical Forecasts
Transforms probabilistic forecast into a categorical forecast through selection of a probability threshold
Simple but doesn’t fully address probability
Brier Score
Simple extension that fully characterizes probabilistic forecasts
Limited to occurrence of a specific event
N
iii op
NBS
1
21
Rank Probability Score
Extension characterizing full distribution of forecasts
Ideal as science measure but added complexity a concern at program level
Application of Accuracy Measures
Deterministic Measures Probabilistic Brier Score
Deterministic Application
Used National Weather Service verification database– Monthly data for 177 sites starting April 2001– Results computed “on-the-fly”
Evaluated accuracy difference between AHPS and non-AHPS points– Considered subset of points for the North Central,
Ohio, and Missouri Basin River Forecast Centers Expressed in terms of mean absolute error and
root mean square error
Deterministic ResultsBelow Flood Stage Forecast Mean Absolute Error
MBRFC, NCRFC, OHRFC April 2001 - July 2003
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
DAY1 AHPS DAY1 NON-AHPS
DAY2 AHPS DAY2 NON-AHPS
DAY3 AHPS DAY3 NON-AHPS
Ft
FAST
MEDIUM
SLOW
Below Flood Stage Forecast RMSEMBRFC, NCRFC, OHRFC April 2001 - July 2003
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
DAY1 AHPS DAY1 NON-AHPS
DAY2 AHPS DAY2 NON-AHPS
DAY3 AHPS DAY3 NON-AHPS
Ft
FAST
MEDIUM
SLOW
Below Flood Stage Forecast Number of SamplesMBRFC, NCRFC, OHRFC April 2001 - July 2003
0
5000
10000
15000
20000
25000
30000
35000
DAY1 AHPS DAY1 NON-AHPS
DAY2 AHPS DAY2 NON-AHPS
DAY3 AHPS DAY3 NON-AHPS
Sa
mp
les
FAST
MEDIUM
SLOW
Deterministic ResultsMBRFC, NCRFC, OHRFC April 2001 - July 2003
Mean Absolute Error (Combined Response)
0
0.5
1
1.5
2
2.5
3
DAY1 AHPS DAY1 NON-AHPS
DAY2 AHPS DAY2 NON-AHPS
DAY3 AHPS DAY3 NON-AHPS
Ft
BELOW FLOOD STAGE
ABOVE FLOOD STAGE
MBRFC, NCRFC, OHRFC April 2001 - July 2003RMSE (Combined Response)
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
DAY1 AHPS DAY1 NON-AHPS
DAY2 AHPS DAY2 NON-AHPS
DAY3 AHPS DAY3 NON-AHPS
Ft
BELOW FLOOD STAGE
ABOVE FLOOD STAGE
MBRFC, NCRFC, OHRFC April 2001 - July 2003Number of Samples (Combined Response)
0
10000
20000
30000
40000
50000
60000
70000
DAY1AHPS
DAY1 NON-AHPS
DAY2AHPS
DAY2 NON-AHPS
DAY3AHPS
DAY3 NON-AHPS
# O
bs
BELOW FLOOD STAGE
ABOVE FLOOD STAGE
Implications: Deterministic
Possible to implement something rapidly Characterization must be defined Existing verification database and interface
inadequate
Brier Score Evaluation
Sample ensemble forecasts and verification provided by Kristie Franz– 43 sites from the Ohio River Forecast Center
– 11 weekly mean and monthly maximum exceedance forecasts
– Forecast traces, verification, and historical data
Evaluated accuracy of forecasts for exceedance of flood stage
Brier Score Evaluation Use of all forecasts suggested
very high accuracy
Use of all forecasts suggests very high accuracy Only 17% improvement over forecasts for no flooding
Monthly Maximum Exceedance Forecasts
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Bri
er S
core
Brier Score Evaluation Most revealing results obtained for forecasts where
flooding occurred
Forecasts Where Flooding Occurred
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
BBVK2 CARI2 WLBK2 ALL
Bri
er S
core
Rank Probability Score
Could interpret as accuracy from 76 to 91%
RPS for OHRFC Forecast Points/Mean Weekly Stage Forecasts(from Franz and Sorooshian, 2002)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
FLRK2 WRTO1 PKYK2 DLYW2 ELKK2 PSTK2 PTVK2 CMBK2 WLBK2 BBVK2
Forecast Point
RP
S
Implications: Probabilistic
Brier score can be presented simply and meaningfully
Application limited by constraint to instances of flooding
Rank probability score addresses all forecasts but meaning is harder to express
Necessary to regularly archive ensemble forecasts and verification
Recommendations
Performance measures Data collection Additional analyses
Recommended AHPS Accuracy Measures
Deterministic River Forecast Accuracy Probabilistic River Forecast Accuracy Flood Forecast Accuracy
AHPS Deterministic River Forecast Accuracy
Percent accuracy of mean daily streamflow for days 1-3
Evaluates short-term hydrograph forecasts
AHPS Probabilistic River Forecast Accuracy
RPS derived accuracy of weekly mean streamflow exceedance
Evaluates AHPS weekly chance of exceedance forecasts
Express as percent accuracy Apply to week 2 and week 4 forecasts
AHPS Flood Forecast Accuracy
Derived from Brier score and weekly maximum stage forecasts
Simple evaluation of weekly exceedance forecasts Express as percent accuracy for cases where
flooding occurred Apply to week 2 and week 4 forecasts
Recommended Data Archival
Forecast Data– Ensemble forecast traces at selected points
Verification– Corresponding stage/streamflow observations
Historical Data– Ensure consistency with forecast quantities
Further Analysis
Apply proposed measures to enhanced set of archived data
Evaluations will help illustrate where forecast skill exists and improvements are possible
Explore possible alternatives for collection of climatological data
Final metric selection best made after more comprehensive evaluation
Closing Notes
Accuracy measures provide a bridge between programmatic and science activities and metrics
Important to recognize limitations of accuracy measures and continue consideration of other metrics
Top Related