Real-time Verification of Operational Precipitation Forecasts using Hourly Gauge Data
Andrew Loughe Judy Henderson
Jennifer Mahoney Edward Tollerud
Real-time Verification System (RTVS) of NOAA / FSLBoulder, Colorado USA
Outline
Some approaches to objective verification
How we perform automated precipitation verification
What we mean by "real-time"
Forecasts + Obs --> Results disseminated over the web. (The steps involved)
QC, Model comparisons, Statistical displays
Future direction
If you don't have objective data
you are just another person
with an opinion
Our Approach?
Basically, we're gross!
No really, we are...
We process 4,500 gauge measurements each hour of every day. On average we retain 2,800 "good" reports. That's 67,000 observations per day, 200,000 per month, and over 6 million per season.
The Real-time Verification System
An independent, real-time, automated data ingest and management system
Gauge observations received each hour of every day (~4500)
Gross error check on observations is performed
Model forecasts interpolated to the observation points
Results stored in 2 x 2 contingency tables of forecast / observation pairs (YY, YN, NY, NN)
Graphics, skill scores and contingency information disseminated over the WorldWide Web
Alternative Approaches(Should be objective)
Grid-to-grid verification
We're game... but not yet!
More fair to the modelers
Less fair to the end-users of the forecast?
More representative of the areal coverage of precipitation
Can do pattern matching and partitioning of the error (Ebert, et al.) or studies of representativeness error (Foufoula et al.)
What about Case Studies?Do you fish with a pole or do you fish with a net?
We fish with a net
Case studies are insufficient for evaluating national-scale forecast systems
Subjective analyses often focus on where forecasts work well, and not on where they work poorly
There exists a need to assess variability on many time and space scales (from daily to seasonal)
Timely and objective information is needed for decision making
Realtime or Near-Realtime?
Realtime processing... Monthly and Seasonal dissemination of results (for now)
Gauge data stored in hourly bins
Model data interpolated once the observations catch up (Models initialized as late as 18Z, and then 24h forecasts are made)
Data collected over numerous accumulation periods
Go with the flow...
I. Obtain gauge data and collect it into hourly bins Match data with list of "good" stations (QC'd list)
II. Interpolate model data to "good" observation points
III. Accumulate precipitation over 3, 6, 12, 24 hours
IV.Compute contingency pairs (YY, YN, NY, NN)
V. Process these contingency data to create plots of ESS and Bias for Eta and RUC2
VI.Make these displays and the associated statistical information available through the web
A Point-Specific Approach
(Eta at 40 km)
Gauge Data Checked for Accuracy
Hourly gauge data are checked for accuracy vs. radar, 24h totals, nearest neighbor
Further data are included through in-house QC efforts
Forecast / Observation Comparisons
Comparisons made at numerous thresholds from 0.1 to 5.0 inches
Comparisons made over 3, 6, 12, and 24h accumulation periods
2x2 Contingency Tables
Dichotomous Forecasting
Basic Definitions
An "event" is one of:
hit = YY YES Forecast YES Observed false_alarm = YN YES Forecast NOT Observed detection_failure= NY NOT Forecast YES Observed null_event = NN NOT Forecast NOT Observed
From which these basic terms may be defined:
numevents = YY + YN + NY + NN Number of eventsyes_obs = YY + NY Number of observed eventsyes_fcst = YY + YN Number of forecast eventsnot_obs = YN + NN Number of events not observednot_fcst = NY + NN Number of events not forecastfcst_or_obs = YY + YN + NY Number of events forecast or observedcorrect = YY + NN Number of events correctly forecast
Skill Scores
* POD = hits / yes_obs Probability of detection
FOM = detection_failures / yes_obs Frequency of misses (1 - POD)
* PON = null_events / not_obs Probability of null event
POFD = false_alarms / not_obs Probability of false detection (probability of false alarm) (1 - PON)
FOH = hits / yes_fcst Frequency of hits (1 - FAR)
* FAR = false_alarms / yes_fcst False alarm ratio
FOCN = null_events / not_fcst Frequency of correct null forecasts (1 - DFR)
DFR = detection_failures / not_fcst Detection failure ratio
* BIAS = yes_fcst / yes_obs Frequency Bias, a measure of over- or under- forecasting
* CSI = hits / fcst_or_obs Critical Success Index (CSI or Threat Score)
* TSS = POD - POFD True Skill Statistic [ hits/yes_obs - false_alarms/not_obs ]
* HSS = (correct - chance) / (numevents - chance);
where chance = ( yes_fcst * yes_obs + not_fcst * not_obs ) / numevents
* ESS = (hits - chance) / (fcst_or_obs - chance); where chance = (yes_fcst * yes_obs) / numevents
Results Available over the Webwww-ad.fsl.noaa.gov/afra/rtvs/precip
Specify parameters... obtain graphical result
View contingency tables stored on disk
The Future!Access and Displays via Database
(Model Icing Forecasts)
Specify parameters Display results (gnuplot) via database query (MySQL)
Are these methods sufficient?
Trade off between dealing with the specifics and dealing with the general (rifle vs. shotgun)
Method is not discretized by region or event
Density of observations is not smooth
Although method is straightforward, there still is a lack of understanding for what the skill scores represent
May tell you which forecast system is "better", but not why
Future Plans
Add more models to this point-specific approach, and provide a measure of confidence
Perform verification using a gridded, analyzed precipitation field (Stage IV Precipitation)
Verify the probabilistic forecasts of ensembles
Move verification data into the relational database and compute results on-the-fly
Relate verification results geographically
Access verification results as soon as the forecast period ends (timeliness)
Contd, ...
Test and extend QC of the observations
Currently we are:
Assessing skill using East-only and West-only hourly station data
Assessing skill using full RFC and the in-house QC methods
Assessing skill using no QC methods whatsoever
Comparing these four experimental results
ProblemNot Reporting "Zero" Precipitation?
The Affect on Precipitation Verification
Top Related