Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater -...
-
Upload
irene-walsh -
Category
Documents
-
view
216 -
download
1
Transcript of Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater -...
Multi-Model or Post-processing: Pros and Cons
Tom Hopson - NCAR
Martyn Clark - NIWA
Andrew Slater - CIRES/NSIDC
2 Questions:
1) How best to utilize a multi-model simulation (forecast), especially if under-dispersive?
a) Should more dynamical variability be searched for? Or
b) Is it better to balance post-processing with multi-model utilization to create a properly dispersive, informative ensemble?
2) How significant are the gains of a multi-model approach over utilizing a single best model with a few “tricks”?
a) Blending two post-processing approaches (quantile regression and logistic regression)
b) Applying “poor-man’s” data assimilation
Outline
Explore these two questions using multi-model simulations for the French Broad River, NC of MOPEX
I. Multi-model: Framework for Understanding Structural Errors (FUSE)
Pre-calibration results => under-dispersive
II. Calibration procedure Introduce Quantile Regression (“QR”; Kroenker and Bassett,
1978)
III. Discussing of Question 1) -- how best to utilize multi-model
IV. Question 2 -- multi-model vs single-best model single-best model calibration procedure: Logistic Regression
employed under QR framework
PRMS SACRAMENTO
ARNO/VICTOPMODELwltθ fldθ satθ
lzZ
uzZ
pe
S1T S1
F
S2T
S2F
A
S2F
B
qif
qbA
qbB
q12
wltθ fldθ satθ
lzZ
uzZ
e p
qsx
qb
q12
S2
S1
wltθ fldθ satθ
GFLWR lzZ
uzZ
S2
S1
e p
q12
qsx
qb
e p
wltθ satθ
S1TA
fldθ
lzZ
uzZqsx
qif
qb
S1TB S1
F
q12
S2
Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735.
FUSE: Framework for Understanding Structural Errors
PRMS SACRAMENTO
ARNO/VICTOPMODELwltθ fldθ satθ
lzZ
uzZ
pe
S1T S1
F
S2T
S2F
A
S2F
B
qif
qbA
qbB
q12
wltθ fldθ satθ
lzZ
uzZ
e p
qsx
qb
q12
S2
S1
wltθ fldθ satθ
GFLWR lzZ
uzZ
S2
S1
e p
q12
qsx
qb
e p
wltθ satθ
S1TA
fldθ
lzZ
uzZqsx
qif
qb
S1TB S1
F
q12
S2
Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735.
Define development decisions: upper layer architecture
PRMS SACRAMENTO
ARNO/VICTOPMODELwltθ fldθ satθ
lzZ
uzZ
pe
S1T S1
F
S2T
S2F
A
S2F
B
qif
qbA
qbB
q12
wltθ fldθ satθ
lzZ
uzZ
e p
qsx
qb
q12
S2
S1
wltθ fldθ satθ
GFLWR lzZ
uzZ
S2
S1
e p
q12
qsx
qb
e p
wltθ satθ
S1TA
fldθ
lzZ
uzZqsx
qif
qb
S1TB S1
F
q12
S2
Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735.
Define development decisions: lower layer / baseflow
PRMS SACRAMENTO
ARNO/VICTOPMODELwltθ fldθ satθ
lzZ
uzZ
pe
S1T S1
F
S2T
S2F
A
S2F
B
qif
qbA
qbB
q12
wltθ fldθ satθ
lzZ
uzZ
e p
qsx
qb
q12
S2
S1
wltθ fldθ satθ
GFLWR lzZ
uzZ
S2
S1
e p
q12
qsx
qb
e p
wltθ satθ
S1TA
fldθ
lzZ
uzZqsx
qif
qb
S1TB S1
F
q12
S2
Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735.
Define development decisions: percolation
PRMS SACRAMENTO
ARNO/VICTOPMODELwltθ fldθ satθ
lzZ
uzZ
pe
S1T S1
F
S2T
S2F
A
S2F
B
qif
qbA
qbB
q12
wltθ fldθ satθ
lzZ
uzZ
e p
qsx
qb
q12
S2
S1
wltθ fldθ satθ
GFLWR lzZ
uzZ
S2
S1
e p
q12
qsx
qb
e p
wltθ satθ
S1TA
fldθ
lzZ
uzZqsx
qif
qb
S1TB S1
F
q12
S2
Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735.
Define development decisions: surface runoff
PRMS SACRAMENTO
ARNO/VICTOPMODELwltθ fldθ satθ
lzZ
uzZ
pe
S1T S1
F
S2T
S2F
A
S2F
B
qif
qbA
qbB
q12
wltθ fldθ satθ
lzZ
uzZ
e p
qsx
qb
q12
S2
S1
wltθ fldθ satθ
GFLWR lzZ
uzZ
S2
S1
e p
q12
qsx
qb
e p
wltθ satθ
S1TA
fldθ
lzZ
uzZqsx
qif
qb
S1TB S1
F
q12
S2
Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735.
Build unique models: combination 1
PRMS SACRAMENTO
ARNO/VICTOPMODELwltθ fldθ satθ
lzZ
uzZ
pe
S1T S1
F
S2T
S2F
A
S2F
B
qif
qbA
qbB
q12
wltθ fldθ satθ
lzZ
uzZ
e p
qsx
qb
q12
S2
S1
wltθ fldθ satθ
GFLWR lzZ
uzZ
S2
S1
e p
q12
qsx
qb
e p
wltθ satθ
S1TA
fldθ
lzZ
uzZqsx
qif
qb
S1TB S1
F
q12
S2
Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735.
Build unique models: combination 2
PRMS SACRAMENTO
ARNO/VICTOPMODELwltθ fldθ satθ
lzZ
uzZ
pe
S1T S1
F
S2T
S2F
A
S2F
B
qif
qbA
qbB
q12
wltθ fldθ satθ
lzZ
uzZ
e p
qsx
qb
q12
S2
S1
wltθ fldθ satθ
GFLWR lzZ
uzZ
S2
S1
e p
q12
qsx
qb
e p
wltθ satθ
S1TA
fldθ
lzZ
uzZqsx
qif
qb
S1TB S1
F
q12
S2
Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735.
Build unique models: combination 3
78 UNIQUE HYDROLOGIC MODELSALL WITH DIFFERENT STRUCTURE
Example: French Broad RiverBefore Calibration => underdispersive
Black curve shows observations; colors are ensemble
Goal of “calibration” or “post-processing”P
roba
bilit
y
calibration
Discharge
Pro
babi
lity
Skillful ensemble post-processing should correct:• the “on average” bias• as well as under-representation of the 2nd moment of the empirical forecast PDF (i.e. corrected its “dispersion” or “spread”)• ensemble represents a random-draw from an (unknown) PDF of hydrologic uncertainty
goal: produce calibrated ensemble forecast that has as much information content (i.e. “narrow PDF”) as possible
“spread” or “dispersion”
“bias”obs
obs
ForecastPDF
ForecastPDF
Discharge
Our approach: Quantile Regression (QR)
Benefits
1) Less sensitivity to outliers
2) Works with heteroscedastic errors
3) Optimally fit for each part of the PDF
4) “flat” rank histograms
Calibration ProcedureUse QR to perform a fit on 78 quantiles individually (recall:
78 FUSE models simulations).
For each of quantile:
1) Perform a “climatological” fit to the data=> simulation always as good as “climatology”
2) Starting with full regressor set, iteratively select best subset using “forward step-wise cross-validation”
– Fitting done using QR– Selection done by:
a) Minimizing QR cost functionb) Satisfying the binomial distribution=> Verification measures directly inform
the model selection
3) 2nd pass: segregate forecasts into differing ranges of ensemble dispersion, and refit models => forcing skill-spread utility
Pro
babi
lity
Discharge
obs ModelPDF
Regressor set for each quantile:
1) - 78) All individual 78 model simulations79) ensemble mean80) ensemble standard deviation81) ranked ensemble member (sorted ensemble that corresponds to quantile being fit)
Example: French Broad RiverBefore Calibration After Calibration
Black curve shows observations; colors are ensemble
Raw versus Calibrated PDF’s
Blue is “raw” FUSE ensemble
Black is calibrated ensemble
Red is the observed value
Notice: shift in both “bias” and dispersion of final PDF
(also notice PDF asymmetries) obs
Rank Histogram Comparisons
After quantile regression, rank histogram more uniform
(although now slightly over-dispersive)
Raw full ensemble After calibration
Calibration: Rank Histogram and Skill-Spread Information
SSr =rfrcst −rrefrperf −rref
X100%
(reference forecast uncalibrated FUSE ensemble)
Rank Histogram Skill Score = 0.86
Skill-Spread Skill Score = 0.75
Frequency Used forQuantile Fitting of Method I:
Best Model=76%Best Model=76%Ensemble StDev=13%Ensemble Mean=0%Ranked Ensemble=6%
What Nash-Sutcliffe implies about Utility
Sequentially-averaged FUSE models (ranked based on NS Score) and their resultant NS Score
Notice the degredation of NS with increasing # (with a peak at 2 models)
For an equitable multi-model, NS should rise monotonically
Maybe a smaller subset of models would have more utility? (A contradiction for an under-dispersive ensemble?)
What Nash-Sutcliffe implies about Utility (cont)
-- degredation with increased ensemble size
Initial Frequency Used forQuantile Fitting:
Best Model=76%Best Model=76%Ensemble StDev=13%Ensemble Mean=0%Ranked Ensemble=6%
What Nash-Sutcliffe implies about Utility (cont)
Reduced Set Frequency Used for Quantile Fitting:
Best Model=73%Best Model=73%Ensemble StDev=3%Ensemble Mean=32%Ranked Ensemble=29%
…using only top 1/3 of modelsTo rank and form ensemble mean …… earlier results …
Appears to be significant gains in the utility of the ensemble after “filtering” (except for drop in StDev) … however “proof is in the pudding” …Examine verification skill measures …
Skill Score Comparisonsbetween full- and “filtered” FUSE
ensemble setsPoints:
-- quite similar results for a variety of skill scores
-- both approaches give appreciable benefit over the original raw multi-model output
-- however, only in the CRPSS is there improvement of the “filtered” ensemble set over the full set
post-processing method fairly robustMore work (more filtering?)!
GREEN -- full calibrated multi-modelBLUE -- “filtered” calibrated multi-modelReference -- uncalibrated FUSE set
Calibration Procedure (cont)I. Method I (FUSE multi-model) regressor set for each 78 quantiles:
1) sorted associated ensemble member; 2) ensemble median; 3) ensemble standard deviation; 4) all 78 models
II. Method 2 (FUSE best member) regressor set for each 78 quantiles:
First step - use Logistic Regression (LR)– Calibrate CDF over prescribed set of climatological quantiles using FUSE best member
-- for each day: resample 78 member ensemble
1) best member; 2) persistence; 3) associated Logistic Regression ensemble member
Calibration Procedure-- single best model
Use QR to perform a fit on 78 quantiles individually (recall: 78 FUSE models simulations).
For each of quantile:
1) Perform a “climatological” fit to the data=> simulation always as good as “climatology”
2) Starting with full regressor set, iteratively select best subset using “forward step-wise cross-validation”
– Fitting done using QR– Selection done by:
a) Minimizing QR cost functionb) Satisfying the binomial distribution=> Verification measures directly inform
the model selection
3) 2nd pass: segregate forecasts into differing ranges of ensemble dispersion, and refit models => forcing skill-spread utility
Pro
babi
lity
Discharge
obs ModelPDF
Regressor set for each quantile:
First step - use Logistic Regression
– Calibrate CDF over prescribed set of climatological quantiles using FUSE best member
-- for each day: resample 78 member ensemble
1) best member; 2) persistence; 3) associated Logistic Regression ensemble member
Regression Variable Usagefor Best-member method)
Best Model 34%
Persistence 66%
Logistic regress q 86%
0 0.2 0.4 0.6 0.8 1
Precent Used
Log Reg
Persistence
Best Model
Regressors
Regressor Usage
Example: French Broad RiverI. Multi-Model Method II. Best-member Approach
Black curve shows observations; colors are ensemble
GREEN -- Multi-modelBLUE -- Best-ModelReference -- uncalibrated FUSE set
Skill Score Comparisonsbetween multi-model versus single best model calibrated ensembles
Points:
-- quite similar results for a variety of skill scores
-- both approaches give appreciable benefit over the original raw multi-model output
-- CRPSS shows an improvement of the single model over the full set
post-processing method fairly robust
2 Questions revisited:1) How best to utilize a multi-model simulations (forecast),
especially if under-dispersive?a) Should more dynamical variability be searched for? Orb) Is it better to balance post-processing with multi-model utilization to
create a properly dispersive, informative ensemble?“Answer”: adding more models can lead to decreasing skill of the
ensemble mean (even if the ensemble is under-dispersive)
2) How significant are the gains of a multi-model approach over utilizing a single best model with a few “tricks”?
a) Blending two post-processing approaches (quantile regression and logistic regression)
b) Applying “poor-man’s” data assimilation“Answer”: quantile-regression-based calibration is fairly robust and can do
a lot with just a single model, especially if a variety of approaches are utililized
=> Benefit of moving beyond model ensemble calibration alone, to include “kitchen sink”, including “blending” of different techniques
Summary
Quantile regression provides a powerful framework for improving the whole (potentially non-gaussian) PDF of an ensemble forecast
This framework provides an umbrella to blend together multiple statistical correction approaches (logistic regression, etc.)
“step-wise cross-validation” calibration approach used here ensures forecast skill greater than climatology and persistence (by utilizing both in the regression selection process)
The approach here improves a forecast’s ability to represent its own potential forecast error:
– More uniform rank histogram– Guaranteeing utility in the ensemble dispersion
(=> more spread, more uncertainty)