Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater -...

Multi-Model or Post-processing: Pros and Cons

Tom Hopson - NCAR

Martyn Clark - NIWA

Andrew Slater - CIRES/NSIDC

2 Questions:

1) How best to utilize a multi-model simulation (forecast), especially if under-dispersive?

a) Should more dynamical variability be searched for? Or

b) Is it better to balance post-processing with multi-model utilization to create a properly dispersive, informative ensemble?

2) How significant are the gains of a multi-model approach over utilizing a single best model with a few “tricks”?

a) Blending two post-processing approaches (quantile regression and logistic regression)

b) Applying “poor-man’s” data assimilation

Outline

Explore these two questions using multi-model simulations for the French Broad River, NC of MOPEX

I. Multi-model: Framework for Understanding Structural Errors (FUSE)

Pre-calibration results => under-dispersive

II. Calibration procedure Introduce Quantile Regression (“QR”; Kroenker and Bassett,

1978)

III. Discussing of Question 1) -- how best to utilize multi-model

IV. Question 2 -- multi-model vs single-best model single-best model calibration procedure: Logistic Regression

employed under QR framework

PRMS SACRAMENTO

ARNO/VICTOPMODELwltθ fldθ satθ

lzZ

uzZ

pe

S1T S1

F

S2T

S2F

A

S2F

B

qif

qbA

qbB

q12

wltθ fldθ satθ

lzZ

uzZ

e p

qsx

qb

q12

S2

S1

wltθ fldθ satθ

GFLWR lzZ

uzZ

S2

S1

e p

q12

qsx

qb

e p

wltθ satθ

S1TA

fldθ

lzZ

uzZqsx

qif

qb

S1TB S1

F

q12

S2

Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735.

FUSE: Framework for Understanding Structural Errors

PRMS SACRAMENTO


lzZ

uzZ

pe

S1T S1

F

S2T

S2F

A

S2F

B

qif

qbA

qbB

q12

wltθ fldθ satθ

lzZ

uzZ

e p

qsx

qb

q12

S2

S1

wltθ fldθ satθ

GFLWR lzZ

uzZ

S2

S1

e p

q12

qsx

qb

e p

wltθ satθ

S1TA

fldθ

lzZ

uzZqsx

qif

qb

S1TB S1

F

q12

S2


Define development decisions: upper layer architecture

PRMS SACRAMENTO


lzZ

uzZ

pe

S1T S1

F

S2T

S2F

A

S2F

B

qif

qbA

qbB

q12

wltθ fldθ satθ

lzZ

uzZ

e p

qsx

qb

q12

S2

S1

wltθ fldθ satθ

GFLWR lzZ

uzZ

S2

S1

e p

q12

qsx

qb

e p

wltθ satθ

S1TA

fldθ

lzZ

uzZqsx

qif

qb

S1TB S1

F

q12

S2


Define development decisions: lower layer / baseflow

PRMS SACRAMENTO


lzZ

uzZ

pe

S1T S1

F

S2T

S2F

A

S2F

B

qif

qbA

qbB

q12

wltθ fldθ satθ

lzZ

uzZ

e p

qsx

qb

q12

S2

S1

wltθ fldθ satθ

GFLWR lzZ

uzZ

S2

S1

e p

q12

qsx

qb

e p

wltθ satθ

S1TA

fldθ

lzZ

uzZqsx

qif

qb

S1TB S1

F

q12

S2


Define development decisions: percolation

PRMS SACRAMENTO


lzZ

uzZ

pe

S1T S1

F

S2T

S2F

A

S2F

B

qif

qbA

qbB

q12

wltθ fldθ satθ

lzZ

uzZ

e p

qsx

qb

q12

S2

S1

wltθ fldθ satθ

GFLWR lzZ

uzZ

S2

S1

e p

q12

qsx

qb

e p

wltθ satθ

S1TA

fldθ

lzZ

uzZqsx

qif

qb

S1TB S1

F

q12

S2


Define development decisions: surface runoff

PRMS SACRAMENTO


lzZ

uzZ

pe

S1T S1

F

S2T

S2F

A

S2F

B

qif

qbA

qbB

q12

wltθ fldθ satθ

lzZ

uzZ

e p

qsx

qb

q12

S2

S1

wltθ fldθ satθ

GFLWR lzZ

uzZ

S2

S1

e p

q12

qsx

qb

e p

wltθ satθ

S1TA

fldθ

lzZ

uzZqsx

qif

qb

S1TB S1

F

q12

S2


Build unique models: combination 1

PRMS SACRAMENTO


lzZ

uzZ

pe

S1T S1

F

S2T

S2F

A

S2F

B

qif

qbA

qbB

q12

wltθ fldθ satθ

lzZ

uzZ

e p

qsx

qb

q12

S2

S1

wltθ fldθ satθ

GFLWR lzZ

uzZ

S2

S1

e p

q12

qsx

qb

e p

wltθ satθ

S1TA

fldθ

lzZ

uzZqsx

qif

qb

S1TB S1

F

q12

S2



PRMS SACRAMENTO


lzZ

uzZ

pe

S1T S1

F

S2T

S2F

A

S2F

B

qif

qbA

qbB

q12

wltθ fldθ satθ

lzZ

uzZ

e p

qsx

qb

q12

S2

S1

wltθ fldθ satθ

GFLWR lzZ

uzZ

S2

S1

e p

q12

qsx

qb

e p

wltθ satθ

S1TA

fldθ

lzZ

uzZqsx

qif

qb

S1TB S1

F

q12

S2



78 UNIQUE HYDROLOGIC MODELSALL WITH DIFFERENT STRUCTURE

Example: French Broad RiverBefore Calibration => underdispersive

Black curve shows observations; colors are ensemble

Goal of “calibration” or “post-processing”P

roba

bilit

y

calibration

Discharge

Pro

babi

lity

Skillful ensemble post-processing should correct:• the “on average” bias• as well as under-representation of the 2nd moment of the empirical forecast PDF (i.e. corrected its “dispersion” or “spread”)• ensemble represents a random-draw from an (unknown) PDF of hydrologic uncertainty

goal: produce calibrated ensemble forecast that has as much information content (i.e. “narrow PDF”) as possible

“spread” or “dispersion”

“bias”obs

obs

ForecastPDF

ForecastPDF

Discharge

Our approach: Quantile Regression (QR)

Benefits

1) Less sensitivity to outliers

2) Works with heteroscedastic errors

3) Optimally fit for each part of the PDF

4) “flat” rank histograms

Calibration ProcedureUse QR to perform a fit on 78 quantiles individually (recall:

78 FUSE models simulations).

For each of quantile:

1) Perform a “climatological” fit to the data=> simulation always as good as “climatology”

2) Starting with full regressor set, iteratively select best subset using “forward step-wise cross-validation”

– Fitting done using QR– Selection done by:

a) Minimizing QR cost functionb) Satisfying the binomial distribution=> Verification measures directly inform

the model selection

3) 2nd pass: segregate forecasts into differing ranges of ensemble dispersion, and refit models => forcing skill-spread utility

Pro

babi

lity

Discharge

obs ModelPDF

Regressor set for each quantile:

1) - 78) All individual 78 model simulations79) ensemble mean80) ensemble standard deviation81) ranked ensemble member (sorted ensemble that corresponds to quantile being fit)

Example: French Broad RiverBefore Calibration After Calibration


Raw versus Calibrated PDF’s

Blue is “raw” FUSE ensemble

Black is calibrated ensemble

Red is the observed value

Notice: shift in both “bias” and dispersion of final PDF

(also notice PDF asymmetries) obs

Rank Histogram Comparisons

After quantile regression, rank histogram more uniform

(although now slightly over-dispersive)

Raw full ensemble After calibration

Calibration: Rank Histogram and Skill-Spread Information

SSr =rfrcst −rrefrperf −rref

X100%

(reference forecast uncalibrated FUSE ensemble)

Rank Histogram Skill Score = 0.86

Skill-Spread Skill Score = 0.75

Frequency Used forQuantile Fitting of Method I:

Best Model=76%Best Model=76%Ensemble StDev=13%Ensemble Mean=0%Ranked Ensemble=6%

What Nash-Sutcliffe implies about Utility

Sequentially-averaged FUSE models (ranked based on NS Score) and their resultant NS Score

Notice the degredation of NS with increasing # (with a peak at 2 models)

For an equitable multi-model, NS should rise monotonically

Maybe a smaller subset of models would have more utility? (A contradiction for an under-dispersive ensemble?)

What Nash-Sutcliffe implies about Utility (cont)

-- degredation with increased ensemble size

Initial Frequency Used forQuantile Fitting:


What Nash-Sutcliffe implies about Utility (cont)

Reduced Set Frequency Used for Quantile Fitting:


…using only top 1/3 of modelsTo rank and form ensemble mean …… earlier results …

Appears to be significant gains in the utility of the ensemble after “filtering” (except for drop in StDev) … however “proof is in the pudding” …Examine verification skill measures …

Skill Score Comparisonsbetween full- and “filtered” FUSE

ensemble setsPoints:

-- quite similar results for a variety of skill scores

-- both approaches give appreciable benefit over the original raw multi-model output

-- however, only in the CRPSS is there improvement of the “filtered” ensemble set over the full set

post-processing method fairly robustMore work (more filtering?)!

GREEN -- full calibrated multi-modelBLUE -- “filtered” calibrated multi-modelReference -- uncalibrated FUSE set

Calibration Procedure (cont)I. Method I (FUSE multi-model) regressor set for each 78 quantiles:

1) sorted associated ensemble member; 2) ensemble median; 3) ensemble standard deviation; 4) all 78 models

II. Method 2 (FUSE best member) regressor set for each 78 quantiles:

First step - use Logistic Regression (LR)– Calibrate CDF over prescribed set of climatological quantiles using FUSE best member

-- for each day: resample 78 member ensemble

1) best member; 2) persistence; 3) associated Logistic Regression ensemble member

Calibration Procedure-- single best model

Use QR to perform a fit on 78 quantiles individually (recall: 78 FUSE models simulations).

For each of quantile:

1) Perform a “climatological” fit to the data=> simulation always as good as “climatology”

2) Starting with full regressor set, iteratively select best subset using “forward step-wise cross-validation”

– Fitting done using QR– Selection done by:

a) Minimizing QR cost functionb) Satisfying the binomial distribution=> Verification measures directly inform

the model selection

3) 2nd pass: segregate forecasts into differing ranges of ensemble dispersion, and refit models => forcing skill-spread utility

Pro

babi

lity

Discharge

obs ModelPDF

Regressor set for each quantile:

First step - use Logistic Regression

– Calibrate CDF over prescribed set of climatological quantiles using FUSE best member

-- for each day: resample 78 member ensemble

1) best member; 2) persistence; 3) associated Logistic Regression ensemble member

Regression Variable Usagefor Best-member method)

Best Model 34%

Persistence 66%

Logistic regress q 86%

0 0.2 0.4 0.6 0.8 1

Precent Used

Log Reg

Persistence

Best Model

Regressors

Regressor Usage

Example: French Broad RiverI. Multi-Model Method II. Best-member Approach


GREEN -- Multi-modelBLUE -- Best-ModelReference -- uncalibrated FUSE set

Skill Score Comparisonsbetween multi-model versus single best model calibrated ensembles

Points:

-- quite similar results for a variety of skill scores

-- both approaches give appreciable benefit over the original raw multi-model output

-- CRPSS shows an improvement of the single model over the full set

post-processing method fairly robust

2 Questions revisited:1) How best to utilize a multi-model simulations (forecast),

especially if under-dispersive?a) Should more dynamical variability be searched for? Orb) Is it better to balance post-processing with multi-model utilization to

create a properly dispersive, informative ensemble?“Answer”: adding more models can lead to decreasing skill of the

ensemble mean (even if the ensemble is under-dispersive)

2) How significant are the gains of a multi-model approach over utilizing a single best model with a few “tricks”?

a) Blending two post-processing approaches (quantile regression and logistic regression)

b) Applying “poor-man’s” data assimilation“Answer”: quantile-regression-based calibration is fairly robust and can do

a lot with just a single model, especially if a variety of approaches are utililized

=> Benefit of moving beyond model ensemble calibration alone, to include “kitchen sink”, including “blending” of different techniques

Summary

Quantile regression provides a powerful framework for improving the whole (potentially non-gaussian) PDF of an ensemble forecast

This framework provides an umbrella to blend together multiple statistical correction approaches (logistic regression, etc.)

“step-wise cross-validation” calibration approach used here ensures forecast skill greater than climatology and persistence (by utilizing both in the regression selection process)

The approach here improves a forecast’s ability to represent its own potential forecast error:

– More uniform rank histogram– Guaranteeing utility in the ensemble dispersion

(=> more spread, more uncertainty)

Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater -...

Documents

Transcript of Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater -...