Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater -...

30
Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC

Transcript of Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater -...

Page 1: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

Multi-Model or Post-processing: Pros and Cons

Tom Hopson - NCAR

Martyn Clark - NIWA

Andrew Slater - CIRES/NSIDC

Page 2: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

2 Questions:

1) How best to utilize a multi-model simulation (forecast), especially if under-dispersive?

a) Should more dynamical variability be searched for? Or

b) Is it better to balance post-processing with multi-model utilization to create a properly dispersive, informative ensemble?

2) How significant are the gains of a multi-model approach over utilizing a single best model with a few “tricks”?

a) Blending two post-processing approaches (quantile regression and logistic regression)

b) Applying “poor-man’s” data assimilation

Page 3: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

Outline

Explore these two questions using multi-model simulations for the French Broad River, NC of MOPEX

I. Multi-model: Framework for Understanding Structural Errors (FUSE)

Pre-calibration results => under-dispersive

II. Calibration procedure Introduce Quantile Regression (“QR”; Kroenker and Bassett,

1978)

III. Discussing of Question 1) -- how best to utilize multi-model

IV. Question 2 -- multi-model vs single-best model single-best model calibration procedure: Logistic Regression

employed under QR framework

Page 4: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

PRMS SACRAMENTO

ARNO/VICTOPMODELwltθ fldθ satθ

lzZ

uzZ

pe

S1T S1

F

S2T

S2F

A

S2F

B

qif

qbA

qbB

q12

wltθ fldθ satθ

lzZ

uzZ

e p

qsx

qb

q12

S2

S1

wltθ fldθ satθ

GFLWR lzZ

uzZ

S2

S1

e p

q12

qsx

qb

e p

wltθ satθ

S1TA

fldθ

lzZ

uzZqsx

qif

qb

S1TB S1

F

q12

S2

Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735.

FUSE: Framework for Understanding Structural Errors

Page 5: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

PRMS SACRAMENTO

ARNO/VICTOPMODELwltθ fldθ satθ

lzZ

uzZ

pe

S1T S1

F

S2T

S2F

A

S2F

B

qif

qbA

qbB

q12

wltθ fldθ satθ

lzZ

uzZ

e p

qsx

qb

q12

S2

S1

wltθ fldθ satθ

GFLWR lzZ

uzZ

S2

S1

e p

q12

qsx

qb

e p

wltθ satθ

S1TA

fldθ

lzZ

uzZqsx

qif

qb

S1TB S1

F

q12

S2

Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735.

Define development decisions: upper layer architecture

Page 6: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

PRMS SACRAMENTO

ARNO/VICTOPMODELwltθ fldθ satθ

lzZ

uzZ

pe

S1T S1

F

S2T

S2F

A

S2F

B

qif

qbA

qbB

q12

wltθ fldθ satθ

lzZ

uzZ

e p

qsx

qb

q12

S2

S1

wltθ fldθ satθ

GFLWR lzZ

uzZ

S2

S1

e p

q12

qsx

qb

e p

wltθ satθ

S1TA

fldθ

lzZ

uzZqsx

qif

qb

S1TB S1

F

q12

S2

Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735.

Define development decisions: lower layer / baseflow

Page 7: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

PRMS SACRAMENTO

ARNO/VICTOPMODELwltθ fldθ satθ

lzZ

uzZ

pe

S1T S1

F

S2T

S2F

A

S2F

B

qif

qbA

qbB

q12

wltθ fldθ satθ

lzZ

uzZ

e p

qsx

qb

q12

S2

S1

wltθ fldθ satθ

GFLWR lzZ

uzZ

S2

S1

e p

q12

qsx

qb

e p

wltθ satθ

S1TA

fldθ

lzZ

uzZqsx

qif

qb

S1TB S1

F

q12

S2

Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735.

Define development decisions: percolation

Page 8: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

PRMS SACRAMENTO

ARNO/VICTOPMODELwltθ fldθ satθ

lzZ

uzZ

pe

S1T S1

F

S2T

S2F

A

S2F

B

qif

qbA

qbB

q12

wltθ fldθ satθ

lzZ

uzZ

e p

qsx

qb

q12

S2

S1

wltθ fldθ satθ

GFLWR lzZ

uzZ

S2

S1

e p

q12

qsx

qb

e p

wltθ satθ

S1TA

fldθ

lzZ

uzZqsx

qif

qb

S1TB S1

F

q12

S2

Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735.

Define development decisions: surface runoff

Page 9: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

PRMS SACRAMENTO

ARNO/VICTOPMODELwltθ fldθ satθ

lzZ

uzZ

pe

S1T S1

F

S2T

S2F

A

S2F

B

qif

qbA

qbB

q12

wltθ fldθ satθ

lzZ

uzZ

e p

qsx

qb

q12

S2

S1

wltθ fldθ satθ

GFLWR lzZ

uzZ

S2

S1

e p

q12

qsx

qb

e p

wltθ satθ

S1TA

fldθ

lzZ

uzZqsx

qif

qb

S1TB S1

F

q12

S2

Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735.

Build unique models: combination 1

Page 10: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

PRMS SACRAMENTO

ARNO/VICTOPMODELwltθ fldθ satθ

lzZ

uzZ

pe

S1T S1

F

S2T

S2F

A

S2F

B

qif

qbA

qbB

q12

wltθ fldθ satθ

lzZ

uzZ

e p

qsx

qb

q12

S2

S1

wltθ fldθ satθ

GFLWR lzZ

uzZ

S2

S1

e p

q12

qsx

qb

e p

wltθ satθ

S1TA

fldθ

lzZ

uzZqsx

qif

qb

S1TB S1

F

q12

S2

Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735.

Build unique models: combination 2

Page 11: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

PRMS SACRAMENTO

ARNO/VICTOPMODELwltθ fldθ satθ

lzZ

uzZ

pe

S1T S1

F

S2T

S2F

A

S2F

B

qif

qbA

qbB

q12

wltθ fldθ satθ

lzZ

uzZ

e p

qsx

qb

q12

S2

S1

wltθ fldθ satθ

GFLWR lzZ

uzZ

S2

S1

e p

q12

qsx

qb

e p

wltθ satθ

S1TA

fldθ

lzZ

uzZqsx

qif

qb

S1TB S1

F

q12

S2

Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735.

Build unique models: combination 3

78 UNIQUE HYDROLOGIC MODELSALL WITH DIFFERENT STRUCTURE

Page 12: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

Example: French Broad RiverBefore Calibration => underdispersive

Black curve shows observations; colors are ensemble

Page 13: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

Goal of “calibration” or “post-processing”P

roba

bilit

y

calibration

Discharge

Pro

babi

lity

Skillful ensemble post-processing should correct:• the “on average” bias• as well as under-representation of the 2nd moment of the empirical forecast PDF (i.e. corrected its “dispersion” or “spread”)• ensemble represents a random-draw from an (unknown) PDF of hydrologic uncertainty

goal: produce calibrated ensemble forecast that has as much information content (i.e. “narrow PDF”) as possible

“spread” or “dispersion”

“bias”obs

obs

ForecastPDF

ForecastPDF

Discharge

Page 14: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

Our approach: Quantile Regression (QR)

Benefits

1) Less sensitivity to outliers

2) Works with heteroscedastic errors

3) Optimally fit for each part of the PDF

4) “flat” rank histograms

Page 15: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

Calibration ProcedureUse QR to perform a fit on 78 quantiles individually (recall:

78 FUSE models simulations).

For each of quantile:

1) Perform a “climatological” fit to the data=> simulation always as good as “climatology”

2) Starting with full regressor set, iteratively select best subset using “forward step-wise cross-validation”

– Fitting done using QR– Selection done by:

a) Minimizing QR cost functionb) Satisfying the binomial distribution=> Verification measures directly inform

the model selection

3) 2nd pass: segregate forecasts into differing ranges of ensemble dispersion, and refit models => forcing skill-spread utility

Pro

babi

lity

Discharge

obs ModelPDF

Regressor set for each quantile:

1) - 78) All individual 78 model simulations79) ensemble mean80) ensemble standard deviation81) ranked ensemble member (sorted ensemble that corresponds to quantile being fit)

Page 16: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

Example: French Broad RiverBefore Calibration After Calibration

Black curve shows observations; colors are ensemble

Page 17: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

Raw versus Calibrated PDF’s

Blue is “raw” FUSE ensemble

Black is calibrated ensemble

Red is the observed value

Notice: shift in both “bias” and dispersion of final PDF

(also notice PDF asymmetries) obs

Page 18: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

Rank Histogram Comparisons

After quantile regression, rank histogram more uniform

(although now slightly over-dispersive)

Raw full ensemble After calibration

Page 19: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

Calibration: Rank Histogram and Skill-Spread Information

SSr =rfrcst −rrefrperf −rref

X100%

(reference forecast uncalibrated FUSE ensemble)

Rank Histogram Skill Score = 0.86

Skill-Spread Skill Score = 0.75

Page 20: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

Frequency Used forQuantile Fitting of Method I:

Best Model=76%Best Model=76%Ensemble StDev=13%Ensemble Mean=0%Ranked Ensemble=6%

What Nash-Sutcliffe implies about Utility

Page 21: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

Sequentially-averaged FUSE models (ranked based on NS Score) and their resultant NS Score

Notice the degredation of NS with increasing # (with a peak at 2 models)

For an equitable multi-model, NS should rise monotonically

Maybe a smaller subset of models would have more utility? (A contradiction for an under-dispersive ensemble?)

What Nash-Sutcliffe implies about Utility (cont)

-- degredation with increased ensemble size

Page 22: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

Initial Frequency Used forQuantile Fitting:

Best Model=76%Best Model=76%Ensemble StDev=13%Ensemble Mean=0%Ranked Ensemble=6%

What Nash-Sutcliffe implies about Utility (cont)

Reduced Set Frequency Used for Quantile Fitting:

Best Model=73%Best Model=73%Ensemble StDev=3%Ensemble Mean=32%Ranked Ensemble=29%

…using only top 1/3 of modelsTo rank and form ensemble mean …… earlier results …

Appears to be significant gains in the utility of the ensemble after “filtering” (except for drop in StDev) … however “proof is in the pudding” …Examine verification skill measures …

Page 23: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

Skill Score Comparisonsbetween full- and “filtered” FUSE

ensemble setsPoints:

-- quite similar results for a variety of skill scores

-- both approaches give appreciable benefit over the original raw multi-model output

-- however, only in the CRPSS is there improvement of the “filtered” ensemble set over the full set

post-processing method fairly robustMore work (more filtering?)!

GREEN -- full calibrated multi-modelBLUE -- “filtered” calibrated multi-modelReference -- uncalibrated FUSE set

Page 24: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

Calibration Procedure (cont)I. Method I (FUSE multi-model) regressor set for each 78 quantiles:

1) sorted associated ensemble member; 2) ensemble median; 3) ensemble standard deviation; 4) all 78 models

II. Method 2 (FUSE best member) regressor set for each 78 quantiles:

First step - use Logistic Regression (LR)– Calibrate CDF over prescribed set of climatological quantiles using FUSE best member

-- for each day: resample 78 member ensemble

1) best member; 2) persistence; 3) associated Logistic Regression ensemble member

Page 25: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

Calibration Procedure-- single best model

Use QR to perform a fit on 78 quantiles individually (recall: 78 FUSE models simulations).

For each of quantile:

1) Perform a “climatological” fit to the data=> simulation always as good as “climatology”

2) Starting with full regressor set, iteratively select best subset using “forward step-wise cross-validation”

– Fitting done using QR– Selection done by:

a) Minimizing QR cost functionb) Satisfying the binomial distribution=> Verification measures directly inform

the model selection

3) 2nd pass: segregate forecasts into differing ranges of ensemble dispersion, and refit models => forcing skill-spread utility

Pro

babi

lity

Discharge

obs ModelPDF

Regressor set for each quantile:

First step - use Logistic Regression

– Calibrate CDF over prescribed set of climatological quantiles using FUSE best member

-- for each day: resample 78 member ensemble

1) best member; 2) persistence; 3) associated Logistic Regression ensemble member

Page 26: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

Regression Variable Usagefor Best-member method)

Best Model 34%

Persistence 66%

Logistic regress q 86%

0 0.2 0.4 0.6 0.8 1

Precent Used

Log Reg

Persistence

Best Model

Regressors

Regressor Usage

Page 27: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

Example: French Broad RiverI. Multi-Model Method II. Best-member Approach

Black curve shows observations; colors are ensemble

Page 28: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

GREEN -- Multi-modelBLUE -- Best-ModelReference -- uncalibrated FUSE set

Skill Score Comparisonsbetween multi-model versus single best model calibrated ensembles

Points:

-- quite similar results for a variety of skill scores

-- both approaches give appreciable benefit over the original raw multi-model output

-- CRPSS shows an improvement of the single model over the full set

post-processing method fairly robust

Page 29: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

2 Questions revisited:1) How best to utilize a multi-model simulations (forecast),

especially if under-dispersive?a) Should more dynamical variability be searched for? Orb) Is it better to balance post-processing with multi-model utilization to

create a properly dispersive, informative ensemble?“Answer”: adding more models can lead to decreasing skill of the

ensemble mean (even if the ensemble is under-dispersive)

2) How significant are the gains of a multi-model approach over utilizing a single best model with a few “tricks”?

a) Blending two post-processing approaches (quantile regression and logistic regression)

b) Applying “poor-man’s” data assimilation“Answer”: quantile-regression-based calibration is fairly robust and can do

a lot with just a single model, especially if a variety of approaches are utililized

=> Benefit of moving beyond model ensemble calibration alone, to include “kitchen sink”, including “blending” of different techniques

Page 30: Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

Summary

Quantile regression provides a powerful framework for improving the whole (potentially non-gaussian) PDF of an ensemble forecast

This framework provides an umbrella to blend together multiple statistical correction approaches (logistic regression, etc.)

“step-wise cross-validation” calibration approach used here ensures forecast skill greater than climatology and persistence (by utilizing both in the regression selection process)

The approach here improves a forecast’s ability to represent its own potential forecast error:

– More uniform rank histogram– Guaranteeing utility in the ensemble dispersion

(=> more spread, more uncertainty)