Assessing Model Discrepancy Using a Multi-Model Ensemblemazjcr/RougierMPI.pdf · and each model has...

24
Assessing Model Discrepancy Using a Multi-Model Ensemble Jonathan Rougier 1 Michael Goldstein 2 Leanna House 3 1 Department of Mathematics, University of Bristol, UK 2 Department of Mathematical Sciences, Durham University, UK 3 Department of Statistics, Virginia Tech, Blacksburg VA, USA Technical report available at http://www.maths.bris.ac.uk/ mazjcr/mme1.pdf

Transcript of Assessing Model Discrepancy Using a Multi-Model Ensemblemazjcr/RougierMPI.pdf · and each model has...

Page 1: Assessing Model Discrepancy Using a Multi-Model Ensemblemazjcr/RougierMPI.pdf · and each model has its own discrepancy, d(1),...,d(m). I It might seem natural to treat the mean from

Assessing Model Discrepancy Using aMulti-Model Ensemble

Jonathan Rougier1

Michael Goldstein2 Leanna House3

1Department of Mathematics, University of Bristol, UK

2Department of Mathematical Sciences, Durham University, UK

3Department of Statistics, Virginia Tech, Blacksburg VA, USA

Technical report available at http://www.maths.bris.ac.uk/∼mazjcr/mme1.pdf

Page 2: Assessing Model Discrepancy Using a Multi-Model Ensemblemazjcr/RougierMPI.pdf · and each model has its own discrepancy, d(1),...,d(m). I It might seem natural to treat the mean from

The current state of the art (AR4 WG1, ch 10)

Page 3: Assessing Model Discrepancy Using a Multi-Model Ensemblemazjcr/RougierMPI.pdf · and each model has its own discrepancy, d(1),...,d(m). I It might seem natural to treat the mean from

Thinking about the discrepancy

The Best Input approach

Actual climate = f(x̃)⊕ discrepancy

Observations = historic climate⊕measurement error

where x̃ is the best input, and f(x̃) the climate model evaluated atits best input; ‘⊕’ means ‘plus independent’.

I The best input x̃ is uncertain: the model’s standardparameterisation is an estimate of x̃.

I The discrepancy is uncertain: we describe it in terms of amean vector and a variance matrix.

Ignoring the discrepancy is equivalent to treating it as identicallyequal to zero. This does not reflect our judgement that the modelis imperfect. Inferences made with a zero discrepancy must besuspect.

Page 4: Assessing Model Discrepancy Using a Multi-Model Ensemblemazjcr/RougierMPI.pdf · and each model has its own discrepancy, d(1),...,d(m). I It might seem natural to treat the mean from

Thinking about the discrepancy

The Best Input approach

Actual climate = f(x̃)⊕ discrepancy

Observations = historic climate⊕measurement error

where x̃ is the best input, and f(x̃) the climate model evaluated atits best input; ‘⊕’ means ‘plus independent’.

I The best input x̃ is uncertain: the model’s standardparameterisation is an estimate of x̃.

I The discrepancy is uncertain: we describe it in terms of amean vector and a variance matrix.

Ignoring the discrepancy is equivalent to treating it as identicallyequal to zero. This does not reflect our judgement that the modelis imperfect. Inferences made with a zero discrepancy must besuspect.

Page 5: Assessing Model Discrepancy Using a Multi-Model Ensemblemazjcr/RougierMPI.pdf · and each model has its own discrepancy, d(1),...,d(m). I It might seem natural to treat the mean from

Our multi-model ensemble (MME)

Surface temperature, 1995−1999

Page 6: Assessing Model Discrepancy Using a Multi-Model Ensemblemazjcr/RougierMPI.pdf · and each model has its own discrepancy, d(1),...,d(m). I It might seem natural to treat the mean from

Our multi-model ensemble (MME)

I A collection of evaluations of different models (e.g. modelsfrom different research groups). Each model has its ownparameters and is evaluated at its own standardparameterisation:

MME ={f(1)(x̃(1)), . . . , f(m)(x̃(1))

}≡

{f̃(1)

, . . . , f̃(m)}

and each model has its own discrepancy,{d(1), . . . ,d(m)

}.

I It might seem natural to treat the mean from the MME as acentral estimate of actual climate, and the variance as ameasure of uncertainty, but this makes no allowance forcommon sources of uncertainty in the discrepancy, i.e. thatCov(d(i),d(j)) 6= 0.

Examples: common sub-modules (including common code), similar

solver resolution, similar parameter values (peer pressure).

Page 7: Assessing Model Discrepancy Using a Multi-Model Ensemblemazjcr/RougierMPI.pdf · and each model has its own discrepancy, d(1),...,d(m). I It might seem natural to treat the mean from

Our multi-model ensemble (MME)

I A collection of evaluations of different models (e.g. modelsfrom different research groups). Each model has its ownparameters and is evaluated at its own standardparameterisation:

MME ={f(1)(x̃(1)), . . . , f(m)(x̃(1))

}≡

{f̃(1)

, . . . , f̃(m)}

and each model has its own discrepancy,{d(1), . . . ,d(m)

}.

I It might seem natural to treat the mean from the MME as acentral estimate of actual climate, and the variance as ameasure of uncertainty, but this makes no allowance forcommon sources of uncertainty in the discrepancy, i.e. thatCov(d(i),d(j)) 6= 0.

Examples: common sub-modules (including common code), similar

solver resolution, similar parameter values (peer pressure).

Page 8: Assessing Model Discrepancy Using a Multi-Model Ensemblemazjcr/RougierMPI.pdf · and each model has its own discrepancy, d(1),...,d(m). I It might seem natural to treat the mean from

Second-order exchangeability of the MME

SOE implements the following principle:

All of the models in the MME are equally informativeabout actual climate, in the sense that if we had to pickany pair of models for inference about actual climatethen we would be indifferent between all possible pairs.

This is a qualitative judgement based on the modelmeta-information, i.e. made before a detailed inspection of themodel-evaluations. It excludes:

1. Models that are outliers, either too good or too bad;

2. Models that are duplicates of other models.

I For our MME, we exclude BCC-CM1, from the BeijingClimate Center, and GFDL-CM2.1, GISS-EH, GISS-ER,CCS-M3, UKMO-HadGEM1 (duplicates), which leaves uswith 14 models.

Page 9: Assessing Model Discrepancy Using a Multi-Model Ensemblemazjcr/RougierMPI.pdf · and each model has its own discrepancy, d(1),...,d(m). I It might seem natural to treat the mean from

Second-order exchangeability of the MME

SOE implements the following principle:

All of the models in the MME are equally informativeabout actual climate, in the sense that if we had to pickany pair of models for inference about actual climatethen we would be indifferent between all possible pairs.

This is a qualitative judgement based on the modelmeta-information, i.e. made before a detailed inspection of themodel-evaluations. It excludes:

1. Models that are outliers, either too good or too bad;

2. Models that are duplicates of other models.

I For our MME, we exclude BCC-CM1, from the BeijingClimate Center, and GFDL-CM2.1, GISS-EH, GISS-ER,CCS-M3, UKMO-HadGEM1 (duplicates), which leaves uswith 14 models.

Page 10: Assessing Model Discrepancy Using a Multi-Model Ensemblemazjcr/RougierMPI.pdf · and each model has its own discrepancy, d(1),...,d(m). I It might seem natural to treat the mean from

Second-order exchangeability of the MME

SOE implements the following principle:

All of the models in the MME are equally informativeabout actual climate, in the sense that if we had to pickany pair of models for inference about actual climatethen we would be indifferent between all possible pairs.

This is a qualitative judgement based on the modelmeta-information, i.e. made before a detailed inspection of themodel-evaluations. It excludes:

1. Models that are outliers, either too good or too bad;

2. Models that are duplicates of other models.

I For our MME, we exclude BCC-CM1, from the BeijingClimate Center, and GFDL-CM2.1, GISS-EH, GISS-ER,CCS-M3, UKMO-HadGEM1 (duplicates), which leaves uswith 14 models.

Page 11: Assessing Model Discrepancy Using a Multi-Model Ensemblemazjcr/RougierMPI.pdf · and each model has its own discrepancy, d(1),...,d(m). I It might seem natural to treat the mean from

Statistical modelling implications

I If our MME is judged to be SOE, then we can write theevaluations as

f̃(j)

=M(f )⊕Rj(f ) j = 1, . . . ,m

where M(f ) can be thought of as the representative model,and Rj(f ) is an uncorrelated-with-everything residual.

I This implies a similar representation for the discrepancies:

d(j) =M(d)⊕Rj(d) j = 1, . . . ,m

and the two crucial quantities that we must specify are thetwo variances, Var(M(d)) and Var(Rj(d)) (which is thesame for all j).

I Unsurprisingly (?!) we can use the ensemble variance as anestimate of Var(Rj(d)).

Page 12: Assessing Model Discrepancy Using a Multi-Model Ensemblemazjcr/RougierMPI.pdf · and each model has its own discrepancy, d(1),...,d(m). I It might seem natural to treat the mean from

Statistical modelling implications

I If our MME is judged to be SOE, then we can write theevaluations as

f̃(j)

=M(f )⊕Rj(f ) j = 1, . . . ,m

where M(f ) can be thought of as the representative model,and Rj(f ) is an uncorrelated-with-everything residual.

I This implies a similar representation for the discrepancies:

d(j) =M(d)⊕Rj(d) j = 1, . . . ,m

and the two crucial quantities that we must specify are thetwo variances, Var(M(d)) and Var(Rj(d)) (which is thesame for all j).

I Unsurprisingly (?!) we can use the ensemble variance as anestimate of Var(Rj(d)).

Page 13: Assessing Model Discrepancy Using a Multi-Model Ensemblemazjcr/RougierMPI.pdf · and each model has its own discrepancy, d(1),...,d(m). I It might seem natural to treat the mean from

Statistical modelling implications

I If our MME is judged to be SOE, then we can write theevaluations as

f̃(j)

=M(f )⊕Rj(f ) j = 1, . . . ,m

where M(f ) can be thought of as the representative model,and Rj(f ) is an uncorrelated-with-everything residual.

I This implies a similar representation for the discrepancies:

d(j) =M(d)⊕Rj(d) j = 1, . . . ,m

and the two crucial quantities that we must specify are thetwo variances, Var(M(d)) and Var(Rj(d)) (which is thesame for all j).

I Unsurprisingly (?!) we can use the ensemble variance as anestimate of Var(Rj(d)).

Page 14: Assessing Model Discrepancy Using a Multi-Model Ensemblemazjcr/RougierMPI.pdf · and each model has its own discrepancy, d(1),...,d(m). I It might seem natural to treat the mean from

Statistical modelling implications (cont)

With our SOE framework,

Cov(d(i),d(j)) 6= 0 ⇔ Var(M(d)) 6= 0.

We can adopt the following principle for Var(M(d)):

Model disagreement

When the individual models disagree on somecomponent, then the models taken together are judgedless accurate about that component.

This implies, crudely, that Var(M(d)) ∝ Var(Rj(d)), where weget to choose the constant(s) of proportionality.

Small Print . . .Actually, in both cases, i.e. both Var(Rj (d)) and Var(M(d)), we have to be a bit more subtle than this, andperhaps the devil is in the details! The details are in the paper.

Page 15: Assessing Model Discrepancy Using a Multi-Model Ensemblemazjcr/RougierMPI.pdf · and each model has its own discrepancy, d(1),...,d(m). I It might seem natural to treat the mean from

Statistical modelling implications (cont)

With our SOE framework,

Cov(d(i),d(j)) 6= 0 ⇔ Var(M(d)) 6= 0.

We can adopt the following principle for Var(M(d)):

Model disagreement

When the individual models disagree on somecomponent, then the models taken together are judgedless accurate about that component.

This implies, crudely, that Var(M(d)) ∝ Var(Rj(d)), where weget to choose the constant(s) of proportionality.

Small Print . . .Actually, in both cases, i.e. both Var(Rj (d)) and Var(M(d)), we have to be a bit more subtle than this, andperhaps the devil is in the details! The details are in the paper.

Page 16: Assessing Model Discrepancy Using a Multi-Model Ensemblemazjcr/RougierMPI.pdf · and each model has its own discrepancy, d(1),...,d(m). I It might seem natural to treat the mean from

Observations, and modelling validation

DJF surface temperature, 1995–1999, aggregated to 5◦ gridcells.

Observations, including missing values (deg Celcius)

−180 −135 −90 −45 0 45 90 135 180

−85

−45

045

85

0:1

−50 −40 −30 −20 −10 0 10 20 30 40 50

Page 17: Assessing Model Discrepancy Using a Multi-Model Ensemblemazjcr/RougierMPI.pdf · and each model has its own discrepancy, d(1),...,d(m). I It might seem natural to treat the mean from

Observations, and modelling validation

DJF surface temperature, 1995–1999, aggregated to 5◦ gridcells.

Standardised marginal prediction errors

−180 −135 −90 −45 0 45 90 135 180

−85

−45

045

85

0:1

−20 −3 −2 −1 0 1 2 3 20

Page 18: Assessing Model Discrepancy Using a Multi-Model Ensemblemazjcr/RougierMPI.pdf · and each model has its own discrepancy, d(1),...,d(m). I It might seem natural to treat the mean from

Observations, and modelling validation

DJF surface temperature, 1995–1999, aggregated to 5◦ gridcells.

Standardised joint prediction errors

−180 −135 −90 −45 0 45 90 135 180

−85

−45

045

85

0:1

−20 −3 −2 −1 0 1 2 3 20

Page 19: Assessing Model Discrepancy Using a Multi-Model Ensemblemazjcr/RougierMPI.pdf · and each model has its own discrepancy, d(1),...,d(m). I It might seem natural to treat the mean from

Observations, and modelling validation

DJF surface temperature, 1995–1999, aggregated to 5◦ gridcells.

Cross−model correlations

−180 −135 −90 −45 0 45 90 135 180

−85

−45

045

85

0:1

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6

Page 20: Assessing Model Discrepancy Using a Multi-Model Ensemblemazjcr/RougierMPI.pdf · and each model has its own discrepancy, d(1),...,d(m). I It might seem natural to treat the mean from

Discrepancy: Adjusted mean and variance

DJF surface temperature, 1995–1999, aggregated to 5◦ gridcells.

Adjusted discrepancy mean (deg Celcius)

−180 −135 −90 −45 0 45 90 135 180

−85

−45

045

85

0:1

−10 −8 −6 −4 −2 0 2 4 6 8 10

Page 21: Assessing Model Discrepancy Using a Multi-Model Ensemblemazjcr/RougierMPI.pdf · and each model has its own discrepancy, d(1),...,d(m). I It might seem natural to treat the mean from

Discrepancy: Adjusted mean and variance

DJF surface temperature, 1995–1999, aggregated to 5◦ gridcells.

Initial discrepancy standard deviation (deg Celcius)

−180 −135 −90 −45 0 45 90 135 180

−85

−45

045

85

0:1

0 1 2 3 4 5 6 7 8 9 10 11

Page 22: Assessing Model Discrepancy Using a Multi-Model Ensemblemazjcr/RougierMPI.pdf · and each model has its own discrepancy, d(1),...,d(m). I It might seem natural to treat the mean from

Discrepancy: Adjusted mean and variance

DJF surface temperature, 1995–1999, aggregated to 5◦ gridcells.

Adjusted discrepancy standard deviation (deg Celcius)

−180 −135 −90 −45 0 45 90 135 180

−85

−45

045

85

0:1

0 1 2 3 4 5 6 7 8 9 10 11

Page 23: Assessing Model Discrepancy Using a Multi-Model Ensemblemazjcr/RougierMPI.pdf · and each model has its own discrepancy, d(1),...,d(m). I It might seem natural to treat the mean from

Discrepancy: Adjusted mean and variance

DJF surface temperature, 1995–1999, aggregated to 5◦ gridcells.

First eigenvector(14% of total trace)

Second eigenvector(9% of total trace)

Third eigenvector(6% of total trace)

Fourth eigenvector(5% of total trace)

Page 24: Assessing Model Discrepancy Using a Multi-Model Ensemblemazjcr/RougierMPI.pdf · and each model has its own discrepancy, d(1),...,d(m). I It might seem natural to treat the mean from

Summary

I The discrepancy between model-output and actual climate is anuncertain quantity that can never be completely observed, and so allmodel-based climate inference ought to be statistical inference.

I Multi-model ensembles (MMEs) and actual climate observationsboth contain relevant information about the discrepancy. But thereis no ‘automatic’ method for combining them into an estimate:judgements will be required!

I Statistical models provide a framework within which our judgementsabout the discrepancy can be validated and improved.‘Independence’ in the MME is not a tenable statistical model, butSecond Order Exchangeability (SOE) allows us to incorporateshared sources of model-error.

I The Bayes linear framework within which SOE is implemented isinvariant to the number of members of the MME and scales well(polynomial, O(n3)) with the number of components in thediscrepancy vector.