The CMIP multi model ensemble and IPCC: Lessons learned and questions arising

The CMIP multi model ensemble and IPCC:Lessons learned and questions arising

Reto KnuttiInstitute for Atmospheric and Climate ScienceETH Zurich, [email protected]

ETH Zurich | Reto Knutti

Reto Knutti / IAC ETH Zurich

Motivation

Sampling in a multi model ensemble

Combining and averaging models

Model independence

Weighting and metrics

Model tuning, evaluation and overconfidence

Conclusions

Contents


A1B DJF 2100 Temperature change (K)

The Coupled Model Intercomparison Projects A set of coordinated simulations from ~25 global climate models,

data freely available

Goals: understand model differences and processes, provide a basis for IPCC projections

Question: How we synthesize information from multiple models?


Interpreting the ensemble:- “Truth plus error”: The ensemble is a set of simulations where each model approximates the truth with some random error.- “Exchangeable/indistinguishable”: Each model is exchangeable with the other members and the real system. Observations are viewed as a single random draw from an imagined distribution of the space of all possible but equally credible climate models.

In the latter case the uncertainty is independent from the number of models.

Model dependencies (e.g. INGV-ECHAM4 and MPI-ECHAM5)

Best efforts vs. perturbed physics ensembles vs. experimental versionsClimate vs. Earth system model versionsHigh and low resolution versions

“Old” model versions (CMIP3)

Understanding the CMIP ensembles


What and how are we sampling?

Is B1 more uncertain than A2?

Are we sampling the uncertainties we know exist?


Ideally: Design study, define requirements, assess driving factors and uncertainties, build model/ensemble, evaluate, simulate, interpret.

CMIP: Build a model (or several), evaluate, run simulation.Then ask a question that may be answered with whatever data is available.

CMIP is not designed to answer a specific research problem.

CMIP5 specification of the simulations are carefully designed. But the ensemble is an ensemble of opportunity.

Designing a research project


Multi model averages We average models, because we think a model average is “better”

than a single model (but don’t really defend this).

But is it really? Is this the best we can do?

(IPCC AR4, Fig. SPM7)


Models improve, and averaging can help

(Reichler and Kim, BAMS 2007)

Model performanceBetter Worse


Models are not independent

Averaging is not very effective. Less than half of the temperature errors disappear for an average of an infinite number of models of the same quality.

Black dashed: sqrt(B/n+C)

Averaging should consider dependence.

Average of Nmodels

Average of best Nmodels

1/sqrt(N)

(Knutti et al., J. Climate 2010)


Most models shows areas of strong drying, but the multi model mean does not.

Loss of signal by averaging

(Knutti et al., J. Climate 2010)


How do we define a metric for a “good” model?

US CCSP report 3.1Aspects of observed climate that must be simulated to ensure reliable future predictions are unclear. For example, models that simulate the most realistic present-day temperatures for North America may not generate the most reliable projections of future temperaturechanges.

IPCC AR4 WGI FAQ 8.1 There is considerable confidence that climate models provide credible quantitative estimates of future climate change, particularly at continental scales and above. This confidence comes from the foundation of the models in accepted physical principles and from their ability to reproduce observed features of current climate and past climate changes.


Metrics should be simple

Metrics should demonstrably be related to the prediction

Results should be understood in terms of known processes

Robust against slight variations in the definition of the metric and other external choices (e.g. forcing)

Observations available with uncertainties sufficiently small to discriminate between models

Assumptions must be made on how the metric translates into model weights.

A weighted ensemble is not a PDF. A statistical interpretation of the ensembles is required.

Metrics and weighting (I)


Metrics and weighting (II) Present day mean climate provides a weak constraint on the future.

Are we looking at the wrong thing? Is adding complexity increasing the uncertainty?

Is the ensemble too narrow to begin with, so we can’t make progress? Have we used the information already in building the model?

(Knutti, Phil Trans Roy Soc 2008)


Establishing confidence in a prediction

Unlike in weather prediction , the probability/confidence in future climate change projections cannot be established by repeated verification.

We cannot verify our prediction, but only test models indirectly. Which tests are most appropriate?


End of model democracy?

www.ipcc.unibe.ch


“There should be no minimum performance criteria for entry into the CMIP multi-model database.”

“Researchers may select a subset of models for a particular analysis but should document the reasons why.”

“In many cases it may be appropriate to consider simulations from CMIP3 and combine CMIP3 and CMIP5 recognizing differences in specifications (e.g., differences in forcing scenarios). IPCC assessments should consider the large amount of scientific work on CMIP3, in particular in cases where lack of time prevents an in depth analysis of CMIP5.”

End of model democracy?

www.ipcc.unibe.ch


IPCC AR5: First order draft written by approx. mid 2011July 2012: All papers submitted and available to IPCC authorsMarch 2013: Papers accepted or published

AR5 Timeline as listed on Earth System Grid(http://esg-pcmdi.llnl.gov/internal/timetables/ipcc-ar5-timetable)

Sep 2008: All required experiments definedLate 2008: modeling groups begin running benchmark experiments 2009: modeling groups run models and produce outputJan 2010: model output starts to be made available to communityThe reality is that some modeling groups haven’t even started to run simulations…

How can we transfer, store and analyze 2 Petabytes of data?

Challenges: 2013 and 2,000,000,000,000,000


Skill?

Dependence?

Meaning of an average?

Structural uncertainty?

Range covered?

The weather analogy

?


Model sampling is neither systematic nor random, arbitrary prior.

CMIP3 is a collection of ‘best guesses’ not designed to span any uncertainty range.

Interpretation of the ensemble is unclear (truth+error vs. indistinguishable).

Model performance varies but most observable metrics provide only a weak constraint on the future. We don’t know how to weight models but implicitly do it by discarding old models.

Model averaging may help but can create unphysical results.

Models are not independent and not distributed around the truth.

Models are developed, evaluated and weighted on the same data.

Time constraint of IPCC AR5 and amount of data are serious issues.

CMIP3/5 is an amazing playground for statisticians and climate scientists.

Conclusions and challenges

The CMIP multi model ensemble and IPCC: Lessons learned and questions arising

Documents

Transcript of The CMIP multi model ensemble and IPCC: Lessons learned and questions arising