a semantic database of temperature proxies covering the common era
Transcript of a semantic database of temperature proxies covering the common era
PAGES2K SEMANTIC DATABASE
A SEMANTIC DATABASE OF TEMPERATUREPROXIES COVERING THE COMMON ERA
Julien Emile-Geay1, Nicholas P. McKay2, Jianghao Wang1, Darrell Kaufman2 & PAGES2k consortium
Abstract—Reconstructions of surface temperature overthe past 2,000 years extend our knowledge of climatesystem behavior beyond the instrumental era, helping todistinguish between exogenous and endogenous sourcesof climate variability, a fundamental frontier of climatescience. In this study, we describe the latest incarnationof the PAGES 2k global multi-proxy database, a multi-proxy, community-curated pool of paleoclimate records.The database is sttructured as Linked Open Data usinga JSON-LD container, allowing for semantic relationsto be discovered between its objects and other LinkedData. We describe elementary statistical analyses possiblewith this new data resource, present a reconstruction ofglobal surface temperature via Markov random fields, andencourage experimentation via other forms of machinelearning and artificial intelligence.
I. MOTIVATION
Low-frequency climate variability is crucial to adap-tation and planning, and can only be adequately con-strained by paleoclimate observations. Today, however,the vast majority of paleoclimate datasets are in dis-parate formats, incommensurate with each other andwith climate model output. This fundamentally limitsour ability to use them for the validation of Earth systemmodels and, hence, for effective decision-making. It istherefore essential to bring all relevant observations intoa consistent format. Two ingredients have made thispossible:
1) PAGES2k1, a community-driven effort to syn-thesize all publicly-archived, temperature-sensitiveproxy records of the past 2,000 years [1].
2) LiPD, a new container designed to make paleocli-mate data intelligible to machines [2]
The Linked Paleo Data (LiPD) format uses a JSON-LD framework to flexibly annotate metadata, explicatingdomain-specific definitions via a context file. The datathemselves are stored in .csv files.
Corresponding author: J. Emile-Geay, Department of Earth Sci-ences, University of Southern California, Los Angeles, CA1
School of Earth Sciences and Environmental Sustainability, North-ern Arizona University, Flagstaff, AZ 2
1http://www.pages-igbp.org/ini/wg/2k-network/data
80oS
60oS
40oS
20oS
0o
20oN
40oN
60oN
80oN
PAGES 2K network (Phase 2) as of 2015/07/21 (724 records from 667 sites)
bivalve
borehole
coral
historic
hybrid
ice core
lake sediment
marine sediment
sclerosponge
speleothem
tree
200 400 600 800 1000 1200 1400 1600 1800 2000
0
100
200
300
400
500
600
700
Temporal Availability
# p
roxi
es
Year (CE)
0
50
100
150
200First Millennium
Fig. 1. Spatiotemporal data availability in the PAGES2k database
II. DATABASE SYNOPSIS
The database presently contains 724 records from 667locations around the globe, spanning all or part of thepast 2,000 years, with resolution going from monthly tocentennial (Fig 1). There are 11 proxy types and dozensof measurement types. Records were selected by manyvolunteers worldwide, based on the following criteria:
Relation to temperature.The dataset includes proxy records for whichstatistical or mechanistic evidence of tempera-ture exists
Duration.� 500y for non-annually resolved archives,300y for terrestrial archives, 50y for annualmarine archives.
Chronological accuracy.For non-annually resolved records, primarychronological information was archived to en-able age modeling.
Resolution.� 1 data point every 50 years on average(except marine sediments, for which 200 yearsis the minimum average sample interval).
EMILE-GEAY & PAGES2K CONSORTIUM
80oS
60oS
40oS
20oS
0o
20oN
40oN
60oN
80oN
Screened PAGES2k network (fdr, 252 records from 245 sites)
bivalve
coral
historic
ice core
lake sediment
marine sediment
sclerosponge
speleothem
tree
200 400 600 800 1000 1200 1400 1600 1800 2000
0
50
100
150
200
250
Temporal Availability
# p
roxi
es
Year (CE)
0
50
First Millennium
Fig. 2. Screening while controlling for the false discovery rate.
Public Domain.The records are publicly available and citable.
III. DISCOVERING TEMPERATURE RELATIONS
A natural question is the extent to which these proxyrecords capture large-scale temperature information. Forthis we evaluated correlations between records (withmore than 20 available observations over the 1850-2010interval) and the HadCRUT4.2 temperature dataset [3]on a 5 ⇥ 5� grid, who missing values were infilledvia the GraphEM algorithm [4]. Because there are onlyO(160) years of instrumental data, and 2592 grid points,this is a “large p, small n problem”. Moreover, climatetimeseries feature a “warm-colored” spectrum, which in-validates many statistical assumptions (e.g. IID observa-tions). We thus evaluate the significance of mean annualtemperature correlations against isospectral Monte Carlosurrogates [5], controlling for false discoveries [6]. 252records pass this screening test (Fig 2).
IV. A PAGES2K RECONSTRUCTION OF GLOBALSURFACE TEMPERATURE
Fitting this screened dataset against HadCRUT4.2mean annual temperature using a Gaussian graphicalmodel [4] with sparsity induced by the graphical lasso[7], we reconstructed global surface temperature over thepast 2,000 years. The global mean is shown in Fig 3,together with uncertainties. The dataset can be used todiagnose the response to volcanic & solar forcing, gaugethe unusual character of twentieth century warming, orprobe the continuum of climate variability.
V. DISCUSSION
These applications only scratch the surface of whatthis database enables. We envision its use for fingerprint-ing natural climate forcings, evaluating global climate
200 400 600 800 1000 1200 1400 1600 1800 2000
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
RE/CE = +0.90
R2 = +0.90
MSE = +0.00
Year (CE)
Tem
p A
nom
Global mean temperature (20-year lowpass)
Reconstruction using hybrid GraphEM
HadCRUT4
Fig. 3. Global mean temperature reconstruction using the PAGES2khigh-resolution records and the GraphEM[4] algorithm
models, establishing relations to other climate fields(e.g. drought indices) or to (pre-)historical events. Oncepublished [8] the data will be available as Linked OpenData, allowing for web-based discovery and linkages toother datasets.
ACKNOWLEDGMENTS
Funding for the authors was provided by NSF grantsAGS-1003818, EAR-1347221 and ICER-1541029.
REFERENCES
[1] Kaufman, Darrell S. & PAGES 2K Consortium, “A community-driven framework for climate reconstructions,” Eos, Transactions
American Geophysical Union, vol. 95, no. 40, pp. 361–368, 2014.[2] McKay, Nicholas P. and J. Emile-Geay, “The Linked Paleo
Data framework: a common tongue for paleoclimatology.”https://www.authorea.com/users/17200/articles/19163/ show article,March 2015.
[3] C. P. Morice, J. J. Kennedy, N. A. Rayner, and P. D. Jones,“Quantifying uncertainties in global and regional temperaturechange using an ensemble of observational estimates: The had-crut4 data set,” Journal of Geophysical Research: Atmospheres,vol. 117, no. D8, pp. n/a–n/a, 2012.
[4] D. Guillot, B. Rajaratnam, and J. Emile-Geay, “Statistical paleo-climate reconstructions via Markov random fields,” Ann. Applied.
Statist., pp. 324–352, 2015.[5] W. Ebisuzaki, “A method to estimate the statistical significance
of a correlation when the data are serially correlated,” Journal
of Climate, vol. 10, pp. 2147–2153, 2011/10/22 1997.[6] V. Ventura, C. J. Paciorek, and J. S. Risbey, “Controlling the
Proportion of Falsely Rejected Hypotheses when ConductingMultiple Tests with Climatological Data,” Journal of Climate,vol. 17, pp. 4343–4356, 2013/02/25 2004.
[7] J. Friedman, T. Hastie, and R. Tibshirani, “Sparse inverse covari-ance estimation with the graphical lasso,” Biostat, vol. 9, no. 3,pp. 432–441, 2008.
[8] PAGES2K Consortium, “A global multiproxy database for tem-perature reconstructions of the Common Era,” Scientific Data, inprep.