Community-Supported Data Repositories in Paleoecology and Paleoclimatology: The ‘Middle Tail’...

17
in Paleoecology and Paleoclimatology: The ‘Middle Tail’ between Geoscientific Users and Geoinformatics Neotoma DB www.neotomadb.org C4P Jack Williams, Allan Ashworth, Brian Bills, Jessica Blois, Don Charles, Simon Goring, Russ Graham, Eric Grimm, Alison Smith, & Mark Uhen Part I: Building the Middle Tail: Community-Led Data Repositories Part II: Interconnecting the Middle Tail: Cyberinfrastructure for the Paleogeosciences

Transcript of Community-Supported Data Repositories in Paleoecology and Paleoclimatology: The ‘Middle Tail’...

Page 1: Community-Supported Data Repositories in Paleoecology and Paleoclimatology: The ‘Middle Tail’ between Geoscientific Users and Geoinformatics Neotoma DB.

Community-Supported Data Repositories in Paleoecology and Paleoclimatology: The ‘Middle

Tail’ between Geoscientific Users and Geoinformatics

Neotoma DBwww.neotomadb.org C4P

Jack Williams, Allan Ashworth, Brian Bills, Jessica Blois, Don Charles, Simon Goring, Russ Graham, Eric Grimm, Alison Smith, & Mark Uhen

Part I: Building the Middle Tail: Community-Led Data RepositoriesPart II: Interconnecting the Middle Tail: Cyberinfrastructure for the Paleogeosciences

Page 2: Community-Supported Data Repositories in Paleoecology and Paleoclimatology: The ‘Middle Tail’ between Geoscientific Users and Geoinformatics Neotoma DB.

Many Big Questions require assembly of individual paleorecords into larger networks

Do global temperatures lead or lag CO2 during deglaciations?

21,000 11,000 Modern15,000 7,000

%

Spruce distributions: last glacial maximum to present

%

%

%

No Data

Williams et al. (2004) Ecological Monographs

SprucePollen

Ice IceIce

How far and fast can species migrate when climates change?

Global temperatures & CO2: 22ka->0ka

Shakun et al. (2012) Nature

Page 3: Community-Supported Data Repositories in Paleoecology and Paleoclimatology: The ‘Middle Tail’ between Geoscientific Users and Geoinformatics Neotoma DB.

Paleoecological Data: Key characteristics• ‘Long Tail’: Collected in the field by small

scientific teams. Scientists vary w.r.t. data management expertise, capacity, interest

• Highly valuable: specimens & samples collected decades ago are still analyzed

• Distributed scientific expertise: by proxy type, region, time period, and/or taxonomic group

C4P

“Big Data”

“Long Tail”

Datasets

Dat

a S

ize

Neotoma DBwww.neotomadb.org

Page 4: Community-Supported Data Repositories in Paleoecology and Paleoclimatology: The ‘Middle Tail’ between Geoscientific Users and Geoinformatics Neotoma DB.

Solution: Community-Led Data Repositories (COLDARs) as ‘middle tail’ for long-tail data

Neotoma DBwww.neotomadb.org

Key Characteristics

Open Data

Curated by Community

Added Value by serving community-specific needs (e.g. age models, taxonomy)

Paleobiology DBpaleobiodb.org

Page 5: Community-Supported Data Repositories in Paleoecology and Paleoclimatology: The ‘Middle Tail’ between Geoscientific Users and Geoinformatics Neotoma DB.

Neotoma DBwww.neotomadb.org

accessible

small data

BIG DATA

findable

identification,persistence

authorization,protocols

context,provenance

re-usable

harmonized, community governance & input

interoperable

“… data have no value or meaning in isolation; they exist within a knowledge infrastructure — an ecology of people, practices, technologies, institutions, material objects, and relationships.” - C.L. Borgman

Moving up the Value Chain: Generic Depositories vs. Community-Led Repositories

Modified from K. Lehnert

Community-Led Repositories

Generic Depositories

Page 6: Community-Supported Data Repositories in Paleoecology and Paleoclimatology: The ‘Middle Tail’ between Geoscientific Users and Geoinformatics Neotoma DB.

Neotoma Paleoecology Database: Community-Led Repository for Quaternary and Pliocene Data

Design Concepts• Spatiotemporal Database: species

occurrences & abundances in space & time

• Age Controls and Age Models stored

• Centralized IT and Distributed Scientific Governance Neotoma composed of several constituent databases (e.g. North American Pollen Database, FAUNMAP)

• Open Data accessible via Explorer, APIs, R Neotoma

• Broad User Community: Paleoecologists, ecosystem modellers, paleoclimatologists, biogeographers, educators, … Neotoma DB

www.neotomadb.org

Page 7: Community-Supported Data Repositories in Paleoecology and Paleoclimatology: The ‘Middle Tail’ between Geoscientific Users and Geoinformatics Neotoma DB.

• Time: Late Neogene (~last 5 million years)

• Most records: 104-105 yrs• Space: North American to

Global• Paleoecological Data

• Plants & pollen• Vertebrates• Ostracodes• Diatoms• Insects• Testate Amoebae• Physical Sedimentology

Brewer et al. 2012 TREE

Neotoma Domain Temporal Domains of Paleoecological Databases

Neotoma DBwww.neotomadb.org

Page 8: Community-Supported Data Repositories in Paleoecology and Paleoclimatology: The ‘Middle Tail’ between Geoscientific Users and Geoinformatics Neotoma DB.

Recent uploads to Neotoma Pubs Citing Neotoma & Constituent DBs

Neotoma Uploads, Citations, and Usage

Last updated: July 2015

2014 Usage StatisticsNeotoma Explorer: 1,918 unique usersNeotoma APIs: 1,562 unique usersNeotoma APIs: 241,469 requests

Neotoma DBwww.neotomadb.org

Page 9: Community-Supported Data Repositories in Paleoecology and Paleoclimatology: The ‘Middle Tail’ between Geoscientific Users and Geoinformatics Neotoma DB.

Data Preparation & Submission

Data Search & Retrieval

Neotoma Explorer

APIs

neotoma (R)

Neotoma DB

Tilia

Data Exploration & Visualization

Data Archival

Ice Age Mapper Niche ViewerStratigraphic DiagramsExplorer

Data Submission Web Application Downloadable

Database Snapshots

Neotoma Software EcosystemExists

In Development

Page 10: Community-Supported Data Repositories in Paleoecology and Paleoclimatology: The ‘Middle Tail’ between Geoscientific Users and Geoinformatics Neotoma DB.

Amoebae Data StewardsDeveloper Team

Bills (lead)AndersonBucklandDavisGoringGrimmRothWilliams

Executive Team

Grimm, Williams + 1 more

Users & Informaticists

Paleobiological Data Consortium

Neotoma Leadership Council

Graham, Blois, Davis, Barnosky, Colburn, Etnier, Jacisin, Maguire, Milideo, Smith, Warren

Josh Miller, Russ Graham

Grimm, Williams, Bills + 1 Developer & 3 Data Stewards

Bob Booth

Betancourt, Holmgren, Latorre, Rylander

Ashworth, Buckland, Punel

Alison Smith, Brandon Curry

Don Charles, Sonja HausmannBob Booth

Suzanne Pilaar Birch, Chris WidjaJon Nichols

Grimm, Bradshaw, Giesecke, Williams, Goring, Evans, Fletcher, Hopf, Markgraf, McGeever, Mitchell

Training Workshops

Diatoms

Insects

Middens

Pollen

Plant Macros

Vertebrates

Biomarkers

Isotopes

Taphonomy

Ostracodes

Neotoma Governance (Proposed)

Neotoma DBwww.neotomadb.org

Page 11: Community-Supported Data Repositories in Paleoecology and Paleoclimatology: The ‘Middle Tail’ between Geoscientific Users and Geoinformatics Neotoma DB.

Next Challenge: Organizing and Interconnecting the Middle Tail

C4P CINERGI Catalog: 224 Databases, 23 with geologic time metadata

C4PCINERGIhttp://pivots2.azurewebsites.net/c4p.html#pv-file-selection

Page 12: Community-Supported Data Repositories in Paleoecology and Paleoclimatology: The ‘Middle Tail’ between Geoscientific Users and Geoinformatics Neotoma DB.

EarthCube RCN: Cyberinfrastructure for Paleobioscience (C4P)

Goals

Build new partnerships and collaborations among geoscientists and technologists

Survey and catalog existing resources

Share news of the latest advances in cyberscience and paleogeoinformatics

Facilitate development of common standards and semantic frameworks C4P

Page 13: Community-Supported Data Repositories in Paleoecology and Paleoclimatology: The ‘Middle Tail’ between Geoscientific Users and Geoinformatics Neotoma DB.

EarthCube RCN: Cyberinfrastructure for Paleobioscience (C4P)

C4P

Activities

• Webinars & YouTube Channel: https://www.youtube.com/user/cyber4paleo

• CINERGI Catalog of paleoresources (databases, software, etc.) http://earthcube.org/content/cinergi-c4p-resource-viewer

• Paleobiology Workshop (May 2014)

• Geochronology Workshop (Oct 2014)

• Early Career Workshops – GSA 2014, 2015

• New Initiatives: Paleobiological Data Consortium (Neotoma/PBDB/…, PBDB-iDigBio, Open Core Data (CDSCO/IEDA/Neotoma/…)

Page 14: Community-Supported Data Repositories in Paleoecology and Paleoclimatology: The ‘Middle Tail’ between Geoscientific Users and Geoinformatics Neotoma DB.

PALEOBIOLOGICAL DATA

CONSORTIUM

COMMUNITYGEODATA

OPEN-SOURCE

BIODATA

Paleobiology DB

NOW DBContinental Scientific Drilling Office (CDSCO)

Digimorph

NOAA Paleoclimatology

DarwinCore

iDigPaleo

MorphoBank

Neotoma DB

VertNet

Early Career Members-at-Large

ROpenSci

GBIF/BISON

STEPPE

Open Geospatial Consortium

Integrated Earth Data Alliance

iDigBio

C4P

• Share best practices & protocols

• Build compatibility between geo- & bioinformatics

Page 15: Community-Supported Data Repositories in Paleoecology and Paleoclimatology: The ‘Middle Tail’ between Geoscientific Users and Geoinformatics Neotoma DB.

Current & Future Neotoma, C4P, & PDC Activities

1. Data Uploads (Neotoma; e.g. MIOMAP, Mexican Quaternary Mammal DB, ongoing)

2. All Hands Neotoma Workshop at AGU (Neotoma; Dec 2015)

3. One-Stop Queries for Neotoma & Paleobio DBs (Harmonized APIs & R packages) (PDC, ongoing)

4. Hackathon for Paleobiological Data (C4P; Summer 2016, invitations TBD!)

5. New tools for data visualization & exploration (Neotoma Taxa Mapper & Niche Viewer)

Neotoma DBwww.neotomadb.org C4P PDC

Page 16: Community-Supported Data Repositories in Paleoecology and Paleoclimatology: The ‘Middle Tail’ between Geoscientific Users and Geoinformatics Neotoma DB.

Sounds great! What’s in it for me?1. Interested in using Neotoma to archive your data and

make it available to others?• Catch me after session• Talk to a Data Steward• WebEx training for new Stewards

2. Interested in using Neotoma & other paleobio resources?• Neotoma Explorer walkthrough exercise: http://

serc.carleton.edu/neotoma/activities.html • neotoma (R) paper (Goring et al. 2015 Open Quaternary)• User workshops: ESA2016, IBS2017• Hackathon Summer 2016

3. Interested in integrating your resource (software/DBs) to Neotoma & other paleobio resources?• Catch me after session• Hackathon Summer 2016

Neotoma DBwww.neotomadb.org C4P PDC

Page 17: Community-Supported Data Repositories in Paleoecology and Paleoclimatology: The ‘Middle Tail’ between Geoscientific Users and Geoinformatics Neotoma DB.

This talk represents the work of many

Neotoma PIs & Developers: Eric C. Grimm, Russ Graham, Mike Anderson, Allan Ashworth, Brian Bills, Jessica Blois, Bob Booth, Ed Davis, Don Charles, Simon Goring, Steve Jackson, Alison Smith, Jack Williams

C4P RCN Steering Committee: Kerstin Lehnert, David Anderson, Doug Fils, Leslie Hsu, Chris Jenkins, Anders Noren, Tom Olsewski, Dena Smith, Mark Uhen, Jack Williams

Neotoma DBNSF-Geoinformatics

NSF-Earth Cube

Eric Grimm

C4P

Paleobiological Data Consortium: Mark Uhen, Jack Williams, Brian Bills, Jessica Blois, Ed Davis, Simon Goring, Russ Graham, Michael McClennen, Shanan Peters, Alison Smith

NSF-Earth CubePaleobio Data Consortium