Big Data Initiatives for Agroecosystems

19
Big Data Initiatives for Agroecosystems Cynthia Parr Knowledge Services Division National Agricultural Library Ecological Society of America, 2015

Transcript of Big Data Initiatives for Agroecosystems

Page 1: Big Data Initiatives for Agroecosystems

Big Data Initiatives for Agroecosystems

Cynthia ParrKnowledge Services DivisionNational Agricultural Library

Ecological Society of America, 2015

Page 2: Big Data Initiatives for Agroecosystems

Outline

• Data management at the National Agricultural Library

• Four examples1. Insects 5K – i5K Workspace2. Life Cycle Assessment3. Long-Term Agroecosystem

Research 4. Ag Data Commons

• General principles8.1 million items, Agricola, PubAg

Page 3: Big Data Initiatives for Agroecosystems

3http://blog.thingarage.com/

raw data

citable publication

Page 4: Big Data Initiatives for Agroecosystems

4

raw data collection

cleaning, enrichment, analysis

registration, preservation

temporary data

referable data

citable data

citable publication

Modified from Peter Wittenberg, Research Data Alliancehttps://rd-alliance.org/group/data-fabric-ig.html

Page 5: Big Data Initiatives for Agroecosystems

i5k.nal.usda.gov

5

Page 6: Big Data Initiatives for Agroecosystems

Genome project hosting at the i5k Workspace

• 27 pilot genomes hosted; 45 total– Storage and dissemination of a

genome assembly and anything mapped to it.

– BLAST, JBrowse Genome Browser• Manual Curation: Web Apollo• Post-curation maintenance

– Quality Control – Official Gene Set generation

• Research plan• Generate material• Sequencing• Assembly• Automated

annotation

• Manual Curation• Official gene set

generation• Genome project

maintenance

• Biological insights/Publication

Genome Project Trajectory

Page 7: Big Data Initiatives for Agroecosystems

Life Cycle Assessment Commons

7

www.lcacommons.gov

Page 8: Big Data Initiatives for Agroecosystems

Unformatted, non-standard

LCA Commons Concept

LCA Community

Open LCA FrameworkCommon computing environment, application,

data standards, and development

NAL LCADC

NREL USLCI

XYZ LCI DB

ABCLCI DB

Distributed computing environment & application

Common data standards

Distributed computing environment

DEFLCI DB

Common application & data standards

Interoperability Tools

Ag Data Commons

Catalog and Repository

Page 9: Big Data Initiatives for Agroecosystems

Long Term Agro-ecosystem Research (LTAR)

Page 10: Big Data Initiatives for Agroecosystems

LTAR Data

Common Observatory– Meteorology– Hydrology– Eddy flux CO2

– Non-CO2 gasses– Soil– Biological

10

Common Experiment Approach

– Business as usual– Aspirational

Will include data about– Management practices– Results

Page 11: Big Data Initiatives for Agroecosystems

LTAR Data Loss N=194 of ~500 citations in 2011 LTAR site proposals

Bad links to data

No data available

80% of papers provide no way to obtain data

Data are accessible

Refers to general data source

Page 12: Big Data Initiatives for Agroecosystems

LTAR information management

• Support for download of files, web services• Metadata in FGDC CSDGM, ISO 19115, EML,

Project Open Data• Catalog of instrument specs using SensorML 2• Data dictionaries in ISO 19110• Weather data to be converted to other formats• Field names could be converted to match different

conventions (AgMIP, etc.)

Page 13: Big Data Initiatives for Agroecosystems

Ag Data Commons

13

Page 14: Big Data Initiatives for Agroecosystems

data.nal.usda.gov

EnhancedDKAN

Page 15: Big Data Initiatives for Agroecosystems

Distributed repositories

AG DATA COMMONS

Search & Knowledge Discovery

Thesaurus &Indexing

Ag Data CommonsRepository

Organization & Curation

Grant management

systems

INGESTION DISSEMINATION

PubAg

DatasetSubmission

Analytics & Tools

Data.govForest Service

NCBI

Ag Data Commons

Catalog

Color Legend:BuildingAdapt/Re-useExisting

LCA Commons

Page 16: Big Data Initiatives for Agroecosystems

Guiding principle 1:a distributed network ….

Geospatial Catalog

Geospatial Repository

STEWARDS

Ag Data Commons (catalog)

Ag Data Commons

(repository)

USDA Enterprise Inventory

National Weather Service

Data.gov

Ecosystems.data.gov

of Networks…

Page 17: Big Data Initiatives for Agroecosystems

Public access to open, machine readable data enables larger

scale, integrative and innovative data science

The long tail

Guiding principle 2:big data AND long tail

Page 18: Big Data Initiatives for Agroecosystems

Guiding principle 3:curation adds value

• Data dictionaries• Standards & templates• Linkages• Semantics• Preservation

Page 19: Big Data Initiatives for Agroecosystems

Thanks!

National Agricultural LibraryKnowledge Services Division: Susan McCarthy

LTARJeffrey Campbell, Charles Lockwood

i5K Monica Poelchau, Chris Childers

LCA Commons Peter Arbuckle, Ezra Kahn

Ag Data Commons Ursula Pieper, Jocelyn McNamara, Qing Qu, Erin Antognoli, Melissa Lowrey, Jaylen Nathwani, NuCivic

… and collaborators and testers