Post on 14-Apr-2017
Cynthia Parr @cydparrUS Department of AgricultureNational Agricultural Library16 July 2016
Ag Data Commons Agricultural research
metadata and data
Federal directives: Public access to open, machine-readable data
The problems in agricultural data
• Broad subject areas• Journals not integrated with repositories like
Dryad• Too many existing databases & web distribution
points• Lack of infrastructure for long-tail data• Lack of a neutral, sustainable solution for long-
term multi-institutional projects
3
• Supports Public Access mandates• Holds agricultural research data• Primary audience: researchers• Holds metadata for data held elsewhere• Starting with USDA data but will broaden• Both human and machine access• Can include unpublished data that is ready
for release
Ag Data Commons
A proposed solution
AG DATA COMMONSSearch &
Knowledge Discovery
Thesaurus &Indexing
Ag Data CommonsRepository
Organization & Curation
Grant management
systems
INGESTION DISSEMINATION
PubAg
DatasetSubmission
Analytics & Tools
Data.govAg Data
Commons Catalog
LegendBuildingAdaptingExisting
Distributed repositories
Forest ServiceGeospatial
DKAN http://nucivic.com/dkan/ PRO• Open source community• Drupal modules for basic
CMS functions • Integrated CKAN catalog• Feeds Data.gov• Basic metadata already
supported
CON• Not designed for scientific
data or scientists• No links to literature• No Digital Object
Identifiers• Doesn’t handle dataset
relationships• Metadata inadequate for
compliance checking & re-use
6
Pilot FY 2016• ~35 non-NAL users• Almost 200 datasets (104
harvested)• Links to PubAg • Digital Object Identifiers• Metadata for compliance
checking and re-use• Support for program
collections• Policies and
documentation
https://data.nal.usda.gov/
The workflow1. Register yourself2. Organize your research products
– Files (resources) within one dataset (metadata record) = common data dictionary, common DOI
– Multiple datasets with different DOIs and different data dictionaries– Links to externally hosted products like source code vs CSV files– README file vs Data Dictionary
3. Prepare and save metadata and data 4. Submit for review by NAL curator5. Revise as directed6. Resubmit for review, request embargo as necessary,
record will then be published7. Receive DOI(s) from NAL curator and share with journal,
include in manuscripts, share on website
8
Research products
Include in the Ag Data Commons (or provide links)• Raw data files and/or Processed data files• Data dictionary or Readme
Do not submit with the data (include citation in metadata)• Manuscript• Figures/tables from manuscript
Research productsInclude as resources(resource can be URL pointer)• Web database• Software• Source code/Scripts/Workflows• User manuals
Do not submit with the data (include links in metadata)• Presentations associated with the study• News articles or press releases• Related or cited data
Metadata StandardsCore Metadata Schema
POD 1.1 (Project Open Data)https://project-open-data.cio.gov/
Related Scientific Metadata & Data Standards (e.g.)ISO 19115 (GIS Data, FGDC)https://www.iso.orgDarwin Core (Biodiversity standards)http://rs.tdwg.org/dwc/EML (Ecological Metadata Language)https://knb.ecoinformatics.org/#tools/emlMiXS GSC (Genomic Standards Consortium)http://gensc.org/projects/mixs-gsc-project/
12
13
14
Acknowledgements
Cynthia.Parr@ars.usda.gov
Susan McCarthy, NAL – KSDUrsula Pieper, NAL – ISDQing Qu, NAL – KSD contractor Jeff Campbell – NAL – KSDJaylen Nathwani, NAL – student internNüCivic, Angry Cactus TeamJocelyn McNamara -- NAL – KSD contractorKerry Huller – UMD graduate fellow Erin Antognoli – UMD graduate fellow