Reusable Software and Open Data To Optimize AgricultureDavid LeBauer
AGU 2015 Fall Meetings@dlebauer
OverviewIdeas: Software: Modular, Reusable, and Useable Data: Harmonization, Distribution Workflows: Reproducible, Automated Science: Cumulative and Synthetic
Examples:
PEcAn ProjectBETYdb TERRA Ref
Agriculture: Model and Application
Food, fuel, and other ecosystem services (e.g. C, N, H2O) Basic science: genes to organism to ecosystem Engineering applications: computing, data collection, prediction
Enzyme Ecosystem Continent
OverviewIdeas: Software: Modular, Reusable, and Useable Data: Harmonization, Distribution Workflows: Reproducible, Automated Science: Cumulative and Synthetic
Examples:
PEcAn ProjectBETYdb TERRA Ref
betydb.org github.com/pecanproject/bety @BETYdatabase
BETYdbDatabase for meta-analysis (BETYdb)
Model-data synthesis, provenance (PEcAn)Link Genomics to Phenomics (TERRA Ref)
Spreadsheet
Publication Extraction
Analysis…Publication Extraction
Meta Analysis
LeBauer and Treseder 2008
BETYdb: Data entry Workflow
BETYdb.org →docs →data entry workflow LeBauer et al, in prep
Technicians Enter and CheckScientist Identifies Data
Data Access
/search?search=Salix+vcmax
R Web Application + API
BETYdb.org →docs →Data Access
PEcAn
OverviewIdeas: Software: Modular, Reusable, and Useable Data: Harmonization, Distribution Workflows: Reproducible, Automated Science: Cumulative and Synthetic
Examples:
PEcAn ProjectBETYdb TERRA Ref
pecanproject.org github.com/pecanproject/pecan @PEcAnproject
PEcAn: complex models in complex workflows
Modeling Information SystemsDietze 2016 Princeton University Press
BioCro / Wimovac Crop ModelHumphries and Long, 2005
Miguez et al 2009
Ecosystem Modeling c. 2012
Select Site Configure Run Visualize, ExportRun Model
Dietze, Kooper, LeBauer 2012
LeBauer et al 2013
Given available data,How well do we know parameters?
How does this affect prediction?What should we collect?
PEcAn: Sensitivity Analysis & Variance Decomposition
BETYdb + PEcAn
BETYdb is PEcAn’s informatics backendProvides data, workflow and data provenanceFederated network of databases
OverviewIdeas: Software: Modular, Reusable, and Useable Data: Harmonization, Distribution Workflows: Reproducible, Automated Science: Cumulative and Synthetic
Examples:
PEcAn ProjectBETYdb TERRA Ref
terraref.ncsa.illinois.edu github.com/terraref @terra_ref
TERRA: Better Breeding Through Science
We have increased yields many times in the last 60 years. What new opportunities does modern science provide?
University of Illinois Integrated Pest Management
• Use scientific understanding to select for traits • Replace manual measurement with remote sensing • Target specific genes and phenotypes in crosses
ARPA-E TERRA ProgramSix Funded Teams $30 m in awards $5 m in sensors
TERRA Ref:Public reference dataHPC Computing
TERRA Ref: An Agricultural ObservatorySimilar to and informed by:
Large Synoptic Survey Telescope
National Ecological Observatory Network
Open: Science, Data, SoftwareUseable: Useful and Familiar to Scientists, Breeders, Precision AgModular: Extensible, Distributed, Automated, InteroperableInterdisciplinary: Genes to Ecosystems with Robots, Vision, StatisticsScalable: From Mobile Devices to High Performance Computers
terraref.ncsa.illinois.edu @terra_ref github.com/terraref
TERRA Reference Data and Computing
Sensor Data Sources
Lemnatec IndoorDanforth, St. Louis
Lemnatec FieldUSDA ALRC, Maricopa, AZ
Tractor and UAVKansas State
Plus, other teams, public, (sharing optional)
Shared Sorghum genomics and germplasm,
Reference DataRaw Sequence DataAligned ReadsSNPs
ImagesSpectraPoint cloudsShapes
Biomass, GrowthTissue ChemistryPhotosynthesis
YieldStress ToleranceEcosystem Services
Big Data Volume & VelocityImaging Spectrometers: VNIR ~3-4 TB/d SWIR ~1 TB/d3D Laser Scanner ~ 1 TB/d
4 Year Total: 1 - 40 PB
VNIR
SWIR 3D …Everything else
Computing and StorageRoger Server: 1PB online, GIS optimizedNebula: NCSA Open Stack ServerBlue Waters: 10 PB tape storageYour Local: [Desktop, HPC, or Sensor Platform]
Data Products Standards CommitteePaul Bartlett Near Earth AutonomyJeff White USDA ALARC, ICASAMelba Crawford Purdue UniversityMichael Gore,Elodie Garazave Cornell University
Matt Colgan Blue RiverChrister Janssen PNNLBarnabas Poczos Carnegie MellonAlex Thomasson Texas A&M UniversityCheryl Porter University of Florida, AgMIP, USDAShawn Serbin Brookhaven National Lab, PEcAnShelly PetroyChristine Laney NEON
Carolyn J. Lawrence-Dill Iowa State, AgBioDataEric Lyons University of Arizona, CoGETed Habermann HDF Group
Participants• Project representatives• Domain Experts• Scientific Community (You)*
Responsibilities• Define Data• Revise, Improve• Training, Outreach
* github.com/terraref/reference-data/issues
Computing PipelineData Uploaded via API
Triggers Analytical Pipeline
Generates and Stores Data, Metadata
Users select data, launch VM: Favorite Software Data Mounted HPC Access States can be Shared, Archived
Acknowledgements
Projects: PEcAn, NCSA, BrownDog, Plants In Silico, CyberGIS, National Data Service, USDA, AgMIP, National Data ServiceData: Providers and CuratorsMentors: Mike Dietze, Steve Long, Kathleen TresederFunding: NSF, EBI, ARPA-E, DOE, NASA
Contact
Web GitHub s Twitter a
David LeBauer [email protected] dlebauer @dlebauer
BETYdb betydb.org pecanproject/bety @BETYdatabase
PEcAn Project pecaproject.org pecanproject/pecan @PEcAnproject
TERRA Ref terraref.ncsa.illinois.edu terraref @terra_ref
PIs Amy Colin-Marshal, Steve Long, James O’Dwyer, Diwakar Shukla
Plants in SilicoMulti-scale modeling platform to predict crop response to climate change
Top Related