1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research...
-
Upload
clayton-bachelder -
Category
Documents
-
view
213 -
download
0
Transcript of 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research...
1
A Case Study in E-Science:
Building Ecological Informatics Solutions
for Multi-Decadal Research
ARL/CNI 2008 Conference
Washington, DC
16 October 2008
2
Roadmap
• Why multi-decadal research?– A brief history of LTER
• Data/information challenges• Ecological informatics
– Current state-of-the-art– Future
3
Roadmap
• Why multi-decadal research?– A brief history of LTER
• Data/information challenges• Ecological informatics
– Current state-of-the-art– Future
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 4
Long-Term Research is Required to Reveal:
• Slow processes or transients
• Episodic or infrequent events
• Decadal trends
• Multi-factor responses
• Processes with major time lags
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 5
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 6
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 7
Roadmap
• Why multi-decadal research?– A brief history of LTER
• Data/information challenges• Ecological informatics
– Current state-of-the-art– Future
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 8
Data Dispersion• Data are massively dispersed
– Ecological field stations and research centers (100’s)– Natural history museums and biocollection facilities (100’s)– Agency data collections (100’s to 1000’s)– Individual scientists (1000’s to 10,000s)
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 9
Data EntropyIn
form
atio
n C
on
ten
t
Time
Time of publication
Specific details
General details
Accident
Retirement or career change
Death
(Michener et al. 1997)
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 10
Data IntegrationJones et al. 2007
• Data are heterogeneous– Syntax
• (format)
– Schema• (model)
– Semantics• (meaning)
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 11
Source: John Gantz, IDC Corporation: The Expanding Digital Universe
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
900,000
1,000,000
2005 2006 2007 2008 2009 2010
Information and Storage
Transient information or unfilled demand for storage
Information
Available Storage
Petabytes Worldwide
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 12
Roadmap
• Why multi-decadal research?– A brief history of LTER
• Data/information challenges• Ecological informatics
– Current state-of-the-art– Future
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 13
Ecological Informatics
A discipline A discipline
thatthat
incorporates both incorporates both concepts and practicalconcepts and practical toolstools
for thefor the
understanding, generation, processing, understanding, generation, processing, preservation and propagationpreservation and propagation of ecological of ecological
data, information and knowledge.data, information and knowledge.
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 14
Data Archives
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 15
Existing Tools Provide Needed Functionality:
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 16
Metacat Data Distribution
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 17
Roadmap
• Why multi-decadal research?– A brief history of LTER
• Data/information challenges• Ecological informatics
– Current state-of-the-art– Future
• Science• Technology• Sociocultural dimension
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 18
Global ChangeSmith, Knapp, Collins. In press.
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 19
Critical Areas in the Earth System
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 20
Dec
reas
ing
Spa
tial C
over
age
Incr
easi
ng P
roce
ss K
now
ledg
e
Adapted from CENR-OSTP
Remotesensing
Intensive science sites andexperiments
Extensive science sites
Volunteer & education networks
Knowledge Pyramid
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 21
Technology Directions
• CI enabling the science• Whole-data-life-cycle • Domain-agnostic solutions
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 22
Focus on CI that Enables the Science(end-to-end solutions)
• Discovery, access, and use
• Open access to holdings (and tools)
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office
Support the Data Lifecycle
– Reliable, replicated storage infrastructure
– Interoperability across data centers
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 24
Examples of Data HoldingsData Center Types of Data Managed
Metadata Standard(s)
National Biological Information Infrastructure
Biodiversity, taxonomic, ecological BDP, Dar, Dub, OGIS
Oak Ridge National Laboratory – Distributed Active Archive Center
Biogeochemical dynamics, terrestrial ecological Earth observation imagery
DIF, BDP, ECHO
Long Term Ecological Research Network
Ecological, biodiversity, biophysical, social, genomics, and taxonomic
EML
Avian Knowledge Network Avian populations and molecular biology Dub
Atlas of Living Australia (ALA) Biological and taxonomic Dub subset
South African Environmental Observatory Network (SAEON)
Biophysical, biodiversity, disturbance, and Earth observation imagery
EML
Taiwan Ecological Research Network (TERN)
Biodiversity, biotic structure, function/process, biogeochemical,
climate, and hydrologic
EML
Metadata Interoperability Across Data Holdings
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office
Data Interoperability: Ontologies and Semantic Mediation
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 26
Earth & Space Life Physical
Engi-neering
DOMAINS : top level designation for areas of study within a discipline.
DISCIPLINE : major branch of knowledge or learning.
SCIENCE
HUMANITIES
SOCIAL SCIENCE
Domain-Agnostic Solutions
Domain Agnostic: practice or tool that crosses domains
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 27
Kilo Nalu Workflow
Streaming Datafrom observatoryDataTurbine Server
Graphs and derived data can bearchived and displayed
now <- Sys.time()Epoch <- now - as.numeric(now)timeval <-Epoch + timestampsposixtmedian = median(timeval)mediantime = as.numeric(posixtmedian)meantemp = mean(data)
Support application scriptsin R, Matlab, etc.
Modular components,easily saved and shared
•Publish to workflow repository with accession number•Documents the linkage between publication, analysis, and data
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 28
Kepler Use Cases Represent Many Science Domains
• Ecology– SEEK: Ecological Niche
Modeling – REAP:environmental sensor
networks– NEON: Ecological sensor
networks• Molecular biology
– SDM: Gene promoter identification
– ChIP-chip: genome research– CAMERA: metagenomics
• Oceanography– REAP: SST data processing– LOOKING: ocean observing CI– ROADNet: real-time modeling– Ocean Life project
• Physics– CPES: Plasma fusion simulation– FermiLab: particle physics
• Chemistry– Resurgence: Computational
chemistry– DART (X-Ray crystallography)
• Library science– DIGARCH: Digital preservation– Cheshire digital library: archival
• Conservation biology– SanParks: Thresholds of Potential
Concerns• Geosciences
– GEON: LiDAR data processing– GEON: Geological data
integration
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 29
Workflow Sharing Portal
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 30
Sociocultural Directions
• Education and training• Engaging citizens in science• Building global communities of practice
Mound built by cathedral termites
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 31
Experiential, Career-long Education and Training
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 32
Citizen Science
www.CitizenScience.org
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 33
Building Global Science Communities of Practice via CI
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 34
…a wide range of partnering organizations
• Libraries & digital libraries • Academic institutions • Research networks • NSF- and government-funded
synthesis & supercomputer centers/networks
• Governmental organizations • International organizations • Data and metadata archives • Professional societies • NGOs • Commercial sector
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 35
Longevity of CI Enterprises• Broad, active community engagement
– Involvement of library and science educators engaging new generations of students in best practices
– Existing outreach and education programs
• Transparent, participatory governance• Adoption/creation of sustainable business models• Strong organizational sustainability
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office
SPECIALIZED:[FEW USERS – e.g., Econ. Dev.]
Massively Parallel Systems, Specialized Codes
ADVANCED:[MODERATE USE – e.g., Research]
HPC clusters, Community Codes,
Viz Tools
BASIC: [UBIQUITOUS USE]
Campus Data Networking,Video-conferencing,
Data Archive (UNM, Branch Campuses, NM Institutions of
Higher Learning, etc.)
Help-Desk Support, Databases, Collections, Digital Archive, Collaboration Technologies, etc.
Visualization & Analytical Tools
HPC & Nat’l NetworksLarge cycles and High Bandwidth
SPECIALIZED:Fundamental research
ADVANCED:Undergraduate and Graduate
Programs (in Computing,Library, and Cognitive
Sciences)
BASIC:University-wide informatics courses
(e.g., creation of an Information Sciences Program
(ISP Certificate)
Research Services
CyberinfrastructureAcademics
Building a Computational- and Information-Literate
Work Force(i.e., evolving a School of Computing,
Information & Library Science)
Libraries, HPC Centers, etc.--Preserving, Protecting,
Processing, and Disseminating Data,
Information, and Knowledge
Serving UNM, Academia, the State,
and Economic Development in NM
University of New Mexico Academic CI Planning for the 21st Century
Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research
Network Office 37
Thanks!
• Suzie Allard – University of Tennessee• Matt Jones – University of California Santa Barbara• Mike Frame – USGS, National Biological Information Infrastructure• Patricia Cruse – California Digital Library• Bob Cook – Oak Ridge National Laboratory DAAC• Steve Kelling – Cornell Lab of Ornithology• DataNetONE Partners & Kepler-CORE Team