1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research...

37
1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008

Transcript of 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research...

Page 1: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

1

A Case Study in E-Science:

Building Ecological Informatics Solutions

for Multi-Decadal Research

ARL/CNI 2008 Conference

Washington, DC

16 October 2008

Page 2: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

2

Roadmap

• Why multi-decadal research?– A brief history of LTER

• Data/information challenges• Ecological informatics

– Current state-of-the-art– Future

Page 3: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

3

Roadmap

• Why multi-decadal research?– A brief history of LTER

• Data/information challenges• Ecological informatics

– Current state-of-the-art– Future

Page 4: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 4

Long-Term Research is Required to Reveal:

• Slow processes or transients

• Episodic or infrequent events

• Decadal trends

• Multi-factor responses

• Processes with major time lags

Page 5: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 5

Page 6: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 6

Page 7: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 7

Roadmap

• Why multi-decadal research?– A brief history of LTER

• Data/information challenges• Ecological informatics

– Current state-of-the-art– Future

Page 8: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 8

Data Dispersion• Data are massively dispersed

– Ecological field stations and research centers (100’s)– Natural history museums and biocollection facilities (100’s)– Agency data collections (100’s to 1000’s)– Individual scientists (1000’s to 10,000s)

Page 9: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 9

Data EntropyIn

form

atio

n C

on

ten

t

Time

Time of publication

Specific details

General details

Accident

Retirement or career change

Death

(Michener et al. 1997)

Page 10: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 10

Data IntegrationJones et al. 2007

• Data are heterogeneous– Syntax

• (format)

– Schema• (model)

– Semantics• (meaning)

Page 11: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 11

Source: John Gantz, IDC Corporation: The Expanding Digital Universe

0

100,000

200,000

300,000

400,000

500,000

600,000

700,000

800,000

900,000

1,000,000

2005 2006 2007 2008 2009 2010

Information and Storage

Transient information or unfilled demand for storage

Information

Available Storage

Petabytes Worldwide

Page 12: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 12

Roadmap

• Why multi-decadal research?– A brief history of LTER

• Data/information challenges• Ecological informatics

– Current state-of-the-art– Future

Page 13: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 13

Ecological Informatics

A discipline A discipline

thatthat

incorporates both incorporates both concepts and practicalconcepts and practical toolstools

for thefor the

understanding, generation, processing, understanding, generation, processing, preservation and propagationpreservation and propagation of ecological of ecological

data, information and knowledge.data, information and knowledge.

Page 14: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 14

Data Archives

Page 15: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 15

Existing Tools Provide Needed Functionality:

Page 16: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 16

Metacat Data Distribution

Page 17: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 17

Roadmap

• Why multi-decadal research?– A brief history of LTER

• Data/information challenges• Ecological informatics

– Current state-of-the-art– Future

• Science• Technology• Sociocultural dimension

Page 18: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 18

Global ChangeSmith, Knapp, Collins. In press.

Page 19: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 19

Critical Areas in the Earth System

Page 20: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 20

Dec

reas

ing

Spa

tial C

over

age

Incr

easi

ng P

roce

ss K

now

ledg

e

Adapted from CENR-OSTP

Remotesensing

Intensive science sites andexperiments

Extensive science sites

Volunteer & education networks

Knowledge Pyramid

Page 21: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 21

Technology Directions

• CI enabling the science• Whole-data-life-cycle • Domain-agnostic solutions

Page 22: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 22

Focus on CI that Enables the Science(end-to-end solutions)

• Discovery, access, and use

• Open access to holdings (and tools)

Page 23: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office

Support the Data Lifecycle

– Reliable, replicated storage infrastructure

– Interoperability across data centers

Page 24: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 24

Examples of Data HoldingsData Center Types of Data Managed

Metadata Standard(s)

National Biological Information Infrastructure

Biodiversity, taxonomic, ecological BDP, Dar, Dub, OGIS

Oak Ridge National Laboratory – Distributed Active Archive Center

Biogeochemical dynamics, terrestrial ecological Earth observation imagery

DIF, BDP, ECHO

Long Term Ecological Research Network

Ecological, biodiversity, biophysical, social, genomics, and taxonomic

EML

Avian Knowledge Network Avian populations and molecular biology Dub

Atlas of Living Australia (ALA) Biological and taxonomic Dub subset

South African Environmental Observatory Network (SAEON)

Biophysical, biodiversity, disturbance, and Earth observation imagery

EML

Taiwan Ecological Research Network (TERN)

Biodiversity, biotic structure, function/process, biogeochemical,

climate, and hydrologic

EML

Metadata Interoperability Across Data Holdings

Page 25: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office

Data Interoperability: Ontologies and Semantic Mediation

Page 26: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 26

Earth & Space Life Physical

Engi-neering

DOMAINS : top level designation for areas of study within a discipline.

DISCIPLINE : major branch of knowledge or learning.

SCIENCE

HUMANITIES

SOCIAL SCIENCE

Domain-Agnostic Solutions

Domain Agnostic: practice or tool that crosses domains

Page 27: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 27

Kilo Nalu Workflow

Streaming Datafrom observatoryDataTurbine Server

Graphs and derived data can bearchived and displayed

now <- Sys.time()Epoch <- now - as.numeric(now)timeval <-Epoch + timestampsposixtmedian = median(timeval)mediantime = as.numeric(posixtmedian)meantemp = mean(data)

Support application scriptsin R, Matlab, etc.

Modular components,easily saved and shared

•Publish to workflow repository with accession number•Documents the linkage between publication, analysis, and data

Page 28: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 28

Kepler Use Cases Represent Many Science Domains

• Ecology– SEEK: Ecological Niche

Modeling – REAP:environmental sensor

networks– NEON: Ecological sensor

networks• Molecular biology

– SDM: Gene promoter identification

– ChIP-chip: genome research– CAMERA: metagenomics

• Oceanography– REAP: SST data processing– LOOKING: ocean observing CI– ROADNet: real-time modeling– Ocean Life project

• Physics– CPES: Plasma fusion simulation– FermiLab: particle physics

• Chemistry– Resurgence: Computational

chemistry– DART (X-Ray crystallography)

• Library science– DIGARCH: Digital preservation– Cheshire digital library: archival

• Conservation biology– SanParks: Thresholds of Potential

Concerns• Geosciences

– GEON: LiDAR data processing– GEON: Geological data

integration

Page 29: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 29

Workflow Sharing Portal

Page 30: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 30

Sociocultural Directions

• Education and training• Engaging citizens in science• Building global communities of practice

Mound built by cathedral termites

Page 31: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 31

Experiential, Career-long Education and Training

Page 32: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 32

Citizen Science

www.CitizenScience.org

Page 33: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 33

Building Global Science Communities of Practice via CI

Page 34: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 34

…a wide range of partnering organizations

• Libraries & digital libraries • Academic institutions • Research networks • NSF- and government-funded

synthesis & supercomputer centers/networks

• Governmental organizations • International organizations • Data and metadata archives • Professional societies • NGOs • Commercial sector

Page 35: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 35

Longevity of CI Enterprises• Broad, active community engagement

– Involvement of library and science educators engaging new generations of students in best practices

– Existing outreach and education programs

• Transparent, participatory governance• Adoption/creation of sustainable business models• Strong organizational sustainability

Page 36: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office

SPECIALIZED:[FEW USERS – e.g., Econ. Dev.]

Massively Parallel Systems, Specialized Codes

ADVANCED:[MODERATE USE – e.g., Research]

HPC clusters, Community Codes,

Viz Tools

BASIC: [UBIQUITOUS USE]

Campus Data Networking,Video-conferencing,

Data Archive (UNM, Branch Campuses, NM Institutions of

Higher Learning, etc.)

Help-Desk Support, Databases, Collections, Digital Archive, Collaboration Technologies, etc.

Visualization & Analytical Tools

HPC & Nat’l NetworksLarge cycles and High Bandwidth

SPECIALIZED:Fundamental research

ADVANCED:Undergraduate and Graduate

Programs (in Computing,Library, and Cognitive

Sciences)

BASIC:University-wide informatics courses

(e.g., creation of an Information Sciences Program

(ISP Certificate)

Research Services

CyberinfrastructureAcademics

Building a Computational- and Information-Literate

Work Force(i.e., evolving a School of Computing,

Information & Library Science)

Libraries, HPC Centers, etc.--Preserving, Protecting,

Processing, and Disseminating Data,

Information, and Knowledge

Serving UNM, Academia, the State,

and Economic Development in NM

University of New Mexico Academic CI Planning for the 21st Century

Page 37: 1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.

Cyberinfrastructure: Informatics Across the Biological SciencesThe Long Term Ecological Research

Network Office 37

Thanks!

• Suzie Allard – University of Tennessee• Matt Jones – University of California Santa Barbara• Mike Frame – USGS, National Biological Information Infrastructure• Patricia Cruse – California Digital Library• Bob Cook – Oak Ridge National Laboratory DAAC• Steve Kelling – Cornell Lab of Ornithology• DataNetONE Partners & Kepler-CORE Team