Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach...

36
Data Intensive Challenges in Biodiversity Conservation Steve Kelling

description

TERN Symposium 2011

Transcript of Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach...

Page 1: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

Data Intensive Challenges in Biodiversity Conservation

Steve Kelling

Page 2: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

Environmental Science Challenges

• Climate Change• Biodiversity Loss• Invasive Species• Water Depletion• Disease Spread• Green Energy• Habitat Loss• ---

Page 3: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

Habitat Loss

From: University of California Press Blog Earth Day 2010

Habitat loss is the major issue for Biodiversity Conservation.

Page 4: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

The increasing availability of massive volumes of scientific data requires new synthetic analysis techniques to explore and identify interesting patterns that were otherwise not apparent. For biodiversity studies a “data driven” approach is necessary due to the complexity of ecological systems, particularly when viewed at large spatial and temporal scales.

Page 5: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

Observation NetworksDescription of eBird: http://www.ebird.org

Species Distribution ModelsDescription of the Avian Knowledge Network: http://avianknowledge.net

Data Intensive ScienceDescription of the outcomes of the DataONE Exploration, Visualization, and Analysis Working Group

Presentation Goals:

Page 6: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

eBird is a global online program that gathers bird observations from citizen scientists, predominately across the Western Hemisphere. eBird gathers checklists of birds with associated effort information from well-defined locations, passing each record through a two-tiered verification system.

ebird is a joint project between the Cornell Lab of Ornithology and National Audubon Society, and has more than 2 dozen regional partners.

Sullivan, B.L., C.L. Wood, m.J. Iliff, R.E. Bonney, D. Fink, and S. Kelling. 2009. eBird: A citizen-based bird observation network in the biological sciences. Biological Conservation 142: 2282-2292.

Page 7: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

eBird uses Crowdsourcing techniques to gather observations of birds.

Crowdsourcing is the act of outsourcing tasks, traditionally performed by an employee or contractor, to an undefined, large group of peopleor community (a "crowd"), through an open call.

Jeff Howe, one of first authors to employ the term, established that the concept of crowdsourcing depends essentially on the fact thatbecause it is an open call to an undefined group of people, it gathers those who are most fit to perform tasks, solve complex problems andcontribute with the most relevant and fresh ideas.For example, the public may be invited to develop a new technology, carry out a design task, refine or carry out the steps of an algorithm,or help capture, systematize or analyze large amounts of data (CITIZEN SCIENCE).(From Wikipedia)

Page 8: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

eBird Checklists

Volunteers submit checklists of bird observations from specific locations using protocols that collect information on data, time, and distance traveled.

Page 9: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

Flagged Records

• 4% submitted records were flagged for review• 60% of those records were reviewed and validated

eBird contains a two-stag verification system:(1) Instantaneous automated evaluation of submissions based on species count

limits for a given data and location;(2) A growing network of more than 500 regional editors composed of local

experts who vet records flagged by the automated filters.

Page 10: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

Understanding our Audience

eBird is building a web enabled community of bird watchers who collect, manage,‐and store their observations in a globally accessible unified database. Through itsdevelopment as a tool that addresses the needs of the birding community,eBird sustains and grows participation.

Give Birders What They Want!

Page 11: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

eBird contains an array of data visualization and analysis toolsthat provide birders, land managers, and scientists with summaryinformation about bird distribution.

Page 12: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

Sooty Shearwater

eBird data can be used to examine the timing of migration across large geographic areas.

Because each eBird observation is recorded at a specific location,eBird can generate maps depicting species distribution at multiplespatio temporal scales.‐

Page 13: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

Bird Occurrence Patterns in Upstate New York

eBird provides ‘‘bar charts” (i.e., frequency histograms) based on frequency of detection for individual species.

These visualizations provide users with occurrence information at specific locations at 1 week increments and ‐indicate the likelihood of detecting a species based on its frequency in that area (darker and wider bars indicate increased frequency).

Page 14: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

Growth in eBird Observations and Checklists

Observations

Checklists

2003 2004 2005 2006 2007 2008 2009 20100

400,000

800,000

1,200,000

1,600,000

2,000,000

2,400,000

0

40000

80000

120000

160000

200000

240000

eBird 2.0 launch

2011

Page 15: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

Statistics 2010

More than…

18,214, 480 observations submitted d

1,300,029 hours collecting bird observations.

1,293,480 checklists entered

22,136 contributors

351,000 unique visitors to eBird

20 million page views

Page 16: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling
Page 17: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

Introducing

BirdsEye—an

eBird powered

iPhone app

Page 18: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

Estimating Species Distributions

Determining the patterns of species occurrence through time, space, and understanding their links with features of the environment are central themes in ecology. Identifying the factors that influence species distributions is a complex task, requiring the examination of multiple facets of a species’ natural history and their relationships with the complex and variable environments which they live.

Fink, D., W. M. Hochachka, D. Winkler, B. Shaby, G. Hooker, B. Zuckerberg, M. A. Munson, D. Sheldon, M. Riedewald, and S. Kelling. 2010. Spatiotemporal Exploratory models for Large scale Survey Data. Ecological Applications ‐ 20:2131 2147.‐

Page 19: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

Observational Data Model

The most crucial aspect of predicting species occurrence is to learn a model—called the observation model—from observed measurements and make probabilistic inferences over regions or variables where measurements were not made. This approach joins organism observations with a multitude of "drivers", covariates that could potentially influence the occurrence of the organism. While a single (or a few sources) of noisy observations may not be sufficient to accurately model distributions, combining many measurements (e.g., species occurrence, weather, organism occurrence, landscape mosaic, human population data etc.), greatly improves the accuracy of the models.

Page 20: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

Munson, M. A., K. Webb, D. Sheldon, D. Fink, W. M. Hochachka, M. J. Iliff, M. Riedewald,D. Sorokina, B. L. Sullivan, C. L. Wood, and S. Kelling. 2009.The eBird Reference Dataset(http://www.avianknowledge.net/content/features/archive/eBird_Ref).

Page 21: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

The Multi-scale Modeling Challenge

Goal: Analysis at broad-scale with fine resolutionChallenge: spatiotemporal patterning at multiple

scales• Local-scale

– Fine-scale spatial and temporal resource patterns

• Large-scale– Regional & seasonal variation in species’ habitat utilization

Page 22: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

Wood Thrush

Page 23: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

SpatioTemporal Exploratory Model (STEM) Current nonparametric SDM’s are very good for

local-scale modeling by relating environmental predictors (X) to observed occurrences (y)

Multi-scale strategy: differentiate between local and global-scale ST structure.

1. Make explicit time (t) and location (s) 2. “Regionalize” by restricting support3. Predictions at time (t) and location (s) are

made by averaging across a set of local models containing that time and location

1

n(s,t)f i(X,s,t)I(s,t i)

i1

m

y f (X)

f (X,s, t)I(s, t )

Restricted Support Set ( )q

Number of models supporting (s,t)

ith ST explicit base model

Page 24: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

“Slice and dice” ST extent into stixels• With sufficient overlap• Adapt to different dynamics

Temporal Design: • 40 day intevals• 80 evenly spaced windows throughout

year

Spatial Design• For each time interval• Random Sample rectangles

(12 deg lon x 9 deg lat) • Minimum 25 unique locations.

The ST Ensemble

Page 25: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

Western Meadowlark

Page 26: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

SpatioTemporal Variation of Local-scale Predictor Effects Non-stationarity of species-habitat associations

Exploratory Inference:

Although many ecological processes are known or expected to vary in space and time, the vast majority of SDM is done for a single region and/or season. So, our goal is to develop techniques to explore patterns of variation in ST and time to provide ecologists and land managers with more accurate information about how species habitat associations (requirements) change.‐

Page 27: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

Chimney Swift

Indigo Bunting

Page 28: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

Taking a data intensive science approach requires a data management and research environment that supports the entire data life cycle; from acquisition, storage, management, and integration, to data exploration, analysis, visualization and other computing and information processing services.

Kelling, S., W. M. Hochachka, D. Fink, M. Riedewald, R. Caruana, G. Ballard, and G. Hooker. 2009. Data intensive Science: A New Paradigm for Biodiversity Studies. BioScience ‐ 59:613‐620.

Page 29: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

• Data Discovery, Access, and Synthesis• Model Development• Managing Computational Requirements• Exploring and Visualizing Model Results• Examples

Steve Kelling (co-chair), Cornell Lab of Ornithology Bob Cook (co-chair), Oak Ridge National Lab John Cobb, Oak Ridge National LabTheo Damoulis, Cornell UniversityTom Dietterich, Oregon State Juliana Freire, University of UtahDaniel Fink, Cornell Lab of Ornithology Damian Gesler, iPlant

Scientific Exploration, Visualization, and Analysis Working Group

Bill Michener, University of New Mexico Jeff Morisette, USGS Patrick O’Leary U of IdahoAlyssa Rosemartin NPNSuresh SanthanaVannan, Oak Ridge National Lab Claudio Silva, University of Utah Kevin Webb, Cornell Lab of Ornithology

Kelling, S., R. Cook, T. Damoulas, D. Fink, J. Freire, W. M. Hochachka, W. K. Michener, K. Rosenberg, and C. Silva, 2011 IN PRESS.Estimating species distributions, across space through time and with features of the environment.

Page 30: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

Observational Data Sources

Photo courtesy of www.carboafrica.net

Sensors, sensor networks, and remote sensing gather observations

Page 31: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

Data Interoperability

Our major data interoperability challenge rectifying object based models (i.e. vector entities such as‐locations where birds are observed), with field based models (i.e. raster imagery comprised of attribute‐values in gridded in space) of storing geographic information. To make data interoperable we had to applythat conflate point location based observations (e.g. bird observations) to match raster attribute data‐at the resolution of the raster data. For each observation location, we determine the cell in the rastergrid into which the observation's location falls. We use the value of that cell's attribute as the attributevalue for each observation.

Page 32: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

Spatio-Temporal Exploratory Models predict the probability of occurrence of bird species across the United States at a 35 km x 35 km grid.

Patterns in Bird Species Occurrence Explored through Data Intensive Analysis and Visualization

Bird observations and environmental data from > 100,000 locations in US integrated and analyzed using High Performance Computing Resources

Land Cover

Potential Uses-• Examine patterns of migration • Infer impacts ofclimate change• Measure patterns of habitat

useage• Measure population trends

Model resultseBird

Meteorology

MODIS – Remote sensing data

Occurrence of Indigo Bunting (2008)

Jan Sep DecJunApr

Page 33: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

Observations from Bird Watchers (citizen scientists)—huge number of birders collecting 16 million observations each yearCombine with environmental factors like land cover, landscape fragmentation, topography, human population, weather, and remote sensing data (green ness of terrestrial vegetation).‐Integrating the data into one database is challenge.This huge amount of data can only be analyzed on Supercomputers, using the NSF TeraGrid High Performance ComputingModels used in the creation of the 2011 United States of America State of the Birds Report entitled Birds in Public Lands and Waters.

Page 34: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

Gaining insight into the complexities and processes of natural systems is no longer

an exclusive realm of theory and experiment; computation and access to large

quantities of data is now an equal and indispensible partner for advances in scientific knowledge,

land management, and informed decision making.

Biodiversity Research and Conservation in a Digital World

Page 35: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

Funding and Acknowledgements

• National Science Foundation• Leon Levy Foundation• Wolf Creek Foundation

The volunteers who contributed millions of hoursgathering bird observations.

Page 36: Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a multi-scale approach to estimating species distributions - Steve Kelling

eBird and the Avian Knowledge Network

Art Munson - CU

Daniel Fink - CU

Wesley Hochachka - CU

Denis Lepage - BSC

Rich Caruana - MS

Mirek Riedewald - NEU

Daria Sorokina - CMU

Kevin Webb - CU

Giles Hooker - CU

Brian Sullivan - CU

Chris Wood - CU

Marshall Iliff - CU

Computational Sustainability

Carla Gomes - CU

Tom Dietterich - OSU

Daniel Sheldon - OCU

Ken Rosenberg - CU

Rebecca Hutchinson - OSU

Weng-Keen Wong - OSU

Megan MacDonald - CU

Stefan Hames - CU

Theo Damoulas - CU

Bistra Dilkina - CU

DataONE

Bill Michener - UNM

Bob Cook - ORNL

Jeff Morrisette - USGS

Juliana Freire - UUT

Claudio Silva - UUT

Matt Jones - UCSB

Suresh SanthanaVannan - ORNL

Acknowledgements