Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The...

22
Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The Santa Fe Institute, Santa Fe, N.M. BIEN, MBG, iPToL, Developing a Taxonomic Name Resolution Service Overview of Science and Initial Meeting Goals

Transcript of Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The...

Page 1: Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The Santa Fe Institute, Santa Fe, N.M. Brian J. Enquist.

Brian J. EnquistDept. Ecology and Evolutionary Biology

University of Arizona, Tucson, A.Z.and

The Santa Fe Institute, Santa Fe, N.M.

BIEN, MBG, iPToL,

Developing a Taxonomic Name Resolution Service:Overview of Science and Initial Meeting Goals

Page 2: Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The Santa Fe Institute, Santa Fe, N.M. Brian J. Enquist.

BIEN – Botanical Information and Ecology Network

Specimens

Ecological observationsspecies/individual observations, Trait Data

+ =

Page 3: Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The Santa Fe Institute, Santa Fe, N.M. Brian J. Enquist.

Map by N. Swenson

Green et al. Science (2008)

Merging plant specimens from various herbaria,

plots, and ecological data (functional traits)

Swenson and Enquist (2007) American Journal of Botany

Example of potential science to be done . . .

Page 4: Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The Santa Fe Institute, Santa Fe, N.M. Brian J. Enquist.

BIEN is a synthetic NCEAS working groupSystematists, ecologists, informaticians and computer scientists

We are presenting a new model forward for asking questions in botanical science

BIEN NCEAS Working Group Dec. 2008 --> 2009, 2010

Page 5: Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The Santa Fe Institute, Santa Fe, N.M. Brian J. Enquist.

www.salvias.net

Plots with plant community abundance, diversity, size information Building on work by Alwyn Gentry . . .

(1) Plot Data Literally thousands of ecological vegetation plots by numerous researchers

Page 6: Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The Santa Fe Institute, Santa Fe, N.M. Brian J. Enquist.

Integrating Plot Networks

Page 7: Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The Santa Fe Institute, Santa Fe, N.M. Brian J. Enquist.

(2) Specimen Data….

Page 8: Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The Santa Fe Institute, Santa Fe, N.M. Brian J. Enquist.

Exchange schema(s)

Confederated resource

DATA SCRUBBINGCORRECTING

Plot and Trait Data

Specimen Data

Science

Data StandardizationTools

TAXONOMIC INTELLIGENCE

BIEN Initial Roadmap

Page 9: Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The Santa Fe Institute, Santa Fe, N.M. Brian J. Enquist.

(1) Specific short-term science questions at the nexus of merging herbarium, plot (abundance), and observation (trait) data for plants in the Americas.

(2) Technology development goals associated with answering these questions effectively – as well as to establish an informatics methodology for continuing to assemble and integrate relevant observation data for this and other projects.

BIEN Initial Project Goals

(3) Longer-term program development – seek support to develop a permanent technical solution to the integration of vegetation/trait/botanical data

Page 10: Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The Santa Fe Institute, Santa Fe, N.M. Brian J. Enquist.

Q 3: What are the physiological, demographic, environmental, and phylogenetic correlates of rarity and commonness across environmental gradients? Can these correlates be used to predict vulnerability or resistance to extinction for species and communities under differing scenarios of habitat loss and climate change?

Q 2: How are abundance and range size related? For example, do trees with small ranges tend to be rare, relative to widespread species?

Q 1: How does climate influence the relative distribution of narrow and widespread species? Do these relationships vary in tropical and temperate environments?

BIEN Initial Science Goals

Page 11: Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The Santa Fe Institute, Santa Fe, N.M. Brian J. Enquist.

Use collections and observations from plots

to then map/model distributions

University of Arizona Diversity Mapper

Page 12: Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The Santa Fe Institute, Santa Fe, N.M. Brian J. Enquist.

BIEN2.0 Database (March 2010)

• Core specimen datasets for the Americas (GBIF, MBG)

• Core plots datasets (FIA, CTFS, VEGBANK, SALVIAS)

BIEN database is housed at NCEAS and contains

• Core species trait database (Literature and collaborators)

7.5 million specimens

10.7 plot observations

~ 18 million individual taxon strings

Traits: 25, Species: 36,241, Records: 122,230

~ 250,000 unique taxa (!) indicating synonyms comprise large fraction of names

Use these data to generate measures of geographic ranges, measures of abundance and trait composition

Page 13: Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The Santa Fe Institute, Santa Fe, N.M. Brian J. Enquist.

- Spring 2009, expanded BIEN group and submitted a proposal to become a ‘Grand Challenge’ team at iPlant

Long story . . . . Central Grand Challenge team is the ‘Tree of Life’ team (iPToL)

-2010 - Formalized a two year partnership with iPlant to work closely with iPToL

-To fund collaborations with MBG

-To develop cyberinfrastructure to support development of a TNRS and synonymized New World Plant Names Database (NWPND)

- To ‘do science’ by merging ecology and traits on iPToL phylogenies.

Longer-term program development

Page 14: Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The Santa Fe Institute, Santa Fe, N.M. Brian J. Enquist.

Why Develop a TNRS?

The ‘names problem’ is one of the most fundamental impediments to biological science

The plant sciences do not yet have the cyberinfrastructure needed for taxonomic

standardization

Our inability to integrate differing heterogeneous data sources across differing temporal and spatial scales has significantly limited our ability to answer key scientific

questions (applied and academic).

Page 15: Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The Santa Fe Institute, Santa Fe, N.M. Brian J. Enquist.

Taxonomic NameResolution Service

iPlant Tree of Life (iPToL)

APWeb G

roup

Generic Synonymy

APWeb

Tropicos Core: N

ames, Specim

ens, References, People

Projects / Concepts

Alt. Classifications

Literature links

VoucheredImages

The PlantList

iPlant

NCEAS

BIEN

MOBOT Informatics

Relationship Between Institutional and Efforts

iPlant cyberinfrastructure development team

Page 16: Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The Santa Fe Institute, Santa Fe, N.M. Brian J. Enquist.

- Develop a botanical Taxonomic Name Resolution Service (TNRS) to authenticate botanical names against a standard list of published taxa

-Develop an authoritative, synonymized NWPND that will allow New World plant occurrence data to be mapped to a standard set of taxon concepts.

Primary goals of this BIEN/MBG/iPToL collaboration will be to work together with software developers and technology staff at iPlant to:

Specific goals for this meeting

Page 17: Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The Santa Fe Institute, Santa Fe, N.M. Brian J. Enquist.

Technology to be developed with iPlant

- The TNRS & NWPND will be web services, with batch-processing capability and an intuitive user interface

- The TNRS & NWPND services will be available to any user interested in correcting and harmonizing names of plant taxa.

Developed with the assistance of programmers and technical support from iPlant.

The development of the TNRS & NWPND necessitates short- and long-term goals. It is up to us to define these

goals in order to develop this technology.

Page 18: Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The Santa Fe Institute, Santa Fe, N.M. Brian J. Enquist.

(i) Overview of science - Science goals will help define use cases, needs, and structure of the TNRS.

(ii) Agree upon outcomes - Ensure MBG, BIEN, iPTOL and collaborators are clear on respective needs and desired outcomes.

(iii) Articulate ‘use cases’ - will define what TNRS should and should not do as well as clarify programming and technology needs.

**(iv) Articulate short and long-term goals for the TNRS **

(v) Detail longer term vision and goals - for next TNRS meetings and technology development goals between now and future meetings.

General Goals for This Meeting

- Rank in order of importance- Rank in order of ease of implementation

Page 19: Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The Santa Fe Institute, Santa Fe, N.M. Brian J. Enquist.

To start we need to clearly articulate short- and long-term goals.

Initially, toward a TNRS, we propose two lines of development:

(1) (2)

These lines of development can start separately but ultimately will merge

Synonymy Name Matching

Page 20: Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The Santa Fe Institute, Santa Fe, N.M. Brian J. Enquist.

Short-term goals (implement within first year)

(1) Name matching

Compile complete list of names within TROPICOS and ideally IPNI. Flag taxa occurring within the Americas,

Define and articulate the basic functions of proposed web service, including but not limited to:

- To match a user-submitted list of names against authoritative list.

- Build an interface for user interpretation and adjudication of ambiguous cases based on match rankings

- To catch and correct common spelling mistakes, atomize data elements, extract additional concatenated information, etc.

Data

Applications

Page 21: Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The Santa Fe Institute, Santa Fe, N.M. Brian J. Enquist.

(2) Synonymy

- Identify other digitized sources of monographic and regional synonymy

The expectation is that these will provide a common framework for combining various authoritative lists into a first draft of a New World plant taxon checklist

- Define functions of application for checking submitted names against list of synonymized names.

Applications

- Interface for user interpretation and adjudication of ambiguous cases

- Ranking of degrees of ambiguity

- Digitized synonymized checklists within MBG.

Data:

Short-term goals (implement within first year)

Page 22: Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The Santa Fe Institute, Santa Fe, N.M. Brian J. Enquist.

In this meeting we will not create a requirements document or detailed work flows

The charge for the meeting is to finalize short-term goals for TNRS and immediate ‘use cases’ to develop

the bullet points of these documents

-These documents will be created on project wiki after the meeting with the feedback from the TNRS collaborators

- Documents will be used to then start developing the prototype of the TNRS this year for implementation from technology staff and programmers at iPlant.