2005.V Smith.Cybertaxonomy

38
What is Cybertaxonomy ? some case studies from a louse systematist Illinois Natural History Survey University of Illinois Campus Champaign, Illinois USA Vincent S. Smith

description

V. S. Smith. What is Cybertaxonomy? some case studies from a louse systematist

Transcript of 2005.V Smith.Cybertaxonomy

Page 1: 2005.V Smith.Cybertaxonomy

What is Cybertaxonomy ?some case studies from a louse systematist

Illinois Natural History SurveyUniversity of Illinois Campus

Champaign, IllinoisUSA

Vincent S. Smith

Page 2: 2005.V Smith.Cybertaxonomy

“The bane of my existence is doing things that Iknow the computer could do for me”

-- Dan Connolly, The XML Revolution(Nature, 1998)

Page 3: 2005.V Smith.Cybertaxonomy

Proportion of alldescribed species… Paraneoptera

Insect OrdinalPhylogeny…

Sensu W.C. Wheeleret al, Cladistics 2001

Insects

ArachnidsCrustaceans 2.4%

Other Arthropods 1.2%Molluscs

Nematodes 0.9%Other Invertebrates

Vertebrates

Plants

FungiAlgae

Protozoans

56.3%

4.5%

4.2%

4.0%2.7%

14.3%

4.2%2.4%2.4%

Bacteria andViruses 0.5%

Page 4: 2005.V Smith.Cybertaxonomy

ParaneopteraProportion of all

described species…

Phthiraptera(Parasitic lice)

Insect OrdinalPhylogeny…

Sensu W.C. Wheeleret al, Cladistics 2001

Insects

ArachnidsCrustaceans 2.4%

Other Arthropods 1.2%Molluscs

Nematodes 0.9%Other Invertebrates

Vertebrates

Plants

FungiAlgae

Protozoans

56.3%

4.5%

4.2%

4.0%2.7%

14.3%

4.2%2.4%2.4%

Bacteria andViruses 0.5%

Page 5: 2005.V Smith.Cybertaxonomy

Proportion ofdescribed insects (by order)…

Proportion of alldescribed species…

Insects

ArachnidsCrustaceans 2.4%

Other Arthropods 1.2%Molluscs

Nematodes 0.9%Other Invertebrates

Vertebrates

Plants

FungiAlgae

Protozoans

56.3%

4.5%

4.2%

4.0%2.7%

14.3%

4.2%2.4%2.4%

Bacteria andViruses 0.5%

Data from McGavin, 2001& uBio (www.ubio.org)

Phthiraptera(Parasitic lice)

Page 6: 2005.V Smith.Cybertaxonomy

Valid LouseSpecies

(4,927)

(2,3

14)

Syno

nym

s

WholeFrozen

Specimens

(20,000 con. est.)PDF Files

(1,946)

Bibliography

(7,110 papers)

“My” Data Set…

…to Scale

Louse / Host Associations(10,574)

Parasitized Mammals

(1,214 spp.)

Parasitized Birds

(3,508 spp.)

MorphologicalCharacters

(548)Phylogenies

(27)

Museum Specimens

Images(4,927)

DNASequences

(2,992)

Page 7: 2005.V Smith.Cybertaxonomy

Data storage device…

Palma, R.L., andR.L.C. Pilgrim.2002. A revisionof the genusNaubates(Insecta:Phthiraptera:Philopteridae).J. R. Soc. N.Z.32:7-60.

data in 4 of 54 pages,in 1 of 7,110 taxonomic

142 pieces of “raw”

papers on lice

- Taxonomic names- Authorities (name concepts)- Citations- Collection data- Morphological characters- Textual descriptions- Diagnostic keys

- Illustrations- Photographs

Page 8: 2005.V Smith.Cybertaxonomy

Immediate Collaborators…

• Alpha taxonomists

• Ecologists• Phylogeneticists

• Vector biologists

• “Amateurs”

• Professionals• Retired professionals

Contributors to“my”dataset

…necessitates tools for collaboration

Page 9: 2005.V Smith.Cybertaxonomy

http://darwin.zoology.gla.ac.uk/~rpage/LouseBase/2/

LouseBASE

Specimens Images

(SID)

http://darwin.zoology.gla.ac.uk/~SID/

Literature

PHPBibhttp://myphpbib.sourceforge.net/

Lab Notebook

http://www2.flmnh.ufl.edu/pdb/

Host-Parasite Checklists

http://www2.flmnh.ufl.edu/adb/

Web Databases…

Glasgow version at:

Page 10: 2005.V Smith.Cybertaxonomy

http://darwin.zoology.gla.ac.uk/~rpage/LouseBase/2/

LouseBASE

Specimens Images

(SID)

http://darwin.zoology.gla.ac.uk/~SID/

Literature

PHPBibhttp://myphpbib.sourceforge.net/

Web Databases…

Glasgow version at:

Developed by Rod PageOn-line Reference Manager

Import via PubMed ID’se.g. 15472046

Smith, V.S. 2004. Lousy Lists.Systematic Biology. 53: 666-668.

• Import / export Endnote &Reference Manager format

• Browse, search & edit Nice example at:http://darwin.zoology.gla.ac.uk/~kdavis/bib/

• Multi-user

Page 11: 2005.V Smith.Cybertaxonomy

Images

(SID)

http://darwin.zoology.gla.ac.uk/~SID/

Literature

PHPBibhttp://myphpbib.sourceforge.net/

Lab Notebook

http://www2.flmnh.ufl.edu/pdb/

Host-Parasite Checklists

http://www2.flmnh.ufl.edu/adb/

Web Databases…

Developed in David Reed’s lab.On-line Molecular Lab. Notebook

• DNA extraction, PCR & microsatellite workflows

• Stores electropherograms

• Multi-user

• Includes DNA primer database

Page 12: 2005.V Smith.Cybertaxonomy

Literature

PHPBibhttp://myphpbib.sourceforge.net/

Lab Notebook

http://www2.flmnh.ufl.edu/pdb/

Host-Parasite Checklists

http://www2.flmnh.ufl.edu/adb/

Web Databases…

Developed in David Reed’s lab.On-line Molecular Lab. Notebook

• DNA extraction, PCR & microsatellite workflows

• Stores electropherograms

• Multi-user

• Includes DNA primer database

Developed in Dale Clayton’s lab.On-line Host-Louse Checklist

host-louse associations• Simple search & browse

by host or louse classification

Page 13: 2005.V Smith.Cybertaxonomy

http://darwin.zoology.gla.ac.uk/~rpage/LouseBase/2/

LouseBASE

Specimens Images

(SID)

http://darwin.zoology.gla.ac.uk/~SID/

Literature

PHPBibhttp://myphpbib.sourceforge.net/

Lab Notebook

http://www2.flmnh.ufl.edu/pdb/

Host-Parasite Checklists

http://www2.flmnh.ufl.edu/adb/

Web Databases…

Glasgow version at:

Page 14: 2005.V Smith.Cybertaxonomy

LouseBASE - specimen databasehttp://darwin.zoology.gla.ac.uk/~rpage/LouseBase/2/

• Easy addition of new collections

• Extensive browse & search options

• Manages DNA extracts & sequences

For curation of specimens and related data

• Export formatted datasets- FASTA (DNA sequences)

- Se-aL (sequence alignment)

- TreeMap (cophylogenetic analyses)

- Sequin (Genbank sequence submission)

R. Page & V. Smith, in use since June 1999

Page 15: 2005.V Smith.Cybertaxonomy

SID - Specimen Image Database

• Bulk image upload & annotation

• Extensive browse & search options

• Copyright options to control image use

• Web image labeling

• Designation of “exemplar” images

• RSS & web service access

http://darwin.zoology.gla.ac.uk/~sid/

For sharing & disseminating images

S. Rycroft & V. Smith, in use since June 2004

Page 16: 2005.V Smith.Cybertaxonomy

Images LiteratureLab Notebooks

Specimens Checklists

Tools for collaboration…

Page 17: 2005.V Smith.Cybertaxonomy

Images LiteratureLab Notebooks

Specimens Checklists

Tools for collaboration…

• Synthesize and integrate data

• Share data• Minimize errors• Reduce redundancy

… document patterns

ASK NEWQUESTIONS OF

THE DATA

… understand process

2) Integrative studies across databases 1) Database meta-analysis

Page 18: 2005.V Smith.Cybertaxonomy

Coextinction…The loss of a species with the loss of its affiliate

California condor and its now extinctlouse (Colpocephalum californici)

Louse ChecklistAssociation Database

Price, R. D et al. 2003.The Chewing Lice: WorldChecklist. INHS.

Durden & Musser. 1994.The sucking lice of theworld. AMNH.

Meta-analysis

Database

Proportion of extinct hosts (E)1.00.0

Pro

porti

on o

f affi

liate

s ex

tinct

(A) 1.0

Ficus wasps - FicusAnt butterflies - antsPrimate fungi - primatesPrimate nematodes - primatesPrimate lice - primatesSeabird lice - seabirdsBird mites - birdsButterflies - host plants

Koh, Dunn, Sodhi,Colwell, Proctor, &Smith. 2004,Science.

Coextinction Prediction Curves

Page 19: 2005.V Smith.Cybertaxonomy

Coextinction…The loss of a species with the loss of its affiliate

California condor and its now extinctlouse (Colpocephalum californici)

Beetles

Monogeneans

Bird lice

Bird mites

Butterflies

Mammal lice

Plants

Fish

Birds

Birds

Plants

Mammals

Primate nematodes

Primate lice

Primate Pneumocystis fungi

Ficus waspsLycaenid butterflies

Metazoan parasites

Primates

Primates

Primates

FicusAnts

Canadian fish

1234567 2 4 6 8 100Endangered Hosts

(Thousands)Projected Extinctions

(Hundreds)

(4672, 0.4%)

Pro

ject

ed E

xtin

ctio

ns

>6,300 co-endangeredaffiliates from9,491 hosts

IUC

N R

ed li

st

Beetles

Monogeneans

Bird lice

Bird mites

Mammal lice

Butterflies

Plants

Fish

Birds

Birds

Mammals

Plants

20406080100120140 0

Extinct Hosts Extinct Affiliates(Estimated)

10 20 30 40 50 60 70 80

Est

imat

ed E

xtin

ctio

ns

>200 co-extinctaffiliates from399 extinct hosts

IUC

N R

ed li

st Hosts Affiliates

Proportion of extinct hosts (E)1.00.0

Pro

porti

on o

f affi

liate

s ex

tinct

(A) 1.0

Ficus wasps - FicusAnt butterflies - antsPrimate fungi - primatesPrimate nematodes - primatesPrimate lice - primatesSeabird lice - seabirdsBird mites - birdsButterflies - host plants

Koh, Dunn, Sodhi,Colwell, Proctor, &Smith. 2004,Science.

Coextinction Prediction Curves

Page 20: 2005.V Smith.Cybertaxonomy

Integrative studies…Human-Louse Coevolution

Reed, Smith et al,PLoS-Biology 2004.

Body LiceHead Lice

Fun tion 1c

2

4

6

Function2

-

-

-

Females

151050-5-10

6

4

2

0

32

1

Group Centroids

Ungrouped Cases

3 Pubic Lice

2 Head lice

1 Body Lice

Head Lice

Body Lice

Pubic Lice

Page 21: 2005.V Smith.Cybertaxonomy

Integrative studies…Human-Louse Coevolution Lice Predate the K-T boundary

010

020

0

KT

Modern Birds

Wappler, Smith & Dalgleish,Proc.R.Soc., 2004.

Smith et al, in prep.

Reed, Smith et al,PLoS-Biology 2004.

Body LiceHead Lice

Head Lice

Body Lice

Pubic Lice

Page 22: 2005.V Smith.Cybertaxonomy

Images LiteratureLab Notebooks

Specimens Checklists

Questions facilitated by the databases…

…but

Page 23: 2005.V Smith.Cybertaxonomy

-Taxon names, -Globally unique identifiers, -Authority taxon concepts, -Dates of modification, -Institution codes, -Collection codes, -Catalogue numbers, - Information withheld, -Geography, -Localities, - Collection methods, -Preservation methods, -Geological periods, -Date collected, -Collector names, -Sex, -Life stage, -Publication titles, -Publication journals, -Morphologicalcharacters, -Morphological character states, -Anatomical names, -Anatomical definitions, -Imagetypes, -Copyright licences, -Access permissions, -Video clips, -Textual descriptions, -Soundrecordings, -DNA sequences, - Gene names, -DNA sequence alignments, -Phylogenetic analyticalmethods, -Phylogenetic trees, - Phylogenetic datasets, -Morphometric data, - Gene orders, -Character state transitions, etc, etc, etc…

Shared “raw” data lacks integration…

Images LiteratureLab Notebooks

Specimens Checklists

… “raw” data types

BioCorder - Biological Recorderhttp://www.biocorder.org/

Page 24: 2005.V Smith.Cybertaxonomy

BioCorder - Biological Recorderhttp://www.biocorder.org/

An web based framework for systematic research…

• Distributed• Modular • Cross platform

• Generic (any taxa)• Free• Intuitive

Awarded $500k byNSF, March ‘05

A collaboration between:

Rod PageDavid ReedVince Smith

Mark Hafner

Page 25: 2005.V Smith.Cybertaxonomy

www.AnyTaxonSite.org

“AnyTaxon”

BioCorder Installation

BioCorder & Taxonomy

I.P.N.I

“Name Banks”

Dictionaries oftaxonomic names

TaxonomicSearch Engines

Thesaurus ofclassifications

Page 26: 2005.V Smith.Cybertaxonomy

www.AnyTaxonSite.org

“AnyTaxon”

BioCorder & Taxonomy

I.P.N.I

“Name Banks”

Dictionaries oftaxonomic names

urn:lsid:sid.zoology.gla.ac.uk:id:6

TaxonomicSearch Engines

Thesaurus ofclassifications

Taxon name

BioCorder Installation

Page 27: 2005.V Smith.Cybertaxonomy

BioCorder & TaxonomyTaxonomic

Search Engines

Thesaurus ofclassifications

Taxonomy Tools

http://names.mbl.edu/tools/http://darwin.zoology.gla.ac.uk/~rpage/portal

/

- Check spelling- Name canonizer- Checklist parser

Map relationshipsbetween taxonomic

concepts

OntologiesVisualizations

Visualize & exploreclassifications

http://darwin.zoology.gla.ac.uk/~rpage/portal/Glasgow TSE:

uBio TSE:http://uio.mbl.edu/SOAPbrowser/

Page 28: 2005.V Smith.Cybertaxonomy

Smith, 2001

BioCorder & Morphology(A special case of taxonomy)

Text Descriptions Keys Images

Anatomical Terms

(like taxonomic names)

DefinitionsSynonyms

Relationships

Page 29: 2005.V Smith.Cybertaxonomy

Smith, 2001

BioCorder & Morphology(A special case of taxonomy)

Text Descriptions Keys Images

Anatomical Terms

(like taxonomic names)

DefinitionsSynonyms

Relationships

Page 30: 2005.V Smith.Cybertaxonomy

Smith, 2001

BioCorder & Morphology(A special case of taxonomy)

Text Descriptions Keys Images

Anatomical Terms

(like taxonomic names)

DefinitionsSynonyms

Relationships

I.P.N.I

“Name Banks”

Dictionaries oftaxonomic names

Page 31: 2005.V Smith.Cybertaxonomy

Smith, 2001

BioCorder & Morphology(A special case of taxonomy)

Text Descriptions Keys Images

Anatomical Terms

(like taxonomic names)

DefinitionsSynonyms

Relationships

“Anatomy Banks”

Controlled vocabulariesof anatomical terms

Main Entry: tho·raxPronunciation: 'thOr-"aksFunction: nounInflected Form(s): pluraltho·rax·es or tho·ra·cesEtymology: from Latin thorac1 : the part of the mammalianbody between the neck andthe abdomen2 : the middle of the threechief divisions

Page 32: 2005.V Smith.Cybertaxonomy

Smith, 2001

BioCorder & Morphology(A special case of taxonomy)

Text Descriptions Keys Images

Anatomical Terms

(like taxonomic names)

DefinitionsSynonyms

Relationships

TaxonomicSearch Engines

Thesaurus ofclassifications

“Anatomy Banks”

Controlled vocabulariesof anatomical terms

Main Entry: tho·raxPronunciation: 'thOr-"aksFunction: nounInflected Form(s): pluraltho·rax·es or tho·ra·cesEtymology: from Latin thorac1 : the part of the mammalianbody between the neck andthe abdomen2 : the middle of the threechief divisions

Page 33: 2005.V Smith.Cybertaxonomy

Smith, 2001

BioCorder & Morphology(A special case of taxonomy)

Text Descriptions Keys Images

Anatomical Terms

(like taxonomic names)

DefinitionsSynonyms

Relationships

AnatomicalSearch Engines

Seeks terms &defines relationships

“Anatomy Banks”

Controlled vocabulariesof anatomical terms

Main Entry: tho·raxPronunciation: 'thOr-"aksFunction: nounInflected Form(s): pluraltho·rax·es or tho·ra·cesEtymology: from Latin thorac1 : the part of the mammalianbody between the neck andthe abdomen2 : the middle of the threechief divisions

Page 34: 2005.V Smith.Cybertaxonomy

Smith, 2001

BioCorder & Morphology(A special case of taxonomy)

Text Descriptions Keys Images

Anatomical Terms

(like taxonomic names)

DefinitionsSynonyms

Relationships

Why…?•Gene ontology groupsalready doing this•Comparative anatomicalsearches possible•Assists with descriptions,keys & image annotation

“Anatomy Banks”

Controlled vocabulariesof anatomical terms

Main Entry: tho·raxPronunciation: 'thOr-"aksFunction: nounInflected Form(s): pluraltho·rax·es or tho·ra·cesEtymology: from Latin thorac1 : the part of the mammalianbody between the neck andthe abdomen2 : the middle of the threechief divisions

AnatomicalSearch Engines

Seeks terms &defines relationships

Page 35: 2005.V Smith.Cybertaxonomy

BioCorder & Search

examplesOf

LouseTaxon

ImagesClassification

Checklist

isPartOf

inPartOf

Museum

hasSpecimens

MammalTaxon

isBasedOn

Phylogeny

isPartOf

inPartOf

linksTo

Resource

linksTo

linksTo

Resource

linksTo

Resource

linksTo

Resource

linksTo

Resource

linksTolinksTo

Resource Resource

Resource

linksTo linksTo

Resource Resource

World Wide WebSemantics of the resourcegleaned from content by user Place

presentAt

DNASequences

Morph.Characters

requires requires

e.g. Pediculus humanus(human head & body louse)

BioCorder & Semantic WebSemantics of the resourceprocessed by computer

Page 36: 2005.V Smith.Cybertaxonomy

BioCorder & Search

examplesOf

LouseTaxon

ImagesClassification

Checklist

isPartOf

inPartOf

Museum

hasSpecimens

MammalTaxon

isBasedOn

Phylogeny

isPartOf

inPartOf

linksTo

Resource

linksTo

linksTo

Resource

linksTo

Resource

linksTo

Resource

linksTo

Resource

linksTolinksTo

Resource Resource

Resource

linksTo linksTo

Resource Resource

World Wide WebSemantics of the resourcegleaned from content by user Place

presentAt

DNASequences

Morph.Characters

requires requires

e.g. Pediculus humanus(human head & body louse)

BioCorder & Semantic WebSemantics of the resourceprocessed by computer

http://darwin.zoology.gla.ac.uk/~vsmith/cybertaxonomy/More details & examples at:

http://www.biocorder.org/

Page 37: 2005.V Smith.Cybertaxonomy

So what is cybertaxonomy…

“The bane of my existence is doing things that Iknow the computer could do for me”

-- Dan Connolly, The XML Revolution(Nature, 1998)

“The bane of my existence is doing things that Iknow the computer could do for me”

Page 38: 2005.V Smith.Cybertaxonomy

Cybertaxonomy Roadmap

Stand AloneDatabases

e.g. Biota

Databases functionindependently on stand

alone computers

Stand AloneDatabases withWeb Interface

e.g. LouseBASE

Web interface permitsmultiple users fromdifferent locations

Linked Databaseswith Web Interface

e.g GBIF portal

Centralized databasesconnected permitting queriesacross independent databases

SemanticallyIntegrated

Data

Semantic WebEnabled Databaseswith Web Interface

e.g. SID & BioCorder

Network of semantic webenabled distributed &specialized databases

Data translation tools & “screen scrapers”e.g. http://darwin.zoology.gla.ac.uk/~rpage/hacks/

http://darwin.zoology.gla.ac.uk/~vsmith/cybertaxonomy/

http://www.biocorder.org/