What is Cybertaxonomy ?some case studies from a louse systematist
Illinois Natural History SurveyUniversity of Illinois Campus
Champaign, IllinoisUSA
Vincent S. Smith
“The bane of my existence is doing things that Iknow the computer could do for me”
-- Dan Connolly, The XML Revolution(Nature, 1998)
Proportion of alldescribed species… Paraneoptera
Insect OrdinalPhylogeny…
Sensu W.C. Wheeleret al, Cladistics 2001
Insects
ArachnidsCrustaceans 2.4%
Other Arthropods 1.2%Molluscs
Nematodes 0.9%Other Invertebrates
Vertebrates
Plants
FungiAlgae
Protozoans
56.3%
4.5%
4.2%
4.0%2.7%
14.3%
4.2%2.4%2.4%
Bacteria andViruses 0.5%
ParaneopteraProportion of all
described species…
Phthiraptera(Parasitic lice)
Insect OrdinalPhylogeny…
Sensu W.C. Wheeleret al, Cladistics 2001
Insects
ArachnidsCrustaceans 2.4%
Other Arthropods 1.2%Molluscs
Nematodes 0.9%Other Invertebrates
Vertebrates
Plants
FungiAlgae
Protozoans
56.3%
4.5%
4.2%
4.0%2.7%
14.3%
4.2%2.4%2.4%
Bacteria andViruses 0.5%
Proportion ofdescribed insects (by order)…
Proportion of alldescribed species…
Insects
ArachnidsCrustaceans 2.4%
Other Arthropods 1.2%Molluscs
Nematodes 0.9%Other Invertebrates
Vertebrates
Plants
FungiAlgae
Protozoans
56.3%
4.5%
4.2%
4.0%2.7%
14.3%
4.2%2.4%2.4%
Bacteria andViruses 0.5%
Data from McGavin, 2001& uBio (www.ubio.org)
Phthiraptera(Parasitic lice)
Valid LouseSpecies
(4,927)
(2,3
14)
Syno
nym
s
WholeFrozen
Specimens
(20,000 con. est.)PDF Files
(1,946)
Bibliography
(7,110 papers)
“My” Data Set…
…to Scale
Louse / Host Associations(10,574)
Parasitized Mammals
(1,214 spp.)
Parasitized Birds
(3,508 spp.)
MorphologicalCharacters
(548)Phylogenies
(27)
Museum Specimens
Images(4,927)
DNASequences
(2,992)
Data storage device…
Palma, R.L., andR.L.C. Pilgrim.2002. A revisionof the genusNaubates(Insecta:Phthiraptera:Philopteridae).J. R. Soc. N.Z.32:7-60.
data in 4 of 54 pages,in 1 of 7,110 taxonomic
142 pieces of “raw”
papers on lice
- Taxonomic names- Authorities (name concepts)- Citations- Collection data- Morphological characters- Textual descriptions- Diagnostic keys
- Illustrations- Photographs
Immediate Collaborators…
• Alpha taxonomists
• Ecologists• Phylogeneticists
• Vector biologists
• “Amateurs”
• Professionals• Retired professionals
Contributors to“my”dataset
…necessitates tools for collaboration
http://darwin.zoology.gla.ac.uk/~rpage/LouseBase/2/
LouseBASE
Specimens Images
(SID)
http://darwin.zoology.gla.ac.uk/~SID/
Literature
PHPBibhttp://myphpbib.sourceforge.net/
Lab Notebook
http://www2.flmnh.ufl.edu/pdb/
Host-Parasite Checklists
http://www2.flmnh.ufl.edu/adb/
Web Databases…
Glasgow version at:
http://darwin.zoology.gla.ac.uk/~rpage/LouseBase/2/
LouseBASE
Specimens Images
(SID)
http://darwin.zoology.gla.ac.uk/~SID/
Literature
PHPBibhttp://myphpbib.sourceforge.net/
Web Databases…
Glasgow version at:
Developed by Rod PageOn-line Reference Manager
Import via PubMed ID’se.g. 15472046
Smith, V.S. 2004. Lousy Lists.Systematic Biology. 53: 666-668.
• Import / export Endnote &Reference Manager format
• Browse, search & edit Nice example at:http://darwin.zoology.gla.ac.uk/~kdavis/bib/
• Multi-user
Images
(SID)
http://darwin.zoology.gla.ac.uk/~SID/
Literature
PHPBibhttp://myphpbib.sourceforge.net/
Lab Notebook
http://www2.flmnh.ufl.edu/pdb/
Host-Parasite Checklists
http://www2.flmnh.ufl.edu/adb/
Web Databases…
Developed in David Reed’s lab.On-line Molecular Lab. Notebook
• DNA extraction, PCR & microsatellite workflows
• Stores electropherograms
• Multi-user
• Includes DNA primer database
Literature
PHPBibhttp://myphpbib.sourceforge.net/
Lab Notebook
http://www2.flmnh.ufl.edu/pdb/
Host-Parasite Checklists
http://www2.flmnh.ufl.edu/adb/
Web Databases…
Developed in David Reed’s lab.On-line Molecular Lab. Notebook
• DNA extraction, PCR & microsatellite workflows
• Stores electropherograms
• Multi-user
• Includes DNA primer database
Developed in Dale Clayton’s lab.On-line Host-Louse Checklist
host-louse associations• Simple search & browse
by host or louse classification
http://darwin.zoology.gla.ac.uk/~rpage/LouseBase/2/
LouseBASE
Specimens Images
(SID)
http://darwin.zoology.gla.ac.uk/~SID/
Literature
PHPBibhttp://myphpbib.sourceforge.net/
Lab Notebook
http://www2.flmnh.ufl.edu/pdb/
Host-Parasite Checklists
http://www2.flmnh.ufl.edu/adb/
Web Databases…
Glasgow version at:
LouseBASE - specimen databasehttp://darwin.zoology.gla.ac.uk/~rpage/LouseBase/2/
• Easy addition of new collections
• Extensive browse & search options
• Manages DNA extracts & sequences
For curation of specimens and related data
• Export formatted datasets- FASTA (DNA sequences)
- Se-aL (sequence alignment)
- TreeMap (cophylogenetic analyses)
- Sequin (Genbank sequence submission)
R. Page & V. Smith, in use since June 1999
SID - Specimen Image Database
• Bulk image upload & annotation
• Extensive browse & search options
• Copyright options to control image use
• Web image labeling
• Designation of “exemplar” images
• RSS & web service access
http://darwin.zoology.gla.ac.uk/~sid/
For sharing & disseminating images
S. Rycroft & V. Smith, in use since June 2004
Images LiteratureLab Notebooks
Specimens Checklists
Tools for collaboration…
Images LiteratureLab Notebooks
Specimens Checklists
Tools for collaboration…
• Synthesize and integrate data
• Share data• Minimize errors• Reduce redundancy
… document patterns
ASK NEWQUESTIONS OF
THE DATA
… understand process
2) Integrative studies across databases 1) Database meta-analysis
Coextinction…The loss of a species with the loss of its affiliate
California condor and its now extinctlouse (Colpocephalum californici)
Louse ChecklistAssociation Database
Price, R. D et al. 2003.The Chewing Lice: WorldChecklist. INHS.
Durden & Musser. 1994.The sucking lice of theworld. AMNH.
Meta-analysis
Database
Proportion of extinct hosts (E)1.00.0
Pro
porti
on o
f affi
liate
s ex
tinct
(A) 1.0
Ficus wasps - FicusAnt butterflies - antsPrimate fungi - primatesPrimate nematodes - primatesPrimate lice - primatesSeabird lice - seabirdsBird mites - birdsButterflies - host plants
Koh, Dunn, Sodhi,Colwell, Proctor, &Smith. 2004,Science.
Coextinction Prediction Curves
Coextinction…The loss of a species with the loss of its affiliate
California condor and its now extinctlouse (Colpocephalum californici)
Beetles
Monogeneans
Bird lice
Bird mites
Butterflies
Mammal lice
Plants
Fish
Birds
Birds
Plants
Mammals
Primate nematodes
Primate lice
Primate Pneumocystis fungi
Ficus waspsLycaenid butterflies
Metazoan parasites
Primates
Primates
Primates
FicusAnts
Canadian fish
1234567 2 4 6 8 100Endangered Hosts
(Thousands)Projected Extinctions
(Hundreds)
(4672, 0.4%)
Pro
ject
ed E
xtin
ctio
ns
>6,300 co-endangeredaffiliates from9,491 hosts
IUC
N R
ed li
st
Beetles
Monogeneans
Bird lice
Bird mites
Mammal lice
Butterflies
Plants
Fish
Birds
Birds
Mammals
Plants
20406080100120140 0
Extinct Hosts Extinct Affiliates(Estimated)
10 20 30 40 50 60 70 80
Est
imat
ed E
xtin
ctio
ns
>200 co-extinctaffiliates from399 extinct hosts
IUC
N R
ed li
st Hosts Affiliates
Proportion of extinct hosts (E)1.00.0
Pro
porti
on o
f affi
liate
s ex
tinct
(A) 1.0
Ficus wasps - FicusAnt butterflies - antsPrimate fungi - primatesPrimate nematodes - primatesPrimate lice - primatesSeabird lice - seabirdsBird mites - birdsButterflies - host plants
Koh, Dunn, Sodhi,Colwell, Proctor, &Smith. 2004,Science.
Coextinction Prediction Curves
Integrative studies…Human-Louse Coevolution
Reed, Smith et al,PLoS-Biology 2004.
Body LiceHead Lice
Fun tion 1c
2
4
6
Function2
-
-
-
Females
151050-5-10
6
4
2
0
32
1
Group Centroids
Ungrouped Cases
3 Pubic Lice
2 Head lice
1 Body Lice
Head Lice
Body Lice
Pubic Lice
Integrative studies…Human-Louse Coevolution Lice Predate the K-T boundary
010
020
0
KT
Modern Birds
Wappler, Smith & Dalgleish,Proc.R.Soc., 2004.
Smith et al, in prep.
Reed, Smith et al,PLoS-Biology 2004.
Body LiceHead Lice
Head Lice
Body Lice
Pubic Lice
Images LiteratureLab Notebooks
Specimens Checklists
Questions facilitated by the databases…
…but
-Taxon names, -Globally unique identifiers, -Authority taxon concepts, -Dates of modification, -Institution codes, -Collection codes, -Catalogue numbers, - Information withheld, -Geography, -Localities, - Collection methods, -Preservation methods, -Geological periods, -Date collected, -Collector names, -Sex, -Life stage, -Publication titles, -Publication journals, -Morphologicalcharacters, -Morphological character states, -Anatomical names, -Anatomical definitions, -Imagetypes, -Copyright licences, -Access permissions, -Video clips, -Textual descriptions, -Soundrecordings, -DNA sequences, - Gene names, -DNA sequence alignments, -Phylogenetic analyticalmethods, -Phylogenetic trees, - Phylogenetic datasets, -Morphometric data, - Gene orders, -Character state transitions, etc, etc, etc…
Shared “raw” data lacks integration…
Images LiteratureLab Notebooks
Specimens Checklists
… “raw” data types
BioCorder - Biological Recorderhttp://www.biocorder.org/
BioCorder - Biological Recorderhttp://www.biocorder.org/
An web based framework for systematic research…
• Distributed• Modular • Cross platform
• Generic (any taxa)• Free• Intuitive
Awarded $500k byNSF, March ‘05
A collaboration between:
Rod PageDavid ReedVince Smith
Mark Hafner
www.AnyTaxonSite.org
“AnyTaxon”
BioCorder Installation
BioCorder & Taxonomy
I.P.N.I
“Name Banks”
Dictionaries oftaxonomic names
TaxonomicSearch Engines
Thesaurus ofclassifications
www.AnyTaxonSite.org
“AnyTaxon”
BioCorder & Taxonomy
I.P.N.I
“Name Banks”
Dictionaries oftaxonomic names
urn:lsid:sid.zoology.gla.ac.uk:id:6
TaxonomicSearch Engines
Thesaurus ofclassifications
Taxon name
BioCorder Installation
BioCorder & TaxonomyTaxonomic
Search Engines
Thesaurus ofclassifications
Taxonomy Tools
http://names.mbl.edu/tools/http://darwin.zoology.gla.ac.uk/~rpage/portal
/
- Check spelling- Name canonizer- Checklist parser
Map relationshipsbetween taxonomic
concepts
OntologiesVisualizations
Visualize & exploreclassifications
http://darwin.zoology.gla.ac.uk/~rpage/portal/Glasgow TSE:
uBio TSE:http://uio.mbl.edu/SOAPbrowser/
Smith, 2001
BioCorder & Morphology(A special case of taxonomy)
Text Descriptions Keys Images
Anatomical Terms
(like taxonomic names)
DefinitionsSynonyms
Relationships
Smith, 2001
BioCorder & Morphology(A special case of taxonomy)
Text Descriptions Keys Images
Anatomical Terms
(like taxonomic names)
DefinitionsSynonyms
Relationships
Smith, 2001
BioCorder & Morphology(A special case of taxonomy)
Text Descriptions Keys Images
Anatomical Terms
(like taxonomic names)
DefinitionsSynonyms
Relationships
I.P.N.I
“Name Banks”
Dictionaries oftaxonomic names
Smith, 2001
BioCorder & Morphology(A special case of taxonomy)
Text Descriptions Keys Images
Anatomical Terms
(like taxonomic names)
DefinitionsSynonyms
Relationships
“Anatomy Banks”
Controlled vocabulariesof anatomical terms
Main Entry: tho·raxPronunciation: 'thOr-"aksFunction: nounInflected Form(s): pluraltho·rax·es or tho·ra·cesEtymology: from Latin thorac1 : the part of the mammalianbody between the neck andthe abdomen2 : the middle of the threechief divisions
Smith, 2001
BioCorder & Morphology(A special case of taxonomy)
Text Descriptions Keys Images
Anatomical Terms
(like taxonomic names)
DefinitionsSynonyms
Relationships
TaxonomicSearch Engines
Thesaurus ofclassifications
“Anatomy Banks”
Controlled vocabulariesof anatomical terms
Main Entry: tho·raxPronunciation: 'thOr-"aksFunction: nounInflected Form(s): pluraltho·rax·es or tho·ra·cesEtymology: from Latin thorac1 : the part of the mammalianbody between the neck andthe abdomen2 : the middle of the threechief divisions
Smith, 2001
BioCorder & Morphology(A special case of taxonomy)
Text Descriptions Keys Images
Anatomical Terms
(like taxonomic names)
DefinitionsSynonyms
Relationships
AnatomicalSearch Engines
Seeks terms &defines relationships
“Anatomy Banks”
Controlled vocabulariesof anatomical terms
Main Entry: tho·raxPronunciation: 'thOr-"aksFunction: nounInflected Form(s): pluraltho·rax·es or tho·ra·cesEtymology: from Latin thorac1 : the part of the mammalianbody between the neck andthe abdomen2 : the middle of the threechief divisions
Smith, 2001
BioCorder & Morphology(A special case of taxonomy)
Text Descriptions Keys Images
Anatomical Terms
(like taxonomic names)
DefinitionsSynonyms
Relationships
Why…?•Gene ontology groupsalready doing this•Comparative anatomicalsearches possible•Assists with descriptions,keys & image annotation
“Anatomy Banks”
Controlled vocabulariesof anatomical terms
Main Entry: tho·raxPronunciation: 'thOr-"aksFunction: nounInflected Form(s): pluraltho·rax·es or tho·ra·cesEtymology: from Latin thorac1 : the part of the mammalianbody between the neck andthe abdomen2 : the middle of the threechief divisions
AnatomicalSearch Engines
Seeks terms &defines relationships
BioCorder & Search
examplesOf
LouseTaxon
ImagesClassification
Checklist
isPartOf
inPartOf
Museum
hasSpecimens
MammalTaxon
isBasedOn
Phylogeny
isPartOf
inPartOf
linksTo
Resource
linksTo
linksTo
Resource
linksTo
Resource
linksTo
Resource
linksTo
Resource
linksTolinksTo
Resource Resource
Resource
linksTo linksTo
Resource Resource
World Wide WebSemantics of the resourcegleaned from content by user Place
presentAt
DNASequences
Morph.Characters
requires requires
e.g. Pediculus humanus(human head & body louse)
BioCorder & Semantic WebSemantics of the resourceprocessed by computer
BioCorder & Search
examplesOf
LouseTaxon
ImagesClassification
Checklist
isPartOf
inPartOf
Museum
hasSpecimens
MammalTaxon
isBasedOn
Phylogeny
isPartOf
inPartOf
linksTo
Resource
linksTo
linksTo
Resource
linksTo
Resource
linksTo
Resource
linksTo
Resource
linksTolinksTo
Resource Resource
Resource
linksTo linksTo
Resource Resource
World Wide WebSemantics of the resourcegleaned from content by user Place
presentAt
DNASequences
Morph.Characters
requires requires
e.g. Pediculus humanus(human head & body louse)
BioCorder & Semantic WebSemantics of the resourceprocessed by computer
http://darwin.zoology.gla.ac.uk/~vsmith/cybertaxonomy/More details & examples at:
http://www.biocorder.org/
So what is cybertaxonomy…
“The bane of my existence is doing things that Iknow the computer could do for me”
-- Dan Connolly, The XML Revolution(Nature, 1998)
“The bane of my existence is doing things that Iknow the computer could do for me”
Cybertaxonomy Roadmap
Stand AloneDatabases
e.g. Biota
Databases functionindependently on stand
alone computers
Stand AloneDatabases withWeb Interface
e.g. LouseBASE
Web interface permitsmultiple users fromdifferent locations
Linked Databaseswith Web Interface
e.g GBIF portal
Centralized databasesconnected permitting queriesacross independent databases
SemanticallyIntegrated
Data
Semantic WebEnabled Databaseswith Web Interface
e.g. SID & BioCorder
Network of semantic webenabled distributed &specialized databases
Data translation tools & “screen scrapers”e.g. http://darwin.zoology.gla.ac.uk/~rpage/hacks/
http://darwin.zoology.gla.ac.uk/~vsmith/cybertaxonomy/
http://www.biocorder.org/
Top Related