2005.V Smith.Cybertaxonomy

Post on 22-Nov-2014

2.376 views 0 download

Tags:

description

V. S. Smith. What is Cybertaxonomy? some case studies from a louse systematist

Transcript of 2005.V Smith.Cybertaxonomy

What is Cybertaxonomy ?some case studies from a louse systematist

Illinois Natural History SurveyUniversity of Illinois Campus

Champaign, IllinoisUSA

Vincent S. Smith

“The bane of my existence is doing things that Iknow the computer could do for me”

-- Dan Connolly, The XML Revolution(Nature, 1998)

Proportion of alldescribed species… Paraneoptera

Insect OrdinalPhylogeny…

Sensu W.C. Wheeleret al, Cladistics 2001

Insects

ArachnidsCrustaceans 2.4%

Other Arthropods 1.2%Molluscs

Nematodes 0.9%Other Invertebrates

Vertebrates

Plants

FungiAlgae

Protozoans

56.3%

4.5%

4.2%

4.0%2.7%

14.3%

4.2%2.4%2.4%

Bacteria andViruses 0.5%

ParaneopteraProportion of all

described species…

Phthiraptera(Parasitic lice)

Insect OrdinalPhylogeny…

Sensu W.C. Wheeleret al, Cladistics 2001

Insects

ArachnidsCrustaceans 2.4%

Other Arthropods 1.2%Molluscs

Nematodes 0.9%Other Invertebrates

Vertebrates

Plants

FungiAlgae

Protozoans

56.3%

4.5%

4.2%

4.0%2.7%

14.3%

4.2%2.4%2.4%

Bacteria andViruses 0.5%

Proportion ofdescribed insects (by order)…

Proportion of alldescribed species…

Insects

ArachnidsCrustaceans 2.4%

Other Arthropods 1.2%Molluscs

Nematodes 0.9%Other Invertebrates

Vertebrates

Plants

FungiAlgae

Protozoans

56.3%

4.5%

4.2%

4.0%2.7%

14.3%

4.2%2.4%2.4%

Bacteria andViruses 0.5%

Data from McGavin, 2001& uBio (www.ubio.org)

Phthiraptera(Parasitic lice)

Valid LouseSpecies

(4,927)

(2,3

14)

Syno

nym

s

WholeFrozen

Specimens

(20,000 con. est.)PDF Files

(1,946)

Bibliography

(7,110 papers)

“My” Data Set…

…to Scale

Louse / Host Associations(10,574)

Parasitized Mammals

(1,214 spp.)

Parasitized Birds

(3,508 spp.)

MorphologicalCharacters

(548)Phylogenies

(27)

Museum Specimens

Images(4,927)

DNASequences

(2,992)

Data storage device…

Palma, R.L., andR.L.C. Pilgrim.2002. A revisionof the genusNaubates(Insecta:Phthiraptera:Philopteridae).J. R. Soc. N.Z.32:7-60.

data in 4 of 54 pages,in 1 of 7,110 taxonomic

142 pieces of “raw”

papers on lice

- Taxonomic names- Authorities (name concepts)- Citations- Collection data- Morphological characters- Textual descriptions- Diagnostic keys

- Illustrations- Photographs

Immediate Collaborators…

• Alpha taxonomists

• Ecologists• Phylogeneticists

• Vector biologists

• “Amateurs”

• Professionals• Retired professionals

Contributors to“my”dataset

…necessitates tools for collaboration

http://darwin.zoology.gla.ac.uk/~rpage/LouseBase/2/

LouseBASE

Specimens Images

(SID)

http://darwin.zoology.gla.ac.uk/~SID/

Literature

PHPBibhttp://myphpbib.sourceforge.net/

Lab Notebook

http://www2.flmnh.ufl.edu/pdb/

Host-Parasite Checklists

http://www2.flmnh.ufl.edu/adb/

Web Databases…

Glasgow version at:

http://darwin.zoology.gla.ac.uk/~rpage/LouseBase/2/

LouseBASE

Specimens Images

(SID)

http://darwin.zoology.gla.ac.uk/~SID/

Literature

PHPBibhttp://myphpbib.sourceforge.net/

Web Databases…

Glasgow version at:

Developed by Rod PageOn-line Reference Manager

Import via PubMed ID’se.g. 15472046

Smith, V.S. 2004. Lousy Lists.Systematic Biology. 53: 666-668.

• Import / export Endnote &Reference Manager format

• Browse, search & edit Nice example at:http://darwin.zoology.gla.ac.uk/~kdavis/bib/

• Multi-user

Images

(SID)

http://darwin.zoology.gla.ac.uk/~SID/

Literature

PHPBibhttp://myphpbib.sourceforge.net/

Lab Notebook

http://www2.flmnh.ufl.edu/pdb/

Host-Parasite Checklists

http://www2.flmnh.ufl.edu/adb/

Web Databases…

Developed in David Reed’s lab.On-line Molecular Lab. Notebook

• DNA extraction, PCR & microsatellite workflows

• Stores electropherograms

• Multi-user

• Includes DNA primer database

Literature

PHPBibhttp://myphpbib.sourceforge.net/

Lab Notebook

http://www2.flmnh.ufl.edu/pdb/

Host-Parasite Checklists

http://www2.flmnh.ufl.edu/adb/

Web Databases…

Developed in David Reed’s lab.On-line Molecular Lab. Notebook

• DNA extraction, PCR & microsatellite workflows

• Stores electropherograms

• Multi-user

• Includes DNA primer database

Developed in Dale Clayton’s lab.On-line Host-Louse Checklist

host-louse associations• Simple search & browse

by host or louse classification

http://darwin.zoology.gla.ac.uk/~rpage/LouseBase/2/

LouseBASE

Specimens Images

(SID)

http://darwin.zoology.gla.ac.uk/~SID/

Literature

PHPBibhttp://myphpbib.sourceforge.net/

Lab Notebook

http://www2.flmnh.ufl.edu/pdb/

Host-Parasite Checklists

http://www2.flmnh.ufl.edu/adb/

Web Databases…

Glasgow version at:

LouseBASE - specimen databasehttp://darwin.zoology.gla.ac.uk/~rpage/LouseBase/2/

• Easy addition of new collections

• Extensive browse & search options

• Manages DNA extracts & sequences

For curation of specimens and related data

• Export formatted datasets- FASTA (DNA sequences)

- Se-aL (sequence alignment)

- TreeMap (cophylogenetic analyses)

- Sequin (Genbank sequence submission)

R. Page & V. Smith, in use since June 1999

SID - Specimen Image Database

• Bulk image upload & annotation

• Extensive browse & search options

• Copyright options to control image use

• Web image labeling

• Designation of “exemplar” images

• RSS & web service access

http://darwin.zoology.gla.ac.uk/~sid/

For sharing & disseminating images

S. Rycroft & V. Smith, in use since June 2004

Images LiteratureLab Notebooks

Specimens Checklists

Tools for collaboration…

Images LiteratureLab Notebooks

Specimens Checklists

Tools for collaboration…

• Synthesize and integrate data

• Share data• Minimize errors• Reduce redundancy

… document patterns

ASK NEWQUESTIONS OF

THE DATA

… understand process

2) Integrative studies across databases 1) Database meta-analysis

Coextinction…The loss of a species with the loss of its affiliate

California condor and its now extinctlouse (Colpocephalum californici)

Louse ChecklistAssociation Database

Price, R. D et al. 2003.The Chewing Lice: WorldChecklist. INHS.

Durden & Musser. 1994.The sucking lice of theworld. AMNH.

Meta-analysis

Database

Proportion of extinct hosts (E)1.00.0

Pro

porti

on o

f affi

liate

s ex

tinct

(A) 1.0

Ficus wasps - FicusAnt butterflies - antsPrimate fungi - primatesPrimate nematodes - primatesPrimate lice - primatesSeabird lice - seabirdsBird mites - birdsButterflies - host plants

Koh, Dunn, Sodhi,Colwell, Proctor, &Smith. 2004,Science.

Coextinction Prediction Curves

Coextinction…The loss of a species with the loss of its affiliate

California condor and its now extinctlouse (Colpocephalum californici)

Beetles

Monogeneans

Bird lice

Bird mites

Butterflies

Mammal lice

Plants

Fish

Birds

Birds

Plants

Mammals

Primate nematodes

Primate lice

Primate Pneumocystis fungi

Ficus waspsLycaenid butterflies

Metazoan parasites

Primates

Primates

Primates

FicusAnts

Canadian fish

1234567 2 4 6 8 100Endangered Hosts

(Thousands)Projected Extinctions

(Hundreds)

(4672, 0.4%)

Pro

ject

ed E

xtin

ctio

ns

>6,300 co-endangeredaffiliates from9,491 hosts

IUC

N R

ed li

st

Beetles

Monogeneans

Bird lice

Bird mites

Mammal lice

Butterflies

Plants

Fish

Birds

Birds

Mammals

Plants

20406080100120140 0

Extinct Hosts Extinct Affiliates(Estimated)

10 20 30 40 50 60 70 80

Est

imat

ed E

xtin

ctio

ns

>200 co-extinctaffiliates from399 extinct hosts

IUC

N R

ed li

st Hosts Affiliates

Proportion of extinct hosts (E)1.00.0

Pro

porti

on o

f affi

liate

s ex

tinct

(A) 1.0

Ficus wasps - FicusAnt butterflies - antsPrimate fungi - primatesPrimate nematodes - primatesPrimate lice - primatesSeabird lice - seabirdsBird mites - birdsButterflies - host plants

Koh, Dunn, Sodhi,Colwell, Proctor, &Smith. 2004,Science.

Coextinction Prediction Curves

Integrative studies…Human-Louse Coevolution

Reed, Smith et al,PLoS-Biology 2004.

Body LiceHead Lice

Fun tion 1c

2

4

6

Function2

-

-

-

Females

151050-5-10

6

4

2

0

32

1

Group Centroids

Ungrouped Cases

3 Pubic Lice

2 Head lice

1 Body Lice

Head Lice

Body Lice

Pubic Lice

Integrative studies…Human-Louse Coevolution Lice Predate the K-T boundary

010

020

0

KT

Modern Birds

Wappler, Smith & Dalgleish,Proc.R.Soc., 2004.

Smith et al, in prep.

Reed, Smith et al,PLoS-Biology 2004.

Body LiceHead Lice

Head Lice

Body Lice

Pubic Lice

Images LiteratureLab Notebooks

Specimens Checklists

Questions facilitated by the databases…

…but

-Taxon names, -Globally unique identifiers, -Authority taxon concepts, -Dates of modification, -Institution codes, -Collection codes, -Catalogue numbers, - Information withheld, -Geography, -Localities, - Collection methods, -Preservation methods, -Geological periods, -Date collected, -Collector names, -Sex, -Life stage, -Publication titles, -Publication journals, -Morphologicalcharacters, -Morphological character states, -Anatomical names, -Anatomical definitions, -Imagetypes, -Copyright licences, -Access permissions, -Video clips, -Textual descriptions, -Soundrecordings, -DNA sequences, - Gene names, -DNA sequence alignments, -Phylogenetic analyticalmethods, -Phylogenetic trees, - Phylogenetic datasets, -Morphometric data, - Gene orders, -Character state transitions, etc, etc, etc…

Shared “raw” data lacks integration…

Images LiteratureLab Notebooks

Specimens Checklists

… “raw” data types

BioCorder - Biological Recorderhttp://www.biocorder.org/

BioCorder - Biological Recorderhttp://www.biocorder.org/

An web based framework for systematic research…

• Distributed• Modular • Cross platform

• Generic (any taxa)• Free• Intuitive

Awarded $500k byNSF, March ‘05

A collaboration between:

Rod PageDavid ReedVince Smith

Mark Hafner

www.AnyTaxonSite.org

“AnyTaxon”

BioCorder Installation

BioCorder & Taxonomy

I.P.N.I

“Name Banks”

Dictionaries oftaxonomic names

TaxonomicSearch Engines

Thesaurus ofclassifications

www.AnyTaxonSite.org

“AnyTaxon”

BioCorder & Taxonomy

I.P.N.I

“Name Banks”

Dictionaries oftaxonomic names

urn:lsid:sid.zoology.gla.ac.uk:id:6

TaxonomicSearch Engines

Thesaurus ofclassifications

Taxon name

BioCorder Installation

BioCorder & TaxonomyTaxonomic

Search Engines

Thesaurus ofclassifications

Taxonomy Tools

http://names.mbl.edu/tools/http://darwin.zoology.gla.ac.uk/~rpage/portal

/

- Check spelling- Name canonizer- Checklist parser

Map relationshipsbetween taxonomic

concepts

OntologiesVisualizations

Visualize & exploreclassifications

http://darwin.zoology.gla.ac.uk/~rpage/portal/Glasgow TSE:

uBio TSE:http://uio.mbl.edu/SOAPbrowser/

Smith, 2001

BioCorder & Morphology(A special case of taxonomy)

Text Descriptions Keys Images

Anatomical Terms

(like taxonomic names)

DefinitionsSynonyms

Relationships

Smith, 2001

BioCorder & Morphology(A special case of taxonomy)

Text Descriptions Keys Images

Anatomical Terms

(like taxonomic names)

DefinitionsSynonyms

Relationships

Smith, 2001

BioCorder & Morphology(A special case of taxonomy)

Text Descriptions Keys Images

Anatomical Terms

(like taxonomic names)

DefinitionsSynonyms

Relationships

I.P.N.I

“Name Banks”

Dictionaries oftaxonomic names

Smith, 2001

BioCorder & Morphology(A special case of taxonomy)

Text Descriptions Keys Images

Anatomical Terms

(like taxonomic names)

DefinitionsSynonyms

Relationships

“Anatomy Banks”

Controlled vocabulariesof anatomical terms

Main Entry: tho·raxPronunciation: 'thOr-"aksFunction: nounInflected Form(s): pluraltho·rax·es or tho·ra·cesEtymology: from Latin thorac1 : the part of the mammalianbody between the neck andthe abdomen2 : the middle of the threechief divisions

Smith, 2001

BioCorder & Morphology(A special case of taxonomy)

Text Descriptions Keys Images

Anatomical Terms

(like taxonomic names)

DefinitionsSynonyms

Relationships

TaxonomicSearch Engines

Thesaurus ofclassifications

“Anatomy Banks”

Controlled vocabulariesof anatomical terms

Main Entry: tho·raxPronunciation: 'thOr-"aksFunction: nounInflected Form(s): pluraltho·rax·es or tho·ra·cesEtymology: from Latin thorac1 : the part of the mammalianbody between the neck andthe abdomen2 : the middle of the threechief divisions

Smith, 2001

BioCorder & Morphology(A special case of taxonomy)

Text Descriptions Keys Images

Anatomical Terms

(like taxonomic names)

DefinitionsSynonyms

Relationships

AnatomicalSearch Engines

Seeks terms &defines relationships

“Anatomy Banks”

Controlled vocabulariesof anatomical terms

Main Entry: tho·raxPronunciation: 'thOr-"aksFunction: nounInflected Form(s): pluraltho·rax·es or tho·ra·cesEtymology: from Latin thorac1 : the part of the mammalianbody between the neck andthe abdomen2 : the middle of the threechief divisions

Smith, 2001

BioCorder & Morphology(A special case of taxonomy)

Text Descriptions Keys Images

Anatomical Terms

(like taxonomic names)

DefinitionsSynonyms

Relationships

Why…?•Gene ontology groupsalready doing this•Comparative anatomicalsearches possible•Assists with descriptions,keys & image annotation

“Anatomy Banks”

Controlled vocabulariesof anatomical terms

Main Entry: tho·raxPronunciation: 'thOr-"aksFunction: nounInflected Form(s): pluraltho·rax·es or tho·ra·cesEtymology: from Latin thorac1 : the part of the mammalianbody between the neck andthe abdomen2 : the middle of the threechief divisions

AnatomicalSearch Engines

Seeks terms &defines relationships

BioCorder & Search

examplesOf

LouseTaxon

ImagesClassification

Checklist

isPartOf

inPartOf

Museum

hasSpecimens

MammalTaxon

isBasedOn

Phylogeny

isPartOf

inPartOf

linksTo

Resource

linksTo

linksTo

Resource

linksTo

Resource

linksTo

Resource

linksTo

Resource

linksTolinksTo

Resource Resource

Resource

linksTo linksTo

Resource Resource

World Wide WebSemantics of the resourcegleaned from content by user Place

presentAt

DNASequences

Morph.Characters

requires requires

e.g. Pediculus humanus(human head & body louse)

BioCorder & Semantic WebSemantics of the resourceprocessed by computer

BioCorder & Search

examplesOf

LouseTaxon

ImagesClassification

Checklist

isPartOf

inPartOf

Museum

hasSpecimens

MammalTaxon

isBasedOn

Phylogeny

isPartOf

inPartOf

linksTo

Resource

linksTo

linksTo

Resource

linksTo

Resource

linksTo

Resource

linksTo

Resource

linksTolinksTo

Resource Resource

Resource

linksTo linksTo

Resource Resource

World Wide WebSemantics of the resourcegleaned from content by user Place

presentAt

DNASequences

Morph.Characters

requires requires

e.g. Pediculus humanus(human head & body louse)

BioCorder & Semantic WebSemantics of the resourceprocessed by computer

http://darwin.zoology.gla.ac.uk/~vsmith/cybertaxonomy/More details & examples at:

http://www.biocorder.org/

So what is cybertaxonomy…

“The bane of my existence is doing things that Iknow the computer could do for me”

-- Dan Connolly, The XML Revolution(Nature, 1998)

“The bane of my existence is doing things that Iknow the computer could do for me”

Cybertaxonomy Roadmap

Stand AloneDatabases

e.g. Biota

Databases functionindependently on stand

alone computers

Stand AloneDatabases withWeb Interface

e.g. LouseBASE

Web interface permitsmultiple users fromdifferent locations

Linked Databaseswith Web Interface

e.g GBIF portal

Centralized databasesconnected permitting queriesacross independent databases

SemanticallyIntegrated

Data

Semantic WebEnabled Databaseswith Web Interface

e.g. SID & BioCorder

Network of semantic webenabled distributed &specialized databases

Data translation tools & “screen scrapers”e.g. http://darwin.zoology.gla.ac.uk/~rpage/hacks/

http://darwin.zoology.gla.ac.uk/~vsmith/cybertaxonomy/

http://www.biocorder.org/