Finding Bugs in People: Developing an Entomology Ontology from the UMLS

28
Finding Bugs in People: Developing an Entomology Ontology from the UMLS Indra Neil Sarkar, PhD Lewis B. & Dorothy Cullman Bioinformatics Associate Division of Invertebrate Zoology American Museum of Natural History NKOS Workshop 10 June 2005

description

Finding Bugs in People: Developing an Entomology Ontology from the UMLS. Indra Neil Sarkar, PhD Lewis B. & Dorothy Cullman Bioinformatics Associate Division of Invertebrate Zoology American Museum of Natural History. NKOS Workshop. 10 June 2005. Phenotypes. Structural Data. Sequence Data. - PowerPoint PPT Presentation

Transcript of Finding Bugs in People: Developing an Entomology Ontology from the UMLS

Page 1: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

Finding Bugs in People:Developing an

Entomology Ontology from the UMLS

Indra Neil Sarkar, PhDLewis B. & Dorothy Cullman Bioinformatics Associate

Division of Invertebrate Zoology

American Museum of Natural History

NKOS Workshop 10 June 2005

Page 2: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

© 2005 Indra Neil Sarkar, PhD

Total Evidence Tree of Life

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.

Sequence Data

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.

Structural Data

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.

Phenotypes

Morphology

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.

Page 3: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

© 2005 Indra Neil Sarkar, PhD

Statements of Homology

Sequence Data Multiple Sequence Alignments

CLUSTAL, T-COFFEE, MUSCLE

Non-sequence Data Ontologies

Page 4: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

© 2005 Indra Neil Sarkar, PhD

Ontologies

“White” “Blanc” “Weiss”

White BlueRed

Color

Page 5: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

© 2005 Indra Neil Sarkar, PhD

Ogden-Richards Semiotic Triangle

“White” “Weiss”“Blanc” XVFD

Symbols

Thought/Reference

Referent

Page 6: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

© 2005 Indra Neil Sarkar, PhD

Ontology Development

Protégé http://protege.stanford.edu “Frame-based”

Page 7: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

© 2005 Indra Neil Sarkar, PhD

Ontology Development

Page 8: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

© 2005 Indra Neil Sarkar, PhD

Ontologies in Phylogenetics

“Wing” “Aile” “Flügel”

Wing ArmForeleg

Forelimb

Page 9: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

© 2005 Indra Neil Sarkar, PhD

Wing ArmForeleg

Forelimb

CATBATBIRD

111

Forelimb

Wing(2)

Arm(3)

Foreleg(1)

132

[Gene 1][Gene 1][Gene 1]

[Gene 2][Gene 2][Gene 2]

………

Ontologies in Phylogenetics

Page 10: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

© 2005 Indra Neil Sarkar, PhD

Ontologies in Phylogenetics

Genetic Information 99% of Earth’s biota are extinct!

Morphological Information Fossil record Morphological studies from extant organisms

Page 11: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

© 2005 Indra Neil Sarkar, PhD

Ontologies in Phylogenetics

Ontology Development Web Ontology Language (OWL) Structured Descriptive Data (SDD)

Can be exported to NEXUS, DELTA, Lucid

Ontology Acquisition and Markup Archival Resources Natural Language Processing

Page 12: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

© 2005 Indra Neil Sarkar, PhD

Unified Medical Language System (UMLS)

Metathesaurus One Million Concepts 100+ Biomedical Terminologies/Ontologies

Semantic Network 135 Semantic Types 15 Coarse Semantic Groups

SPECIALIST Lexicon English + Biomedical Words

Page 13: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

© 2005 Indra Neil Sarkar, PhD

Torre-Bueno Glossary of Entomology (TBGE)

Common Entomology Phrases 300 Primary Sources 15,010 Terms/Phrases

Page 14: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

© 2005 Indra Neil Sarkar, PhD

TBGE to UMLS

Question 1: Is Entomology Language Different than Biomedical Language? TBGE to SPECIALIST

Question 2: Can UMLS Be Used to Seed an Ontology for Entomology? TBGE to UMLS Metathesaurus Organize Results According to Semantic

Network

Page 15: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

© 2005 Indra Neil Sarkar, PhD

Q1: Is Entomology a Unique Language?

“Look-up” Individual Word Atoms in SPECIALIST

Complete Look-up 48% Coverage

Partial Look-up 66% Coverage

Not found 34% Not covered

Page 16: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

© 2005 Indra Neil Sarkar, PhD

Q2: Can UMLS Be Used to Seed Entomology Ontology?

Three-Tiered Mapping Approach Tier 1: Direct Mapping

Exact & Normalized String Matching Tier 2: Direct Mapping after Demodification

Remove nominal and adjectival modifiers Exact & Normalized String Matching

Tier 3: Approximate Matching MetaMap Application

Page 17: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

© 2005 Indra Neil Sarkar, PhD

Q2: Can UMLS Be Used to Seed Entomology Ontology?

Three-Tiered Mapping Approach Tier 1: Direct Mapping

Exact & Normalized String Matching Tier 2: Direct Mapping after Demodification

Remove nominal and adjectival modifiers Exact & Normalized String Matching

Tier 3: Approximate Matching MetaMap Application

Page 18: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

© 2005 Indra Neil Sarkar, PhD

Q2: Can UMLS Be Used to Seed Entomology Ontology?

Tier

(Approach)

1

(Direct)

2

(Demod)

3

(Approx)

% Mapped % Accuracy

Method Overall Method Overall

20 8620 86

37 7449 78

23 4161 71

Page 19: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

© 2005 Indra Neil Sarkar, PhD

Q2: Can UMLS Be Used to Seed Entomology Ontology?

Correct Mappings

0

500

1000

1500

2000

2500

3000

ACTIANATCHEMCONCDEVIDISOGENEGEOG

LIVBOBJCOCCUORGAPHENPHYSPROCSemantic Type

Number of Mappings

Approx

Demod

Direct

Incorrect Mappings

0

100

200

300

400

500

600

700

800

900

1000

ACTIANATCHEMCONCDEVIDISOGENEGEOG

LIVBOBJCOCCUORGAPHENPHYSPROC

Semantic Type

Number of Mappings

Approx

Demod

Direct

Page 20: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

© 2005 Indra Neil Sarkar, PhD

Q2: Can UMLS Be Used to Seed Entomology Ontology?

0

500

1000

1500

2000

2500

3000

SNOMED CT

MeSH

Other (30)

UWDANCBI MDR LCH CSP GO

Incorrect

Correct

Source Terminologies

Page 21: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

© 2005 Indra Neil Sarkar, PhD

TBGE-UMLS Implications

UMLS Semantic Network is a good Seed Ontology for Biological Domain Ontologies

Best Term-Concept Mappings into Anatomy

Page 22: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

© 2005 Indra Neil Sarkar, PhD

Bottom-Up vs. Top-Down

Page 23: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

© 2005 Indra Neil Sarkar, PhD

In Summary…

Ontologies are Needed for Phylogenetics Existing Biomedical Ontologies Are Useful

for New Domain Ontologies (especially UMLS)

Top-Down Strategy using UMLS is Tractable

Page 24: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

© 2005 Indra Neil Sarkar, PhD

End Goal

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.

Sequence Data

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.

Structural Data

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.

Phenotypes

Morphology

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.

OWL SDD

Page 25: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

© 2005 Indra Neil Sarkar, PhD

Next Steps

Represent Seed Entomology Ontology in OWL

Link OWL Representation to SDD for use in Taxonomic Descriptions

Involve Team of Experts for Validation Go Beyond Morphology-- Location,

Biodiversity Data, etc.

Page 26: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

Acknowledgements

Page 27: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

© 2005 Indra Neil Sarkar, PhD

Acknowledgements

Tom Moritz Rob DeSalle Mark Siddall David Figurski Susan Perkins Paul Planet

Gloria Coruzzi Olivier Bodenreider Carol Friedman Jim Cimino Bob Morris Mark Musen

National Institutes of Health

National Science Foundation

American Museum of Natural History

Page 28: Finding Bugs in People: Developing an  Entomology Ontology from the UMLS

Thank [email protected]

Indra Neil Sarkar, Cullman Bioinformatics AssociateAmerican Museum of Natural History

http://www.GenomeCurator.org/people/sarkar