Biomedical Annotation - Kevin Livingston

17
Kevin Livingston, Ph.D. Postdoctoral Fellow Pharmacology Department, School of Medicine University of Colorado Anschutz Medical Campus [email protected] http://compbio.ucdenver.edu/Hunter_lab/Livingsto Biomedical Annotation

Transcript of Biomedical Annotation - Kevin Livingston

Page 1: Biomedical Annotation - Kevin Livingston

Kevin Livingston, Ph.D.Postdoctoral FellowPharmacology Department, School of MedicineUniversity of Colorado Anschutz Medical Campus

[email protected]://compbio.ucdenver.edu/Hunter_lab/Livingston

Biomedical Annotation

Page 2: Biomedical Annotation - Kevin Livingston

22

Biomedical researchers are interested in

understanding their data in the context of

all known background knowledge:curated databases & literature.

Page 3: Biomedical Annotation - Kevin Livingston

3

Pubmed Growth Rate1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

0

100

200

300

400

500

600

700

800

900

1000

1100

0

5

10

15

20

25

f(x) = 321.698053474767 exp( 0.0401516775647944 x )R² = 0.943742964334693

f(x) = 7.85609625880221 exp( 0.0404790715086599 x )R² = 0.99939550228951

973,499 PubMed entries in 2011 (>2,600 per day)

New

En

trie

s (t

ho

usa

nd

s)

To

tal E

ntr

ies

(mill

ion

s)

2 journal articles

per minute

!

Page 4: Biomedical Annotation - Kevin Livingston

44

Biomedical Data Sources

1,380 Databas

es in 2012

Total Manual GO Annotations:1,116,848

Total GO Annotations:132,425,702

PubMed Articles Referenced:

94,518

Page 5: Biomedical Annotation - Kevin Livingston

55

Annotation Consumers?• The linguistic community typically uses

annotation as training data or for specific tasks– An abundance of tools that can produce

annotations in the specific format of those resources

– Tools for computational linguistics

• Biomedical annotation typically used for curating, indexing, or enrichment analysis

• But what about re-using annotations and tools in other contexts and for other purposes?

Page 6: Biomedical Annotation - Kevin Livingston

66

Page 7: Biomedical Annotation - Kevin Livingston

7

Vision

Texts

Knowledge Base

Text Mining

Ontologies

DBs

Intelligent

Applications

Page 8: Biomedical Annotation - Kevin Livingston

88

Applications: Gene Centric

Page 9: Biomedical Annotation - Kevin Livingston

99

Applications: Document Centric

Page 10: Biomedical Annotation - Kevin Livingston

1010

Annotation for Computation

• Computer understandable• Composable• Provenance of compositions

traceable

Page 11: Biomedical Annotation - Kevin Livingston

1111

CRAFT:Colorado Richly Annotated Full Text

corpushttp://bionlp-corpora.sourceforge.net/CRAFT/

• 67 full text articles (+30 more reserved for future testing)

• >560,000 Tokens• >21,000 Sentences

• ~100,000 concept annotations to7 different biomedical ontologies/terminologies

• Penn Treebank markup for each sentence

• Multiple output formats available• Integrated with UIMA

Page 12: Biomedical Annotation - Kevin Livingston

1212

CRAFT Annotation

proteinbinding

biologicalregulation

transcriptionproteinDNA

hemopoiesis

transcriptioncoactivatoractivity

transcriptioncorepressoractivity

regulates

results inregulation by

results in regulation byentity that has function

has agent

results ininteraction of

GO BP SOGO MFCHEBI relation

Hematopoiesis is precisely orchestrated by lineage-specific DNA-binding proteins that regulate transcription

in concert with coactivators and corepressors.

Page 13: Biomedical Annotation - Kevin Livingston

1313

Applications: Annotating

Page 14: Biomedical Annotation - Kevin Livingston

14

Compositional Annotation& Knowledge

vertebrate pigmentati

onsubClassOf

occurs_in

GO:0043474

pigmentation

TAXON:7742Vertebrata

text annotation 3

denotes

hasBodyhasBodytext annotation

2text annotation

1

basedOn

basedOn

CRAFTPMID:147371

83 hasTarget

hasTarget

Page 15: Biomedical Annotation - Kevin Livingston

1515

Summary• Model that covers syntactic and semantic

annotation– Linguistic annotation– Semantic annotation– Entity-based annotation

• Capture complex content that is not necessarily best represented via a single URI– Created a GraphAnnotation

that denotes a RDF named graph

• Add kiao:basedOn to enable annotation compositions and provenance tracking– Annotation-level– Assertion-level

Page 16: Biomedical Annotation - Kevin Livingston

16

Acknowledgements

University of Colorado:• Hunter Lab

– Larry Hunter– Mike Bada– Bill Baumgartner– Chris Roeder– Kevin Cohen– Carsten Goerg

• National ICT Australia– Karin Verspoor

• Funding:– NIH/NLM training grant– Andrew W. Mellon Foundation

Page 17: Biomedical Annotation - Kevin Livingston

Kevin Livingston, Ph.D.Postdoctoral FellowPharmacology Department, School of MedicineUniversity of Colorado Anschutz Medical Campus

[email protected]://compbio.ucdenver.edu/Hunter_lab/Livingston

Biomedical Annotation