Biomedical Annotation - Kevin Livingston
Transcript of Biomedical Annotation - Kevin Livingston
Kevin Livingston, Ph.D.Postdoctoral FellowPharmacology Department, School of MedicineUniversity of Colorado Anschutz Medical Campus
[email protected]://compbio.ucdenver.edu/Hunter_lab/Livingston
Biomedical Annotation
22
Biomedical researchers are interested in
understanding their data in the context of
all known background knowledge:curated databases & literature.
3
Pubmed Growth Rate1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
0
100
200
300
400
500
600
700
800
900
1000
1100
0
5
10
15
20
25
f(x) = 321.698053474767 exp( 0.0401516775647944 x )R² = 0.943742964334693
f(x) = 7.85609625880221 exp( 0.0404790715086599 x )R² = 0.99939550228951
973,499 PubMed entries in 2011 (>2,600 per day)
New
En
trie
s (t
ho
usa
nd
s)
To
tal E
ntr
ies
(mill
ion
s)
2 journal articles
per minute
!
44
Biomedical Data Sources
1,380 Databas
es in 2012
Total Manual GO Annotations:1,116,848
Total GO Annotations:132,425,702
PubMed Articles Referenced:
94,518
55
Annotation Consumers?• The linguistic community typically uses
annotation as training data or for specific tasks– An abundance of tools that can produce
annotations in the specific format of those resources
– Tools for computational linguistics
• Biomedical annotation typically used for curating, indexing, or enrichment analysis
• But what about re-using annotations and tools in other contexts and for other purposes?
66
7
Vision
Texts
Knowledge Base
Text Mining
Ontologies
DBs
Intelligent
Applications
88
Applications: Gene Centric
99
Applications: Document Centric
1010
Annotation for Computation
• Computer understandable• Composable• Provenance of compositions
traceable
1111
CRAFT:Colorado Richly Annotated Full Text
corpushttp://bionlp-corpora.sourceforge.net/CRAFT/
• 67 full text articles (+30 more reserved for future testing)
• >560,000 Tokens• >21,000 Sentences
• ~100,000 concept annotations to7 different biomedical ontologies/terminologies
• Penn Treebank markup for each sentence
• Multiple output formats available• Integrated with UIMA
1212
CRAFT Annotation
proteinbinding
biologicalregulation
transcriptionproteinDNA
hemopoiesis
transcriptioncoactivatoractivity
transcriptioncorepressoractivity
regulates
results inregulation by
results in regulation byentity that has function
has agent
results ininteraction of
GO BP SOGO MFCHEBI relation
Hematopoiesis is precisely orchestrated by lineage-specific DNA-binding proteins that regulate transcription
in concert with coactivators and corepressors.
1313
Applications: Annotating
14
Compositional Annotation& Knowledge
vertebrate pigmentati
onsubClassOf
occurs_in
GO:0043474
pigmentation
TAXON:7742Vertebrata
text annotation 3
denotes
hasBodyhasBodytext annotation
2text annotation
1
basedOn
basedOn
CRAFTPMID:147371
83 hasTarget
hasTarget
1515
Summary• Model that covers syntactic and semantic
annotation– Linguistic annotation– Semantic annotation– Entity-based annotation
• Capture complex content that is not necessarily best represented via a single URI– Created a GraphAnnotation
that denotes a RDF named graph
• Add kiao:basedOn to enable annotation compositions and provenance tracking– Annotation-level– Assertion-level
16
Acknowledgements
University of Colorado:• Hunter Lab
– Larry Hunter– Mike Bada– Bill Baumgartner– Chris Roeder– Kevin Cohen– Carsten Goerg
• National ICT Australia– Karin Verspoor
• Funding:– NIH/NLM training grant– Andrew W. Mellon Foundation
Kevin Livingston, Ph.D.Postdoctoral FellowPharmacology Department, School of MedicineUniversity of Colorado Anschutz Medical Campus
[email protected]://compbio.ucdenver.edu/Hunter_lab/Livingston
Biomedical Annotation