Using Citizen Science to organize biomedical knowledge
Transcript of Using Citizen Science to organize biomedical knowledge
Using Citizen Science to
organize biomedical
knowledge
Andrew Su, Ph.D.@andrewsu
http://sulab.org
March 5, 2015
Future of Genomic Medicine
Slides posted at slideshare.net/andrewsu
2
Candidate genes
FLNB
CTNNB1
EPHA3
SMAD3
XPO1
RPS27
FLCN
ATR
FLT3
BRD2
ERG
RAF1
EGFR
ERBB4
RARA
JAK3
LRP1
WT1
PML
SMARCA4
…
The biomedical literature is growing fast…3
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1983 1988 1993 1998 2003 2008 2013
Number of new PubMed-indexed articles
… but it is very hard to query and compute5
Imatinib
Crizotinib
Erlotinib
Gefitinib
Sorafenib
Lapatinib
Dasatinib
…
Acute myeloid leukemia
Acute lymphoblastic leukemia
Chronic myelogenous leukemia
Chronic lymphocytic leukemia
Hodgkin lymphoma
Non-Hodgkin lymphoma
Myeloma
…
AND
6
Pathways
Diseases
Proteins
Variants
Genes
Drugs
Goal: Assemble a network of biomedical
knowledge that is comprehensive,
current, computable and traceable.
Information Extraction7
1. Identify high level concepts in text
2. Identify relationships between concepts
8
Doğan and Lu. Proceedings of the 2012 Workshop on BioNLP, 2012, 91-9.
NCBI Disease Corpus
593 PubMed abstracts 12 expert annotators
(2 per document)
6,900 “disease concept” mentions
Question: Can a group of non-scientists
collectively perform concept recognition in
biomedical texts?
9
Experimental design
Task: Identify the “disease concepts” in
the 593 abstracts from the NCBI disease
corpus
– $0.06 per Human Intelligence Task (HIT)
– HIT = annotate one abstract from PubMed
– 15 workers annotate each abstract
11
Comparison to gold standard12
K = 6
F score = 0.87
• 593 documents
• 15 users / doc
• 9 days
• 145 workers
• $630.96
Precision
Recall
Comparisons to human annotators14
Average level of
agreement
between expert
annotators
(stage 1)
F = 0.76
Comparisons to human annotators15
F = 0.76F = 0.87
Average level of
agreement
between expert
annotators
(stage 2)
Does Mechanical Turk scale?16
1,000,000 articles per year
10 annotators / article
4 tasks / doc
$0.06 / task
$ 2,400,000 / year
Question: Can a group of non-scientists
collectively perform concept recognition in
biomedical texts ?
17
and will they do
it for free?
^
Mark2Cure Campaign #0
• Goal: replicate the NCBI disease corpus
– 593 documents, 15x redundancy
• Launched Jan 19, 2015
• Completed Feb 16, 2015
19
– 4 weeks
– 10,275 document
annotation events
– 212 unique users
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Comparison to gold standard20
k = 6
F score = 0.84
PrecisionRecall
Voting threshold
Total cost: $0
Does Citizen Science scale?21
1,000,000 articles * 10 AE / article 15,828
volunteers
needed
10,275 AE * 365 days
212 annotators* 28 days
AE = Annotation events
=
Number of annotation
events per year
Number of annotation
events per year
per volunteer
Does Citizen Science scale?22
15,828
volunteers
needed
175,000
volunteers
300,000
volunteers
37,000
volunteers
1,000,000
volunteers
Annotating the relationships23
This molecule inhibits the growth of a broad
panel of cancer cell lines, and is particularly
efficacious in leukemia cells, including
orthotopic leukemia preclinical models as
well as in ex vivo acute myeloid leukemia
(AML) and chronic lymphocytic leukemia
(CLL) patient tumor samples. Thus, inhibition
of CDK9 may represent an interesting
approach as a cancer therapeutic target
especially in hematologic malignancies.
therapeutic target
subjectpredicate
object
GENE
DISEASE
24
Candidate genes
FLNB
CTNNB1
EPHA3
SMAD3
XPO1
RPS27
FLCN
ATR
FLT3
BRD2
ERG
RAF1
EGFR
ERBB4
RARA
JAK3
LRP1
WT1
PML
SMARCA4
…
25
Cyrus Afrasiabi
Sebastian Burgstaller
Ramya Gamini
Louis Gioia
Salvatore Loguercio
Adam Mark
Erick Scott
Greg Stupp
Andra Waagmeester
Kevin Xin
Other group members
Contact
http://sulab.org
@andrewsu
+Andrew Su
Mark2Cure
Ben Good
Max Nanis
Ginger Tsueng
Chunlei Wu
All Mark2Curators!
Funding and Support
BioGPS: GM83924
Gene Wiki: GM089820
BD2K Center of Excellence: GM114833
Icon credits (Noun Project, Wikimedia Commons): Zach VanDeHey, hunotika, Viktorvoigt, Alberto Rojas, Lloyd Humphreys
Matt and Cristina Might
NGLY1 community
Why do I Mark2Cure?26
I am retired, have a doctorate in
medical humanities, and have two
children with Gaucher disease. I am
just looking for some way to put my
education to use.
My 4 year old daughter Phoebe is
living with and battling rare
disease.
I have Ehlers Danlos Syndrome. I hope to help people
learn about this painful and debilitating disorder, so that
others like me can receive more effective medical care.
Take part in
something that
helps humanity.
I Mark2Cure in memory of
my son Mike who had type 1
diabetes.
Studied biology in
college and I really
miss it!
In memory of my daughter
who had Cystic Fibrosis
To give back