Gene annotation games
-
Upload
goodb -
Category
Technology
-
view
968 -
download
2
description
Transcript of Gene annotation games
Benjamin Good*, Salvatore Loguercio, Andrew Su
The Scripps Research Institute
http://genegames.org
April 20, 2012
7th Annual Systems to Synthesis Symposium at the Salk Institute
GAMES FOR HUMAN GENE ANNOTATION
WHY GAMES?
Von Ahn L. : Google Tech Talk: Human Computation 2006.
It is estimated that 9 billion hours are spent playing Solitaire every year
Seven million hours of human labor
Empire State Building
ONE YEAR SOLITAIRE = 1,285 EMPIRE STATE BUILDINGS
Von Ahn L. : Google Tech Talk: Human Computation 2006.
150 billion hours
McGonigal J. Reality is broken : why games make us better and how they can change the world . New York: Penguin Press; 2011.
0
40000000000
80000000000
120000000000
160000000000
7M 9B 150B
GAMES WITH A PURPOSE
Devise protein folding algorithms
Fix multiple sequence alignmentsDesign RNA molecules
Label all images on the Web
Annotate all human genes
ANNOTATE ALL HUMAN GENES
Record the relevant properties of each gene in a manner that facilitates computation
• biological process• molecular function• cellular localization• interaction partners• disease relevance• genomic location• genetic variations• post translational modifications• related drugs• related publications• ...
Gene
BUILDING AN ANNOTATION
image credits:phillipmartin.infowikipedia.org/wiki/Manuscriptbeyondcomputingmag.com/
Gene Biological process, disease etc.
1. do science2. publish it
3. curate the knowledge
WHY DOES HE LOOK SO TIRED?
MANY SCIENTISTS, POWERFUL TOOLS
2000200120022003200420052006200720082009201020112012
500000
550000
600000
650000
700000
750000
800000
850000
900000
950000
1000000
Number ar-ticles added to PubMed
GROWTH OF POTENTIAL ANNOTATIONS
112 publications/hour(37 more by the end of this talk)
HOW DO WE INVOLVE THE COMMUNITY IN GENE ANNOTATION?
HOW DO WE INVOLVE THE COMMUNITY IN GENE ANNOTATION?
Make it fun!
LINK GENES TO DISEASES WITH DIZEEZ
If its ‘right’, you get points
then on to the next question
Click the related disease
hurry!
DIZEEZ IS FUN.. TO SOME PEOPLE
• Advertised with a blog post, a few tweets and conference poster
• Results since Dec. 2011:
• 180 people have played it
• 713 one minute game rounds have been completed
• 4,585 distinct gene-disease associations collected
QUALITY THROUGH REPLICATION
4,585Distinct gene-disease pairs collected
example: ABCB5 Acute myeloid leukemia
collected more than once482 Potential new annotations
(do not appear in OMIM, PharmGKB 224
ABCB5 IS RELATED TO ACUTE MYELOID LEUKEMIA
PROBLEMS WITH DIZEEZ
• Dizeez actually punishes desired behavior (adding new, unknown associations) by not awarding points
• Does not allow player to enter associations other than those in the provided list
• GenESP fixes both problems
(modeled after the ESP Game). See: Ahn and Dabbish (2004) Labeling images with a computer game, SIGCHI
NO DATA YET, PLAY NOW!
http://genegames.org
A RE-USABLE PATTERN
Gene Disease
Gene Function
Gene Gene
Gene Generelationship
cancer normal
find patterns
make predictions on new samples
GENOMIC PREDICTORS
cancer
normal
THE TRICK IS TO FIND THE RIGHT COMBINATION
Out of the 25,000+ genes, which work together the best?
they always find patterns, hard to know when they are real
Purely computational approaches have trouble generalizing
NETWORK GUIDED FOREST (NGF)
Dutkowski & Ideker (2011) Protein Networks as Logic Functions in Development in Development and Cancer. PLoS Computational Biology
Use network to find good gene combinations
THE TRICK IS TO FIND THE RIGHT COMBINATION
‘COMBO’: FIND THE RIGHT COMBINATION OF GENES TO BUILD A PHENOTYPE PREDICTOR
HUMAN GUIDED FOREST (HGF)
http://i9606.blogspot.com/2012/04/human-guided-forests-hgf.html
Let COMBO players build decision modules
150 billion hours...
McGonigal J. Reality is broken : why games make us better and how they can change the world . New York: Penguin Press; 2011.
0
40000000000
80000000000
120000000000
160000000000
7M 9B 150B
THE END
More information at:http://genegames.org
http://sulab.org/
our poster!
Thanks to:
Andrew SuSalvatore Loguercio
GO IS NOT KEEPING UP
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 20110
200000
400000
600000
800000
1000000
1200000
articles indexed in pubmedGO annotations created
112 publications/hour(37 more by the end of this talk)
>21 milliontotal
Annotate all the images on the web
dog drinking sprinklerfirehose
ANOTHER MAJOR ANNOTATION PROBLEM
A SUCCESSFUL MODEL
ESP GAME RESULTS
first 3 months (2003)
• 13,630 players added 1,271,451 labels to 293,760 images
• became http://images.google.com/imagelabeler/
since scaled up to hundred thousand+ players and 10’s of millions of images labeled.
Ahn and Dabbish (2004) Labeling images with a computer game, SIGCHI