Gene annotation games

34
Benjamin Good*, Salvatore Loguercio, Andrew Su The Scripps Research Institute http://genegames.org April 20, 2012 7 th Annual Systems to Synthesis Symposium at the Salk Institute GAMES FOR HUMAN GENE ANNOTATION

description

Talk at the Salk Institute's 2012 Systems to Synthesis Symposium. Discusses the use of online games with the purpose of annotating the human genome and building better phenotype predictors.

Transcript of Gene annotation games

Page 1: Gene annotation games

Benjamin Good*, Salvatore Loguercio, Andrew Su

The Scripps Research Institute

http://genegames.org

April 20, 2012

7th Annual Systems to Synthesis Symposium at the Salk Institute

GAMES FOR HUMAN GENE ANNOTATION

Page 2: Gene annotation games

WHY GAMES?

Von Ahn L. : Google Tech Talk: Human Computation 2006.

It is estimated that 9 billion hours are spent playing Solitaire every year

Page 3: Gene annotation games

Seven million hours of human labor

Empire State Building

ONE YEAR SOLITAIRE = 1,285 EMPIRE STATE BUILDINGS

Von Ahn L. : Google Tech Talk: Human Computation 2006.

Page 4: Gene annotation games

150 billion hours

McGonigal J. Reality is broken : why games make us better and how they can change the world . New York: Penguin Press; 2011.

0

40000000000

80000000000

120000000000

160000000000

7M 9B 150B

Page 5: Gene annotation games

GAMES WITH A PURPOSE

Devise protein folding algorithms

Fix multiple sequence alignmentsDesign RNA molecules

Label all images on the Web

Page 6: Gene annotation games

Annotate all human genes

Page 7: Gene annotation games

ANNOTATE ALL HUMAN GENES

Record the relevant properties of each gene in a manner that facilitates computation

• biological process• molecular function• cellular localization• interaction partners• disease relevance• genomic location• genetic variations• post translational modifications• related drugs• related publications• ...

Gene

Page 8: Gene annotation games

BUILDING AN ANNOTATION

image credits:phillipmartin.infowikipedia.org/wiki/Manuscriptbeyondcomputingmag.com/

Gene Biological process, disease etc.

1. do science2. publish it

3. curate the knowledge

Page 9: Gene annotation games

WHY DOES HE LOOK SO TIRED?

Page 10: Gene annotation games

MANY SCIENTISTS, POWERFUL TOOLS

Page 11: Gene annotation games

2000200120022003200420052006200720082009201020112012

500000

550000

600000

650000

700000

750000

800000

850000

900000

950000

1000000

Number ar-ticles added to PubMed

GROWTH OF POTENTIAL ANNOTATIONS

112 publications/hour(37 more by the end of this talk)

Page 12: Gene annotation games
Page 13: Gene annotation games

HOW DO WE INVOLVE THE COMMUNITY IN GENE ANNOTATION?

Page 14: Gene annotation games

HOW DO WE INVOLVE THE COMMUNITY IN GENE ANNOTATION?

Make it fun!

Page 15: Gene annotation games

LINK GENES TO DISEASES WITH DIZEEZ

If its ‘right’, you get points

then on to the next question

Click the related disease

hurry!

Page 16: Gene annotation games

DIZEEZ IS FUN.. TO SOME PEOPLE

• Advertised with a blog post, a few tweets and conference poster

• Results since Dec. 2011:

• 180 people have played it

• 713 one minute game rounds have been completed

• 4,585 distinct gene-disease associations collected

Page 17: Gene annotation games

QUALITY THROUGH REPLICATION

4,585Distinct gene-disease pairs collected

example: ABCB5 Acute myeloid leukemia

collected more than once482 Potential new annotations

(do not appear in OMIM, PharmGKB 224

Page 18: Gene annotation games

ABCB5 IS RELATED TO ACUTE MYELOID LEUKEMIA

Page 19: Gene annotation games

PROBLEMS WITH DIZEEZ

• Dizeez actually punishes desired behavior (adding new, unknown associations) by not awarding points

• Does not allow player to enter associations other than those in the provided list

• GenESP fixes both problems

Page 20: Gene annotation games

(modeled after the ESP Game). See: Ahn and Dabbish (2004) Labeling images with a computer game, SIGCHI

Page 21: Gene annotation games

NO DATA YET, PLAY NOW!

http://genegames.org

Page 22: Gene annotation games

A RE-USABLE PATTERN

Gene Disease

Gene Function

Gene Gene

Gene Generelationship

Page 23: Gene annotation games

cancer normal

find patterns

make predictions on new samples

GENOMIC PREDICTORS

cancer

normal

Page 24: Gene annotation games

THE TRICK IS TO FIND THE RIGHT COMBINATION

Out of the 25,000+ genes, which work together the best?

they always find patterns, hard to know when they are real

Purely computational approaches have trouble generalizing

Page 25: Gene annotation games

NETWORK GUIDED FOREST (NGF)

Dutkowski & Ideker (2011) Protein Networks as Logic Functions in Development in Development and Cancer. PLoS Computational Biology

Use network to find good gene combinations

Page 26: Gene annotation games

THE TRICK IS TO FIND THE RIGHT COMBINATION

Page 27: Gene annotation games

‘COMBO’: FIND THE RIGHT COMBINATION OF GENES TO BUILD A PHENOTYPE PREDICTOR

Page 28: Gene annotation games

HUMAN GUIDED FOREST (HGF)

http://i9606.blogspot.com/2012/04/human-guided-forests-hgf.html

Let COMBO players build decision modules

Page 29: Gene annotation games

150 billion hours...

McGonigal J. Reality is broken : why games make us better and how they can change the world . New York: Penguin Press; 2011.

0

40000000000

80000000000

120000000000

160000000000

7M 9B 150B

Page 30: Gene annotation games

THE END

More information at:http://genegames.org

http://sulab.org/

[email protected]

our poster!

Thanks to:

Andrew SuSalvatore Loguercio

Page 31: Gene annotation games

GO IS NOT KEEPING UP

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 20110

200000

400000

600000

800000

1000000

1200000

articles indexed in pubmedGO annotations created

112 publications/hour(37 more by the end of this talk)

>21 milliontotal

Page 32: Gene annotation games

Annotate all the images on the web

dog drinking sprinklerfirehose

ANOTHER MAJOR ANNOTATION PROBLEM

Page 33: Gene annotation games

A SUCCESSFUL MODEL

Page 34: Gene annotation games

ESP GAME RESULTS

first 3 months (2003)

• 13,630 players added 1,271,451 labels to 293,760 images

• became http://images.google.com/imagelabeler/

since scaled up to hundred thousand+ players and 10’s of millions of images labeled.

Ahn and Dabbish (2004) Labeling images with a computer game, SIGCHI