ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

29
The Gene Wiki: Crowdsourcing human gene annotation Andrew Su, Ph.D. Department of Molecular and Experimental Medicine The Scripps Research Institute Biocuration 2012 April 2, 2012

description

some animations don't adapt well to static slides -- download the ppt file to view...

Transcript of ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

Page 1: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

The Gene Wiki: Crowdsourcing human gene annotation

Andrew Su, Ph.D.Department of Molecular and Experimental Medicine

The Scripps Research Institute

Biocuration 2012

April 2, 2012

Page 2: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

The Long Tail is a prolific source of content2

ShortHead

Long Tail

Content produced

Contributors (sorted)

News :Video:

Product reviews:Food reviews:Talent judging:

Gene annotation:

NewspapersTV/Hollywood

Consumer reportsFood criticsOlympics

Manual curation

BlogsYouTube

Amazon reviewsYelp

American IdolGene Wiki

Page 3: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

3

We can harness the Long Tail of scientists to directly participate in

the gene annotation process.

Page 4: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

Wikipedia is reasonably accurate4

Page 5: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

Wikipedia has breadth and depth5

http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons, July 2008

Articles

Words(millions)

Wikipedia Britannica Online

Page 6: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

Filtering, extracting, and summarizing PubMed

Documents

Concepts

Page 7: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

Wiki success depends on a positive feedback7

Gene wiki page utility

Number ofusers

Number ofcontributors

1001

2002

Page 8: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

10,000 gene “stubs” within Wikipedia8

Protein structure

Symbols and identifiers

Tissue expression pattern

Gene Ontology annotations

Links to structured databases

Gene summary

Protein interactions

Linked references

Huss, PLoS Biol, 2008

Utility

Users

Contributors

Page 9: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

Gene Wiki has a critical mass of readers9

Total: ~4.3 million views / month

Huss, PLoS Biol, 2008; Good, NAR, 2011

Utility

Users

Contributors

Page 10: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

Gene Wiki has a critical mass of editors10

Good, NAR, 2011

Utility

Users

Contributors

Cum

ulat

ive

edits

Productive edits

Vandalism

~10,000 words added / month

4.3 million views / month

1000 edits / month

Total 1.42 million words ≈ 230 full-length articles

Page 11: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

A review article for every gene is powerful11

Hyperlinks to related concepts

References to the literature

Reelin: 68 editors, 543 edits since July 2002

Heparin: 175 editors, 320 edits since June 2003

AMPK: 44 editors, 84 edits since March 2004

RNAi: 232 editors, 708 edits since October 2002

Page 12: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

Making the Gene Wiki more computable12

Structured annotationsFree text

Page 13: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

Filling the gaps in gene annotation13

Wikilink

GO exact synonym

Gene Wiki mapping

NCBI Entrez Gene: 3362

GO:0004993

Candidate assertion

Page 14: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

Filling the gaps in gene annotation14

Wikilink

GO exact match

Gene Wiki mapping

NCBI Entrez Gene: 334

GO:0006897

Candidate assertion

Page 15: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

Disease associations mined from the Gene Wiki

2147 candidate

annotations

Gene Wiki Articles (10,271)

Filter out seeded text

NCBO Annotator

Compare to DO database

Matched Disease Ontology terms

(2983)

70% have no match

2% match child

23% exact match

5% match parent

Good, BMC Genomics 2011, 12:603

Page 16: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

Disease associations mined from the Gene Wiki

Expert curation

Correct86%

Maybe: 4%

Incorrect: 10%

Overall specificity: 90-93%

Good, BMC Genomics 2011, 12:603

Page 17: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

GO associations mined from the Gene Wiki

6319 candidate

annotations

Gene Wiki Articles (10,271)

Filter out seeded text

NCBO Annotator

Compare to GO database

Matched Gene Ontology terms

(11,022)

55% have no match

2% match child

17% exact match

26% match parent

Good, BMC Genomics 2011, 12:603

Page 18: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

GO associations mined from the Gene Wiki

Expert curation

Correct

Maybe

Incorrect 60%

Overall specificity: 48-64%

26%

14%

Good, BMC Genomics 2011, 12:603

Page 19: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

Common sources of error in GO associations19

OR2F1: “Olfactory receptors … are responsible for the recognition and G protein-mediated transduction of odorant signals.”

1) Incorrect concept recognition

Transduction (GO:0009293)

The transfer of genetic information to a bacterium from a bacteriophage or between bacterial or yeast cells mediated by a phage vector.

Signal transduction (GO:0007165)

The cellular process in which a signal is conveyed to trigger a change in the activity or state of a cell. Signal transduction begins with reception of a signal, e.g. a ligand binding to a receptor or receptor activation by a stimulus such as light, and ends with regulation of a downstream cellular process…

Good, BMC Genomics 2011, 12:603

Page 20: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

Common sources of error in GO associations20

MEF2C: “Several post translational modifications have been identified including phosphorylation on serine-59 …”

2) Incorrect sentence context

DephosphorylationExcretionGene expressionGlycosylationLocalizationMethylationProteolysisSecretionTransportTranscriptionTranslation

MEF2C

Myelination

Phosporylation

Neurogenesis

Good, BMC Genomics 2011, 12:603

Page 21: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

Novel GO annotations – so what?21

11,022 annotations mined from Gene Wiki

4703 (43%) match known annotations

~100,000 annotations

from GO consortium

6319 “novel”

annotations @ 48-64% specificity

Page 22: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

Gene Wiki content improves enrichment analysis22

GO term

Gene listConcept

recognitionPubMed abstracts

Enrichment analysis

GO:0007411

axon guidance

(GO:0007411)

264 genes

Linked genes through PubMed

P = 1.55 E-20

811 articles

Yes No

Yes 13 2

No 251 12033

Page 23: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

Gene Wiki content improves enrichment analysis23

GO term

Gene listConcept

recognitionPubMed abstracts

Gene Wiki

+

Enrichment analysis

GO:0006936 GO:0006936

muscle contraction

(GO:0006936)

87 genes

Linked genes through PubMed

Linked genes through

PubMed + Gene Wiki

P = 1.0 P = 1.22 E-09

251 articles

87 articles

Page 24: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

Gene Wiki content improves enrichment analysis24

p-value (PubMed only)

p-value (PubMed + GW)

Muscle contraction

More significant

PubMed + GW

More significant

PubMed only

Page 25: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

Challenges and future directions

• How to complement and integrate with traditional biocuration workflows?

• How to disseminate and utilize crowdsourced annotations?

25

Page 26: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

The Long Tail of scientists is a valuable source of

information on gene function

26

Page 27: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

27

Doug Howe, ZFINJohn Hogenesch, U PennJon Huss, GNFLuca de Alfaro, UCSCAngel Pizzaro, U PennFaramarz Valafar, SDSUPierre Lindenbaum,

Fondation Jean DaussetMichael Martone, RushKonrad Koehler, Karo BioWarren Kibbe, Simon Lim, NorthwesternMany Wikipedia editors

WP:MCB Project

Collaborators

Erik ClarkeBen Good (*)Salvatore Loguercio

Ian MacleodChunlei Wu

Group members

Funding and Support

(BioGPS: GM83924, Gene Wiki: GM089820)

Contacthttp://sulab.org

[email protected]@andrewsu+Andrew Su

See poster # 30 for more on the Gene Wiki and

crowdsourcing in biology!

Page 28: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

Making the Gene Wiki more reliable28

The company name is derived from old Greek, and means

"destroyer of birds".

Novartis is a multinational pharmaceutical company

based in Basel, Switzerland that manufactures drugs such

as clozapine (Clozaril), diclofenac (Voltaren), …

2

2

Page 29: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

Making the Gene Wiki more reliable29

http://www.wikitrust.net/

The company name is derived from old Greek, and means

"destroyer of birds".

Novartis is a multinational pharmaceutical company

based in Basel, Switzerland that manufactures drugs such

as clozapine (Clozaril), diclofenac (Voltaren), …

*

36211 total edits 36 total edits

High-trust author Low-trust author

******

** *

*

*

**

2