Wikipedia as an engine for scientific communication and collaboration at massive scale

19
Wikipedia as an engine for scientific communication and collaboration at massive scale Andrew Su, Ph.D. @andrewsu [email protected] http://sulab.org ScienceWriters2012 October 27, 2012 OK OK

description

Talk given at Science Writers 2012 in Raleigh, NC on October 27, 2012.

Transcript of Wikipedia as an engine for scientific communication and collaboration at massive scale

Page 1: Wikipedia as an engine for scientific communication and collaboration at massive scale

Wikipedia as an engine for scientific communication and

collaboration at massive scale

Andrew Su, Ph.D.@andrewsu

[email protected]://sulab.org

ScienceWriters2012

October 27, 2012

OK

OK

Page 2: Wikipedia as an engine for scientific communication and collaboration at massive scale

1979

1980

1981

1982

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

0

200,000

400,000

600,000

800,000

1,000,000

Number of PubMed-indexed articles

The biomedical literature is growing rapidly 2

Page 3: Wikipedia as an engine for scientific communication and collaboration at massive scale

The biomedical literature is growing rapidly 3

0

1 0

2 0

Average capacity of human scientistNumber of articles read by typical scientist

Page 4: Wikipedia as an engine for scientific communication and collaboration at massive scale

High-throughput molecular profiling is powerful4

Testable hypothesis

~20,000 genes 100+ candidates 10+ experiments

Page 5: Wikipedia as an engine for scientific communication and collaboration at massive scale

Filtering, extracting, and summarizing PubMed

Documents

Concepts Review article

Page 6: Wikipedia as an engine for scientific communication and collaboration at massive scale

Filtering, extracting, and summarizing PubMed

Documents

Concepts

Page 7: Wikipedia as an engine for scientific communication and collaboration at massive scale

10k gene “stubs” within Wikipedia ≈ “Gene Wiki”7

Protein structure

Symbols and identifiers

Tissue expression pattern

Gene Ontology annotations

Links to structured databases

Gene summary

Protein interactions

Linked references

Huss, PLoS Biol, 2008

Page 8: Wikipedia as an engine for scientific communication and collaboration at massive scale

Gene Wiki has a critical mass of readers8

Rank 1-10: Laypeople

InsulinTitin

Human chorionic gonadotropinVasopressin

ANKHCLOCKCatalase

ErythropoietinGlucagon

Parathyroid hormone

Rank 1001-1010: Specialists

CSDACNTNAP2

IGSF8Adenosine A3 receptor

RYR1ETV6

Small heterodimer partner5-HT1D receptor

TRPC6Interleukin-6 receptor

Rank 101-110: Scientists

Tau proteinInterleukin 10

APCC-Met

Factor VInterleukin 8

CD44Histamine H1 receptorKappa Opioid receptor

Dihydrofolate reductase

Total: 4.0 million views / month

Huss, PLoS Biol, 2008; Huss, NAR, 2010; Good, NAR, 2011

Page 9: Wikipedia as an engine for scientific communication and collaboration at massive scale

Gene Wiki has a critical mass of readers9

Huss, PLoS Biol, 2008; Huss, NAR, 2010; Good, NAR, 2011

Page 10: Wikipedia as an engine for scientific communication and collaboration at massive scale

Gene Wiki has a critical mass of editors10

Increase of ~10,000 words / month from >1,000 editsCurrently 1.42 million words

Approximately equal to 230 full-length articles

Edi

tor

coun

t Editors

Edits Edi

t co

unt

Huss, NAR, 2010; Good, NAR, 2011

Page 11: Wikipedia as an engine for scientific communication and collaboration at massive scale

A review article for every gene is powerful11

References to the literature

Hyperlinks to related conceptsReelin: 98 editors, 703 edits since July 2002

Heparin: 358 editors, 654 edits since June 2003

AMPK: 109 editors, 203 edits since March 2004

RNAi: 394 editors, 994 edits since October 2002

Page 12: Wikipedia as an engine for scientific communication and collaboration at massive scale

The Gene Wiki is timely and current12

Manny Ramirez suspended for doping

Catalase linked to premature gray hair

Also, MGAT2 (obesity), ALDH2 (heart attack), SOX21 (hair loss), SATB1 (breast cancer), TSLP (asthma), CCR5 (HIV), …

Huss, NAR, 2010

Page 13: Wikipedia as an engine for scientific communication and collaboration at massive scale

The Gene Wiki is (reasonably) reliable13

Good edits

VandalismCum

ulat

ive

edits

Date

Per edit probability

98.9%

1.1%

Average lifetime

115.4 d

3.4 d

Probability by time

99.968%

0.032%

(0.63% for WP overall)

Good, NAR, 2011

Page 14: Wikipedia as an engine for scientific communication and collaboration at massive scale

Making the Gene Wiki more reliable14

The company name is derived from old Greek, and means

"destroyer of birds".

Novartis is a multinational pharmaceutical company

based in Basel, Switzerland that manufactures drugs such

as clozapine (Clozaril), diclofenac (Voltaren), …

2

2

http://www.wikitrust.net/Good, NAR, 2011

Page 15: Wikipedia as an engine for scientific communication and collaboration at massive scale

Making the Gene Wiki more reliable15

http://www.wikitrust.net/

The company name is derived from old Greek, and means

"destroyer of birds".

Novartis is a multinational pharmaceutical company

based in Basel, Switzerland that manufactures drugs such

as clozapine (Clozaril), diclofenac (Voltaren), …

*

36211 total edits 36 total edits

High-trust author Low-trust author

******

** *

*

*

**

2

Good, NAR, 2011

Page 16: Wikipedia as an engine for scientific communication and collaboration at massive scale

Partnering with traditional scientific publishing16

Page 17: Wikipedia as an engine for scientific communication and collaboration at massive scale

Partnering with traditional scientific publishing17

Page 18: Wikipedia as an engine for scientific communication and collaboration at massive scale

Partnering with traditional scientific publishing18

Page 19: Wikipedia as an engine for scientific communication and collaboration at massive scale

19

Doug Howe, ZFINJohn Hogenesch, U PennJon Huss, GNFLuca de Alfaro, UCSCAngel Pizzaro, U PennFaramarz Valafar, SDSUPierre Lindenbaum,

Fondation Jean DaussetMichael Martone, RushKonrad Koehler, Karo BioWarren Kibbe, Simon Lim, NorthwesternMany Wikipedia editors

WP:MCB Project

Collaborators

Ben GoodSalvatore LoguercioIan Macleod

Max NanisChunlei Wu

Group members

Funding and Support

(BioGPS: GM83924, Gene Wiki: GM089820)

Contacthttp://sulab.org

[email protected]@andrewsu+Andrew Su

http://slideshare.com/andrewsu