Literature mining: what is it, and should I care?

Post on 01-Nov-2014

1.759 views 0 download

Tags:

description

EMBL Lab Day, European Molecular Biology Laboratory, Heidelberg, Germany, June 10, 2008

Transcript of Literature mining: what is it, and should I care?

Literature mining

Explosion

exponential increase

some things never change

“graph calculus”

=

~50 seconds per paper

Information retrieval

find the relevant papers

ad hoc retrieval

user-specified query

“yeast AND cell cycle”

stemming

yeast / yeasts

dynamic query expansion

yeast / S. cerevisiae

ranking

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

no tool will find it

Entity recognition

identify the substance(s)

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

good synonyms list

orthographic variation

CDC28

Cdc28p

disambiguation

Cdc2

APC

still too much to read

Information extraction

formalize the facts

co-mentioning

NLPNatural Language Processing

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

database

integration

STRING & STITCH

Acknowledgments

STRING & STITCH– Christian von Mering

– Michael Kuhn

– Manuel Stark

– Samuel Chaffron

– Philippe Julien

– Tobias Doerks

– Jan Korbel

– Berend Snel

– Martijn Huynen

– Peer Bork

The movie “Brazil”

Reflect– Evangelos Pafilis

– Michael Kuhn

– Heiko Horn

– Peer Bork

– Sean O’Donoghue

– Reinhardt Schneider

NLP pipeline– Jasmin Saric

– Rossitza Ouzounova

– Isabel Rojas

– Peer Bork