Ismb2009

63
Some Experiments in Semantic Enrichment From Proteins to Hypotheses: Anita de Waard Principal Researcher, Disruptive Technologies Elsevier Labs, Amsterdam Utrecht Institute of Linguistics, University Utrecht The Netherlands [email protected]

Transcript of Ismb2009

Page 1: Ismb2009

Some Experiments in Semantic Enrichment

From Proteins to Hypotheses:

Anita de Waard

Principal Researcher, Disruptive TechnologiesElsevier Labs, Amsterdam

Utrecht Institute of Linguistics, University UtrechtThe Netherlands

[email protected]

Page 2: Ismb2009

Some Experiments in Semantic Enrichment

Page 3: Ismb2009

Entities and Relationships:FEBS Structured Digital Abstracts

OKKAM Entity Repository

Some Experiments in Semantic Enrichment

Page 4: Ismb2009

Entities and Relationships:FEBS Structured Digital Abstracts

OKKAM Entity Repository

Hypotheses and Evidence: Discourse Analysis of Biology Articles

Hypotheses, Evidence and Relationships

Some Experiments in Semantic Enrichment

Page 5: Ismb2009

Entities and Relationships:FEBS Structured Digital Abstracts

OKKAM Entity Repository

Hypotheses and Evidence: Discourse Analysis of Biology Articles

Hypotheses, Evidence and Relationships

Collaborations:Elsevier Grand Challenge

Future of Research Communications

Some Experiments in Semantic Enrichment

Page 6: Ismb2009

Entities and Relationships

Page 7: Ismb2009

- With FEBS Letters Editorial Office in Heidelberg; and MINT database Rome

- SDA [Gerstein et. al]: ‘machine-readable XML summary of pertinent facts’

- For each ppi paper: proteins, protein-protein interactions, methods: provided by author in XLS

- Started April 9, 2008: in ScienceDirect links to MINT and Uniprot

- Now 117 articles with SDA: http://www.febsletters.org/content/sda

- Being used as Gold Standard for Biocreative Challenge II.5http://www.biocreative.org/tasks/biocreative-ii5/biocreative-ii5-evaluation/

FEBS Letters Structured Digital Abstracts

Page 8: Ismb2009

D-07-02772Testis-specific poly(A) polymerase (TPAP) is a cytoplasmic poly(A) polymerase that is highly expressed in round spermatids. We identified germ cell-specific gene 1 (GSG1) as a TPAP interaction partner protein using yeast two-hybrid and coimmunoprecipitation assays. Subcellular fractionation analysis showed that GSG1 is exclusively localized in the endoplasmic reticulum (ER) of mouse testis where TPAP is also present. In NIH3T3 cells cotransfected with TPAP and GSG1, both proteins colocalize in the ER. Moreover, expression of GSG1 stimulates TPAP targeting to the ER, suggesting that interactions between the two proteins lead to the redistribution of TPAP from the cytosol to the ER.

MINT-6168263:Gsg1 (uniprotkb:Q8R1W2), TPAP (uniprotkb:Q9WVP6) and Calmegin (uniprotkb:P52194) colocalize (MI:0403) by cosedimentation (MI:0027)

MINT-6168204, MINT-6168178:Gsg1 (uniprotkb:Q8R1W2) and TPAP (uniprotkb:Q9WVP6) colocalize (MI:0403) by fluorescence microscopy (MI:0416)

MINT-6167930:Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:0218) with TPAP (uniprotkb:Q9WVP6) by two-hybrid (MI:0018)

Page 9: Ismb2009

D-07-02772Testis-specific poly(A) polymerase (TPAP) is a cytoplasmic poly(A) polymerase that is highly expressed in round spermatids. We identified germ cell-specific gene 1 (GSG1) as a TPAP interaction partner protein using yeast two-hybrid and coimmunoprecipitation assays. Subcellular fractionation analysis showed that GSG1 is exclusively localized in the endoplasmic reticulum (ER) of mouse testis where TPAP is also present. In NIH3T3 cells cotransfected with TPAP and GSG1, both proteins colocalize in the ER. Moreover, expression of GSG1 stimulates TPAP targeting to the ER, suggesting that interactions between the two proteins lead to the redistribution of TPAP from the cytosol to the ER.

MINT-6168263:Gsg1 (uniprotkb:Q8R1W2), TPAP (uniprotkb:Q9WVP6) and Calmegin (uniprotkb:P52194) colocalize (MI:0403) by cosedimentation (MI:0027)

MINT-6168204, MINT-6168178:Gsg1 (uniprotkb:Q8R1W2) and TPAP (uniprotkb:Q9WVP6) colocalize (MI:0403) by fluorescence microscopy (MI:0416)

MINT-6167930:Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:0218) with TPAP (uniprotkb:Q9WVP6) by two-hybrid (MI:0018)

Page 10: Ismb2009

D-07-02772Testis-specific poly(A) polymerase (TPAP) is a cytoplasmic poly(A) polymerase that is highly expressed in round spermatids. We identified germ cell-specific gene 1 (GSG1) as a TPAP interaction partner protein using yeast two-hybrid and coimmunoprecipitation assays. Subcellular fractionation analysis showed that GSG1 is exclusively localized in the endoplasmic reticulum (ER) of mouse testis where TPAP is also present. In NIH3T3 cells cotransfected with TPAP and GSG1, both proteins colocalize in the ER. Moreover, expression of GSG1 stimulates TPAP targeting to the ER, suggesting that interactions between the two proteins lead to the redistribution of TPAP from the cytosol to the ER.

MINT-6168263:Gsg1 (uniprotkb:Q8R1W2), TPAP (uniprotkb:Q9WVP6) and Calmegin (uniprotkb:P52194) colocalize (MI:0403) by cosedimentation (MI:0027)

MINT-6168204, MINT-6168178:Gsg1 (uniprotkb:Q8R1W2) and TPAP (uniprotkb:Q9WVP6) colocalize (MI:0403) by fluorescence microscopy (MI:0416)

MINT-6167930:Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:0218) with TPAP (uniprotkb:Q9WVP6) by two-hybrid (MI:0018)

Page 11: Ismb2009

D-07-02772Testis-specific poly(A) polymerase (TPAP) is a cytoplasmic poly(A) polymerase that is highly expressed in round spermatids. We identified germ cell-specific gene 1 (GSG1) as a TPAP interaction partner protein using yeast two-hybrid and coimmunoprecipitation assays. Subcellular fractionation analysis showed that GSG1 is exclusively localized in the endoplasmic reticulum (ER) of mouse testis where TPAP is also present. In NIH3T3 cells cotransfected with TPAP and GSG1, both proteins colocalize in the ER. Moreover, expression of GSG1 stimulates TPAP targeting to the ER, suggesting that interactions between the two proteins lead to the redistribution of TPAP from the cytosol to the ER.

MINT-6168263:Gsg1 (uniprotkb:Q8R1W2), TPAP (uniprotkb:Q9WVP6) and Calmegin (uniprotkb:P52194) colocalize (MI:0403) by cosedimentation (MI:0027)

MINT-6168204, MINT-6168178:Gsg1 (uniprotkb:Q8R1W2) and TPAP (uniprotkb:Q9WVP6) colocalize (MI:0403) by fluorescence microscopy (MI:0416)

MINT-6167930:Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:0218) with TPAP (uniprotkb:Q9WVP6) by two-hybrid (MI:0018)

Page 12: Ismb2009

D-07-02772Testis-specific poly(A) polymerase (TPAP) is a cytoplasmic poly(A) polymerase that is highly expressed in round spermatids. We identified germ cell-specific gene 1 (GSG1) as a TPAP interaction partner protein using yeast two-hybrid and coimmunoprecipitation assays. Subcellular fractionation analysis showed that GSG1 is exclusively localized in the endoplasmic reticulum (ER) of mouse testis where TPAP is also present. In NIH3T3 cells cotransfected with TPAP and GSG1, both proteins colocalize in the ER. Moreover, expression of GSG1 stimulates TPAP targeting to the ER, suggesting that interactions between the two proteins lead to the redistribution of TPAP from the cytosol to the ER.

MINT-6168263:Gsg1 (uniprotkb:Q8R1W2), TPAP (uniprotkb:Q9WVP6) and Calmegin (uniprotkb:P52194) colocalize (MI:0403) by cosedimentation (MI:0027)

MINT-6168204, MINT-6168178:Gsg1 (uniprotkb:Q8R1W2) and TPAP (uniprotkb:Q9WVP6) colocalize (MI:0403) by fluorescence microscopy (MI:0416)

MINT-6167930:Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:0218) with TPAP (uniprotkb:Q9WVP6) by two-hybrid (MI:0018)

Page 13: Ismb2009

OKKAM Project

FP7 - EU funded project, 13 partners, 2008 - 2010

- Main goal: create entity identifier repository

- Create entity-centric architecture:

- create new entities + synonyms where needed

- offer interface/link where not needed

- contain resolver, matcher, repository

- E.g. http://sig.ma - semantic mashup of entity properties

Page 14: Ismb2009

http://sig.ma/search?q=p53

Page 15: Ismb2009

OKKAM Entity-centric authoring toolCurrent:

- Build MS Word plugin to find proteins in FEBS articles (i.e., semi-automating FEBS SDA!)

- Now: testing web server interface to WhatizitNext: link to Biocreative metaserver

- Add: links to other entities via OKKAM server

- Prototype now being tested with FEBS Office

Future:

- Create unique Author ID, straddling proprietary (Scopus) and other (ResearcherID, OpenID) systems

- Allow query and resolution of authors/papers within Word

Page 16: Ismb2009

OKKAM Entity Editor in MS Word

Page 17: Ismb2009

OKKAM Entity Editor in MS Word

Page 18: Ismb2009

OKKAM Entity Editor in MS Word

Page 19: Ismb2009

OKKAM Entity Editor in MS Word

Page 20: Ismb2009

OKKAM plugin vs. UCSD group Plugin

UCSD plugin Okkam plugin

Works! Can be downloaded, open source

etc.

Prototype, starting to test with FEBS authors

Offline: load ontologies locally

Online: call out to web services (soon...)

Identify classes of entities (e.g., ‘inhibitor’)

Identify specific entities (e.g., a specific protein)

Works with Word 2007Works with other version of Word

(on a PC!)

Stores annotations as XML metadata - maintain

metadata in Word info pane

Store annotations as Word comments - can generate report

of entitieslinked to identifiers

Work well together!

Page 21: Ismb2009

OKKAM plugin vs. UCSD group Plugin

UCSD plugin Okkam plugin

Works! Can be downloaded, open source

etc.

Prototype, starting to test with FEBS authors

Offline: load ontologies locally

Online: call out to web services (soon...)

Identify classes of entities (e.g., ‘inhibitor’)

Identify specific entities (e.g., a specific protein)

Works with Word 2007Works with other version of Word

(on a PC!)

Stores annotations as XML metadata - maintain

metadata in Word info pane

Store annotations as Word comments - can generate report

of entitieslinked to identifiers

Work well together!

Page 22: Ismb2009

Hypotheses and Evidence

Page 23: Ismb2009

Issue with Biocreative challenge:

From one of the documents in the training set:

- In Xenopus oocyte maturation, cytoplasmic polyadenylation mediated by cytoplasmic polyadenylation element binding protein (CPEB) induces the translation of maternal mRNA [5].

- In mouse testis, another novel member of the CPEB protein family (CPEB2) and a homolog of xGLD-2 (mGLD-2) have been identified [7] and [8]

versus:

- TPAP was present in GSG1 immunoprecipitates (Fig. 2B). The in vivo data suggest that TPAP–GSG1 interactions occur in mammalian cells.

Issue 1: what is experimentally verified content?Issue 2: what is new?

Page 24: Ismb2009

Discourse analysis of biological text

Page 25: Ismb2009

Discourse analysis of biological text

- How can we identify line of argumentation in biology text?

Page 26: Ismb2009

Discourse analysis of biological text

- How can we identify line of argumentation in biology text?

Page 27: Ismb2009

Discourse analysis of biological text

- How can we identify line of argumentation in biology text?

- Liguistics says: parse into clauses (anything with a verb)

Page 28: Ismb2009

Discourse analysis of biological text

- How can we identify line of argumentation in biology text?

- Liguistics says: parse into clauses (anything with a verb)- Identify Discourse Segment Purpose

Page 29: Ismb2009

Discourse analysis of biological text

- How can we identify line of argumentation in biology text?

- Liguistics says: parse into clauses (anything with a verb)- Identify Discourse Segment Purpose- Identify verb tense, type, markers for each segment type

Page 30: Ismb2009

Discourse analysis of biological text

- How can we identify line of argumentation in biology text?

- Liguistics says: parse into clauses (anything with a verb)- Identify Discourse Segment Purpose- Identify verb tense, type, markers for each segment type

Page 31: Ismb2009

Discourse analysis of biological text

- How can we identify line of argumentation in biology text?

- Liguistics says: parse into clauses (anything with a verb)- Identify Discourse Segment Purpose- Identify verb tense, type, markers for each segment type

Page 32: Ismb2009

3 Realms of Science:

Experimental realm

Data realm

Conceptual realm

Page 33: Ismb2009

3 Realms of Science:

Experimental realm

Data realm

Conceptual realm

(1) Oncogene-induced senescence is characterized by the appearance of cells with a flat morphology that express senescence associated (SA)-

-Galactosidase.

(4b) transduction with either miR-Vec-371&2 or miR-Vec-373 prevents RASV12-induced growth arrest in primary human cells.

(2b) control RASV12-arrested cells showed relatively high abundance of flat cells expressing SA- -Galactosidase

(3b) very few cells showed senescent morphology when transduced with either miR-Vec-371&2, miR-Vec-373, or control p53kd.

(4a) Altogether, these data show that

(3a) Consistent with the cell growth assay,

(Figures)

(2a) Indeed,

(2c) (Figures 2G and 2H).

Page 34: Ismb2009

3 Realms of Science:

Experimental realm

Data realm

Conceptual realm

(1) Oncogene-induced senescence is characterized by the appearance of cells with a flat morphology that express senescence associated (SA)-

-Galactosidase.

(4b) transduction with either miR-Vec-371&2 or miR-Vec-373 prevents RASV12-induced growth arrest in primary human cells.

(2b) control RASV12-arrested cells showed relatively high abundance of flat cells expressing SA- -Galactosidase

(3b) very few cells showed senescent morphology when transduced with either miR-Vec-371&2, miR-Vec-373, or control p53kd.

(4a) Altogether, these data show that

(3a) Consistent with the cell growth assay,

(Figures)

(2a) Indeed,

(2c) (Figures 2G and 2H).

Page 35: Ismb2009

Hypotheses ‘erode’ into facts:

Page 36: Ismb2009

Concepts

KnownFact KnownFact

Hypotheses ‘erode’ into facts:

Page 37: Ismb2009

Concepts

KnownFact KnownFact

Hypotheses ‘erode’ into facts:

Hypothesis

To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...

Page 38: Ismb2009

Concepts

KnownFact KnownFact

Experiment 1

Goal

Result

Data

Method

Hypotheses ‘erode’ into facts:

Hypothesis

To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...

Page 39: Ismb2009

Concepts

KnownFact KnownFact

Experiment 1

Goal

Result

Data

Method

Hypotheses ‘erode’ into facts:

Hypothesis

To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...

Implication

Therefore, these results point toLATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,

Page 40: Ismb2009

Concepts

KnownFact KnownFact

Experiment 1

Goal

Result

Data

Method

Goal

Experiment 2

Data

Method Result

Hypotheses ‘erode’ into facts:

Hypothesis

To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...

Implication

Therefore, these results point toLATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,

Page 41: Ismb2009

Concepts

KnownFact KnownFact

Experiment 1

Goal

Result

Data

Method

Goal

Experiment 2

Data

Method Result

Hypotheses ‘erode’ into facts:

Hypothesis

To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...

Implication

Therefore, these results point toLATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,

Fact

two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in testicular germ cell

tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).

Raver-Shapira et.al, JMolCell 2007

Page 42: Ismb2009

Concepts

KnownFact KnownFact

Experiment 1

Goal

Result

Data

Method

Goal

Experiment 2

Data

Method Result

Hypotheses ‘erode’ into facts:

Hypothesis

To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...

Implication

Therefore, these results point toLATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,

Fact

two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in testicular germ cell

tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).

Raver-Shapira et.al, JMolCell 2007

miR-372 and miR-373 target the Lats2 tumor suppressor (Voorhoeve et al., 2006)

Yabuta, JBioChem 2007

Page 43: Ismb2009

Concepts

KnownFact KnownFact

Experiment 1

Goal

Result

Data

Method

Goal

Experiment 2

Data

Method Result

Hypotheses ‘erode’ into facts:

Hypothesis

To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...

Implication

Therefore, these results point toLATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,

Fact

two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in testicular germ cell

tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).

Raver-Shapira et.al, JMolCell 2007

miR-372 and miR-373 target the Lats2 tumor suppressor (Voorhoeve et al., 2006)

Yabuta, JBioChem 2007

Hypothesis + Evidence

Page 44: Ismb2009

Identification of hypotheses in papers: SWAN Alzheimer KB

Page 45: Ismb2009

Identification of hypotheses in papers: SWAN Alzheimer KB

Page 46: Ismb2009

Identification of hypotheses in papers: SWAN Alzheimer KB

Page 47: Ismb2009

HypER Working Group:- Goal: Align and expand existing efforts on detection and

analysis of Hypotheses, Evidence & Relationships

- Partners:

- Harvard/MGH: SWAN, ARF

- Open University: Cohere

- Oxford University: CiTO, eLearning/Rhetoric

- DERI: SALT, aTags

- University of Trento: LiquidPub

- Xerox Research: XIP hypothesis identifier

- U Tilburg: ML for Science

- Elsevier, UUtrecht: Discourse analysis of biology

Page 48: Ismb2009

HypER Working Group:- Goal: Align and expand existing efforts on detection and

analysis of Hypotheses, Evidence & Relationships

- Partners:

- Harvard/MGH: SWAN, ARF

- Open University: Cohere

- Oxford University: CiTO, eLearning/Rhetoric

- DERI: SALT, aTags

- University of Trento: LiquidPub

- Xerox Research: XIP hypothesis identifier

- U Tilburg: ML for Science

- Elsevier, UUtrecht: Discourse analysis of biology

Page 49: Ismb2009

HypER Working Group:- Goal: Align and expand existing efforts on detection and

analysis of Hypotheses, Evidence & Relationships

- Partners:

- Harvard/MGH: SWAN, ARF

- Open University: Cohere

- Oxford University: CiTO, eLearning/Rhetoric

- DERI: SALT, aTags

- University of Trento: LiquidPub

- Xerox Research: XIP hypothesis identifier

- U Tilburg: ML for Science

- Elsevier, UUtrecht: Discourse analysis of biology

Page 50: Ismb2009

HypER Working Group:- Goal: Align and expand existing efforts on detection and

analysis of Hypotheses, Evidence & Relationships

- Partners:

- Harvard/MGH: SWAN, ARF

- Open University: Cohere

- Oxford University: CiTO, eLearning/Rhetoric

- DERI: SALT, aTags

- University of Trento: LiquidPub

- Xerox Research: XIP hypothesis identifier

- U Tilburg: ML for Science

- Elsevier, UUtrecht: Discourse analysis of biology

Page 51: Ismb2009

HypER Working Group:- Goal: Align and expand existing efforts on detection and

analysis of Hypotheses, Evidence & Relationships

- Partners:

- Harvard/MGH: SWAN, ARF

- Open University: Cohere

- Oxford University: CiTO, eLearning/Rhetoric

- DERI: SALT, aTags

- University of Trento: LiquidPub

- Xerox Research: XIP hypothesis identifier

- U Tilburg: ML for Science

- Elsevier, UUtrecht: Discourse analysis of biology

Page 52: Ismb2009

HypER Working Group:- Goal: Align and expand existing efforts on detection and

analysis of Hypotheses, Evidence & Relationships

- Partners:

- Harvard/MGH: SWAN, ARF

- Open University: Cohere

- Oxford University: CiTO, eLearning/Rhetoric

- DERI: SALT, aTags

- University of Trento: LiquidPub

- Xerox Research: XIP hypothesis identifier

- U Tilburg: ML for Science

- Elsevier, UUtrecht: Discourse analysis of biology

Page 53: Ismb2009

HypER Working Group:- Goal: Align and expand existing efforts on detection and

analysis of Hypotheses, Evidence & Relationships

- Partners:

- Harvard/MGH: SWAN, ARF

- Open University: Cohere

- Oxford University: CiTO, eLearning/Rhetoric

- DERI: SALT, aTags

- University of Trento: LiquidPub

- Xerox Research: XIP hypothesis identifier

- U Tilburg: ML for Science

- Elsevier, UUtrecht: Discourse analysis of biology

Hypothesis 22: Intramembrenous Aβ dimer may be toxic.

Derived from: POSTAT_CONTRIBUTION(This essay explores the possibility that a fraction of these Abeta peptides never leave the

Page 54: Ismb2009

HypER Activities: http://hyper.wik.isCurrent activities:

- Aligning discourse ontologies: joint task with W3C HCLSSig

- Aligning architectures to exchange hypotheses + evidence

- Format for a rhetorical conference paper (SALT + abcde)

- Parser comparison of hypothesis identification tools

- With NIF+SWAN/SCF: Structured Rhetorical Abstracts for Neuroscience publications

Further interests:

- Better structure of evidence: MyExperiment, KeFeD, ...

- Granularity of annotation/access: entity, hypothesis, discussion?

- Where/how annotate: http://saakm2009.semanticauthoring.org/

Page 55: Ismb2009

Collaborations

Page 56: Ismb2009

Scope: Tools and processes to:

- Improve the process of creating, reviewing and editing scientific content

- Interpret, visualize or connect science knowledge

- Provide tools/ideas for measuring the impact of these improvements.

June 2008: 71 Submissions from 15 countries.

August 2008: 10 Semi-finalists teams, access to:

- 500,000 full text articles

- Plus EMTREE, EmBase, Scopus

- Created tool/demo

- Presented to the Judges

- Wrote a paper (accepted for JWeb Semantics)

April 2009: Judges selected 4 Finalist teams.

And the winners are:

Page 57: Ismb2009

Scope: Tools and processes to:

- Improve the process of creating, reviewing and editing scientific content

- Interpret, visualize or connect science knowledge

- Provide tools/ideas for measuring the impact of these improvements.

June 2008: 71 Submissions from 15 countries.

August 2008: 10 Semi-finalists teams, access to:

- 500,000 full text articles

- Plus EMTREE, EmBase, Scopus

- Created tool/demo

- Presented to the Judges

- Wrote a paper (accepted for JWeb Semantics)

April 2009: Judges selected 4 Finalist teams.

And the winners are:

Figure 1: A summary generated when moving the mouse

over the link “Discovery’s” (mouse pointer omitted).

tive Summariser (IBES), complements generic sum-

maries in providing additional information about a

particular aspect of a page.3 Generic summaries

themselves are easy to generate due to rules enforced

by the Wikipedia style-guide, which dictates that all

titles be noun phrases describing an entity, thus serv-

ing as a short generic summary. Furthermore, the

first sentence of the article should contain the title

in subject position, which tends to create sentences

that define the main entity of the article.

For the elaborative summarisation scenario de-

scribed, we are interested in exploring ways in

which the reading context can be leveraged to pro-

duce the elaborative summary. One method ex-

plored in this paper attempts to map the content of

the linked document into the semantic space of the

reading context, as defined in vector-space. We use

Singular Value Decomposition (SVD), the underly-

ing method behind Latent Semantic Analysis (Deer-

wester et al., 1990), as a means of identifying latent

topics in the reading context, against which we com-

pare the linked document. We present our system

and the results from our preliminary investigation in

the remainder of this paper.

3http://www.ict.csiro.au/staff/stephen.wan/ibes/

2 Related Work

Using link text for summarisation has been explored

previously by Amitay and Paris (2000). They identi-

fied situations when it was possible to generate sum-

maries of web-pages by recycling human-authored

descriptions of links from anchor text. In our work,

we use the anchor text as the reading context to pro-

vide an elaborative summary for the linked docu-

ment.

Our work is similar in domain to that of the 2007

CLEF WiQA shared task.4 However, in contrast to

our application scenario, the end goal of the shared

task focuses on suggesting editing updates for a

particular document and not on elaborating on the

user’s reading context.

A related task was explored at the Document Un-

derstanding Conference (DUC) in 2007.5 Here the

goal was to find new information with respect to a

previously seen set of documents. This is similar to

the elaborative goal of our summary in the sense that

one could answer the question: “What else can I say

about topic X (that hasn’t already been mentioned

in the reading context)”. However, whereas DUC

focused on unlinked news wire text, we explore a

different genre of text.

3 Algorithm

Our approach is designed to select justification sen-

tences and expand upon them by finding elaborative

material. The first stage identifies those sentences

in the linked document that support the semantic

content of the anchor text. We call those sentences

justification material. The second stage finds mate-

rial that is supplementary yet relevant for the user.

In this paper, we report on the first of these tasks,

though ultimately both are required for elaborative

summaries.

To locate justification material, we implemented

two known summarisation techniques. The first

compares word overlap between the anchor text and

the linked document. The second approach attempts

to discover a semantic space, as defined by the read-

ing context. The linked document is then mapped

into this semantic space. These are referred to as the

Simple Link method and the SVD method, where

4http://ilps.science.uva.nl/WiQA/5http://duc.nist.gov/guidelines/2007.html

130

Page 58: Ismb2009

Scope: Tools and processes to:

- Improve the process of creating, reviewing and editing scientific content

- Interpret, visualize or connect science knowledge

- Provide tools/ideas for measuring the impact of these improvements.

June 2008: 71 Submissions from 15 countries.

August 2008: 10 Semi-finalists teams, access to:

- 500,000 full text articles

- Plus EMTREE, EmBase, Scopus

- Created tool/demo

- Presented to the Judges

- Wrote a paper (accepted for JWeb Semantics)

April 2009: Judges selected 4 Finalist teams.

And the winners are:

Figure 1: A summary generated when moving the mouse

over the link “Discovery’s” (mouse pointer omitted).

tive Summariser (IBES), complements generic sum-

maries in providing additional information about a

particular aspect of a page.3 Generic summaries

themselves are easy to generate due to rules enforced

by the Wikipedia style-guide, which dictates that all

titles be noun phrases describing an entity, thus serv-

ing as a short generic summary. Furthermore, the

first sentence of the article should contain the title

in subject position, which tends to create sentences

that define the main entity of the article.

For the elaborative summarisation scenario de-

scribed, we are interested in exploring ways in

which the reading context can be leveraged to pro-

duce the elaborative summary. One method ex-

plored in this paper attempts to map the content of

the linked document into the semantic space of the

reading context, as defined in vector-space. We use

Singular Value Decomposition (SVD), the underly-

ing method behind Latent Semantic Analysis (Deer-

wester et al., 1990), as a means of identifying latent

topics in the reading context, against which we com-

pare the linked document. We present our system

and the results from our preliminary investigation in

the remainder of this paper.

3http://www.ict.csiro.au/staff/stephen.wan/ibes/

2 Related Work

Using link text for summarisation has been explored

previously by Amitay and Paris (2000). They identi-

fied situations when it was possible to generate sum-

maries of web-pages by recycling human-authored

descriptions of links from anchor text. In our work,

we use the anchor text as the reading context to pro-

vide an elaborative summary for the linked docu-

ment.

Our work is similar in domain to that of the 2007

CLEF WiQA shared task.4 However, in contrast to

our application scenario, the end goal of the shared

task focuses on suggesting editing updates for a

particular document and not on elaborating on the

user’s reading context.

A related task was explored at the Document Un-

derstanding Conference (DUC) in 2007.5 Here the

goal was to find new information with respect to a

previously seen set of documents. This is similar to

the elaborative goal of our summary in the sense that

one could answer the question: “What else can I say

about topic X (that hasn’t already been mentioned

in the reading context)”. However, whereas DUC

focused on unlinked news wire text, we explore a

different genre of text.

3 Algorithm

Our approach is designed to select justification sen-

tences and expand upon them by finding elaborative

material. The first stage identifies those sentences

in the linked document that support the semantic

content of the anchor text. We call those sentences

justification material. The second stage finds mate-

rial that is supplementary yet relevant for the user.

In this paper, we report on the first of these tasks,

though ultimately both are required for elaborative

summaries.

To locate justification material, we implemented

two known summarisation techniques. The first

compares word overlap between the anchor text and

the linked document. The second approach attempts

to discover a semantic space, as defined by the read-

ing context. The linked document is then mapped

into this semantic space. These are referred to as the

Simple Link method and the SVD method, where

4http://ilps.science.uva.nl/WiQA/5http://duc.nist.gov/guidelines/2007.html

130

Page 59: Ismb2009

Scope: Tools and processes to:

- Improve the process of creating, reviewing and editing scientific content

- Interpret, visualize or connect science knowledge

- Provide tools/ideas for measuring the impact of these improvements.

June 2008: 71 Submissions from 15 countries.

August 2008: 10 Semi-finalists teams, access to:

- 500,000 full text articles

- Plus EMTREE, EmBase, Scopus

- Created tool/demo

- Presented to the Judges

- Wrote a paper (accepted for JWeb Semantics)

April 2009: Judges selected 4 Finalist teams.

And the winners are:

Figure 1: A summary generated when moving the mouse

over the link “Discovery’s” (mouse pointer omitted).

tive Summariser (IBES), complements generic sum-

maries in providing additional information about a

particular aspect of a page.3 Generic summaries

themselves are easy to generate due to rules enforced

by the Wikipedia style-guide, which dictates that all

titles be noun phrases describing an entity, thus serv-

ing as a short generic summary. Furthermore, the

first sentence of the article should contain the title

in subject position, which tends to create sentences

that define the main entity of the article.

For the elaborative summarisation scenario de-

scribed, we are interested in exploring ways in

which the reading context can be leveraged to pro-

duce the elaborative summary. One method ex-

plored in this paper attempts to map the content of

the linked document into the semantic space of the

reading context, as defined in vector-space. We use

Singular Value Decomposition (SVD), the underly-

ing method behind Latent Semantic Analysis (Deer-

wester et al., 1990), as a means of identifying latent

topics in the reading context, against which we com-

pare the linked document. We present our system

and the results from our preliminary investigation in

the remainder of this paper.

3http://www.ict.csiro.au/staff/stephen.wan/ibes/

2 Related Work

Using link text for summarisation has been explored

previously by Amitay and Paris (2000). They identi-

fied situations when it was possible to generate sum-

maries of web-pages by recycling human-authored

descriptions of links from anchor text. In our work,

we use the anchor text as the reading context to pro-

vide an elaborative summary for the linked docu-

ment.

Our work is similar in domain to that of the 2007

CLEF WiQA shared task.4 However, in contrast to

our application scenario, the end goal of the shared

task focuses on suggesting editing updates for a

particular document and not on elaborating on the

user’s reading context.

A related task was explored at the Document Un-

derstanding Conference (DUC) in 2007.5 Here the

goal was to find new information with respect to a

previously seen set of documents. This is similar to

the elaborative goal of our summary in the sense that

one could answer the question: “What else can I say

about topic X (that hasn’t already been mentioned

in the reading context)”. However, whereas DUC

focused on unlinked news wire text, we explore a

different genre of text.

3 Algorithm

Our approach is designed to select justification sen-

tences and expand upon them by finding elaborative

material. The first stage identifies those sentences

in the linked document that support the semantic

content of the anchor text. We call those sentences

justification material. The second stage finds mate-

rial that is supplementary yet relevant for the user.

In this paper, we report on the first of these tasks,

though ultimately both are required for elaborative

summaries.

To locate justification material, we implemented

two known summarisation techniques. The first

compares word overlap between the anchor text and

the linked document. The second approach attempts

to discover a semantic space, as defined by the read-

ing context. The linked document is then mapped

into this semantic space. These are referred to as the

Simple Link method and the SVD method, where

4http://ilps.science.uva.nl/WiQA/5http://duc.nist.gov/guidelines/2007.html

130

Page 60: Ismb2009

Scope: Tools and processes to:

- Improve the process of creating, reviewing and editing scientific content

- Interpret, visualize or connect science knowledge

- Provide tools/ideas for measuring the impact of these improvements.

June 2008: 71 Submissions from 15 countries.

August 2008: 10 Semi-finalists teams, access to:

- 500,000 full text articles

- Plus EMTREE, EmBase, Scopus

- Created tool/demo

- Presented to the Judges

- Wrote a paper (accepted for JWeb Semantics)

April 2009: Judges selected 4 Finalist teams.

And the winners are:

Figure 1: A summary generated when moving the mouse

over the link “Discovery’s” (mouse pointer omitted).

tive Summariser (IBES), complements generic sum-

maries in providing additional information about a

particular aspect of a page.3 Generic summaries

themselves are easy to generate due to rules enforced

by the Wikipedia style-guide, which dictates that all

titles be noun phrases describing an entity, thus serv-

ing as a short generic summary. Furthermore, the

first sentence of the article should contain the title

in subject position, which tends to create sentences

that define the main entity of the article.

For the elaborative summarisation scenario de-

scribed, we are interested in exploring ways in

which the reading context can be leveraged to pro-

duce the elaborative summary. One method ex-

plored in this paper attempts to map the content of

the linked document into the semantic space of the

reading context, as defined in vector-space. We use

Singular Value Decomposition (SVD), the underly-

ing method behind Latent Semantic Analysis (Deer-

wester et al., 1990), as a means of identifying latent

topics in the reading context, against which we com-

pare the linked document. We present our system

and the results from our preliminary investigation in

the remainder of this paper.

3http://www.ict.csiro.au/staff/stephen.wan/ibes/

2 Related Work

Using link text for summarisation has been explored

previously by Amitay and Paris (2000). They identi-

fied situations when it was possible to generate sum-

maries of web-pages by recycling human-authored

descriptions of links from anchor text. In our work,

we use the anchor text as the reading context to pro-

vide an elaborative summary for the linked docu-

ment.

Our work is similar in domain to that of the 2007

CLEF WiQA shared task.4 However, in contrast to

our application scenario, the end goal of the shared

task focuses on suggesting editing updates for a

particular document and not on elaborating on the

user’s reading context.

A related task was explored at the Document Un-

derstanding Conference (DUC) in 2007.5 Here the

goal was to find new information with respect to a

previously seen set of documents. This is similar to

the elaborative goal of our summary in the sense that

one could answer the question: “What else can I say

about topic X (that hasn’t already been mentioned

in the reading context)”. However, whereas DUC

focused on unlinked news wire text, we explore a

different genre of text.

3 Algorithm

Our approach is designed to select justification sen-

tences and expand upon them by finding elaborative

material. The first stage identifies those sentences

in the linked document that support the semantic

content of the anchor text. We call those sentences

justification material. The second stage finds mate-

rial that is supplementary yet relevant for the user.

In this paper, we report on the first of these tasks,

though ultimately both are required for elaborative

summaries.

To locate justification material, we implemented

two known summarisation techniques. The first

compares word overlap between the anchor text and

the linked document. The second approach attempts

to discover a semantic space, as defined by the read-

ing context. The linked document is then mapped

into this semantic space. These are referred to as the

Simple Link method and the SVD method, where

4http://ilps.science.uva.nl/WiQA/5http://duc.nist.gov/guidelines/2007.html

130

Page 61: Ismb2009

FoRC: The Future of Research Communication

Improve ways to:

- create, review, edit scientific content

- interpret, visualize, connect scientific knowledge

- measure the impact of these improvements

- present new paradigms in publishing

- serve global academic knowledge communities

- communicate between collaboratories

- share research data

- support complex knowledge ecosystems.

Harvard, Fall 2010

Page 62: Ismb2009

FoRC: The Future of Research Communication

Improve ways to:

- create, review, edit scientific content

- interpret, visualize, connect scientific knowledge

- measure the impact of these improvements

- present new paradigms in publishing

- serve global academic knowledge communities

- communicate between collaboratories

- share research data

- support complex knowledge ecosystems.

New ways of communicating science requires new ways of communicating! Conference registrants form a community: - Website is their meeting place, continues as long as users want to- The physical conference is synchronous manifestation of the community- Others can participate through a virtual environment at other time/place

Harvard, Fall 2010

Page 63: Ismb2009

Acknowledgments- Funding:

FP7-CordisNWO Casimir Project

- UUtrecht: Leen Breure,Henk Pander MaatTed Sanders

- ISI: Gully BurnsEd Hovy

- Elsevier: David Marques Darin McBeath Stefano Bocconi Adriaan KlinkenbergEmilie Marcus

- Challenge judges and participants: EMBL Team, CMU TeamDavid ShottonAlfonso Valencia

- Harvard/MGH:Tim ClarkElizabeth Wu

- DERI:Siegfried HandschuhTudor GrozaMatthias Samwald Paul Buitelaar

- Xerox Research: Agnes Sandor

- Open University: Simon Buckingham Shum David ShottonJack ParkClara Mancini

- Oxford University:Annamaria CarusiDavid Shotton

- Tilburg U:Piroska Lendvai

HypER: