Ismb2009
-
Upload
anita-de-waard -
Category
Education
-
view
534 -
download
0
Transcript of Ismb2009
Some Experiments in Semantic Enrichment
From Proteins to Hypotheses:
Anita de Waard
Principal Researcher, Disruptive TechnologiesElsevier Labs, Amsterdam
Utrecht Institute of Linguistics, University UtrechtThe Netherlands
Some Experiments in Semantic Enrichment
Entities and Relationships:FEBS Structured Digital Abstracts
OKKAM Entity Repository
Some Experiments in Semantic Enrichment
Entities and Relationships:FEBS Structured Digital Abstracts
OKKAM Entity Repository
Hypotheses and Evidence: Discourse Analysis of Biology Articles
Hypotheses, Evidence and Relationships
Some Experiments in Semantic Enrichment
Entities and Relationships:FEBS Structured Digital Abstracts
OKKAM Entity Repository
Hypotheses and Evidence: Discourse Analysis of Biology Articles
Hypotheses, Evidence and Relationships
Collaborations:Elsevier Grand Challenge
Future of Research Communications
Some Experiments in Semantic Enrichment
Entities and Relationships
- With FEBS Letters Editorial Office in Heidelberg; and MINT database Rome
- SDA [Gerstein et. al]: ‘machine-readable XML summary of pertinent facts’
- For each ppi paper: proteins, protein-protein interactions, methods: provided by author in XLS
- Started April 9, 2008: in ScienceDirect links to MINT and Uniprot
- Now 117 articles with SDA: http://www.febsletters.org/content/sda
- Being used as Gold Standard for Biocreative Challenge II.5http://www.biocreative.org/tasks/biocreative-ii5/biocreative-ii5-evaluation/
FEBS Letters Structured Digital Abstracts
D-07-02772Testis-specific poly(A) polymerase (TPAP) is a cytoplasmic poly(A) polymerase that is highly expressed in round spermatids. We identified germ cell-specific gene 1 (GSG1) as a TPAP interaction partner protein using yeast two-hybrid and coimmunoprecipitation assays. Subcellular fractionation analysis showed that GSG1 is exclusively localized in the endoplasmic reticulum (ER) of mouse testis where TPAP is also present. In NIH3T3 cells cotransfected with TPAP and GSG1, both proteins colocalize in the ER. Moreover, expression of GSG1 stimulates TPAP targeting to the ER, suggesting that interactions between the two proteins lead to the redistribution of TPAP from the cytosol to the ER.
MINT-6168263:Gsg1 (uniprotkb:Q8R1W2), TPAP (uniprotkb:Q9WVP6) and Calmegin (uniprotkb:P52194) colocalize (MI:0403) by cosedimentation (MI:0027)
MINT-6168204, MINT-6168178:Gsg1 (uniprotkb:Q8R1W2) and TPAP (uniprotkb:Q9WVP6) colocalize (MI:0403) by fluorescence microscopy (MI:0416)
MINT-6167930:Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:0218) with TPAP (uniprotkb:Q9WVP6) by two-hybrid (MI:0018)
D-07-02772Testis-specific poly(A) polymerase (TPAP) is a cytoplasmic poly(A) polymerase that is highly expressed in round spermatids. We identified germ cell-specific gene 1 (GSG1) as a TPAP interaction partner protein using yeast two-hybrid and coimmunoprecipitation assays. Subcellular fractionation analysis showed that GSG1 is exclusively localized in the endoplasmic reticulum (ER) of mouse testis where TPAP is also present. In NIH3T3 cells cotransfected with TPAP and GSG1, both proteins colocalize in the ER. Moreover, expression of GSG1 stimulates TPAP targeting to the ER, suggesting that interactions between the two proteins lead to the redistribution of TPAP from the cytosol to the ER.
MINT-6168263:Gsg1 (uniprotkb:Q8R1W2), TPAP (uniprotkb:Q9WVP6) and Calmegin (uniprotkb:P52194) colocalize (MI:0403) by cosedimentation (MI:0027)
MINT-6168204, MINT-6168178:Gsg1 (uniprotkb:Q8R1W2) and TPAP (uniprotkb:Q9WVP6) colocalize (MI:0403) by fluorescence microscopy (MI:0416)
MINT-6167930:Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:0218) with TPAP (uniprotkb:Q9WVP6) by two-hybrid (MI:0018)
D-07-02772Testis-specific poly(A) polymerase (TPAP) is a cytoplasmic poly(A) polymerase that is highly expressed in round spermatids. We identified germ cell-specific gene 1 (GSG1) as a TPAP interaction partner protein using yeast two-hybrid and coimmunoprecipitation assays. Subcellular fractionation analysis showed that GSG1 is exclusively localized in the endoplasmic reticulum (ER) of mouse testis where TPAP is also present. In NIH3T3 cells cotransfected with TPAP and GSG1, both proteins colocalize in the ER. Moreover, expression of GSG1 stimulates TPAP targeting to the ER, suggesting that interactions between the two proteins lead to the redistribution of TPAP from the cytosol to the ER.
MINT-6168263:Gsg1 (uniprotkb:Q8R1W2), TPAP (uniprotkb:Q9WVP6) and Calmegin (uniprotkb:P52194) colocalize (MI:0403) by cosedimentation (MI:0027)
MINT-6168204, MINT-6168178:Gsg1 (uniprotkb:Q8R1W2) and TPAP (uniprotkb:Q9WVP6) colocalize (MI:0403) by fluorescence microscopy (MI:0416)
MINT-6167930:Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:0218) with TPAP (uniprotkb:Q9WVP6) by two-hybrid (MI:0018)
D-07-02772Testis-specific poly(A) polymerase (TPAP) is a cytoplasmic poly(A) polymerase that is highly expressed in round spermatids. We identified germ cell-specific gene 1 (GSG1) as a TPAP interaction partner protein using yeast two-hybrid and coimmunoprecipitation assays. Subcellular fractionation analysis showed that GSG1 is exclusively localized in the endoplasmic reticulum (ER) of mouse testis where TPAP is also present. In NIH3T3 cells cotransfected with TPAP and GSG1, both proteins colocalize in the ER. Moreover, expression of GSG1 stimulates TPAP targeting to the ER, suggesting that interactions between the two proteins lead to the redistribution of TPAP from the cytosol to the ER.
MINT-6168263:Gsg1 (uniprotkb:Q8R1W2), TPAP (uniprotkb:Q9WVP6) and Calmegin (uniprotkb:P52194) colocalize (MI:0403) by cosedimentation (MI:0027)
MINT-6168204, MINT-6168178:Gsg1 (uniprotkb:Q8R1W2) and TPAP (uniprotkb:Q9WVP6) colocalize (MI:0403) by fluorescence microscopy (MI:0416)
MINT-6167930:Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:0218) with TPAP (uniprotkb:Q9WVP6) by two-hybrid (MI:0018)
D-07-02772Testis-specific poly(A) polymerase (TPAP) is a cytoplasmic poly(A) polymerase that is highly expressed in round spermatids. We identified germ cell-specific gene 1 (GSG1) as a TPAP interaction partner protein using yeast two-hybrid and coimmunoprecipitation assays. Subcellular fractionation analysis showed that GSG1 is exclusively localized in the endoplasmic reticulum (ER) of mouse testis where TPAP is also present. In NIH3T3 cells cotransfected with TPAP and GSG1, both proteins colocalize in the ER. Moreover, expression of GSG1 stimulates TPAP targeting to the ER, suggesting that interactions between the two proteins lead to the redistribution of TPAP from the cytosol to the ER.
MINT-6168263:Gsg1 (uniprotkb:Q8R1W2), TPAP (uniprotkb:Q9WVP6) and Calmegin (uniprotkb:P52194) colocalize (MI:0403) by cosedimentation (MI:0027)
MINT-6168204, MINT-6168178:Gsg1 (uniprotkb:Q8R1W2) and TPAP (uniprotkb:Q9WVP6) colocalize (MI:0403) by fluorescence microscopy (MI:0416)
MINT-6167930:Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:0218) with TPAP (uniprotkb:Q9WVP6) by two-hybrid (MI:0018)
OKKAM Project
FP7 - EU funded project, 13 partners, 2008 - 2010
- Main goal: create entity identifier repository
- Create entity-centric architecture:
- create new entities + synonyms where needed
- offer interface/link where not needed
- contain resolver, matcher, repository
- E.g. http://sig.ma - semantic mashup of entity properties
OKKAM Entity-centric authoring toolCurrent:
- Build MS Word plugin to find proteins in FEBS articles (i.e., semi-automating FEBS SDA!)
- Now: testing web server interface to WhatizitNext: link to Biocreative metaserver
- Add: links to other entities via OKKAM server
- Prototype now being tested with FEBS Office
Future:
- Create unique Author ID, straddling proprietary (Scopus) and other (ResearcherID, OpenID) systems
- Allow query and resolution of authors/papers within Word
OKKAM Entity Editor in MS Word
OKKAM Entity Editor in MS Word
OKKAM Entity Editor in MS Word
OKKAM Entity Editor in MS Word
OKKAM plugin vs. UCSD group Plugin
UCSD plugin Okkam plugin
Works! Can be downloaded, open source
etc.
Prototype, starting to test with FEBS authors
Offline: load ontologies locally
Online: call out to web services (soon...)
Identify classes of entities (e.g., ‘inhibitor’)
Identify specific entities (e.g., a specific protein)
Works with Word 2007Works with other version of Word
(on a PC!)
Stores annotations as XML metadata - maintain
metadata in Word info pane
Store annotations as Word comments - can generate report
of entitieslinked to identifiers
Work well together!
OKKAM plugin vs. UCSD group Plugin
UCSD plugin Okkam plugin
Works! Can be downloaded, open source
etc.
Prototype, starting to test with FEBS authors
Offline: load ontologies locally
Online: call out to web services (soon...)
Identify classes of entities (e.g., ‘inhibitor’)
Identify specific entities (e.g., a specific protein)
Works with Word 2007Works with other version of Word
(on a PC!)
Stores annotations as XML metadata - maintain
metadata in Word info pane
Store annotations as Word comments - can generate report
of entitieslinked to identifiers
Work well together!
Hypotheses and Evidence
Issue with Biocreative challenge:
From one of the documents in the training set:
- In Xenopus oocyte maturation, cytoplasmic polyadenylation mediated by cytoplasmic polyadenylation element binding protein (CPEB) induces the translation of maternal mRNA [5].
- In mouse testis, another novel member of the CPEB protein family (CPEB2) and a homolog of xGLD-2 (mGLD-2) have been identified [7] and [8]
versus:
- TPAP was present in GSG1 immunoprecipitates (Fig. 2B). The in vivo data suggest that TPAP–GSG1 interactions occur in mammalian cells.
Issue 1: what is experimentally verified content?Issue 2: what is new?
Discourse analysis of biological text
Discourse analysis of biological text
- How can we identify line of argumentation in biology text?
Discourse analysis of biological text
- How can we identify line of argumentation in biology text?
Discourse analysis of biological text
- How can we identify line of argumentation in biology text?
- Liguistics says: parse into clauses (anything with a verb)
Discourse analysis of biological text
- How can we identify line of argumentation in biology text?
- Liguistics says: parse into clauses (anything with a verb)- Identify Discourse Segment Purpose
Discourse analysis of biological text
- How can we identify line of argumentation in biology text?
- Liguistics says: parse into clauses (anything with a verb)- Identify Discourse Segment Purpose- Identify verb tense, type, markers for each segment type
Discourse analysis of biological text
- How can we identify line of argumentation in biology text?
- Liguistics says: parse into clauses (anything with a verb)- Identify Discourse Segment Purpose- Identify verb tense, type, markers for each segment type
Discourse analysis of biological text
- How can we identify line of argumentation in biology text?
- Liguistics says: parse into clauses (anything with a verb)- Identify Discourse Segment Purpose- Identify verb tense, type, markers for each segment type
3 Realms of Science:
Experimental realm
Data realm
Conceptual realm
3 Realms of Science:
Experimental realm
Data realm
Conceptual realm
(1) Oncogene-induced senescence is characterized by the appearance of cells with a flat morphology that express senescence associated (SA)-
-Galactosidase.
(4b) transduction with either miR-Vec-371&2 or miR-Vec-373 prevents RASV12-induced growth arrest in primary human cells.
(2b) control RASV12-arrested cells showed relatively high abundance of flat cells expressing SA- -Galactosidase
(3b) very few cells showed senescent morphology when transduced with either miR-Vec-371&2, miR-Vec-373, or control p53kd.
(4a) Altogether, these data show that
(3a) Consistent with the cell growth assay,
(Figures)
(2a) Indeed,
(2c) (Figures 2G and 2H).
3 Realms of Science:
Experimental realm
Data realm
Conceptual realm
(1) Oncogene-induced senescence is characterized by the appearance of cells with a flat morphology that express senescence associated (SA)-
-Galactosidase.
(4b) transduction with either miR-Vec-371&2 or miR-Vec-373 prevents RASV12-induced growth arrest in primary human cells.
(2b) control RASV12-arrested cells showed relatively high abundance of flat cells expressing SA- -Galactosidase
(3b) very few cells showed senescent morphology when transduced with either miR-Vec-371&2, miR-Vec-373, or control p53kd.
(4a) Altogether, these data show that
(3a) Consistent with the cell growth assay,
(Figures)
(2a) Indeed,
(2c) (Figures 2G and 2H).
Hypotheses ‘erode’ into facts:
Concepts
KnownFact KnownFact
Hypotheses ‘erode’ into facts:
Concepts
KnownFact KnownFact
Hypotheses ‘erode’ into facts:
Hypothesis
To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...
Concepts
KnownFact KnownFact
Experiment 1
Goal
Result
Data
Method
Hypotheses ‘erode’ into facts:
Hypothesis
To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...
Concepts
KnownFact KnownFact
Experiment 1
Goal
Result
Data
Method
Hypotheses ‘erode’ into facts:
Hypothesis
To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...
Implication
Therefore, these results point toLATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,
Concepts
KnownFact KnownFact
Experiment 1
Goal
Result
Data
Method
Goal
Experiment 2
Data
Method Result
Hypotheses ‘erode’ into facts:
Hypothesis
To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...
Implication
Therefore, these results point toLATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,
Concepts
KnownFact KnownFact
Experiment 1
Goal
Result
Data
Method
Goal
Experiment 2
Data
Method Result
Hypotheses ‘erode’ into facts:
Hypothesis
To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...
Implication
Therefore, these results point toLATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,
Fact
two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in testicular germ cell
tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).
Raver-Shapira et.al, JMolCell 2007
Concepts
KnownFact KnownFact
Experiment 1
Goal
Result
Data
Method
Goal
Experiment 2
Data
Method Result
Hypotheses ‘erode’ into facts:
Hypothesis
To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...
Implication
Therefore, these results point toLATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,
Fact
two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in testicular germ cell
tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).
Raver-Shapira et.al, JMolCell 2007
miR-372 and miR-373 target the Lats2 tumor suppressor (Voorhoeve et al., 2006)
Yabuta, JBioChem 2007
Concepts
KnownFact KnownFact
Experiment 1
Goal
Result
Data
Method
Goal
Experiment 2
Data
Method Result
Hypotheses ‘erode’ into facts:
Hypothesis
To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...
Implication
Therefore, these results point toLATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,
Fact
two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in testicular germ cell
tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).
Raver-Shapira et.al, JMolCell 2007
miR-372 and miR-373 target the Lats2 tumor suppressor (Voorhoeve et al., 2006)
Yabuta, JBioChem 2007
Hypothesis + Evidence
Identification of hypotheses in papers: SWAN Alzheimer KB
Identification of hypotheses in papers: SWAN Alzheimer KB
Identification of hypotheses in papers: SWAN Alzheimer KB
HypER Working Group:- Goal: Align and expand existing efforts on detection and
analysis of Hypotheses, Evidence & Relationships
- Partners:
- Harvard/MGH: SWAN, ARF
- Open University: Cohere
- Oxford University: CiTO, eLearning/Rhetoric
- DERI: SALT, aTags
- University of Trento: LiquidPub
- Xerox Research: XIP hypothesis identifier
- U Tilburg: ML for Science
- Elsevier, UUtrecht: Discourse analysis of biology
HypER Working Group:- Goal: Align and expand existing efforts on detection and
analysis of Hypotheses, Evidence & Relationships
- Partners:
- Harvard/MGH: SWAN, ARF
- Open University: Cohere
- Oxford University: CiTO, eLearning/Rhetoric
- DERI: SALT, aTags
- University of Trento: LiquidPub
- Xerox Research: XIP hypothesis identifier
- U Tilburg: ML for Science
- Elsevier, UUtrecht: Discourse analysis of biology
HypER Working Group:- Goal: Align and expand existing efforts on detection and
analysis of Hypotheses, Evidence & Relationships
- Partners:
- Harvard/MGH: SWAN, ARF
- Open University: Cohere
- Oxford University: CiTO, eLearning/Rhetoric
- DERI: SALT, aTags
- University of Trento: LiquidPub
- Xerox Research: XIP hypothesis identifier
- U Tilburg: ML for Science
- Elsevier, UUtrecht: Discourse analysis of biology
HypER Working Group:- Goal: Align and expand existing efforts on detection and
analysis of Hypotheses, Evidence & Relationships
- Partners:
- Harvard/MGH: SWAN, ARF
- Open University: Cohere
- Oxford University: CiTO, eLearning/Rhetoric
- DERI: SALT, aTags
- University of Trento: LiquidPub
- Xerox Research: XIP hypothesis identifier
- U Tilburg: ML for Science
- Elsevier, UUtrecht: Discourse analysis of biology
HypER Working Group:- Goal: Align and expand existing efforts on detection and
analysis of Hypotheses, Evidence & Relationships
- Partners:
- Harvard/MGH: SWAN, ARF
- Open University: Cohere
- Oxford University: CiTO, eLearning/Rhetoric
- DERI: SALT, aTags
- University of Trento: LiquidPub
- Xerox Research: XIP hypothesis identifier
- U Tilburg: ML for Science
- Elsevier, UUtrecht: Discourse analysis of biology
HypER Working Group:- Goal: Align and expand existing efforts on detection and
analysis of Hypotheses, Evidence & Relationships
- Partners:
- Harvard/MGH: SWAN, ARF
- Open University: Cohere
- Oxford University: CiTO, eLearning/Rhetoric
- DERI: SALT, aTags
- University of Trento: LiquidPub
- Xerox Research: XIP hypothesis identifier
- U Tilburg: ML for Science
- Elsevier, UUtrecht: Discourse analysis of biology
HypER Working Group:- Goal: Align and expand existing efforts on detection and
analysis of Hypotheses, Evidence & Relationships
- Partners:
- Harvard/MGH: SWAN, ARF
- Open University: Cohere
- Oxford University: CiTO, eLearning/Rhetoric
- DERI: SALT, aTags
- University of Trento: LiquidPub
- Xerox Research: XIP hypothesis identifier
- U Tilburg: ML for Science
- Elsevier, UUtrecht: Discourse analysis of biology
Hypothesis 22: Intramembrenous Aβ dimer may be toxic.
Derived from: POSTAT_CONTRIBUTION(This essay explores the possibility that a fraction of these Abeta peptides never leave the
HypER Activities: http://hyper.wik.isCurrent activities:
- Aligning discourse ontologies: joint task with W3C HCLSSig
- Aligning architectures to exchange hypotheses + evidence
- Format for a rhetorical conference paper (SALT + abcde)
- Parser comparison of hypothesis identification tools
- With NIF+SWAN/SCF: Structured Rhetorical Abstracts for Neuroscience publications
Further interests:
- Better structure of evidence: MyExperiment, KeFeD, ...
- Granularity of annotation/access: entity, hypothesis, discussion?
- Where/how annotate: http://saakm2009.semanticauthoring.org/
Collaborations
Scope: Tools and processes to:
- Improve the process of creating, reviewing and editing scientific content
- Interpret, visualize or connect science knowledge
- Provide tools/ideas for measuring the impact of these improvements.
June 2008: 71 Submissions from 15 countries.
August 2008: 10 Semi-finalists teams, access to:
- 500,000 full text articles
- Plus EMTREE, EmBase, Scopus
- Created tool/demo
- Presented to the Judges
- Wrote a paper (accepted for JWeb Semantics)
April 2009: Judges selected 4 Finalist teams.
And the winners are:
Scope: Tools and processes to:
- Improve the process of creating, reviewing and editing scientific content
- Interpret, visualize or connect science knowledge
- Provide tools/ideas for measuring the impact of these improvements.
June 2008: 71 Submissions from 15 countries.
August 2008: 10 Semi-finalists teams, access to:
- 500,000 full text articles
- Plus EMTREE, EmBase, Scopus
- Created tool/demo
- Presented to the Judges
- Wrote a paper (accepted for JWeb Semantics)
April 2009: Judges selected 4 Finalist teams.
And the winners are:
Figure 1: A summary generated when moving the mouse
over the link “Discovery’s” (mouse pointer omitted).
tive Summariser (IBES), complements generic sum-
maries in providing additional information about a
particular aspect of a page.3 Generic summaries
themselves are easy to generate due to rules enforced
by the Wikipedia style-guide, which dictates that all
titles be noun phrases describing an entity, thus serv-
ing as a short generic summary. Furthermore, the
first sentence of the article should contain the title
in subject position, which tends to create sentences
that define the main entity of the article.
For the elaborative summarisation scenario de-
scribed, we are interested in exploring ways in
which the reading context can be leveraged to pro-
duce the elaborative summary. One method ex-
plored in this paper attempts to map the content of
the linked document into the semantic space of the
reading context, as defined in vector-space. We use
Singular Value Decomposition (SVD), the underly-
ing method behind Latent Semantic Analysis (Deer-
wester et al., 1990), as a means of identifying latent
topics in the reading context, against which we com-
pare the linked document. We present our system
and the results from our preliminary investigation in
the remainder of this paper.
3http://www.ict.csiro.au/staff/stephen.wan/ibes/
2 Related Work
Using link text for summarisation has been explored
previously by Amitay and Paris (2000). They identi-
fied situations when it was possible to generate sum-
maries of web-pages by recycling human-authored
descriptions of links from anchor text. In our work,
we use the anchor text as the reading context to pro-
vide an elaborative summary for the linked docu-
ment.
Our work is similar in domain to that of the 2007
CLEF WiQA shared task.4 However, in contrast to
our application scenario, the end goal of the shared
task focuses on suggesting editing updates for a
particular document and not on elaborating on the
user’s reading context.
A related task was explored at the Document Un-
derstanding Conference (DUC) in 2007.5 Here the
goal was to find new information with respect to a
previously seen set of documents. This is similar to
the elaborative goal of our summary in the sense that
one could answer the question: “What else can I say
about topic X (that hasn’t already been mentioned
in the reading context)”. However, whereas DUC
focused on unlinked news wire text, we explore a
different genre of text.
3 Algorithm
Our approach is designed to select justification sen-
tences and expand upon them by finding elaborative
material. The first stage identifies those sentences
in the linked document that support the semantic
content of the anchor text. We call those sentences
justification material. The second stage finds mate-
rial that is supplementary yet relevant for the user.
In this paper, we report on the first of these tasks,
though ultimately both are required for elaborative
summaries.
To locate justification material, we implemented
two known summarisation techniques. The first
compares word overlap between the anchor text and
the linked document. The second approach attempts
to discover a semantic space, as defined by the read-
ing context. The linked document is then mapped
into this semantic space. These are referred to as the
Simple Link method and the SVD method, where
4http://ilps.science.uva.nl/WiQA/5http://duc.nist.gov/guidelines/2007.html
130
Scope: Tools and processes to:
- Improve the process of creating, reviewing and editing scientific content
- Interpret, visualize or connect science knowledge
- Provide tools/ideas for measuring the impact of these improvements.
June 2008: 71 Submissions from 15 countries.
August 2008: 10 Semi-finalists teams, access to:
- 500,000 full text articles
- Plus EMTREE, EmBase, Scopus
- Created tool/demo
- Presented to the Judges
- Wrote a paper (accepted for JWeb Semantics)
April 2009: Judges selected 4 Finalist teams.
And the winners are:
Figure 1: A summary generated when moving the mouse
over the link “Discovery’s” (mouse pointer omitted).
tive Summariser (IBES), complements generic sum-
maries in providing additional information about a
particular aspect of a page.3 Generic summaries
themselves are easy to generate due to rules enforced
by the Wikipedia style-guide, which dictates that all
titles be noun phrases describing an entity, thus serv-
ing as a short generic summary. Furthermore, the
first sentence of the article should contain the title
in subject position, which tends to create sentences
that define the main entity of the article.
For the elaborative summarisation scenario de-
scribed, we are interested in exploring ways in
which the reading context can be leveraged to pro-
duce the elaborative summary. One method ex-
plored in this paper attempts to map the content of
the linked document into the semantic space of the
reading context, as defined in vector-space. We use
Singular Value Decomposition (SVD), the underly-
ing method behind Latent Semantic Analysis (Deer-
wester et al., 1990), as a means of identifying latent
topics in the reading context, against which we com-
pare the linked document. We present our system
and the results from our preliminary investigation in
the remainder of this paper.
3http://www.ict.csiro.au/staff/stephen.wan/ibes/
2 Related Work
Using link text for summarisation has been explored
previously by Amitay and Paris (2000). They identi-
fied situations when it was possible to generate sum-
maries of web-pages by recycling human-authored
descriptions of links from anchor text. In our work,
we use the anchor text as the reading context to pro-
vide an elaborative summary for the linked docu-
ment.
Our work is similar in domain to that of the 2007
CLEF WiQA shared task.4 However, in contrast to
our application scenario, the end goal of the shared
task focuses on suggesting editing updates for a
particular document and not on elaborating on the
user’s reading context.
A related task was explored at the Document Un-
derstanding Conference (DUC) in 2007.5 Here the
goal was to find new information with respect to a
previously seen set of documents. This is similar to
the elaborative goal of our summary in the sense that
one could answer the question: “What else can I say
about topic X (that hasn’t already been mentioned
in the reading context)”. However, whereas DUC
focused on unlinked news wire text, we explore a
different genre of text.
3 Algorithm
Our approach is designed to select justification sen-
tences and expand upon them by finding elaborative
material. The first stage identifies those sentences
in the linked document that support the semantic
content of the anchor text. We call those sentences
justification material. The second stage finds mate-
rial that is supplementary yet relevant for the user.
In this paper, we report on the first of these tasks,
though ultimately both are required for elaborative
summaries.
To locate justification material, we implemented
two known summarisation techniques. The first
compares word overlap between the anchor text and
the linked document. The second approach attempts
to discover a semantic space, as defined by the read-
ing context. The linked document is then mapped
into this semantic space. These are referred to as the
Simple Link method and the SVD method, where
4http://ilps.science.uva.nl/WiQA/5http://duc.nist.gov/guidelines/2007.html
130
Scope: Tools and processes to:
- Improve the process of creating, reviewing and editing scientific content
- Interpret, visualize or connect science knowledge
- Provide tools/ideas for measuring the impact of these improvements.
June 2008: 71 Submissions from 15 countries.
August 2008: 10 Semi-finalists teams, access to:
- 500,000 full text articles
- Plus EMTREE, EmBase, Scopus
- Created tool/demo
- Presented to the Judges
- Wrote a paper (accepted for JWeb Semantics)
April 2009: Judges selected 4 Finalist teams.
And the winners are:
Figure 1: A summary generated when moving the mouse
over the link “Discovery’s” (mouse pointer omitted).
tive Summariser (IBES), complements generic sum-
maries in providing additional information about a
particular aspect of a page.3 Generic summaries
themselves are easy to generate due to rules enforced
by the Wikipedia style-guide, which dictates that all
titles be noun phrases describing an entity, thus serv-
ing as a short generic summary. Furthermore, the
first sentence of the article should contain the title
in subject position, which tends to create sentences
that define the main entity of the article.
For the elaborative summarisation scenario de-
scribed, we are interested in exploring ways in
which the reading context can be leveraged to pro-
duce the elaborative summary. One method ex-
plored in this paper attempts to map the content of
the linked document into the semantic space of the
reading context, as defined in vector-space. We use
Singular Value Decomposition (SVD), the underly-
ing method behind Latent Semantic Analysis (Deer-
wester et al., 1990), as a means of identifying latent
topics in the reading context, against which we com-
pare the linked document. We present our system
and the results from our preliminary investigation in
the remainder of this paper.
3http://www.ict.csiro.au/staff/stephen.wan/ibes/
2 Related Work
Using link text for summarisation has been explored
previously by Amitay and Paris (2000). They identi-
fied situations when it was possible to generate sum-
maries of web-pages by recycling human-authored
descriptions of links from anchor text. In our work,
we use the anchor text as the reading context to pro-
vide an elaborative summary for the linked docu-
ment.
Our work is similar in domain to that of the 2007
CLEF WiQA shared task.4 However, in contrast to
our application scenario, the end goal of the shared
task focuses on suggesting editing updates for a
particular document and not on elaborating on the
user’s reading context.
A related task was explored at the Document Un-
derstanding Conference (DUC) in 2007.5 Here the
goal was to find new information with respect to a
previously seen set of documents. This is similar to
the elaborative goal of our summary in the sense that
one could answer the question: “What else can I say
about topic X (that hasn’t already been mentioned
in the reading context)”. However, whereas DUC
focused on unlinked news wire text, we explore a
different genre of text.
3 Algorithm
Our approach is designed to select justification sen-
tences and expand upon them by finding elaborative
material. The first stage identifies those sentences
in the linked document that support the semantic
content of the anchor text. We call those sentences
justification material. The second stage finds mate-
rial that is supplementary yet relevant for the user.
In this paper, we report on the first of these tasks,
though ultimately both are required for elaborative
summaries.
To locate justification material, we implemented
two known summarisation techniques. The first
compares word overlap between the anchor text and
the linked document. The second approach attempts
to discover a semantic space, as defined by the read-
ing context. The linked document is then mapped
into this semantic space. These are referred to as the
Simple Link method and the SVD method, where
4http://ilps.science.uva.nl/WiQA/5http://duc.nist.gov/guidelines/2007.html
130
Scope: Tools and processes to:
- Improve the process of creating, reviewing and editing scientific content
- Interpret, visualize or connect science knowledge
- Provide tools/ideas for measuring the impact of these improvements.
June 2008: 71 Submissions from 15 countries.
August 2008: 10 Semi-finalists teams, access to:
- 500,000 full text articles
- Plus EMTREE, EmBase, Scopus
- Created tool/demo
- Presented to the Judges
- Wrote a paper (accepted for JWeb Semantics)
April 2009: Judges selected 4 Finalist teams.
And the winners are:
Figure 1: A summary generated when moving the mouse
over the link “Discovery’s” (mouse pointer omitted).
tive Summariser (IBES), complements generic sum-
maries in providing additional information about a
particular aspect of a page.3 Generic summaries
themselves are easy to generate due to rules enforced
by the Wikipedia style-guide, which dictates that all
titles be noun phrases describing an entity, thus serv-
ing as a short generic summary. Furthermore, the
first sentence of the article should contain the title
in subject position, which tends to create sentences
that define the main entity of the article.
For the elaborative summarisation scenario de-
scribed, we are interested in exploring ways in
which the reading context can be leveraged to pro-
duce the elaborative summary. One method ex-
plored in this paper attempts to map the content of
the linked document into the semantic space of the
reading context, as defined in vector-space. We use
Singular Value Decomposition (SVD), the underly-
ing method behind Latent Semantic Analysis (Deer-
wester et al., 1990), as a means of identifying latent
topics in the reading context, against which we com-
pare the linked document. We present our system
and the results from our preliminary investigation in
the remainder of this paper.
3http://www.ict.csiro.au/staff/stephen.wan/ibes/
2 Related Work
Using link text for summarisation has been explored
previously by Amitay and Paris (2000). They identi-
fied situations when it was possible to generate sum-
maries of web-pages by recycling human-authored
descriptions of links from anchor text. In our work,
we use the anchor text as the reading context to pro-
vide an elaborative summary for the linked docu-
ment.
Our work is similar in domain to that of the 2007
CLEF WiQA shared task.4 However, in contrast to
our application scenario, the end goal of the shared
task focuses on suggesting editing updates for a
particular document and not on elaborating on the
user’s reading context.
A related task was explored at the Document Un-
derstanding Conference (DUC) in 2007.5 Here the
goal was to find new information with respect to a
previously seen set of documents. This is similar to
the elaborative goal of our summary in the sense that
one could answer the question: “What else can I say
about topic X (that hasn’t already been mentioned
in the reading context)”. However, whereas DUC
focused on unlinked news wire text, we explore a
different genre of text.
3 Algorithm
Our approach is designed to select justification sen-
tences and expand upon them by finding elaborative
material. The first stage identifies those sentences
in the linked document that support the semantic
content of the anchor text. We call those sentences
justification material. The second stage finds mate-
rial that is supplementary yet relevant for the user.
In this paper, we report on the first of these tasks,
though ultimately both are required for elaborative
summaries.
To locate justification material, we implemented
two known summarisation techniques. The first
compares word overlap between the anchor text and
the linked document. The second approach attempts
to discover a semantic space, as defined by the read-
ing context. The linked document is then mapped
into this semantic space. These are referred to as the
Simple Link method and the SVD method, where
4http://ilps.science.uva.nl/WiQA/5http://duc.nist.gov/guidelines/2007.html
130
FoRC: The Future of Research Communication
Improve ways to:
- create, review, edit scientific content
- interpret, visualize, connect scientific knowledge
- measure the impact of these improvements
- present new paradigms in publishing
- serve global academic knowledge communities
- communicate between collaboratories
- share research data
- support complex knowledge ecosystems.
Harvard, Fall 2010
FoRC: The Future of Research Communication
Improve ways to:
- create, review, edit scientific content
- interpret, visualize, connect scientific knowledge
- measure the impact of these improvements
- present new paradigms in publishing
- serve global academic knowledge communities
- communicate between collaboratories
- share research data
- support complex knowledge ecosystems.
New ways of communicating science requires new ways of communicating! Conference registrants form a community: - Website is their meeting place, continues as long as users want to- The physical conference is synchronous manifestation of the community- Others can participate through a virtual environment at other time/place
Harvard, Fall 2010
Acknowledgments- Funding:
FP7-CordisNWO Casimir Project
- UUtrecht: Leen Breure,Henk Pander MaatTed Sanders
- ISI: Gully BurnsEd Hovy
- Elsevier: David Marques Darin McBeath Stefano Bocconi Adriaan KlinkenbergEmilie Marcus
- Challenge judges and participants: EMBL Team, CMU TeamDavid ShottonAlfonso Valencia
- Harvard/MGH:Tim ClarkElizabeth Wu
- DERI:Siegfried HandschuhTudor GrozaMatthias Samwald Paul Buitelaar
- Xerox Research: Agnes Sandor
- Open University: Simon Buckingham Shum David ShottonJack ParkClara Mancini
- Oxford University:Annamaria CarusiDavid Shotton
- Tilburg U:Piroska Lendvai
HypER: