Finding Support Sentences for Entities
description
Transcript of Finding Support Sentences for Entities
Finding Support Sentences for Entities
Roi Blanco, Hugo ZaragozaSIGIR‘10Advisor: Jia Ling, KohSpeaker: Yu Cheng, Hsieh
Outline
• Introduction
• Notations
• Features for Ranking Support Sentences
• Entity Ranking
• Sentence Ranking with Entity Ranking Information
• Experiment
Introduction
• Ranking entities (e.g. experts, locations, companies, etc) became a standard information retrieval task.
• Only show entities without explanations is not enough for users to indentify the relevance between query and entities.
• Retrieving and ranking entity support sentences to explain the relevance of an entity with respect to a query.
Notations
• : a collection of sentences
(paragraphs or text window of fixed size)
• : contexts surround
• : sentence-entity matrix,
, if entity j is present in sentence i
, otherwise
• :
S
sC Ss
G1ijG
0ijG
se 1seG
Notations
• : top k relevant sentences for query q
• : augment by adding contexts
with respect to each
• : candidate support sentences in
for an entity e
• : candidate support sentences in
for an entity e
})(|{ ksranksS qq
},'|{ 'sqq CsSssS qS
's
}1,|{ seqqe GSssSqS
}1,|{ seqqe GSssS qS
Features for Ranking Support Sentences
• : using the original score of the
sentence (measured by BM25)
• : context-aware model using
BM25F
Only consider the relevance between query and sentences
qeqqe SssFsH ),()(
qesqqe SsCsFsH ),,()(
Entity Ranking
• : number of sequences containing the entity e (like tf)• : penalize very frequent entities (like idf)• • : discover special entities
||),( qeFREQ SeqE
Ss se
PARITY G
SeE
||log)(
)(*),(),( eEeqEeqE PARITYFREQCOMB
)|(
)|(log)|(),(
s
qqKLD eP
ePePeqE
||)|(,
||
||)|(
S
GeP
S
SeP Ss se
sq
qeq
Sentence Ranking with Entity Ranking Information
qeseqe SsseifeqEsH ,)',()(
'
seifsH qe ,0)(
Position Consideration
the distance between the last match of query and entity
))(),(()()( epositionqpostionxmaslengthsH qe
Experiment
• Using Semantically Annotated Snapshot, which contains
- 1.5 M documents
- 75M sentences
- 20.3M unique name entities
(using 12 first level Wall Street Journal entity
types)
• Built dataset of 226 (query, entity) pairs with 45 unique queries manually.
Experiment
• Assessors produce queries about topic they know well.
• System produces a set of candidate entities
• Assessors eliminate the non-relevant entities with respect to the query
• System produces candidate sentences for each (query, entity) pair
Experiment
• Assessors evaluate four levels of
relevance:
1. Non-relevant
2. Fairly relevant
3. Relevant
4. Very relevant• A triple is considered relevant iff
),,( seqGrade
3),,( seqGrade),,( seq
Experiment
• Measurement
- MRR
- NDCG
- P@1
- MAP
Tie-aware evaluation is used
Experiment
• functions operate on a top-k set for a given query that can be augmented with a context
• The context of a sentence was defined as - The surrounding four sentences
- The title of its Wikipedia entry • Represent each sentence in three fields - First: the sentence s - Second: the surrounding sentences - Third: Wikipedia title
qeH
sC
Result
Combination > KLD > Frequency > Rarity
Sum > Average
The Role of Context
• Given a fixed query q and a fixed entity e - Correct support sentence for (q, e)
- The context in the ranking function itself
:qeR
:'qS
Conclusions & Future work
• Developed several features embracing different paradigms to tackle the problem
• The context of a sentence which can be effectively exploited using the BM25F
• The methods might have a bias for longer sentences – apply sentence normalization
• Pursuing other linguistic features of sentences