Finding Support Sentences for Entities

Finding Support Sentences for Entities

Roi Blanco, Hugo ZaragozaSIGIR‘10Advisor: Jia Ling, KohSpeaker: Yu Cheng, Hsieh

Outline

• Introduction

• Notations

• Features for Ranking Support Sentences

• Entity Ranking

• Sentence Ranking with Entity Ranking Information

• Experiment

Introduction

• Ranking entities (e.g. experts, locations, companies, etc) became a standard information retrieval task.

• Only show entities without explanations is not enough for users to indentify the relevance between query and entities.

• Retrieving and ranking entity support sentences to explain the relevance of an entity with respect to a query.

Notations

• : a collection of sentences

(paragraphs or text window of fixed size)

• : contexts surround

• : sentence-entity matrix,

, if entity j is present in sentence i

, otherwise

• :

S

sC Ss

G1ijG

0ijG

se 1seG

Notations

• : top k relevant sentences for query q

• : augment by adding contexts

with respect to each

• : candidate support sentences in

for an entity e

• : candidate support sentences in

for an entity e

})(|{ ksranksS qq

},'|{ 'sqq CsSssS qS

's

}1,|{ seqqe GSssSqS

}1,|{ seqqe GSssS qS

Features for Ranking Support Sentences

• : using the original score of the

sentence (measured by BM25)

• : context-aware model using

BM25F

Only consider the relevance between query and sentences

qeqqe SssFsH ),()(

qesqqe SsCsFsH ),,()(

Entity Ranking

• : number of sequences containing the entity e (like tf)• : penalize very frequent entities (like idf)• • : discover special entities

||),( qeFREQ SeqE

Ss se

PARITY G

SeE

||log)(

)(*),(),( eEeqEeqE PARITYFREQCOMB

)|(

)|(log)|(),(

s

qqKLD eP

ePePeqE

||)|(,

||

||)|(

S

GeP

S

SeP Ss se

sq

qeq

Sentence Ranking with Entity Ranking Information

qeseqe SsseifeqEsH ,)',()(

'

seifsH qe ,0)(

Position Consideration

the distance between the last match of query and entity

))(),(()()( epositionqpostionxmaslengthsH qe

Experiment

• Using Semantically Annotated Snapshot, which contains

- 1.5 M documents

- 75M sentences

- 20.3M unique name entities

(using 12 first level Wall Street Journal entity

types)

• Built dataset of 226 (query, entity) pairs with 45 unique queries manually.

Experiment

• Assessors produce queries about topic they know well.

• System produces a set of candidate entities

• Assessors eliminate the non-relevant entities with respect to the query

• System produces candidate sentences for each (query, entity) pair

Experiment

• Assessors evaluate four levels of

relevance:

1. Non-relevant

2. Fairly relevant

3. Relevant

4. Very relevant• A triple is considered relevant iff

),,( seqGrade

3),,( seqGrade),,( seq

Experiment

• Measurement

- MRR

- NDCG

- P@1

- MAP

Tie-aware evaluation is used

Experiment

• functions operate on a top-k set for a given query that can be augmented with a context

• The context of a sentence was defined as - The surrounding four sentences

- The title of its Wikipedia entry • Represent each sentence in three fields - First: the sentence s - Second: the surrounding sentences - Third: Wikipedia title

qeH

sC

Result

Combination > KLD > Frequency > Rarity

Sum > Average

The Role of Context

• Given a fixed query q and a fixed entity e - Correct support sentence for (q, e)

- The context in the ranking function itself

:qeR

:'qS

Conclusions & Future work

• Developed several features embracing different paradigms to tackle the problem

• The context of a sentence which can be effectively exploited using the BM25F

• The methods might have a bias for longer sentences – apply sentence normalization

• Pursuing other linguistic features of sentences

Finding Support Sentences for Entities

Documents

Transcript of Finding Support Sentences for Entities