© 2004 Chris Staff CSAW’04 University of Malta [email protected] of 15 Expanding Query Terms...

15
[email protected] 1 of 15 © 2004 Chris Staff CSAW’04 University of Mal Expanding Query Terms in Context Chris Staff and Robert Muscat Department of Computer Science & AI University of Malta

Transcript of © 2004 Chris Staff CSAW’04 University of Malta [email protected] of 15 Expanding Query Terms...

Page 1: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.

[email protected] 1 of 15© 2004 Chris StaffCSAW’04

University of Malta

Expanding Query Terms in Context

Chris Staff and Robert MuscatDepartment of Computer Science & AI

University of Malta

Page 2: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.

[email protected] 2 of 15© 2004 Chris StaffCSAW’04

University of Malta

Aims of this presentation

• Background – The Vocabulary Problem in IR

• Scenario– Using retrieved documents to determine how to

expand query

• Approach

• Evaluation

Page 3: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.

[email protected] 3 of 15© 2004 Chris StaffCSAW’04

University of Malta

The Vocabulary Problem

• Furnas et al, 1987, find that any two people describe the same concept/object using the same term with a probability of less than .2

• This is a huge problem for IR– High probability of finding some documents

about your term (but watch ambiguous terms!)– Low probability of finding all documents about

your concept (so low ‘coverage’)

Page 4: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.

[email protected] 4 of 15© 2004 Chris StaffCSAW’04

University of Malta

What’s Query Expansion?

• Adding terms to query to improve recall while keeping precision high

• Recall is 1 when all relevant docs are retrieved

• Precision is 1 when all retrieved docs are relevant

Page 5: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.

[email protected] 5 of 15© 2004 Chris StaffCSAW’04

University of Malta

What’s Query Expansion?

• Attempts to improve recall (adding synonyms) usually involve constructed thesaurus (Qiu et al, 1995, Mandala et al, 1999, Voorhees, 1994)

• Attempts to improve precision (by adding restricting terms) now based around automatic relevance feedback (e.g., Mitra et al, 1998)

• Indiscriminate query expansion can lead to loss of precision (Voorhees, 1994) or hurt recall

Page 6: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.

[email protected] 6 of 15© 2004 Chris StaffCSAW’04

University of Malta

Scenario

• Two users search for information related to the same concept C

• User queries Q1 and Q2 have no terms in common

• R1 and R2 are results sets of Q1 and Q2 respectively

• Rcommon = R1 R2

Page 7: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.

[email protected] 7 of 15© 2004 Chris StaffCSAW’04

University of Malta

Scenario

• We assume that Rcommon is small and non-empty (Furnas, 1985 and Furnas et al, 1987)

• If Rcommon is large then Q1 and Q2 will both retrieve same set of documents

• Can determine (using WordNet) if any term in Q1 is the synonym of a term in Q2

– Some doc Dk in Rcommon probably includes both terms (because of way Web IR works)!

Page 8: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.

[email protected] 8 of 15© 2004 Chris StaffCSAW’04

University of Malta

Scenario

• If t1 in Q1 and t2 in Q2 are synonyms

– Can expand either in future queries containing t1 or t2

– As long as doc Dk appears in results set (the context)

Page 9: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.

[email protected] 9 of 15© 2004 Chris StaffCSAW’04

University of Malta

Approach

• ‘Learning’ synonyms in context

• Query Expansion

Page 10: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.

[email protected] 10 of 15© 2004 Chris StaffCSAW’04

University of Malta

‘Learning’ Synonyms in Context

• A document is associated with a “bag of words” ever used to retrieve doc

• A term, document pair is associated with a synset for the term in the context of the doc– Word sense from WordNet also recorded to

reduce ambiguity

Page 11: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.

[email protected] 11 of 15© 2004 Chris StaffCSAW’04

University of Malta

Query Expansion in Context

• Submit unexpanded original user query Q to obtain results set R

• For each document Dk in R (k is rank) retrieve synsets for terms in Q

• Same query term in context of different docs in R may yield inconsistent synsets– Countered using Inverse Document Relevance

Page 12: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.

[email protected] 12 of 15© 2004 Chris StaffCSAW’04

University of Malta

Inverse Document Relevance

• IDR is relative frequency with which doc d is retrieved in rank k when term q occurs in the query

• IDRq,d = Wq,d / Wd (where Wd is number of times d retrieved, Wq,d number of times d retrieved when q occurs in query)

Page 13: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.

[email protected] 13 of 15© 2004 Chris StaffCSAW’04

University of Malta

Term Document Relevance

• We then re-rank documents in R based on their TDR

• TDRq,d,k = IDRq,d x Wq,d,k / Wd,k

• Synsets of top-10 re-ranked document are merged according to word category and sense

• Most frequently occurring word category, word sense pair synset used to expand q in query

Page 14: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.

[email protected] 14 of 15© 2004 Chris StaffCSAW’04

University of Malta

Evaluation

• Need huge query log, ideally, with relevance judgements for queries

• We have TREC QA collection, but we’ll need to index them before running the test queries through them (using, e.g., SMART)– Disadvantage that there might not be enough

queries

• User Studies

Page 15: © 2004 Chris Staff CSAW’04 University of Malta cstaff@cs.um.edu.mt1 of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.

[email protected] 15 of 15© 2004 Chris StaffCSAW’04

University of Malta

Thank you!