Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when...

24
Query Expansion Query Expansion By: Sean McGettrick By: Sean McGettrick

Transcript of Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when...

Page 1: Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

Query ExpansionQuery Expansion

By: Sean McGettrickBy: Sean McGettrick

Page 2: Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

What is Query Expansion?What is Query Expansion?

Query Expansion is the term given when Query Expansion is the term given when a search engine adding search terms to a a search engine adding search terms to a user’s weighted search.user’s weighted search.

The goal is to improve precision and/or The goal is to improve precision and/or recall.recall.

Example: User Query: “car”; Expanded Example: User Query: “car”; Expanded Query: “car cars automobile automobiles Query: “car cars automobile automobiles auto” etc…auto” etc…

Page 3: Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

Classes of Query ExpansionClasses of Query Expansion

Human and/or computer generated Human and/or computer generated thesaurithesauri

Relevance feedbackRelevance feedback

Automatic query expansionAutomatic query expansion

Page 4: Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

Query Expansion IssuesQuery Expansion Issues

Two major issuesTwo major issues Which terms to include?Which terms to include? Which terms to weight more?Which terms to weight more?

Concept-Based vs. Term-Based Query Concept-Based vs. Term-Based Query ExpansionExpansion Is it better to expand based upon the Is it better to expand based upon the

individual terms in the query, or the overall individual terms in the query, or the overall concept of the query?concept of the query?

Page 5: Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

Relevance of Query ExpansionRelevance of Query Expansion

Query expansion is very important on the web.Query expansion is very important on the web. The amount of information on the web is always The amount of information on the web is always increasing.increasing. In 1999, Google had 135 million pages. It now has In 1999, Google had 135 million pages. It now has

over 3 billion.over 3 billion.

Search engine users follow specific trends with Search engine users follow specific trends with their searches.their searches. 2-3 words2-3 words Broad search termBroad search term Do not like to expand their queries either through Do not like to expand their queries either through

refining search terms or using Boolean operatorsrefining search terms or using Boolean operators

Page 6: Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

ThesauriThesauri

What is a Thesauri in the IR world?What is a Thesauri in the IR world? ““Any data structure that defines semantic Any data structure that defines semantic

relatedness between words.”relatedness between words.”Schutze and Pedersen (1997)Schutze and Pedersen (1997)

Often more complex than normal Thesauri.Often more complex than normal Thesauri. Thought to be too broad to be useful.Thought to be too broad to be useful.

Page 7: Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

The Need For ThesauriThe Need For Thesauri

Naturally assumed that pulling words from Naturally assumed that pulling words from a thesauri would increase:a thesauri would increase: The number of documents retrieved.The number of documents retrieved. Possibly precision.Possibly precision.

The car example: “car” vs. “car, auto, The car example: “car” vs. “car, auto, automobile, vehicle, sedan, etc…”automobile, vehicle, sedan, etc…” Which would retrieve the largest number of Which would retrieve the largest number of

documents?documents? Is larger necessarily better?Is larger necessarily better?

Page 8: Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

Human & Automatically Generated Human & Automatically Generated ThesauriThesauri

Earliest work began in the 1950s.Earliest work began in the 1950s. H.P. LuhnH.P. Luhn ThesaurofacetThesaurofacet – detailed list of engineering – detailed list of engineering

termsterms

Largely used in such industries as Largely used in such industries as medicine, aerospace, and other medicine, aerospace, and other technological fields.technological fields.

Page 9: Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

Drawbacks of Handcrafted Drawbacks of Handcrafted ThesauriThesauri

CostCost Development.Development. Maintenance.Maintenance. Cost often outweighs benefit.Cost often outweighs benefit.

TimeTime It often takes a long time for thesauri to It often takes a long time for thesauri to

develop. develop. Hard to keep up with the pace of scientific and Hard to keep up with the pace of scientific and

technological development.technological development.

Page 10: Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

Automatically Generated ThesauriAutomatically Generated Thesauri

Need grew from limitations of handcrafted Need grew from limitations of handcrafted thesauri.thesauri.

No longer the cost of experts to generate No longer the cost of experts to generate thesauri.thesauri.

Page 11: Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

Automatically Generated ThesauriAutomatically Generated Thesauri

3 Steps. 3 Steps. Extract word co-occurrences.Extract word co-occurrences. Define word similarities.Define word similarities.

Based upon word co-occurrence or lexical Based upon word co-occurrence or lexical relationship.relationship.

Cluster words based upon their similarities.Cluster words based upon their similarities.

Not proven very successful.Not proven very successful. As late as 1990 many industries were still As late as 1990 many industries were still

using handcrafted thesauri.using handcrafted thesauri.

Page 12: Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

Relevance FeedbackRelevance Feedback

Began in the 1960s.Began in the 1960s.

Significant improvement in recall and precision Significant improvement in recall and precision over early query expansion work.over early query expansion work.

Basic process as follows.Basic process as follows. The user creates their initial query which returns an The user creates their initial query which returns an

initial result set.initial result set. The user then selects a list of documents that are The user then selects a list of documents that are

relevant to their search.relevant to their search. The system then re-weights and/or expands the query The system then re-weights and/or expands the query

based upon the terms in the documents.based upon the terms in the documents.

Page 13: Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

Relevance Feedback ModelsRelevance Feedback Models

Many different types of models.Many different types of models.

Depend on methods and theories behind Depend on methods and theories behind them.them. Vector Space.Vector Space. Probabilistic.Probabilistic. Boolean.Boolean.

Page 14: Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

““Ide dec-hi” MethodIde dec-hi” Method

In this method, all the top ranked relevant In this method, all the top ranked relevant documents are used as is the highest documents are used as is the highest ranked non-relevant document.ranked non-relevant document.

The non-relevant document is used a The non-relevant document is used a point in the vector space from which the point in the vector space from which the feedback query is removed.feedback query is removed.

Up to 160% improvement over non-Up to 160% improvement over non-expanded queries.expanded queries.

Page 15: Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

Interactive Query ExpansionInteractive Query Expansion

Uses a thesaurus.Uses a thesaurus.

After initial query is submitted, the system After initial query is submitted, the system returns a list of associated and relevant returns a list of associated and relevant words derived from both the result set and words derived from both the result set and a thesaurus.a thesaurus.

Useful, but more research is needed.Useful, but more research is needed.

Page 16: Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

Pseudo-relevance FeedbackPseudo-relevance Feedback

Grew from problems involved in Grew from problems involved in implementing relevance feedback implementing relevance feedback systems.systems.

Users do not like to give manual feedback Users do not like to give manual feedback to the system.to the system.

Page 17: Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

Pseudo-relevance Feedback Pseudo-relevance Feedback ProcessProcess

The system returns an initial set of The system returns an initial set of documents.documents. The system assumes that the top The system assumes that the top nn number of documents are relevant to the number of documents are relevant to the query.query. The system takes terms from these The system takes terms from these documents to re-weight the query.documents to re-weight the query. Relies largely on the systems ability to Relies largely on the systems ability to initially retrieve relevant documents.initially retrieve relevant documents.

Page 18: Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

lollol

Page 19: Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

Automatic Query ExpansionAutomatic Query Expansion

The process of automatic query The process of automatic query expansion using computer generated expansion using computer generated thesauri.thesauri.

Works somewhat like pseudo-relevance Works somewhat like pseudo-relevance feedback.feedback.

Implementation not as useful, but still Implementation not as useful, but still widely researched.widely researched.

Page 20: Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

Term Co-occurrence MeasuresTerm Co-occurrence Measures

Process of developing relationships between Process of developing relationships between words based upon their co-occurrence in words based upon their co-occurrence in documents.documents.

ClusteringClustering Documents that share a significant number of terms Documents that share a significant number of terms

are grouped together.are grouped together. A thesaurus is then generated from the terms in these A thesaurus is then generated from the terms in these

categories.categories. Categories sometimes too narrow or broad.Categories sometimes too narrow or broad. Does not account for synonyms.Does not account for synonyms.

Page 21: Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

Lexical Co-Occurrence MeasuresLexical Co-Occurrence Measures

Instead of looking at the frequency of Instead of looking at the frequency of terms in a document, the proximity of terms in a document, the proximity of words in a document is looked at.words in a document is looked at. Context of words becomes important. Context of words becomes important. Some performance improvement shown Some performance improvement shown in small document collections.in small document collections. Not quite as good as relevance feedback, Not quite as good as relevance feedback, but better than pseudo-relevance but better than pseudo-relevance feedback.feedback.

Page 22: Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

Current State of Query ExpansionCurrent State of Query Expansion

Query Expansion technology has reached Query Expansion technology has reached somewhat of a plateau.somewhat of a plateau.

This is due to limiting factors of relevance This is due to limiting factors of relevance feedback and word co-occurrence.feedback and word co-occurrence.

Current research attempting to refine Current research attempting to refine previous research in the field.previous research in the field.

Page 23: Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

Where To Go From Here?Where To Go From Here?

Grammatical Based ThesauriGrammatical Based Thesauri Syntactical relationship between wordsSyntactical relationship between words Words placed into classesWords placed into classes Some improvement on small document collections. Some improvement on small document collections.

Failed on larger ones.Failed on larger ones.

AI SearchingAI Searching Mostly theoryMostly theory Intelligent AgentsIntelligent Agents Could be customized reflect specific needs of the userCould be customized reflect specific needs of the user Next logical step in IR, but still far off from commercial Next logical step in IR, but still far off from commercial

useuse

Page 24: Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

Works CitedWorks CitedAttardi, G., S. Di Marco and F. Sebastiani. 1998. Automated Generation of Attardi, G., S. Di Marco and F. Sebastiani. 1998. Automated Generation of Category-Specific Thesauri for Interactive Query Expansion. Category-Specific Thesauri for Interactive Query Expansion. Grefenstette, G. 1992. Use of Syntactic Context to Produce Term Grefenstette, G. 1992. Use of Syntactic Context to Produce Term Association Lists for Text Retrieval. In Association Lists for Text Retrieval. In Proceedings of the 15th Annual Proceedings of the 15th Annual International ACM-SIGIR Conference on Research and Development in International ACM-SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, DenmarkInformation Retrieval, Copenhagen, Denmark, ed. N. Belkin, P. Ingwersen , ed. N. Belkin, P. Ingwersen and A. M. Pesjtersen: pp. 89-97. New York: ACM Press.and A. M. Pesjtersen: pp. 89-97. New York: ACM Press.Ide, E. 1971. New Experiments in Relevance Feedback. In G. Salton. Ide, E. 1971. New Experiments in Relevance Feedback. In G. Salton. The The SMART Retrieval System: Experiments in automatic document processingSMART Retrieval System: Experiments in automatic document processing. . Englewood Cliffs, NJ: Prentice-Hall.Englewood Cliffs, NJ: Prentice-Hall.Qiu, Y., 1993. Concept Based Query Expansion. In Qiu, Y., 1993. Concept Based Query Expansion. In Proceedings of SIGIR-Proceedings of SIGIR-93, 1693, 16thth ACM International Conference on Research and Development in ACM International Conference on Research and Development in Information Retrieval.Information Retrieval.Schutze, H. and J. Pederson. 1997. A Cooccurance-based Thesaurus and Schutze, H. and J. Pederson. 1997. A Cooccurance-based Thesaurus and Two Applications to Information Retrieval. Two Applications to Information Retrieval. Information Processing and Information Processing and ManagementManagement 33, no. 3: pp. 307-318. 33, no. 3: pp. 307-318.Walker, D. 2001. Query Expansion Using Thesauri.Walker, D. 2001. Query Expansion Using Thesauri.