The Effects of Time on Query Flow Graph-based Models for Query Suggestion
Query Models
description
Transcript of Query Models
![Page 1: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/1.jpg)
Query Models
• Use
• Types
• What do search engines do
![Page 2: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/2.jpg)
What we have covered
• What is IR
• Evaluation
• Tokenization and properties of text
• Vector models of documents
• Web crawling
• This time– Query models
![Page 3: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/3.jpg)
Interface
Query Engine
Indexer
Index
Crawler
Users
Web
A Typical Web Search Engine
![Page 4: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/4.jpg)
Query Engine
Interface
Users
Web
Online vs offline processing
Off-line
Indexer
Index
Crawler
![Page 5: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/5.jpg)
Interface
Query Engine
Indexer
Index
Crawler
Users
Web
A Typical Web Search Engine
Queries
![Page 6: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/6.jpg)
Why the interest in Queries?
• Queries are ways we interact with IR systems– Expression of an information need
• Nonquery methods?• Types of queries?
![Page 7: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/7.jpg)
Issues with Query Structures
Matching and ranking criteria
• Given a query, what documents are retrieved?
• In what order (rank)?
![Page 8: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/8.jpg)
Types of Query StructuresQuery Models (languages) – most common
• Boolean Queries
• Extended-Boolean Queries– Vector space Boolean
• Vector queries
• Natural Language Queries
• Others?
![Page 9: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/9.jpg)
Simple query language: Boolean– Earliest query model– Terms + Connectors (or operators)– terms
• words• normalized (stemmed) words• phrases• thesaurus terms
– connectors• AND• OR• NOT
– Ex: Beethoven AND sonata
![Page 10: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/10.jpg)
Truth Tables – Boolean Logic
P Q NOT P P AND Q P OR Q0 0 TRUE FALSE FALSE0 1 TRUE FALSE TRUE1 0 FALSE FALSE TRUE1 1 FALSE TRUE TRUE
Presence of P, P = 1Absence of P, P = 0True = 1False = 0
![Page 11: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/11.jpg)
Problems with Boolean Queries• Ranking?• Incorrect interpretation of Boolean connectives
AND and OR• Example - Seeking Saturday entertainmentQueries:• Dinner AND sports AND symphony• Dinner OR sports OR symphony• Dinner AND sports OR symphony
![Page 12: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/12.jpg)
Order of precedence of operators
Example of query. Is
• A AND B
• the same as
• B AND A
• Why?
![Page 13: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/13.jpg)
Sample Boolean Queries• Cat
• Cat OR Dog
• Cat AND Dog
• (Cat AND Dog)
• (Cat AND Dog) OR Collar
• (Cat AND Dog) OR (Collar AND Leash)
• (Cat OR Dog) AND (Collar OR Leash)
![Page 14: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/14.jpg)
Satisfaction of Boolean Query
• (Cat OR Dog) AND (Collar OR Leash)– Each of the following column combinations works:
• Cat x x x x• Dog x x x x x• Collar x x x x• Leash x x x x
Others?
![Page 15: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/15.jpg)
Order of Preference– Define order of preference
• EX: a OR b AND c
– Infix notation• Parenthesis evaluated 1st with left to right precedence of
operators• Next NOT’s are applied• Then AND’s• Then OR’s
– a OR b AND c becomes– a OR (b AND c)
![Page 16: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/16.jpg)
Infix Notation– Usually expressed as INFIX operators in IR
• ((a AND b) OR (c AND b))
– NOT is UNARY PREFIX operator• ((a AND b) OR (c AND (NOT b)))
– AND and OR can be n-ary operators• (a AND b AND c AND d)
– Some rules - (De Morgan revisited)• NOT(a) AND NOT(b) = NOT(a OR b)
• NOT(a) OR NOT(b)= NOT(a AND b)
• NOT(NOT(a)) = a
![Page 17: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/17.jpg)
DNFs and CNFsAll queries can be rewritten as
– Disjunctive Normal Forms (DNFs)– Conjunctive Normal Forms (CNFs)
• DNF Constituents:– Terms (words or phrases)– Conjuncts (terms joined by ANDs)– Disjuncts (conjuncts joined by ORs)– Ex: (A AND B) OR (A AND NOTC)
• CNF Constituents:– Terms (words or phrases)– Disjuncts (terms joined by ORs)– Conjuncts (disjuncts joined by ANDs)– Ex: (A OR B) AND (A OR NOTC)
![Page 18: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/18.jpg)
Effect of CNFs• All complex Boolean queries can be
simplified
• Why do reference librarians like CNFs?
• AND’s reduce the size of the set returned and are easily expandable– So do minus’s
![Page 19: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/19.jpg)
Boolean Searching“Measurement of thewidth of cracks in prestressedconcrete beams”
Formal Query:cracks AND beamsAND Width_measurementAND Prestressed_concrete
Cracks
Beams Widthmeasurement
Prestressedconcrete
Relaxed Query:(C AND B AND P) OR(C AND B AND W) OR(C AND W AND P) OR(B AND W AND P)
![Page 20: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/20.jpg)
Ordering (ranking) of Retrieved Documents
• Pure Boolean has no ordering• Term is there or it’s not• In practice:
– order chronologically
– order by total number of “hits” on query terms• What if one term has more hits than others?
• Is it better to have one of each term or many of one term?
![Page 21: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/21.jpg)
Boolean Query - Summary• Advantages
– simple queries are easy to understand– relatively easy to implement
• Disadvantages– difficult to specify what is wanted– too much returned, or too little– ordering not well determined
• Dominant language in commercial systems until the WWW
![Page 22: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/22.jpg)
Vector Space Model
• Queries treated as small documents• Documents and queries are represented as vectors
in term space– Terms are usually stems– Documents represented by binary vectors of terms
• Query and Document weights are based on length and direction of their vector
• A vector distance measure between the query and documents is used to rank retrieved documents
![Page 23: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/23.jpg)
Document Vectors
• Documents are represented as “bags of words”– Words are terms with no order
• Represented as vectors when used computationally– A vector is like an array of floating point values
– Has direction and magnitude
– Each vector holds a place for every term in the collection
– Therefore, most vectors are sparse
![Page 24: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/24.jpg)
Queries
Vocabulary (dog, house, white)
Queries:
• dog (1,0,0)
• house (0,1,0)
• white (0,0,1)
• house and dog (1,1,0)
• dog and house (1,1,0)
• Show 3-D space plot
![Page 25: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/25.jpg)
Documents (queries) in Vector Space
t1
t2
t3
D1
D2
D10
D3
D9
D4
D7
D8
D5
D11
D6
![Page 26: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/26.jpg)
Documents in 3D Space
Assumption: Documents that are “close together” in space are similar in meaning.
![Page 27: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/27.jpg)
Vector Query Problems
• Significance of queries– Can different values be placed on the different
terms – eg. 2dog 1house
• Scaling – size of vectors
• Number of words in the dictionary?
• 100,000
![Page 28: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/28.jpg)
Proximity Searches• Proximity: terms occur within K positions of one another
– pen w/5 paper
• A “Near” function can be more vague– near(pen, paper)
• Sometimes order can be specified
• Also, Phrases and Collocations– “United Nations” “Bill Clinton”
• Phrase Variants– “retrieval of information” “information retrieval”
Proximity - wikipedia
![Page 29: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/29.jpg)
Filters/field limiters
• Filters: Reduce set of candidate docs• Often specified simultaneous with query• Usually restrictions on metadata
– restrict by:• date range• internet domain (.edu .com .berkeley.edu)• author• size• limit number of documents returned
![Page 30: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/30.jpg)
Natural Language Queries
• The “Holy Grail” of information retrieval• Issues in Natural Language Processing
– syntax
– semantics
– pragmatics
– speech understanding
– speech generation
![Page 31: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/31.jpg)
What do search engines do?
• Tags– Title– Meta
• Term frequency and location
• Popularity
• Others
![Page 32: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/32.jpg)
What do search engines do?
• Collection of various methods, sometimes called pseudo-Boolean– quotes, minus, plus– pseudo AND
• truth in vs in truth
– stop words?
![Page 33: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/33.jpg)
What does Google do?
• Basic search
• Search operators
![Page 34: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/34.jpg)
http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/SearchEngines.html
UC Berkeley Search Engine Guide
![Page 35: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/35.jpg)
http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/SearchEngines.html
UC Berkeley Search Engine Guide
![Page 36: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/36.jpg)
Old:Search Engine Query Differences
![Page 37: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/37.jpg)
Older: Search engine query models
![Page 38: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/38.jpg)
Search query string• The portion of a dynamic URL that contains the search parameters when a
dynamic Web site is searched. Query strings do not exist until a user plugs the variables into a database search, at which point the search engine will create the dynamic URL with the query string based on the results. Query strings
typically contain ? and % characters.
![Page 39: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/39.jpg)
Lucene Basics
• Searches are supported through a wide range of Query options– Keyword– Terms– Phrases– Wildcards– Many, many more
![Page 40: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/40.jpg)
QueryParser syntax examplesQuery expression Document matches if…
java Contains the term java in the default field
java junitjava OR junit
Contains the term java or junit or both in the default field (the default operator can be changed to AND)
+java +junitjava AND junit
Contains both java and junit in the default field
title:ant Contains the term ant in the title field
title:extreme –subject:sports Contains extreme in the title and not sports in subject
(agile OR extreme) AND java
Boolean expression matches
title:”junit in action” Phrase matches in title
title:”junit action”~5 Proximity matches (within 5) in title
java* Wildcard matches
java~ Fuzzy matches
lastmodified:[1/1/09 TO 12/31/09]
Range matches
![Page 41: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/56814889550346895db59dc7/html5/thumbnails/41.jpg)
Types of Query Structures
Query Models (languages) – most common• Boolean Queries
– Old model
• Vector queries– Very common - in all search engines to some extent
• Web queries– Search engines
• Probabilistic models– Mostly research (Indri)
• Holy grail of search– Natural Language Queries