Interaction models between Query Clients, Information Resources & Discovery Services
Query Models
description
Transcript of Query Models
![Page 1: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/1.jpg)
Query Models
• Use
• Types
• What do search engines do
![Page 2: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/2.jpg)
What we have covered
• What is IR
• Evaluation
• Tokenization and properties of text
• Web crawling
• This time– Query models
![Page 3: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/3.jpg)
Interface
Query Engine
Indexer
Index
Crawler
Users
Web
A Typical Web Search Engine
![Page 4: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/4.jpg)
Interface
Query Engine
Indexer
Index
Crawler
Users
Web
A Typical Web Search Engine
Queries
![Page 5: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/5.jpg)
Why the interest in Queries?
• Queries are ways we interact with IR systems– Expression of an information need
• Nonquery methods?• Types of queries?
![Page 6: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/6.jpg)
Issues with Query Structures
Matching and ranking criteria
• Given a query, what documents are retrieved?
• In what order (rank)?
![Page 7: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/7.jpg)
Types of Query StructuresQuery Models (languages) – most common
• Boolean Queries
• Extended-Boolean Queries
• Natural Language Queries
• Vector queries
• Others?
![Page 8: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/8.jpg)
Simple query language: Boolean– Earliest query model– Terms + Connectors (or operators)– terms
• words• normalized (stemmed) words• phrases• thesaurus terms
– connectors• AND• OR• NOT
– Ex: Beethoven AND sonata
![Page 9: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/9.jpg)
Truth Tables – Boolean Logic
P Q NOT P P AND Q P OR Q0 0 TRUE FALSE FALSE0 1 TRUE FALSE TRUE1 0 FALSE FALSE TRUE1 1 FALSE TRUE TRUE
Presence of P, P = 1Absence of P, P = 0True = 1False = 0
![Page 10: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/10.jpg)
Problems with Boolean Queries• Ranking?• Incorrect interpretation of Boolean connectives
AND and OR• Example - Seeking Saturday entertainmentQueries:• Dinner AND sports AND symphony• Dinner OR sports OR symphony• Dinner AND sports OR symphony
![Page 11: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/11.jpg)
Order of precedence of operators
Example of query. Is
• A AND B
• the same as
• B AND A
• Why?
![Page 12: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/12.jpg)
Sample Boolean Queries• Cat
• Cat OR Dog
• Cat AND Dog
• (Cat AND Dog)
• (Cat AND Dog) OR Collar
• (Cat AND Dog) OR (Collar AND Leash)
• (Cat OR Dog) AND (Collar OR Leash)
![Page 13: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/13.jpg)
Satisfaction of Boolean Query
• (Cat OR Dog) AND (Collar OR Leash)– Each of the following column combinations works:
• Cat x x x x• Dog x x x x x• Collar x x x x• Leash x x x x
Others?
![Page 14: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/14.jpg)
Order of Preference– Define order of preference
• EX: a OR b AND c
– Infix notation• Parenthesis evaluated 1st with left to right precedence of
operators• Next NOT’s are applied• Then AND’s• Then OR’s
– a OR b AND c becomes– a OR (b AND c)
![Page 15: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/15.jpg)
Infix Notation– Usually expressed as INFIX operators in IR
• ((a AND b) OR (c AND b))
– NOT is UNARY PREFIX operator• ((a AND b) OR (c AND (NOT b)))
– AND and OR can be n-ary operators• (a AND b AND c AND d)
– Some rules - (De Morgan revisited)• NOT(a) AND NOT(b) = NOT(a OR b)
• NOT(a) OR NOT(b)= NOT(a AND b)
• NOT(NOT(a)) = a
![Page 16: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/16.jpg)
DNFs and CNFsAll queries can be rewritten as
– Disjunctive Normal Forms (DNFs)– Conjunctive Normal Forms (CNFs)
• DNF Constituents:– Terms (words or phrases)– Conjuncts (terms joined by ANDs)– Disjuncts (conjuncts joined by ORs)– Ex: (A AND B) OR (A AND NOTC)
• CNF Constituents:– Terms (words or phrases)– Disjuncts (terms joined by ORs)– Conjuncts (disjuncts joined by ANDs)– Ex: (A OR B) AND (A OR NOTC)
![Page 17: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/17.jpg)
Effect of CNFs• All complex Boolean queries can be
simplified
• Why do reference librarians like CNFs?
• AND’s reduce the size of the set returned and are easily expandable– So do minus’s
![Page 18: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/18.jpg)
Boolean Searching“Measurement of thewidth of cracks in prestressedconcrete beams”
Formal Query:cracks AND beamsAND Width_measurementAND Prestressed_concrete
Cracks
Beams Widthmeasurement
Prestressedconcrete
Relaxed Query:(C AND B AND P) OR(C AND B AND W) OR(C AND W AND P) OR(B AND W AND P)
![Page 19: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/19.jpg)
Pseudo-Boolean Queries
• A new notation, from web search– +cat dog +collar leash
• Does not mean the same thing!
• Need a way to group combinations.
• Phrases:– “stray cat” AND “frayed collar”– +“stray cat” + “frayed collar”
![Page 20: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/20.jpg)
Ordering (ranking) of Retrieved Documents
• Pure Boolean has no ordering• Term is there or it’s not• In practice:
– order chronologically
– order by total number of “hits” on query terms• What if one term has more hits than others?
• Is it better to have one of each term or many of one term?
![Page 21: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/21.jpg)
Boolean Query - Summary• Advantages
– simple queries are easy to understand– relatively easy to implement
• Disadvantages– difficult to specify what is wanted– too much returned, or too little– ordering not well determined
• Dominant language in commercial systems until the WWW
![Page 22: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/22.jpg)
Vector Space Model
• Documents and queries are represented as vectors in term space– Terms are usually stems– Documents represented by binary vectors of terms
• Queries represented the same as documents• Query and Document weights are based on length
and direction of their vector• A vector distance measure between the query and
documents is used to rank retrieved documents
![Page 23: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/23.jpg)
Document Vectors
• Documents are represented as “bags of words”– Words are terms with no order
• Represented as vectors when used computationally– A vector is like an array of floating point values
– Has direction and magnitude
– Each vector holds a place for every term in the collection
– Therefore, most vectors are sparse
![Page 24: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/24.jpg)
Queries
Vocabulary (dog, house, white)
Queries:
• dog (1,0,0)
• house (0,1,0)
• white (0,0,1)
• house and dog (1,1,0)
• dog and house (1,1,0)
• Show 3-D space plot
![Page 25: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/25.jpg)
Documents (queries) in Vector Space
t1
t2
t3
D1
D2
D10
D3
D9
D4
D7
D8
D5
D11
D6
![Page 26: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/26.jpg)
Documents in 3D Space
Assumption: Documents that are “close together” in space are similar in meaning.
![Page 27: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/27.jpg)
Vector Query Problems
• Significance of queries– Can different values be placed on the different
terms – eg. 2dog 1house
• Scaling – size of vectors
• Number of words in the dictionary?
• 100,000
![Page 28: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/28.jpg)
Proximity Searches• Proximity: terms occur within K positions of one another
– pen w/5 paper
• A “Near” function can be more vague– near(pen, paper)
• Sometimes order can be specified• Also, Phrases and Collocations
– “United Nations” “Bill Clinton”
• Phrase Variants– “retrieval of information” “information retrieval”
![Page 29: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/29.jpg)
Filters
• Filters: Reduce set of candidate docs• Often specified simultaneous with query• Usually restrictions on metadata
– restrict by:• date range• internet domain (.edu .com .berkeley.edu)• author• size• limit number of documents returned
![Page 30: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/30.jpg)
Natural Language Queries
• The “Holy Grail” of information retrieval• Issues in Natural Language Processing
– syntax
– semantics
– pragmatics
– speech understanding
– speech generation
![Page 31: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/31.jpg)
What do search engines do?
• Tags– Title– Meta
• Term frequency and location
• Popularity
![Page 32: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/32.jpg)
http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/SearchEngines.html
UC Berkeley Search Engine Guide
![Page 33: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/33.jpg)
http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/SearchEngines.html
UC Berkeley Search Engine Guide
![Page 34: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/34.jpg)
Old:Search Engine Query Differences
![Page 35: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/35.jpg)
Older: Search engine query models
![Page 36: Query Models](https://reader035.fdocuments.in/reader035/viewer/2022062518/5681488b550346895db5a13d/html5/thumbnails/36.jpg)
Types of Query Structures
Query Models (languages) – most common
• Boolean Queries– Old model
• Vector queries– Very common
• Probabilistic models– Mostly research
• Holy grail of search– Natural Language Queries