Query Models

36
Query Models Use Types What do search engines do

description

Query Models. Use Types What do search engines do. What we have covered. What is IR Evaluation Tokenization and properties of text Web crawling This time Query models. Index. Query Engine. Interface. Indexer. Users. Crawler. Web. A Typical Web Search Engine. Queries. Index. - PowerPoint PPT Presentation

Transcript of Query Models

Page 1: Query Models

Query Models

• Use

• Types

• What do search engines do

Page 2: Query Models

What we have covered

• What is IR

• Evaluation

• Tokenization and properties of text

• Web crawling

• This time– Query models

Page 3: Query Models

Interface

Query Engine

Indexer

Index

Crawler

Users

Web

A Typical Web Search Engine

Page 4: Query Models

Interface

Query Engine

Indexer

Index

Crawler

Users

Web

A Typical Web Search Engine

Queries

Page 5: Query Models

Why the interest in Queries?

• Queries are ways we interact with IR systems– Expression of an information need

• Nonquery methods?• Types of queries?

Page 6: Query Models

Issues with Query Structures

Matching and ranking criteria

• Given a query, what documents are retrieved?

• In what order (rank)?

Page 7: Query Models

Types of Query StructuresQuery Models (languages) – most common

• Boolean Queries

• Extended-Boolean Queries

• Natural Language Queries

• Vector queries

• Others?

Page 8: Query Models

Simple query language: Boolean– Earliest query model– Terms + Connectors (or operators)– terms

• words• normalized (stemmed) words• phrases• thesaurus terms

– connectors• AND• OR• NOT

– Ex: Beethoven AND sonata

Page 9: Query Models

Truth Tables – Boolean Logic

P Q NOT P P AND Q P OR Q0 0 TRUE FALSE FALSE0 1 TRUE FALSE TRUE1 0 FALSE FALSE TRUE1 1 FALSE TRUE TRUE

Presence of P, P = 1Absence of P, P = 0True = 1False = 0

Page 10: Query Models

Problems with Boolean Queries• Ranking?• Incorrect interpretation of Boolean connectives

AND and OR• Example - Seeking Saturday entertainmentQueries:• Dinner AND sports AND symphony• Dinner OR sports OR symphony• Dinner AND sports OR symphony

Page 11: Query Models

Order of precedence of operators

Example of query. Is

• A AND B

• the same as

• B AND A

• Why?

Page 12: Query Models

Sample Boolean Queries• Cat

• Cat OR Dog

• Cat AND Dog

• (Cat AND Dog)

• (Cat AND Dog) OR Collar

• (Cat AND Dog) OR (Collar AND Leash)

• (Cat OR Dog) AND (Collar OR Leash)

Page 13: Query Models

Satisfaction of Boolean Query

• (Cat OR Dog) AND (Collar OR Leash)– Each of the following column combinations works:

• Cat x x x x• Dog x x x x x• Collar x x x x• Leash x x x x

Others?

Page 14: Query Models

Order of Preference– Define order of preference

• EX: a OR b AND c

– Infix notation• Parenthesis evaluated 1st with left to right precedence of

operators• Next NOT’s are applied• Then AND’s• Then OR’s

– a OR b AND c becomes– a OR (b AND c)

Page 15: Query Models

Infix Notation– Usually expressed as INFIX operators in IR

• ((a AND b) OR (c AND b))

– NOT is UNARY PREFIX operator• ((a AND b) OR (c AND (NOT b)))

– AND and OR can be n-ary operators• (a AND b AND c AND d)

– Some rules - (De Morgan revisited)• NOT(a) AND NOT(b) = NOT(a OR b)

• NOT(a) OR NOT(b)= NOT(a AND b)

• NOT(NOT(a)) = a

Page 16: Query Models

DNFs and CNFsAll queries can be rewritten as

– Disjunctive Normal Forms (DNFs)– Conjunctive Normal Forms (CNFs)

• DNF Constituents:– Terms (words or phrases)– Conjuncts (terms joined by ANDs)– Disjuncts (conjuncts joined by ORs)– Ex: (A AND B) OR (A AND NOTC)

• CNF Constituents:– Terms (words or phrases)– Disjuncts (terms joined by ORs)– Conjuncts (disjuncts joined by ANDs)– Ex: (A OR B) AND (A OR NOTC)

Page 17: Query Models

Effect of CNFs• All complex Boolean queries can be

simplified

• Why do reference librarians like CNFs?

• AND’s reduce the size of the set returned and are easily expandable– So do minus’s

Page 18: Query Models

Boolean Searching“Measurement of thewidth of cracks in prestressedconcrete beams”

Formal Query:cracks AND beamsAND Width_measurementAND Prestressed_concrete

Cracks

Beams Widthmeasurement

Prestressedconcrete

Relaxed Query:(C AND B AND P) OR(C AND B AND W) OR(C AND W AND P) OR(B AND W AND P)

Page 19: Query Models

Pseudo-Boolean Queries

• A new notation, from web search– +cat dog +collar leash

• Does not mean the same thing!

• Need a way to group combinations.

• Phrases:– “stray cat” AND “frayed collar”– +“stray cat” + “frayed collar”

Page 20: Query Models

Ordering (ranking) of Retrieved Documents

• Pure Boolean has no ordering• Term is there or it’s not• In practice:

– order chronologically

– order by total number of “hits” on query terms• What if one term has more hits than others?

• Is it better to have one of each term or many of one term?

Page 21: Query Models

Boolean Query - Summary• Advantages

– simple queries are easy to understand– relatively easy to implement

• Disadvantages– difficult to specify what is wanted– too much returned, or too little– ordering not well determined

• Dominant language in commercial systems until the WWW

Page 22: Query Models

Vector Space Model

• Documents and queries are represented as vectors in term space– Terms are usually stems– Documents represented by binary vectors of terms

• Queries represented the same as documents• Query and Document weights are based on length

and direction of their vector• A vector distance measure between the query and

documents is used to rank retrieved documents

Page 23: Query Models

Document Vectors

• Documents are represented as “bags of words”– Words are terms with no order

• Represented as vectors when used computationally– A vector is like an array of floating point values

– Has direction and magnitude

– Each vector holds a place for every term in the collection

– Therefore, most vectors are sparse

Page 24: Query Models

Queries

Vocabulary (dog, house, white)

Queries:

• dog (1,0,0)

• house (0,1,0)

• white (0,0,1)

• house and dog (1,1,0)

• dog and house (1,1,0)

• Show 3-D space plot

Page 25: Query Models

Documents (queries) in Vector Space

t1

t2

t3

D1

D2

D10

D3

D9

D4

D7

D8

D5

D11

D6

Page 26: Query Models

Documents in 3D Space

Assumption: Documents that are “close together” in space are similar in meaning.

Page 27: Query Models

Vector Query Problems

• Significance of queries– Can different values be placed on the different

terms – eg. 2dog 1house

• Scaling – size of vectors

• Number of words in the dictionary?

• 100,000

Page 28: Query Models

Proximity Searches• Proximity: terms occur within K positions of one another

– pen w/5 paper

• A “Near” function can be more vague– near(pen, paper)

• Sometimes order can be specified• Also, Phrases and Collocations

– “United Nations” “Bill Clinton”

• Phrase Variants– “retrieval of information” “information retrieval”

Page 29: Query Models

Filters

• Filters: Reduce set of candidate docs• Often specified simultaneous with query• Usually restrictions on metadata

– restrict by:• date range• internet domain (.edu .com .berkeley.edu)• author• size• limit number of documents returned

Page 30: Query Models

Natural Language Queries

• The “Holy Grail” of information retrieval• Issues in Natural Language Processing

– syntax

– semantics

– pragmatics

– speech understanding

– speech generation

Page 31: Query Models

What do search engines do?

• Tags– Title– Meta

• Term frequency and location

• Popularity

Page 34: Query Models

Old:Search Engine Query Differences

Page 35: Query Models

Older: Search engine query models

Page 36: Query Models

Types of Query Structures

Query Models (languages) – most common

• Boolean Queries– Old model

• Vector queries– Very common

• Probabilistic models– Mostly research

• Holy grail of search– Natural Language Queries