Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information...

73
Information Retrieval: Introduction Information Retrieval: Introduction Norbert Fuhr April 28, 2014 1 / 52

Transcript of Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information...

Page 1: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Information Retrieval: Introduction

Norbert Fuhr

April 28, 2014

1 / 52

Page 2: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval Applications

Application ExamplesFacets of Search

Page 3: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Information Retrieval Applications

Application Examples

Web Search

3 / 52

Page 4: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Information Retrieval Applications

Application Examples

Product Search in Online Shops

4 / 52

Page 5: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Information Retrieval Applications

Application Examples

Intranet Search

5 / 52

Page 6: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Information Retrieval Applications

Application Examples

Searching in Digital Libraries

6 / 52

Page 7: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Information Retrieval Applications

Application Examples

Multimedia Search

7 / 52

Page 8: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Information Retrieval Applications

Facets of Search

Facets of SearchLanguage

Example: Cross-lingual search in Google

8 / 52

Page 9: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Information Retrieval Applications

Facets of Search

Facets of SearcingStructure

Example: XML retrieval

9 / 52

Page 10: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Information Retrieval Applications

Facets of Search

Facets of SearchMedia

Example: Similarity search for images

10 / 52

Page 11: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Information Retrieval Applications

Facets of Search

Facets of SearchObjects

Example: People search in 123people

11 / 52

Page 12: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Information Retrieval Applications

Facets of Search

Facets of Searchstatic/dynamic Content

Example: Twitter search

12 / 52

Page 13: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Information Retrieval Applications

Facets of Search

Facets of Search

Language: monolingual, cross-lingual, multilingual

Structure: atomics, fields, tree structure (e.g. XML), graph(e.g. Web)

Media: texts, facts, images, audio, video, 3D,. . .

Objects: products, people, companies,. . .

static/dynamic contents (databases/streams)

13 / 52

Page 14: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Definition of Information Retrieval

Page 15: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Definition of Information Retrieval

Definition of Information Retrieval

Information Retrieval (IR) is about vagueness und uncertainty ininformation systems

Vagueness: user cannot give a precise specification of herinformation need

vague query conditionsiterative query formulation

Uncertainty system has uncertain knowledge about the (contentof the) objects in database

uncertain representation( wrong answers)incomplete representation( missing answers)

15 / 52

Page 16: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Definition of Information Retrieval

Definition of Information Retrieval

Information Retrieval (IR) is about vagueness und uncertainty ininformation systems

Vagueness: user cannot give a precise specification of herinformation need

vague query conditions

iterative query formulation

Uncertainty system has uncertain knowledge about the (contentof the) objects in database

uncertain representation( wrong answers)incomplete representation( missing answers)

15 / 52

Page 17: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Definition of Information Retrieval

Definition of Information Retrieval

Information Retrieval (IR) is about vagueness und uncertainty ininformation systems

Vagueness: user cannot give a precise specification of herinformation need

vague query conditionsiterative query formulation

Uncertainty system has uncertain knowledge about the (contentof the) objects in database

uncertain representation( wrong answers)incomplete representation( missing answers)

15 / 52

Page 18: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Definition of Information Retrieval

Definition of Information Retrieval

Information Retrieval (IR) is about vagueness und uncertainty ininformation systems

Vagueness: user cannot give a precise specification of herinformation need

vague query conditionsiterative query formulation

Uncertainty system has uncertain knowledge about the (contentof the) objects in database

uncertain representation( wrong answers)incomplete representation( missing answers)

15 / 52

Page 19: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Definition of Information Retrieval

Definition of Information Retrieval

Information Retrieval (IR) is about vagueness und uncertainty ininformation systems

Vagueness: user cannot give a precise specification of herinformation need

vague query conditionsiterative query formulation

Uncertainty system has uncertain knowledge about the (contentof the) objects in database

uncertain representation( wrong answers)

incomplete representation( missing answers)

15 / 52

Page 20: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Definition of Information Retrieval

Definition of Information Retrieval

Information Retrieval (IR) is about vagueness und uncertainty ininformation systems

Vagueness: user cannot give a precise specification of herinformation need

vague query conditionsiterative query formulation

Uncertainty system has uncertain knowledge about the (contentof the) objects in database

uncertain representation( wrong answers)incomplete representation( missing answers)

15 / 52

Page 21: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Definition of Information Retrieval

IR = Content-Oriented SearchNarrow Definition of IR

Searching at different abstraction levels:

Syntax document as sequence of symbols

Semantics meaning of a text/media object

Pragmatics usefulness for solving my current problem

16 / 52

Page 22: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Definition of Information Retrieval

Syntax, Semantics and Pragmatics

Welcome at the Information Engineering Group. Our current workfocuses on information retrieval, digital libraries and web-basedinformation systems, with special emphasis on user-orientedresearch.

Syntax ’digital library’ no match

Semantics ’research area’ match

Pragmatics ’potential project partner for medical informationproject’?

17 / 52

Page 23: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Retrieval Quality

Page 24: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Retrieval Quality

Retrieval QualityThe concept of relevance

in contrast to databases, IR system cannot decide if an answeris correct or not

user has information need

relevance: relationship between document and informationneed

judged by user

19 / 52

Page 25: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Retrieval Quality

Facets of Relevance

20 / 52

Page 26: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Retrieval Quality

Facets of Relevance

Situational Relevance: related to the percepted task

Pertinence relevance: related to the information need

Intellectual topicality: as judged by human observer

Algorithmic relevance: system score comparing request/querywith object

In the following: Relevance as pertinence/topicality without furtherdistinction

21 / 52

Page 27: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Retrieval Quality

Retrieval metrics

RET: set of retrieved documents

REL: set of relevant documents in the database

Precision p: Proportion of relevant among retrieved

Recall r : Proportion of retrieved among relevant

p =|REL ∩ RET ||RET |

r =|REL ∩ RET ||REL|

Example:20 relevant documents for the current query.System returns 10 dokumente, of which 8 are relevant.

Precision: p = 8/10 = 0.8Recall: r = 8/20 = 0.4

22 / 52

Page 28: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Retrieval Quality

Retrieval metrics

RET: set of retrieved documents

REL: set of relevant documents in the database

Precision p: Proportion of relevant among retrieved

Recall r : Proportion of retrieved among relevant

p =|REL ∩ RET ||RET |

r =|REL ∩ RET ||REL|

Example:20 relevant documents for the current query.System returns 10 dokumente, of which 8 are relevant.

Precision: p = 8/10 = 0.8Recall: r = 8/20 = 0.4

22 / 52

Page 29: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Retrieval Quality

Retrieval metrics

RET: set of retrieved documents

REL: set of relevant documents in the database

Precision p: Proportion of relevant among retrieved

Recall r : Proportion of retrieved among relevant

p =|REL ∩ RET ||RET |

r =|REL ∩ RET ||REL|

Example:20 relevant documents for the current query.System returns 10 dokumente, of which 8 are relevant.

Precision: p = 8/10 = 0.8Recall: r = 8/20 = 0.4

22 / 52

Page 30: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Representations

Semantic DescriptionsFree Text SearchObjects, Representations, and Descriptions

Page 31: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Representations

Representations

Free text search search in document text

Semantic approach assign semantic descriptions

24 / 52

Page 32: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Representations

Semantic Descriptions

Semantic Descriptions

classification schemes e.g. hierarchic classification, as in librariesor product catalogs

Tagging users assign tags

Ontologies e.g. OWL: Web Ontology Language

25 / 52

Page 33: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Representations

Semantic Descriptions

Classification/Ontology Example: DMOZ

26 / 52

Page 34: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Representations

Free Text Search

Free Text SearchProblems

Inflectioncomputer – computers, fly – fliesgo – goes – going

Derivationcompute - computer - computerization - computation

Synonymsmobile – smartphone, table – bench – board –counter

Polysemesbank, head

Compoundssteamboat, testbed

Phrasesinformation retrieval – retrieval of information

27 / 52

Page 35: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Representations

Free Text Search

Free Text SearchApproaches

inflection, derivation stemming algorithmscomputer, computation, computerize → comput

synonyms synonym lexicons

compunds splitting algorithms

phrases adjacency search

Most systems implement only stemming and adjacency search!

28 / 52

Page 36: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Representations

Objects, Representations, and Descriptions

A Document Object

29 / 52

Page 37: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Representations

Objects, Representations, and Descriptions

Example: document text, representation, description

Text:Research in the probabilistic theory of information retrieval involvesthe construction of mathematical models. In this kind of theoryconstruction the assumptions laid down ...

Stopword removal and stemming:research probabil theory informat retriev involv constructmathemat model kind theory construct assume lay downRepresentation (Bag of words):(research,1), (probabil,1), (theory,2), (informat,1), (retriev,1),(involv,1), (construct,2), (mathemat,1), (model,1), (kind,1),(assum,1), (lay,1), (down,1),Description:(research,0.5), (probabil,0.5), (theory,1.0), (informat,0.5),(retriev,0.5), (involv,0.5), (construct,1.0), (mathemat,0.5),(model,0.5), (kind,0.5), (assum,0.5), (lay,0.5), (down,0.5)

30 / 52

Page 38: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Representations

Objects, Representations, and Descriptions

Example: document text, representation, description

Text:Research in the probabilistic theory of information retrieval involvesthe construction of mathematical models. In this kind of theoryconstruction the assumptions laid down ...Stopword removal and stemming:research probabil theory informat retriev involv constructmathemat model kind theory construct assume lay down

Representation (Bag of words):(research,1), (probabil,1), (theory,2), (informat,1), (retriev,1),(involv,1), (construct,2), (mathemat,1), (model,1), (kind,1),(assum,1), (lay,1), (down,1),Description:(research,0.5), (probabil,0.5), (theory,1.0), (informat,0.5),(retriev,0.5), (involv,0.5), (construct,1.0), (mathemat,0.5),(model,0.5), (kind,0.5), (assum,0.5), (lay,0.5), (down,0.5)

30 / 52

Page 39: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Representations

Objects, Representations, and Descriptions

Example: document text, representation, description

Text:Research in the probabilistic theory of information retrieval involvesthe construction of mathematical models. In this kind of theoryconstruction the assumptions laid down ...Stopword removal and stemming:research probabil theory informat retriev involv constructmathemat model kind theory construct assume lay downRepresentation (Bag of words):(research,1), (probabil,1), (theory,2), (informat,1), (retriev,1),(involv,1), (construct,2), (mathemat,1), (model,1), (kind,1),(assum,1), (lay,1), (down,1),

Description:(research,0.5), (probabil,0.5), (theory,1.0), (informat,0.5),(retriev,0.5), (involv,0.5), (construct,1.0), (mathemat,0.5),(model,0.5), (kind,0.5), (assum,0.5), (lay,0.5), (down,0.5)

30 / 52

Page 40: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Representations

Objects, Representations, and Descriptions

Example: document text, representation, description

Text:Research in the probabilistic theory of information retrieval involvesthe construction of mathematical models. In this kind of theoryconstruction the assumptions laid down ...Stopword removal and stemming:research probabil theory informat retriev involv constructmathemat model kind theory construct assume lay downRepresentation (Bag of words):(research,1), (probabil,1), (theory,2), (informat,1), (retriev,1),(involv,1), (construct,2), (mathemat,1), (model,1), (kind,1),(assum,1), (lay,1), (down,1),Description:(research,0.5), (probabil,0.5), (theory,1.0), (informat,0.5),(retriev,0.5), (involv,0.5), (construct,1.0), (mathemat,0.5),(model,0.5), (kind,0.5), (assum,0.5), (lay,0.5), (down,0.5)

30 / 52

Page 41: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Representations

Objects, Representations, and Descriptions

Conceptual Model

DD

rel.

judg.

α

αQ βQ

ρIR

DQ

DD

Q

D

Q

R

qk∈ Q query

qk ∈ Q: queryrepresentation

qDk ∈ QD : query description

dm ∈ D document

dm ∈ D: documentrepresentation

dDm ∈ DD : document

descriptionR: relevance scale%: retrieval function

31 / 52

Page 42: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Probabilistic Models

Probabilistic Event SpaceProbability Ranking PrincipleBinary Independence Retrieval ModelBM25 modelLearning to Rank

Page 43: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Probabilistic Models

Probabilistic Event Space

Probabilistic Event space

[Fuhr 92]

qk

dm

dm

qk

D

Q

Q: Queriesqk

: queryqk : query rep.

D: Documentsdm: documentdm: document rep.

33 / 52

Page 44: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Probabilistic Models

Probabilistic Event Space

Event space

dm

qk

dm

qk

qk dm

Q

R N R N

R R N N

R R R R

N N N N

D

P(R| , )=0.5

34 / 52

Page 45: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Probabilistic Models

Probabilistic Event Space

Event space

Event space: Q × Dsingle element: query-document pair (qk , dm)all elements are equiprobable

relevance judgement (qk , dm)εRrelevance judgements for different documents w.r.t. the samequery are independent of each other

Probability of relevance P(rel |qk , dm):probability of a an element of (qk , dm) being relevant

regard collections as samples of possibly infinite sets

poor representation of retrieval objects:single representation may stand for a number of differentobjects.

35 / 52

Page 46: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Probabilistic Models

Probabilistic Event Space

Event space

Event space: Q × Dsingle element: query-document pair (qk , dm)all elements are equiprobable

relevance judgement (qk , dm)εRrelevance judgements for different documents w.r.t. the samequery are independent of each other

Probability of relevance P(rel |qk , dm):probability of a an element of (qk , dm) being relevant

regard collections as samples of possibly infinite sets

poor representation of retrieval objects:single representation may stand for a number of differentobjects.

35 / 52

Page 47: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Probabilistic Models

Probabilistic Event Space

Event space

Event space: Q × Dsingle element: query-document pair (qk , dm)all elements are equiprobable

relevance judgement (qk , dm)εRrelevance judgements for different documents w.r.t. the samequery are independent of each other

Probability of relevance P(rel |qk , dm):probability of a an element of (qk , dm) being relevant

regard collections as samples of possibly infinite sets

poor representation of retrieval objects:single representation may stand for a number of differentobjects.

35 / 52

Page 48: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Probabilistic Models

Probability Ranking Principle

Probability Ranking Principle

defines optimum retrieval for probabilistic models:

rank documents according to decreasing values of the

probability of relevance P(rel |q, d)

Advantage:

PRP yields

optimum retrieval quality

minimum retrieval costs

36 / 52

Page 49: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Probabilistic Models

Probability Ranking Principle

Probability Ranking Principle

defines optimum retrieval for probabilistic models:

rank documents according to decreasing values of the

probability of relevance P(rel |q, d)

Advantage:

PRP yields

optimum retrieval quality

minimum retrieval costs

36 / 52

Page 50: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Probabilistic Models

Binary Independence Retrieval Model

Binary Independence Retrieval Model

represent queries and documents as sets of termsT = {t1, . . . , tn} set of terms in the collection

q ∈ Q: queryrepresentation

dm ∈ D: documentrepresentation

qT : set of query terms

dTm : set of document

terms

simple retrieval function: Coordination level match

%COORD(q, dm) = |qT ∩ dTm |

Binary independence retrieval (BIR) model:assign weights to query terms

%BIR(q, dm) =∑

ti∈qT∩dTm

ci

37 / 52

Page 51: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Probabilistic Models

Binary Independence Retrieval Model

Binary Independence Retrieval Model

represent queries and documents as sets of termsT = {t1, . . . , tn} set of terms in the collection

q ∈ Q: queryrepresentation

dm ∈ D: documentrepresentation

qT : set of query terms

dTm : set of document

terms

simple retrieval function: Coordination level match

%COORD(q, dm) = |qT ∩ dTm |

Binary independence retrieval (BIR) model:assign weights to query terms

%BIR(q, dm) =∑

ti∈qT∩dTm

ci

37 / 52

Page 52: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Probabilistic Models

Binary Independence Retrieval Model

Binary Independence Retrieval Model

represent queries and documents as sets of termsT = {t1, . . . , tn} set of terms in the collection

q ∈ Q: queryrepresentation

dm ∈ D: documentrepresentation

qT : set of query terms

dTm : set of document

terms

simple retrieval function: Coordination level match

%COORD(q, dm) = |qT ∩ dTm |

Binary independence retrieval (BIR) model:assign weights to query terms

%BIR(q, dm) =∑

ti∈qT∩dTm

ci

37 / 52

Page 53: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Probabilistic Models

Binary Independence Retrieval Model

%BIR(q, dm) =∑

ti∈qT∩dTm

ci , ci = logpi (1− si )

si (1− pi )

pi = P(ti |rel): prob. that ti occurs in arbitrary relevant doc.si = P(ti | ¯rel): prob. that ti occurs in arbitrary nonrelevant doc.

38 / 52

Page 54: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Probabilistic Models

Binary Independence Retrieval Model

%BIR(q, dm) =∑

ti∈qT∩dTm

ci , ci = logpi (1− si )

si (1− pi )

pi = P(ti |rel): prob. that ti occurs in arbitrary relevant doc.si = P(ti | ¯rel): prob. that ti occurs in arbitrary nonrelevant doc.

38 / 52

Page 55: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Probabilistic Models

Binary Independence Retrieval Model

Parameter estimationRelevance Feedback

ti occurs ¬ ti occurs

relevant ri R − ri R

¬ relevant ni − ri N − ni − R + ri N − R

ni N − ni N

pi = P(ti |rel) prob. that ti occurs in arbitrary relevant doc.

pi ≈riR

si = P(ti | ¯rel) prob. that ti occurs in arbitrary nonrelevant doc..

si ≈ni − riN − R

≈ ni

N

39 / 52

Page 56: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Probabilistic Models

Binary Independence Retrieval Model

Parameter estimation w/o relevance feedback

N - # documents in the collectionni - # documents containing term ti

si = P(ti | ¯rel) prob. that ti occurs in arbitrary nonrelevant doc..

si ≈ni

N

pi = P(ti |rel) prob. that ti occurs in arbitrary relevant doc.assume constant value: p = 0.5

ci = logpi (1− si )

si (1− pi )= log

p

1− p+ log

1− sisi

= 0 + logN − ni

ni≈ log

N

ni

IDF (inverse document frequency) weight: log N/ni

40 / 52

Page 57: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Probabilistic Models

Binary Independence Retrieval Model

Parameter estimation w/o relevance feedback

N - # documents in the collectionni - # documents containing term ti

si = P(ti | ¯rel) prob. that ti occurs in arbitrary nonrelevant doc..

si ≈ni

N

pi = P(ti |rel) prob. that ti occurs in arbitrary relevant doc.assume constant value: p = 0.5

ci = logpi (1− si )

si (1− pi )= log

p

1− p+ log

1− sisi

= 0 + logN − ni

ni≈ log

N

ni

IDF (inverse document frequency) weight: log N/ni

40 / 52

Page 58: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Probabilistic Models

BM25 model

BM25

[Robertson et al 95]heuristic extension of the BIR modelfrom binary to weighted indexing(consideration of within-document frequency tf )

tf

1

umi

mi

41 / 52

Page 59: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Probabilistic Models

BM25 model

tf*idf Weighting

originally developed for (non-probabilistic) vector space model

set of heuristics: the weight of a term should be higher. . .1 the less frequent the term occurs in the collection

(inverse document frequency, idf — see above)2 the more often the term occurs in the document (tf)3 the shorter the document

42 / 52

Page 60: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Probabilistic Models

BM25 model

From binary to weighted Indexing

lm document length(# tokens in dm)al average document length in D

tfmi : occurrence frequency of ti in dm.b weight of length normalization, 0 ≤ b ≤ 1k weight of occurrence frequency

length normalization: B =

((1− b) + b

lmal

)normalized within-document frequency: ntfmi = tfmi/B

BM25 weight: umi =ntfmi

k + ntfmi

=tfmi

k((1− b) + b lm

al

)+ tfmi

43 / 52

Page 61: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Probabilistic Models

Learning to Rank

Parameter learning in IR

[Fuhr 92]

Learning approaches in IR

44 / 52

Page 62: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Probabilistic Models

Learning to Rank

Learning to Rank for Web Searches

45 / 52

Page 63: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Interactive Retrieval

Search modelsAnomalous State of KnowledgeIngwersen’s Cognitive Model

Page 64: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Interactive Retrieval

Search models

Search modelsClassical search process model

47 / 52

Page 65: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Interactive Retrieval

Search models

Empirical studies

information search consists of a sequence of connected, butdifferent searches

search result may trigger new searches

only task context remains the same

main goal of a search is accumulated learning and collectionof new information while searching

48 / 52

Page 66: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Interactive Retrieval

Search models

Search modelsBerry picking-Model

[Bates 90]

continuous change of information need and queries duringsearchinformation need cannot be satisfied by a single result setinstead: sequence of selections and collection of pieces ofinformation during search

49 / 52

Page 67: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Interactive Retrieval

Anomalous State of Knowledge

Anomalous State of Knowledge (ASK)(1)

[Belkin 80]

Classic IR systems: ”best match” principle

system returns those documents that fit best to therepresentation of the information need (e.g. query statement)

only feasible, if user can give precise specification of herinformation need (like e.g. in DBMS)

50 / 52

Page 68: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Interactive Retrieval

Anomalous State of Knowledge

Anomalous State of Knowledge (ASK)(2)

ASK-Hypothesis

information need results from user’s anomalous state ofknowledge (ASK)

user is unable to precisely specify information need forremoving the ASK

instead: describe ASK

requires capture of cognitive and situation-specific aspects forresolving this anomaly

51 / 52

Page 69: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Interactive Retrieval

Anomalous State of Knowledge

Anomalous State of Knowledge (ASK)(2)

ASK-Hypothesis

information need results from user’s anomalous state ofknowledge (ASK)

user is unable to precisely specify information need forremoving the ASK

instead: describe ASK

requires capture of cognitive and situation-specific aspects forresolving this anomaly

51 / 52

Page 70: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Interactive Retrieval

Anomalous State of Knowledge

Anomalous State of Knowledge (ASK)(2)

ASK-Hypothesis

information need results from user’s anomalous state ofknowledge (ASK)

user is unable to precisely specify information need forremoving the ASK

instead: describe ASK

requires capture of cognitive and situation-specific aspects forresolving this anomaly

51 / 52

Page 71: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Interactive Retrieval

Anomalous State of Knowledge

Anomalous State of Knowledge (ASK)(2)

ASK-Hypothesis

information need results from user’s anomalous state ofknowledge (ASK)

user is unable to precisely specify information need forremoving the ASK

instead: describe ASK

requires capture of cognitive and situation-specific aspects forresolving this anomaly

51 / 52

Page 72: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Interactive Retrieval

Anomalous State of Knowledge

Anomalous State of Knowledge (ASK)(2)

ASK-Hypothesis

information need results from user’s anomalous state ofknowledge (ASK)

user is unable to precisely specify information need forremoving the ASK

instead: describe ASK

requires capture of cognitive and situation-specific aspects forresolving this anomaly

51 / 52

Page 73: Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction

Interactive Retrieval

Ingwersen’s Cognitive Model

Ingwersen’s Cognitive Model

Organiz.

SocialContext

Cultural

CognitiveActor(s)

(team)Interface

Informationobjects

IT: EnginesLogics

Algorithms

Cognitive transformations and influenceInteractive communications of cognitive structures

52 / 52