Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval

Probabilistic Models of Novel Probabilistic Models of Novel Document Rankings for Document Rankings for Faceted Topic RetrievalFaceted Topic RetrievalBen Cartrette and Praveen Chandar

Dept. of Computer and Information ScienceUniversity of Delaware

Newark, DE ( CIKM ’09 )

Date: 2010/05/03Speaker: Lin, Yi-JhenAdvisor: Dr. Koh, Jia-Ling

AgendaAgendaIntroduction

- Motivation, GoalFaceted Topic Retrieval

- Task, EvaluationFaceted Topic Retrieval Models

- 4 kinds of modelsExperiment & ResultsConclusion

Introduction - Motivation Introduction - Motivation Modeling documents as independently

relevant does not necessarily provide the optimal user experience.

Traditional evaluation measure

would reward System1 since it has higher

recall

Introduction - MotivationIntroduction - Motivation

Actually, we prefer System2 (since it has more

information)

System2 is better !

Introduction Introduction Novelty and diversity become the

new definition of relevance and evaluation measures .

They can be achieved through retrieving documents that are relevant to query, but cover different facets of the topic.

we call faceted topic retrieval !

Introduction - Goal Introduction - Goal The faceted topic retrieval

system must be able to find a small set of documents that covers all of the facets

3 documents that cover 10 facets is preferable to 5 documents that cover 10 facets

Faceted Topic Retrieval - Faceted Topic Retrieval - TaskTaskDefine the task in terms ofInformation need :

A faceted topic retrieval information need is one that has a set of answers – facets – that are clearly delineated

How that need is best satisfied :Each answer is fully contained within at least one document

Faceted Topic Retrieval - Faceted Topic Retrieval - TaskTask

Information need

invest in next generation technologies

increase use of renewable energy sourcesInvest in renewable energy sources

double ethanol in gas supply

shift to biodiesel

shift to coal

Facets (a set of answers)

Faceted Topic Retrieval Faceted Topic Retrieval A Query :A sort list of keywords

A ranked list of documents that contain as many

unique facets as possible.

Faceted Topic Retrieval -Faceted Topic Retrieval -EvaluationEvaluationS-recallS-precisionRedundancy

Evaluation – Evaluation – an example for S-recall and S-precisionan example for S-recall and S-precisionTotal : 10 facets (assume all facets

in documents are non-overlapped)

Evaluation – Evaluation – an example for Redundancyan example for Redundancy

Faceted topic retrieval Faceted topic retrieval modelsmodels4 kinds of models

- MMR (Maximal Marginal Relevance)- Probabilistic Interpretation of MMR- Greedy Result Set Pruning- A Probabilistic Set-Based Approach

1. MMR1. MMR

2. Probabilistic 2. Probabilistic Interpretation of MMRInterpretation of MMR

Let c1=0, c3=c4

3. Greedy Result Set 3. Greedy Result Set PruningPruningFirst, rank without considering

novelty (in order of relevance)Second, step down the list of

documents, prune documents with similarity greater than some threshold ϴ

I.e., at rank i, remove any document Dj, j > i, with sim(Dj,Di) > ϴ

4. A Probabilistic Set-Based 4. A Probabilistic Set-Based ApproachApproach P(F ϵ D) :Probability of D contains Fthe probability that a facet Fj occurs

in at least one document in a set D is

the probability that all of the facets in a set F are captured by the documents D is

4. A Probabilistic Set-Based 4. A Probabilistic Set-Based ApproachApproach4.1 Hypothesizing Facets4.2 Estimating Document-Facet

Probabilities4.3 Maximizing Likelihood

4.1 Hypothesizing Facets4.1 Hypothesizing FacetsTwo unsupervised probabilistic methods

:Relevance modelingTopic modeling with LDA

Instead of extract facets directly from any particular word or phrase, we build a “ facet model ”P(w|F)

4.1 Hypothesizing Facets4.1 Hypothesizing FacetsSince we do not know the facet

terms or the set of documents relevant to the facet, we will estimate them from the retrieved documents

Obtain m models from the top m retrieved documents by taking each document along with its k nearest neighbors as the basis for a facet model

Relevance modelingRelevance modelingEstimate m ”facet models“ P(w|Fj) from a set of retrieved documents using the so-called RM2 approach:

DFj : the set of documents relevant to facet Fj fk : facet terms

Topic modeling with LDATopic modeling with LDAProbabilistic P(w|Fj) and P(Fj)

can found through expectation maximization

4.2 Estimating Document-4.2 Estimating Document-Facet ProbabilitiesFacet ProbabilitiesBoth the facet relevance model and

LDA model produce generation probabilistic P(Di|Fj)

P(Di|Fj) : the probability that sampling terms from the facet model Fj will produce document Di

4.3 Maximizing Likelihood4.3 Maximizing LikelihoodDefine the likelihood function

Constrain : K : hypothesized minimum number

required to cover the facetsMaximizing L(y) is a NP-Hard problemApproximate solution :For each facet Fj, take the document

Di with maximum

Experiment - DataExperiment - DataA Query :A sort list of keywords

Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval

Documents

Transcript of Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval

Browsing-oriented Semantic Faceted Search

Beyond Basic Faceted Search

Many-Faceted Rasch Modeling Expert Judgment in Test ...libres.uncg.edu/ir/uncg/f/A_Chen_Many_1998.pdf · Many-Faceted Rasch Modeling Expert Judgment in Test Development ... Many-faceted

Faceted Identities

Rocket NXT Faceted Search

Dynamic Taxonomies and Faceted Search

Guided Navigation (Faceted Search)

Fund Performance Rankings...2019/04/30 · Bond Fund Rankings 5 Hybrid Fund Rankings 10 Stock Fund Rankings 14 Fund Directory 37 Fund Performance Rankings | SoundMindInvesting.com

Faceted Search

Track Your Keyword Rankings and Competitors Rankings - Live

A CROSS-DISCIPLINARY 2019-2020 RESEARCH UNIVERSITY ...€¦ · European Graduate School of Business Rankings Masters in Management Rankings Asia Rankings Emerging Economies Rankings

Beads Catalogue November 2009 - Ribtex · Acrylic Faceted Heart 17x20mm Crystal 12pcs ACR112 Acrylic Faceted Prism 33x11mm Crystal 6pcs ACR120 Acrylic Faceted Round 15mm Black 10pcs

A semantic faceted web?

Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval

Faceted Search with Lucene

IPC Athletics Rankings Official World Rankings 2015

A Faceted Eye on Intellectual Giftedness: Examining the ... · A Faceted Eye on Intellectual Giftedness: Examining the ... The study examines the personality profile of ... A FACETED

Fast Faceted Search

Map4rdf - Faceted Browser for Geospatial Datasets · Map4rdf - Faceted Browser for Geospatial Datasets WorkdistributedunderthelicenseCreativeCommonsAttribution-Noncommercial-Share

IPC Athletics Rankings Official World Rankings 2015islandstats.com › docs › istatstrack160103.pdf · IPC Athletics Rankings Official World Rankings 2015 Outdoor 26 Luna Rodriguez,