Algorithms for query result diversification

Swap algorithm for query result diversification

Emre Can Kucukoglu, eckucukoglu@gmail.comReference articles:

• C. Yu, L. Lakshmanan, and S. Amer-Yahia, “It takes variety to make a world: diversification in recommender systems,” in EDBT, 2009.

• Marcos R. Vieira, Humberto Luiz Razente, Maria Camila Nardini Barioni, Marios Hadjieleftheriou, Divesh Srivastava, Caetano Traina Jr., Vassilis J. Tsotras: On query result diversification. ICDE 2011: 1163-1174

Start with the k highest scoring documents, and swap the document which contributes the least to the function F with the next highest scoring document among the remaining documents. At each iteration, acandidate document with a lower relevance is swapped into the top-k set if and only if it increases the overall function F value of the resulting set.Function F:

λ is the tradeoff value between sim function and divfunction.k is the result set size.sim function computes the sum of similarity distancesamong all documents.div function computes the sum of diversity distances among all documents. For a given documents, every pair of documents’ diversity distances are added to find div function result.Similarity distances can be computed with tf-idf or other algorithms.

Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8},k = 4,decreasing order document relevance scores:

Inside the while loop in [2-5], first 4 documents are added to result set R.

d4 d5 d1 d7 d6 d2 d8 d3

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

R STEP 1

Inside the while loop in [2-5], first 4 documents are added to result set R.These are d4, d5, d1, d7.

d4 d5 d1 d7 d6 d2 d8 d3

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

STEP 1

Inside the while loop in [2-5], first 4 documents are added to result set R.These are d4, d5, d1, d7.And remove them from candidate set S.

d4 d5 d1 d7 d6 d2 d8 d3

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

STEP 1S

While candidate set S has documents, pick highest scoring document.

d4 d5 d1 d7 d6 d2 d8 d3

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

STEP 2S

While candidate set S has documents, pick highest scoring document.First pick d6.

d4 d5 d1 d7 d6 d2 d8 d3

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

STEP 2S

While candidate set S has documents, pick highest scoring document.First pick d6. And remove it from S.

d4 d5 d1 d7 d6 d2 d8 d3

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

STEP 2S

While candidate set S has documents, pick highest scoring document.First pick d6. And remove it from S.Assign R’ to R.

d4 d5 d1 d7 d6 d2 d8 d3

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

STEP 2S

Pick d7 from R [line 10],Compute function F value for A={d4,d5,d1,d6} and R’={d4,d5,d1,d7}.

d4 d5 d1 d7 d6 d2 d8 d3

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

STEP 3S

Pick d7 from R [line 10],Compute function F value for A={d4,d5,d1,d6} and R’={d4,d5,d1,d7}.Let F(q,A) > F(q, R’)

Then update R’ with A={d4,d5,d1,d6}.

d4 d5 d1 d7 d6 d2 d8 d3

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

STEP 3S

Pick d1 from R [line 10],and compute F value for A={d4,d5,d6,d7} and R’={d4,d5,d1,d6}.

d4 d5 d1 d7 d6 d2 d8 d3

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

STEP 4S

Pick d1 from R [line 10],and compute F value for A={d4,d5,d6,d7} and R’={d4,d5,d1,d6}.Let F(q,A) < F(q, R’)

d4 d5 d1 d7 d6 d2 d8 d3

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

STEP 4S

d4 d5 d1 d7 d6 d2 d8 d3

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

STEP 5S

Pick d5 from R [line 10],and compute F value for A={d4,d6,d1,d7} and R’={d4,d5,d1,d6}.Let F(q,A) > F(q, R’)

Then update R’ with A={d4,d5,d6,d7}.

d4 d5 d1 d7 d6 d2 d8 d3

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

STEP 5S

d4 d5 d1 d7 d6 d2 d8 d3

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

STEP 6S

Pick d4 from R [line 10],and compute F value for A={d6,d5,d1,d7} and R’={d4,d5,d6,d7}.Let F(q,A) < F(q, R’)

d4 d5 d1 d7 d6 d2 d8 d3

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

STEP 6S

After inner for-loop [10-12],If R’ has higher return value for funtion F than initial R set,Assign R to R’.

d4 d5 d1 d7 d6 d2 d8 d3

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

STEP 7S

After inner for-loop [10-12],If R’ has higher return value for funtion F than initial R set,Assign R to R’. [13-14]

Let F(q,R’) > F(q, R).

d4 d5 d1 d7 d6 d2 d8 d3

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

STEP 7S

While candidate set S has documents, pick highest scoring document.Second pick d2. And remove it from S.Assign R’ to R.

And repeat step [3-6].

d4 d5 d1 d7 d6 d2 d8 d3

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

STEP 8S

If we assume complexity of function F as O(C),overall complexity of swap algorithm is O(N.k.C)

N is the size of the candidate set S.

The F value of the final result R is not guaranteed to be optimal, since documents in the candidateset S are analyzed with respect to their similarity distances.

That is, this method does not consider the order of diversity distances in S.

MMR* algorithm for query result diversification

Emre Can Kucukoglu, eckucukoglu@gmail.comReference articles:

• J. Carbonell and J. Goldstein, “The use of MMR, diversity-based reranking for reordering documents and producing summaries,” in Proc. ACM SIGIR, 1998.

• Marcos R. Vieira, Humberto Luiz Razente, Maria Camila Nardini Barioni, Marios Hadjieleftheriou, Divesh Srivastava, Caetano Traina Jr., Vassilis J. Tsotras: On query result diversification. ICDE 2011: 1163-1174

*: Maximal Marginal Relevance

MMR algorithm iteratively constructs the result set Rby selecting a new document in S that maximixes the function mmr:

λ is the tradeoff value between δsim function and δdiv function.k is the result set size.δsim function computes the similarity distancesbetween query and document.δdiv function computes the diversity distances between a pair of documents.Similarity distances can be computed with tf-idf or other algorithms.Since R is empty in the initial iteration, |R| is 0, so that the element with the highest δsim in S is always included in R, regardless of the λ value.

Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8},k = 4,decreasing order function δsim scores:

Since R is initially empty, first element is picked according to similarity distances.

d4 d5 d1 d7 d6 d2 d8 d3

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

STEP 1R S

Since R is initially empty, first element is picked according to similarity distances.

d4 has highest δsim score. Add it to R, remove from S.

d4 d5 d1 d7 d6 d2 d8 d3

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

STEP 1R

Then for every step, pick highest scoring document from S, according to their function mmrresults until |R| = k.

d4 d5 d1 d7 d6 d2 d8 d3

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

STEP 2-4R

Let d3,d6 and d8 have highest scores.

d4 d5 d1 d7 d6 d2 d8 d3

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

STEP 2-4R

Let d3,d6 and d8 have highest scores.

For each calculation of function mmr, since |R| value is increasing, weight ofdiversity distances is decreasing.

d4 d5 d1 d7 d6 d2 d8 d3

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

STEP 2-4R

Since the result is incrementally constructed by inserting a new element to previous results, the first chosen element has a large influence in the quality of the final result set R.

Moreover, experimental results show that the quality of the results for the MMR method decreases very fast when increasing the λ parameter.

If we assume complexity of picking highest scoring document according to function mmr as O(C),overall complexity of MMR algorithm is O(k.C).

Algorithms for query result diversification

Data & Analytics

Transcript of Algorithms for query result diversification

Chapter 15 Algorithms for Query Processing and Optimization Copyright 2004 Pearson Education, Inc.

1 Algorithms for Query Processing and Optimization.

Diversification Strategies and Adaptation Deficit ...ageconsearch.umn.edu/bitstream/246282/2/60. Diversification... · Diversification Strategies and Adaptation Deficit: ... Diversification

Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.

Explicit Query Interpretation and Diversification for ...users.monash.edu/~yli/assets/pdf/concept_search.pdfA SPARQL-based query ... A comprehensive evaluation against the domain-specific

Comparison of the Accuracy of Black Hole Algorithms and ...ijfma.srbiau.ac.ir/article_15041_1267e2d6bc85fd57ffe3bef9588192af.pdf · stock market prediction and diversification is

PrefDiv: E icient Algorithms for E ective Top-k Result ...PrefDiv: E icient Algorithms for E ective Top-k Result Diversification Xiaoyu Ge University of Pittsburgh xiaoyu@cs.pitt.edu

Efficient Algorithms for Answering the m-Closest Keywords Query

Leveraging Dynamic Query Subtopics for Time-aware Search Result Diversification

Query evaluation techniques for large databasesweb.eecs.umich.edu/~jag/eecs584/papers/queryproc_graefe.pdf · efficient algorithms and software archi-tectures of database query execution

Algorithms for Query Processing and Optimizationinfo.usherbrooke.ca/.../Modules/BD041-Algorithmes-relationnels-et... · Title: Elmasri_6e_Ch19_rLL.pptx Created Date: 20160929141603Z

Survey of Algorithms to Query Image Databases

Optimizing Reformulation-based Query Answering in RDFRDF query answering, SPARQL, query reformulation, query optimization, heuristic algorithms Résumé La technique de r eponse aux

ICS 424 - 01 (072)Query Processing and Optimization 1 Chapter 15 Algorithms for Query Processing and Optimization ICS 424 Advanced Database Systems Dr.

Copyright © 2004 Pearson Education, Inc.. Chapter 15 Algorithms for Query Processing and Optimization.

Query Processing - Duke University4 4 Scanning-based algorithms 5 5 Table scan •Scan table Rand process the query ... Hashing-based algorithms 22 ... •External merge sort, sort-merge

Algorithms for Query Processing and Optimization

IRDM WS 2005 3-1 Chapter 3: Top-k Query Processing and Indexing 3.1 Top-k Algorithms 3.2 Approximate Top-k Query Processing 3.3 Index Access Scheduling.

Chapter 15 Algorithms for Query Processing and Optimization

Machine Learning in Wireless Sensor Networks: Algorithms ... · PDF fileMachine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications ... query processing,