Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter:...

23
Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip Konark

Transcript of Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter:...

Page 1: Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.

Qinqing Gan

Torsten Suel

Improved Techniques for Result Caching in Web Search Engines

Presenter: Arghyadip ● Konark

Page 2: Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.

Summary:Result caching in web search engines

(1)Query Result Caching of search engines to improve the query processing performance.

(2)To increase the effective throughput of the entire search engine system.

(3)Discussion of various weighted ,un-weighted and hybrid query result caching techniques.

(4)Performance Evaluation.

Page 3: Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.

Query Processing Main challenge for query processing is the

significant size of the index data for a query

Need to optimize to scale with users and data

Caching is one of such optimizations Result caching: has query occurred before? List caching: has index data for term been accessed

before?

Page 4: Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.

Query Co-ordinator

Page 5: Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.

Related Work• Number of subsequent papers on result caching: (Cache Hit

only)• Baeza-Yates et al. (SPIRE 2003, 2007, SIGIR 2003)

• Fagni et al. (TOIS 2006)

• Lempel/Moran (WWW 2003)

• Saraiva et al. (SIGIR 2001)

• Xie/Hallaron (Infocom 2002)

• Fagni el al. proposes hybrid methods that combine a dynamic cache with a more static cache

• Baeze-Yates et al. (Spire 2007) use some features for cache admission policy

Page 6: Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.

Caching Basics• LRU: least recently used

• LFU: least frequently used

• Can be implemented using basic data structures

score defined as the time since last occurrence of the same query in LRU, or the frequency of a query in LFU. Evict query with smallest score

• Recency (LRU) vs. frequency (LFU)

• Various hybrids: Combines two or more.

Page 7: Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.

SDC (Static and Dynamic Caching)

LFULRU

Alpha = 0.7

Fagni et al. (TOIS 2006)

Page 8: Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.

Characteristics of Queries(AOL Query Log)• Query frequencies follow Zipf distribution

• While a few queries are quite frequent, most queries occur only once or a few times

Double Logarithmic Scale

Page 9: Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.

Characteristics of Queries• Query traces exhibit some amount of burstiness, i.e.,

most of the queries occur only once or twice• A significant part of this burstiness is due to the same user

reissuing a query to the engine.

•With an assumed query arrival rate at 132 Queries per minute

•Most queries repeat within few minutes/hour

Page 10: Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.

Only Cache Hit?

• Query Result Fails.• Frequent Admission and

Eviction Occurs.

Page 11: Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.

Ideology:

• Study result caching as a weighted caching problem- Hit ratio

- Cost saving

• Hybrid algorithms for weighted caching

Page 12: Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.

Weighted Caching• Assume all cache entries have same size.

• Standard caching: all entries also same cost• Weighted caching: different costs.

• Result caching: some queries more expensive to recompute than others

• In fact, costs highly skewed• Should keep expensive results longer

Page 13: Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.

Weighted Caching Algorithms• LFU_w: evict entry with smallest value of past frequency * cost (weighted version on LFU)

• Landlord• On insertion, give entry a deadline equal to its cost• Evict entry with smallest deadline, and deduct this deadline

from all other deadlines in the cache

Weighed version of LFU (Young, Cao/Irani 1998)

• SDC_w: Combination of LFU_w and Landlord.

Page 14: Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.

Hit Ratio of Basic Algorithms

Page 15: Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.

Cost Reduction

Page 16: Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.

New Hybrid Algorithms

• SDC• lru_lfu• landlord_lfu_w

Page 17: Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.

Weighted Caching and Power Laws

• Problem with weighted caching with high skew• Suppose q_1 has occurred once and has cost 10,

and q_2 has occurred 10 times and has cost 1• LFU_w gives same priority is that right?

• Lottery:• Multiple rounds, one winner per round• Some people buy more tickets than others• But each person buys same number each week• Given past history, guess future winners• Suppose ticket sales are Zipfian

Page 18: Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.

Weighted Caching and Power Laws

• Compare: smoothing techniques in language models• Three solutions:

• Good-Turing estimator• Estimator derived from power law• Pragmatic: fit correction factors from real data

• Last solution subsumes others

Page 19: Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.

Weighted Zipfian Caching

Frequency g()

1 0.05

2 0.25

3 0.35

4 0.75

>=5 1.0

E.g, in LFU_w, Priority score = cost * frequency * g()

Page 20: Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.

Hybrid Algorithms After Adding Correction

Page 21: Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.

Dataset and Evaluations

• 2006 AOL query log with 36 million queries• 4GB of Data Collected as HTML Pages from Quora• Lemur Search Engine has no support for Result

Caching• Plan to Develop Weighted LRU, LFU and SDC Result

Caching on top of Lemur• Compare the performance with different weights

assigned to Hit Ratio and Load over all the above caching variants

• Evaluate which weight metric works best

Page 22: Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.

Evaluation Methodology

Page 23: Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.

Questions?