1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com)...

Quicklink Selection for Navigational Query Results

Deepayan Chakrabarti (deepay@yahoo-inc.com)

Ravi Kumar (ravikuma@yahoo-inc.com)

Kunal Punera (kpunera@yahoo-inc.com)

What are quicklinks

Quicklinks

Result Website

Quicklinks = URLs within the search result website Enable fast navigation to important parts of the

website Which URLs should be QLs?

Quicklinks

Result Website

Quicklink Selection

Some obvious strategies don’t work very well Top clicked URLs in search engine

URL may have low relevance in the QL context lib.utexas.edu/maps is popular for searches on “maps” and

not for searches on “Univ. of Texas” URL may be too specific:

automobiles.honda.com/civic-hybrid/exterior-photos.aspx for honda.com

URL popularity be time sensitive: nytimes.com/election-guide/2008/ for nytimes.com

Quicklink Selection

Some obvious strategies don’t work very wellTop clicked URLs in search engine

Top visited URLs intoolbar data May not relate to search activity:

e.g., for nytimes.com #3 is nytimes.com/mem/emailthis.html #6 is nytimes.com/auth/login #8 is nytimes.com/gst/regi.html

Quicklink Selection

Some obvious strategies don’t work very wellTop clicked URLs in search engine

Top visited URLs in toolbar data

Top URLs from analysis of hyperlink graph Ignores preferences of search users Toolbar data is more representative

Heavily tagged URLs (e.g., del.icio.us/digg) Low coverage: Too few websites

Quicklink Selection

Need a combined approach Search logs Toolbar data Web-server logs Website hyperlink graph User tags

This paper

Related Work

Sitemap generation [Perkowitz+/00] Detection of hard-to-find URLs [Srikant+/01] Improving website navigability [Doerr+/07] Mining Web usage patterns [Buchner/99,

Cadez+/03] BrowseRank [Liu+/08] Post-search browsing behavior [Bilenko+/08]

We focus on QLs in the context of Search

Outline

Motivation and Related Work Problem Formulation Proposed Solution Experiments Conclusions

Problem Formulation

Which k URLs should be QLs?

“The greatest good for the greatest number”

QLs save clicks Maximize the total number of clicks saved

using at most k QLs But when exactly is a click “saved”?

Problem Formulation

When does a QL get clicked by the user?

Graph of click trails (Toolbar data)

Say we pick this node as a QL

nasa.gov

Hubble telescope

Photos

Problem Formulation

Assumption:The user recognizes if SearchResult QL Destination

nasa.gov

Hubble telescope

Photos

Problem Formulation

(saves 1 click each)

nasa.gov

Problem Formulation

(saves 1 click each)

(saves 2 clicks each)

(saves 0)

Total savings = 1*3 + 2*2 = 7 clicks

nasa.gov

Problem Formulation

However…

Unknown pages might become QLs

lyrics.com

A B C Z… These could become the “best” QLs

Problem Formulation

However… Unknown pages might become QLs Automatic-redirect pages might become QLs:

nytimes.com forces logging in aaa.com forces zipcode entry

We need QLs that are “noticeable” in a search context

Problem Formulation

How can we estimate noticeability? Via Search click-logs Noticeability of a URL u:

User notices a useful QL with probability α(u)

Tuning param(≈ 2)

Fraction of search clicks for u on website

Problem Formulation

(saves 0)

# trail prob #clicks

saves 2 x α1 x 2

saves 1 x α1 x 1

saves 2 x (1-α2)α1 x 1

saves 2 x α2 x 2

Total = 5α1 + 4α2 + 2(1-α1)α2

Assumption:The user picks the best QL that he/she notices

nasa.gov

Problem Formulation

(saves 0)

# trail prob #clicks

saves 2 x α1 x 2

saves 1 x α1 x 1

saves 2 x (1-α2)α1 x 1

saves 2 x α2 x 2

Total = 5α1 + 4α2 + 2(1-α1)α2

If only QL1 is perfectly noticeable (α1=1, α2=0): Total = 7 clicks (as if 1 QL only)

If both QLs are perfectly noticeable (α1=1, α2=1): Total = 9 clicks

nasa.gov

Problem Formulation

Which k URLs should be QLs?

Maximize the expected number of clicks saved using at most k QLs while incorporating “noticeability”

Outline

Motivation and Related Work

Problem Formulation Proposed Solution Experiments Conclusions

Algorithms

Maximize expected number of saved clicks using k QLs NP-Hard

Theorem: This objective is non-decreasing submodular

1. Non-negative

2. Adding QLs never hurts

3. “Diminishing Returns”

Marginal improvement to set S

Marginal improvement to superset S’

Algorithms

Greedy algorithm: Iteratively pick QLs that increase the number of saved clicks the most Within a factor (1-1/e) of OPT

[Nemhauser+/’78]

Algorithms

However… Inhomogeneous results: QLs for ea.com are

fifa08.ea.com battlefield.ea.com 6 webpages deep inside thesim2.ea.com

Redundant results: QLs for senate.gov include obama.senate.gov obama.senate.gov/about obama.senate.gov/contact obama.senate.gov/votes

Parent URL makes the child URLs

redundant

Two games made by EA

Algorithms

Both can be specified as pairwise constraints on URLs allowed to belong to a QL set

Pairwise-constrained QL selection isNP-hard.

Two-step process: Heuristically find a large subset of trails that form

a tree Enforce constraints on tree

Dynamic program optimal on tree

Outline

Problem Formulation

Proposed Solution Experiments Conclusions

Experiments

Baseline Methods TopClicked:

URL score = # search clicks on URL TopVisited:

URL score = # occurrences on toolbar trails PageRank:

Build a weighted graph on URLs, where weight(i,j) = # trails using the ij edge

URL score = PageRank on this graph

Experiments

Live Traffic dataset Computed CTRs on QLs currently displayed by

Yahoo! (1043 website subset) Measure:

Pick two equal-sizes subsets of QLs Use sum-of-scores and sum-of-CTRs to predict the

better subset Measure how often the predictions match

Experiments Live Traffic Data

Subset sizesFra

QL-ALG > TopVisited > PageRank > TopClicked

Experiments

Tree-structured trails Most dropped trails are

very short Tree-structured trails

improve accuracy

1 10 100 1000 100000

Length of trail

Live Traffic prediction quality comparison

Distribution of dropped trails

Outline

Problem Formulation

Proposed Solution

Experiments Conclusions

Conclusions

Proposed a formulation for the QL selection problem Both toolbar and search logs are used intuitively

Proposed two algorithms: Greedy: (1-1/e)-optimal Tree-structured: empirically better

Improvement of 22% over competing baselines

1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com)...

Documents

Transcript of 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com)...

Dynamic Reconﬁguration of Primary/Backup … Reconﬁguration of Primary/Backup Clusters Alexander Shraer Benjamin Reed Yahoo! Research fshralex, breedg@yahoo-inc.com Dahlia Malkhi

Building BI App on Cloud Rohit Chatter Sr. Architect@Yahoo! rohitc@yahoo-inc.com.

Introduction to Hadoop Owen O’Malley Yahoo!, Grid Team owen@yahoo-inc.com.

WEB WORKERS 1 Amitesh Madhur (amitesh@yahoo-inc.com) (Exceptional Performance, Bangalore)

Efficient Pagination Using MySQL - Percona · Efficient Pagination Using MySQL Surat Singh Bhati (surat@yahoo-inc.com) Rick James (rjames@yahoo-inc.com) Yahoo Inc Percona Performance

1 Internet Advertising Ramana Yerneni, Yahoo! Labs yerneni@yahoo-inc.com August 17, 2010.

Carlos Castillo, chato@yahoo-inc.com Debora Donato, debora@yahoo-inc.com Aristides Gionis, gionis@yahoo-inc.com Vanessa Murdock, vmurdock@yahoo-inc.com.

Clustering Algorithms for Chains€¦ · Clustering Algorithms for Chains Antti Ukkonen AUKKONEN@YAHOO-INC.COM Yahoo! Research Av. Diagonal 177 08018 Barcelona, Spain Editor: Marina

Multi-view Face Detection Using Deep Convolutional · PDF fileMulti-view Face Detection Using Deep Convolutional Neural Networks Sachin Sudhakar Farfade Yahoo fsachin@yahoo-inc.com

Nate Koechley – natek@yahoo-inc.com Yahoo! vs. Yahoo!: Case Studies Nate Koechley Senior Engineer & Designer, Yahoo! User Interface (YUI) Library Team.

Simplifying Mobile Development with Yahoo! Blueprint Ricardo Varela ricardov@yahoo-inc.com ricardov@yahoo-inc.com.

Yahoo! Search Jonathan Glick – Sr. Manager Yahoo! Search glickj@yahoo-inc.com Sept. 28, 2004.

Yahoo! OpenID and OAuth 1 Allen Tom Yahoo! Membership Architect OpenID Foundation Board Member atom@yahoo-inc.com @atom.

Account Management Best Practices OpenID for Mobile Webfinger Allen Tom Yahoo! Membership Architect atom@yahoo-inc.com @atom.

Yahoo! vs. Yahoo! Three Large-Scale Mainstream DHTML Implementations Nate Koechley natek@yahoo-inc.com natek@yahoo-inc.com nate@koechley.com nate@koechley.com.

Chiru Jaladi (chiru@yahoo-inc.com) Kit Chan (kichan@yahoo ... · ApacheCon 2015, Austin TX Kit Chan (kichan@yahoo-inc.com) Chiru Jaladi (chiru@yahoo-inc.com) Before We Begin This

Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Visualization ( Nonlinear dimensionality reduction )jordan/MLShortCourse/manifolds.pdf · Visualization ( Nonlinear dimensionality reduction ) Fei Sha Yahoo! Research feisha@yahoo-inc.com

Learning to Rank with (a Lot of) Word Featuresqyj/papersA08/2009_ssi_jir.pdfE-mail: chap@yahoo-inc.com, kilian@yahoo-inc.com 2 learn) a similarity metric that operates in this vector

yalesong@yahoo-inc.com liangliang@yahoo …Video2GIF: Automatic Generation of Animated GIFs from Video Michael Gygli CVL, ETH Zurich Zurich, Switzerland gygli@vision.ee.ethz.ch Yale