An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th,...

50
An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th , 2010 Yisong Yue Cornell University

Transcript of An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th,...

Page 1: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

An Interactive Learning Approach to Optimizing Information Retrieval Systems

Yahoo!August 24th, 2010

Yisong Yue Cornell University

Page 2: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

InformationSystems

Page 3: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Designer

How to collect feedback?•What to ask human judges?•What results to show users?

How to interpret feedback?•What is the utility function?•What are useful signals?

Optimization

Optimize Utility Function•Tune parameters (signals)

Collecting Feedback

Explicit Judgments•Is this document good?

Usage Logs (clicks)•Clicks on search results.

Page 4: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Designer

How to collect feedback?•What to ask human judges?•What results to show users?

How to interpret feedback?•What is the utility function?•What are useful signals?

Optimization

Optimize Utility Function•Tune parameters (signals)

Collecting Feedback

Explicit Judgments•Is this document good?

Usage Logs•Clicks on search results.

Page 5: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Designer

How to collect feedback?•What to ask human judges?•What results to show users?

How to interpret feedback?•What is the utility function?•What are useful signals?

Optimization

Optimize Utility Function•Tune parameters (signals)

Collecting Feedback

Explicit Judgments•Is this document good?

Usage Logs•Clicks on search results.

Page 6: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Collecting Feedback

Explicit Judgments•Is this document good?

Usage Logs•Clicks on search results.

Optimization

Optimize Utility Function•Tune parameters (signals)

Designer

How to collect feedback?•What to ask human judges?•What results to show users?

How to interpret feedback?•What is the utility function?•What are useful signals?

Page 7: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Interactive Learning Setting

• Find the best ranking function (out of 1000)– Show results, evaluate using click logs– Clicks biased (users only click on what they see)– Explore / exploit problem

Page 8: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Interactive Learning Setting

• Find the best ranking function (out of 1000)– Show results, evaluate using click logs– Clicks biased (users only click on what they see)– Explore / exploit problem

• Technical issues– How to interpret clicks?– What is the utility function?– What results to show users?

Page 9: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Interactive Learning Setting

• Find the best ranking function (out of 1000)– Show results, evaluate using click logs– Clicks biased (users only click on what they see)– Explore / exploit problem

• Technical issues– How to interpret clicks?– What is the utility function?– What results to show users?

Page 10: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Team-Game Interleaving

1. Kernel Machines http://svm.first.gmd.de/

2. Support Vector Machinehttp://jbolivar.freeservers.com/

3. An Introduction to Support Vector Machineshttp://www.support-vector.net/

4. Archives of SUPPORT-VECTOR-MACHINES ...http://www.jiscmail.ac.uk/lists/SUPPORT...

5. SVM-Light Support Vector Machine http://ais.gmd.de/~thorsten/svm light/

1. Kernel Machines http://svm.first.gmd.de/

2. SVM-Light Support Vector Machine http://ais.gmd.de/~thorsten/svm light/

3. Support Vector Machine and Kernel ... Referenceshttp://svm.research.bell-labs.com/SVMrefs.html

4. Lucent Technologies: SVM demo applet http://svm.research.bell-labs.com/SVT/SVMsvt.html

5. Royal Holloway Support Vector Machine http://svm.dcs.rhbnc.ac.uk

1. Kernel Machines T2http://svm.first.gmd.de/

2. Support Vector Machine T1http://jbolivar.freeservers.com/

3. SVM-Light Support Vector Machine T2http://ais.gmd.de/~thorsten/svm light/

4. An Introduction to Support Vector Machines T1http://www.support-vector.net/

5. Support Vector Machine and Kernel ... References T2http://svm.research.bell-labs.com/SVMrefs.html

6. Archives of SUPPORT-VECTOR-MACHINES ... T1http://www.jiscmail.ac.uk/lists/SUPPORT...

7. Lucent Technologies: SVM demo applet T2http://svm.research.bell-labs.com/SVT/SVMsvt.html

f1(u,q) r1 f2(u,q) r2

Interleaving(r1,r2)

(u=thorsten, q=“svm”)

Interpretation: (r1 Â r2) ↔ clicks(T1) > clicks(T2)

Invariant: For all k, in expectation same number of team members in top k from each team.

NEXTPICK

[Radlinski, Kurup, Joachims; CIKM 2008]

Page 11: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Setting

• Find the best ranking function (out of 1000)– Evaluate using click logs– Clicks biased (users only click on what they see)– Explore / exploit problem

• Technical issues– How to interpret clicks?– What is the utility function?– What results to show users?

Page 12: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Dueling Bandits Problem

• Given K bandits b1, …, bK

• Each iteration: compare (duel) two bandits– E.g., interleaving two retrieval functions

[Yue & Joachims, ICML 2009] [Yue, Broder, Kleinberg, Joachims, COLT 2009]

Page 13: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Dueling Bandits Problem

• Given K bandits b1, …, bK

• Each iteration: compare (duel) two bandits– E.g., interleaving two retrieval functions

• Cost function (regret):

• (bt, bt’) are the two bandits chosen

• b* is the overall best one• (% users who prefer best bandit over chosen ones)

T

tttT bbPbbPR

1

1)'*()*(

[Yue & Joachims, ICML 2009] [Yue, Broder, Kleinberg, Joachims, COLT 2009]

Page 14: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

T

tttT bbPbbPR

1

1)'*()*(

•Example 1•P(f* > f) = 0.9•P(f* > f’) = 0.8•Incurred Regret = 0.7

•Example 2 •P(f* > f) = 0.7•P(f* > f’) = 0.6•Incurred Regret = 0.3

•Example 3•P(f* > f) = 0.51•P(f* > f) = 0.55•Incurred Regret = 0.06

Page 15: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Assumptions

• P(bi > bj) = ½ + εij (distinguishability)

• Strong Stochastic Transitivity– For three bandits bi > bj > bk :

– Monotonicity property

• Stochastic Triangle Inequality– For three bandits bi > bj > bk :

– Diminishing returns property

• Satisfied by many standard models– E.g., Logistic / Bradley-Terry

jkijik , max

jkijik

Page 16: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Explore then Exploit

• First explore– Try to gather as much information as possible– Accumulates regret based on which bandits we decide to

compare

• Then exploit– We have a (good) guess as to which bandit best– Repeatedly compare that bandit with itself

• (i.e., interleave that ranking with itself)

T

tttT bbPbbPR

1

1)'*()*(

Page 17: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Naïve Approach

• In deterministic case, O(K) comparisons to find max

• Extend to noisy case:– Repeatedly compare until confident one is better

• Problem: comparing two awful (but similar) bandits

• Example:– P(A > B) = 0.85 – P(A > C) = 0.85– P(B > C) = 0.51– Comparing B and C requires many comparisons!

Page 18: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Interleaved Filter • Choose candidate bandit at random ►

[Yue, Broder, Kleinberg, Joachims, COLT 2009]

Page 19: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Interleaved Filter• Choose candidate bandit at random

• Make noisy comparisons (Bernoulli trial)against all other bandits simultaneously– Maintain mean and confidence interval for each pair of bandits being compared

[Yue, Broder, Kleinberg, Joachims, COLT 2009]

Page 20: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Interleaved Filter• Choose candidate bandit at random

• Make noisy comparisons (Bernoulli trial)against all other bandits simultaneously– Maintain mean and confidence interval for each pair of bandits being compared

• …until another bandit is better – With confidence 1 – δ

[Yue, Broder, Kleinberg, Joachims, COLT 2009]

Page 21: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Interleaved Filter• Choose candidate bandit at random

• Make noisy comparisons (Bernoulli trial)against all other bandits simultaneously– Maintain mean and confidence interval for each pair of bandits being compared

• …until another bandit is better – With confidence 1 – δ

• Repeat process with new candidate– (Remove all empirically worse bandits)

[Yue, Broder, Kleinberg, Joachims, COLT 2009]

Page 22: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Interleaved Filter• Choose candidate bandit at random

• Make noisy comparisons (Bernoulli trial)against all other bandits simultaneously– Maintain mean and confidence interval for each pair of bandits being compared

• …until another bandit is better – With confidence 1 – δ

• Repeat process with new candidate– (Remove all empirically worse bandits)

• Continue until 1 candidate left►

[Yue, Broder, Kleinberg, Joachims, COLT 2009]

Page 23: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Intuition

• Simulate comparing candidate with all remaining bandits simultaneously

[Yue, Broder, Kleinberg, Joachims, COLT 2009]

Page 24: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Intuition

• Simulate comparing candidate with all remaining bandits simultaneously

• Example:– P(A > B) = 0.85 – P(A > C) = 0.85– P(B > C) = 0.51– B is candidate– # comparisons between B vs C bounded

by comparisons between B vs A!

[Yue, Broder, Kleinberg, Joachims, COLT 2009]

Page 25: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Regret Analysis• Round: all the time steps for a particular

candidate bandit– Halts when better bandit found …– … with 1- δ confidence– Choose δ = 1/(TK2)

• Match: all the comparisons between two bandits in a round– At most K matches in each round– Candidate plays one match against each

remaining bandit

[Yue, Broder, Kleinberg, Joachims, COLT 2009]

Page 26: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Regret Analysis• O(log K) total rounds• O(K) total matches

• Each match incurs regret– Depends on δ = K-2T-1

• Finds best bandit w.p. 1-1/T• Expected regret:

TO log1

TK

ORE

TOT

TK

OT

RE

T

T

log

)(1

log1

1

[Yue, Broder, Kleinberg, Joachims, COLT 2009]

Page 27: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Removing Inferior Bandits• At conclusion of each round

– Remove any empirically worse bandits

• Intuition:– High confidence that winner is better

than incumbent candidate

– Empirically worse bandits cannot be “much better” thanincumbent candidate

– Can show via Hoeffding bound that winner is also better than empirically worsebandits with high confidence

– Preserves 1-1/T confidence overall that we’ll find the best bandit

Page 28: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Summary

• Provably efficient online algorithm – (In a regret sense)– Also results for continuous (convex) setting

Page 29: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Summary

• Provably efficient online algorithm – (In a regret sense)– Also results for continuous (convex) setting

• Requires comparison oracle– Informative (not too noisy)– Independence / Unbiased– Strong transitivity– Triangle Inequality

Page 30: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Revisiting Assumptions

• P(bi > bj) = ½ + εij (distinguishability)

• Strong Stochastic Transitivity– For three bandits bi > bj > bk :

– Monotonicity property

• Stochastic Triangle Inequality– For three bandits bi > bj > bk :

– Diminishing returns property

• Satisfied by many standard models– E.g., Logistic / Bradley-Terry

jkijik , max

jkijik

Page 31: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Directions to Explore

• Relaxing assumptions– E.g., strong transitivity & triangle inequality

• Integrating context

• Dealing with large K– Assume additional structure on for retrieval functions?

• Other cost models– PAC setting (fixed budget, find the best possible)

• Dynamic or changing user interests / environment

Page 32: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Improving Comparison Oracle

Page 33: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Improving Comparison Oracle

• Dueling Bandits Problem– Interactive learning framework– Provably minimizes regret– Assumes ideal comparison oracle

• Can we improve the comparison oracle?– Can we improve how we interpret results?

Page 34: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Determining Statistical Significance

• Each q, interleave A(q) and B(q), log clicks

• t-Test– For each q, score: % clicks on A(q)

• E.g., 3/4 = 0.75

– Sample mean score (e.g., 0.6)– Compute confidence (p value)

• E.g., want p = 0.05 (i.e., 95% confidence)

– More data, more confident

Page 35: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Limitation

• Example: query session with 2 clicks – One click at rank 1 (from A)– Later click at rank 4 (from B)– Normally would count this query session as a tie

Page 36: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Limitation

• Example: query session with 2 clicks– One click at rank 1 (from A)– Later click at rank 4 (from B)– Normally would count this query session as a tie– But second click is probably more informative…– …so B should get more credit for this query

Page 37: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Linear Model

• Feature vector φ(q,c):

• Weight of click is wTφ(q,c)

click previousrank than higher if 1

clicklast if 1

download toledclick if 1

always 1

),( cq

[Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]

Page 38: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Example

• wTφ(q,c) differentiates last clicks and other clicks

else 0 click;last not is c if 1

else 0 click;last is c if 1),( cq

[Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]

Page 39: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Example

• wTφ(q,c) differentiates last clicks and other clicks

• Interleave A vs B– 3 clicks per session– Last click 60% on result from A– Other 2 clicks random

else 0 click;last not is c if 1

else 0 click;last is c if 1),( cq

[Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]

Page 40: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Example

• wTφ(q,c) differentiates last clicks and other clicks

• Interleave A vs B– 3 clicks per session– Last click 60% on result from A– Other 2 clicks random

• Conventional w = (1,1) – has significant variance• Only count last click w = (1,0) – minimizes variance

else 0 click;last not is c if 1

else 0 click;last is c if 1),( cq

[Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]

Page 41: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Scoring Query Sessions

• Feature representation for query session:

)()(

),(),(1

qBcqAcq cqcq

q

[Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]

Page 42: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Scoring Query Sessions

• Feature representation for query session:

• Weighted score for query:

• Positive score favors A, negative favors B

)()(

),(),(1

qBc

T

qAc

Tq

T cqwcqwq

w

)()(

),(),(1

qBcqAcq cqcq

q

[Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]

Page 43: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Supervised Learning

• Will optimize for z-Test: Inverse z-Test– Approximately equal t-Test for large samples– z-Score = mean / standard deviation

1

1

2

)(

)(maxarg*

)(1

)(

1)(

Tw

qq

T

qq

T

wstd

wmeanw

wmeanwn

wstd

wn

wmean

(Assumes A > B)

[Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]

Page 44: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Experiment Setup

• Data collection – Pool of retrieval functions– Hash users into partitions– Run interleaving of different pairs in parallel

• Collected on arXiv.org– 2 pools of retrieval functions– Training Pool: (6 pairs) know A > B – New Pool: (12 pairs)

Page 45: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Training Pool – Cross Validation

Page 46: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.
Page 47: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Experimental Results

• Inverse z-Test works well– Beats baseline on most of new interleaving pairs– Direction of tests all in agreement– In 6/12 pairs, for p=0.1, reduces sample size by 10%– In 4/12 pairs, achieves p=0.05, but not baseline

• 400 to 650 queries per interleaving experiment

• Weights hard to interpret (features correlated)• Largest weight: “1 if single click & rank > 1”

Page 48: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Interactive Learning

• System dynamically decides how to service users– Learn about user preferences– Maximize total user utility over time– Exploration / exploitation tradeoff

Page 49: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Interactive Learning

• System dynamically decides how to service users– Learn about user preferences– Maximize total user utility over time– Exploration / exploitation tradeoff

• Existing mechanism (interleaving) to collect feedback– Can also improve how we interpret feedback

• Simple yet practical cost model– Compatible with interleaving oracle– Yields efficient algorithms

Page 50: An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

Thank You!

Work supported by NSF IIS-0905467, Yahoo! KSC Award, and

Microsoft Graduate Fellowship.

Slides, papers & software available at www.yisongyue.com