Linear Submodular Bandits and their Application to Diversified Retrieval Yisong Yue (CMU) & Carlos...

1
Linear Submodular Bandits and their Application to Diversified Retrieval Yisong Yue (CMU) & Carlos Guestrin (CMU) Optimizing Recommender Systems Every day, users come to news portal For each user, News portal recommends L articles to cover the user’s interests Users provide feedback (clicks, ratings, “likes”). System integrates feedback for future use. Challenge 1: Making Diversified Recommendations Should recommend optimally diversified sets of articles. •“Israel implements unilateral Gaza cease- fire :: WRAL.com” •“Israel unilaterally halts fire, rockets persist” •“Gaza truce, Israeli pullout begin | Latest News” •“Hamas announces ceasefire after Israel declares truce - …” •“Hamas fighters seek to restore order in Gaza Strip - World - Wire …” •“Israel implements unilateral Gaza cease- fire :: WRAL.com” •“Obama vows to fight for middle class” •“Citigroup plans to cut 4500 jobs” •“Google Android market tops 10 billion downloads” •“UC astronomers discover two largest black holes ever found” Challenge 2: Personalization Modeling Diversity via Submodular Utility Functions We assume a set of D concepts or topics Users are modeled by how interested they are in each topic Let F i (A) denote the how well set of articles A covers topic i. (“topic coverage function”) We model user utility as F(A|w) = w T [F 1 (A), …, F D (A)] Linear Submodular Bandits Problem At each iteration t: A set of available articles, A t •Each article represented using D submodular basis functio Algorithm selects a set of L articles A t Algorithm recommends A t to user, receives feedback Assumptions: •Pr(like | a,A) = w T Δ(a|A) (conditional submodular independenc Regret: (1-1/e)OPT – sum of rewards Goal: recommend a set of articles that optimally covers topics that interest the user. Each topic coverage function F i (A) is monotone submodular! A function F is submodular if i.e., the benefit of recommending a second (redundant) article is smaller than adding the first. Properties of Submodular Functions Sums of submodular functions are submodular So F(A|w) is submodular Exact inference is NP-hard! Greedy algorithm yields (1-1/e) approximation bound Incremental gains are locally linear! Both properties will be exploited by our online learning algorithm ) | ( ) | ( ) | ( A a w w A F w a A F T ) ( ) ( ) ( ) ( ) ( ) ( ) | ( 2 2 1 1 A F a A F A F a A F A F a A F A a D D We address two challenges: Diversified recommendations Exploration for personalization Example: Probabilistic Coverage LSBGreedy News Recommender User Study OR ? Different users have different interests Can only learn interests by recommending and receiving feedbac Exploration versus exploitation dilemma We model this as a bandit problem! Mean Estimate by Topic Uncertainty of Estimate 10 days, 10 articles per day Compared against Multi. Weighting (no exploration) [El- Arini et al, ‘09] Ranked Bandits + LinUCB (reduction approach, does not directly model diversity) [Radlinski et al, ’08; Li et al., ‘10] Comparing learned weights for two sessions (LSBGreedy vs MW) 1 st session, MW overfits to “world “ topic 2 nd session, user liked few articles, and MW did not learn anything Maintain mean and confidence interval of user’s interests Greedily recommend articles with highest upper confidence utility In example below, chooses article about economy Theorem: with probability 1- δ average regret shrinks as T T L D O log Comparison Win / Tie / Loss Gain per Day % Likes LSBGreedy vs Static 24 / 0 / 0 1.07 67% LSBGreedy vs MW 24 / 1 / 1 0.54 63% LSBGreedy vs RankLinUCB 21 / 2 / 4 0.58 61% + A a i a i P A F ) | ( 1 1 ) ( Each article a has probability P(i|a) of covering topic I Define topic coverage function for set A as Straightforward to show that F is monotone submodular [El-Arini et al., ‘09]

Transcript of Linear Submodular Bandits and their Application to Diversified Retrieval Yisong Yue (CMU) & Carlos...

Page 1: Linear Submodular Bandits and their Application to Diversified Retrieval Yisong Yue (CMU) & Carlos Guestrin (CMU) Optimizing Recommender Systems Every.

Linear Submodular Bandits and their Application to Diversified Retrieval

Yisong Yue (CMU) & Carlos Guestrin (CMU)

Optimizing Recommender Systems• Every day, users come to news portal• For each user,

• News portal recommends L articles to cover the user’s interests

• Users provide feedback (clicks, ratings, “likes”).• System integrates feedback for future use.

Challenge 1: Making Diversified Recommendations

• Should recommend optimally diversified sets of articles.

•“Israel implements unilateral Gaza cease-fire :: WRAL.com”•“Israel unilaterally halts fire, rockets persist”•“Gaza truce, Israeli pullout begin | Latest News”•“Hamas announces ceasefire after Israel declares truce - …”•“Hamas fighters seek to restore order in Gaza Strip - World - Wire …”

•“Israel implements unilateral Gaza cease-fire :: WRAL.com”•“Obama vows to fight for middle class”•“Citigroup plans to cut 4500 jobs”•“Google Android market tops 10 billion downloads”•“UC astronomers discover two largest black holes ever found”

Challenge 2: Personalization

Modeling Diversity via Submodular Utility Functions

• We assume a set of D concepts or topics• Users are modeled by how interested they are in each topic• Let Fi(A) denote the how well set of articles A covers topic i.

(“topic coverage function”)• We model user utility as F(A|w) = wT[F1(A), …, FD(A)]

Linear Submodular Bandits ProblemAt each iteration t:

•A set of available articles, At

•Each article represented using D submodular basis functions•Algorithm selects a set of L articles At

•Algorithm recommends At to user, receives feedback

•Assumptions:•Pr(like | a,A) = wTΔ(a|A) (conditional submodular independence)

•Regret: (1-1/e)OPT – sum of rewards

Goal: recommend a set of articles that optimally covers topics that interest the user.

Each topic coverage function Fi(A) is monotone submodular!

A function F is submodular if

i.e., the benefit of recommending a second (redundant) article is smaller than adding the first.

Properties of Submodular Functions

• Sums of submodular functions are submodular• So F(A|w) is submodular

• Exact inference is NP-hard!• Greedy algorithm yields (1-1/e) approximation bound

• Incremental gains are locally linear!

• Both properties will be exploited by our online learning algorithm

)|()|()|( AawwAFwaAF T

)()(

)()(

)()(

)|( 22

11

AFaAF

AFaAF

AFaAF

Aa

DD

We address two challenges: • Diversified recommendations• Exploration for personalization

Example: Probabilistic Coverage

LSBGreedy

News Recommender User Study

OR ?• Different users have different interests• Can only learn interests by recommending and receiving feedback• Exploration versus exploitation dilemma• We model this as a bandit problem!

Mean Estimate by Topic Uncertainty of Estimate

• 10 days, 10 articles per day• Compared against

• Multi. Weighting (no exploration) [El-Arini et al, ‘09]• Ranked Bandits + LinUCB (reduction approach, does not

directly model diversity) [Radlinski et al, ’08; Li et al., ‘10]

• Comparing learned weights for two sessions (LSBGreedy vs MW)• 1st session, MW overfits to “world “ topic• 2nd session, user liked few articles, and MW did not learn anything

• Maintain mean and confidence interval of user’s interests• Greedily recommend articles with highest upper confidence utility • In example below, chooses article about economy

Theorem: with probability 1- δ average regret shrinks as

T

T

LDO log

Comparison Win / Tie / Loss Gain per Day % Likes

LSBGreedy vs Static 24 / 0 / 0 1.07 67%

LSBGreedy vs MW 24 / 1 / 1 0.54 63%

LSBGreedy vs RankLinUCB 21 / 2 / 4 0.58 61%

+

Aa

i aiPAF )|(11)(

• Each article a has probability P(i|a) of covering topic I• Define topic coverage function for set A as

• Straightforward to show that F is monotone submodular [El-Arini et al., ‘09]