Multi-Armed Bandits: Intro, examples and tricks

Multi-Armed Bandits:Intro, examples and tricks

Dr Ilias Flaounas Senior Data Scientist at Atlassian

Data Science Sydney meetup 22 March 2016

Motivation

Increase awareness of some very useful but less known techniques

Demo some current work at Atlassian

Connect it with some research from my past

Hopefully, there will be something useful for everybody — apologies for the few equations and loose notation

http://www.nancydixonblog.com/2012/05/-why-knowledge-management-didnt-save-general-motors-addressing-complex-issues-by-convening-conversat.html

( rA,1 )

( rC,2 )

+ rA,4 + rA,5

+ rC,6

+ rA,7 / nA

+ rC,8

1. e-greedy: the best arm is selected for a proportion of 1-e of the trials and a random arm in e trials.

2. e-greedy with variable e

3. Pure exploration first, then pure exploitation.

4. …

5. Thompson sampling (Draw from the estimated beta-distrom

6. Upper Confidence Bound (UCB)

Many solutions…

Disadvantages

Reaching significance for non-winning arms takes longer

Unclear stopping criteria

Hard to order non-winning arms and assess reliably their impact

Advantages

Reaching significance for the winning arm is faster

Best arm can change over time

There are no false positives in the long term

Optimizely recently introduced MAB rebranded as: “Traffic auto-allocation”

Let’s add some context

What happens if we want to assess 100 variations?

How about 1,000 or 10,000 variations?

Contextual Multi-Armed Bandits

rA, t = f(xA,1, xA,2, xA,3…)A -> {xA,1, xA,2, xA,3…}

rB,t = f(xB,1, xB,2, xB,3…)

rC,t = f(xC,1, xC,2, xC,3…)

Experiment parameters, e.g., price, #users, product, bundles, colour of UI elements…

B -> {xB,1, xB,2, xB,3…}

C -> {xC,1, xC,2, xC,3…}

We introduce a notion of proximity or similarity

between arms

A -> {xA,1, xA,2, xA,3…}B -> {xB,1, xB,2, xB,3…}

Contextual Multi-Armed Bandits

LinUCB

L. Li, W. Chu, J. Langford, R. E. Schapire, “A Contextual-Bandit Approach to Personalized News Article Recommendation”, WWW, 2010.

The UCB is some expectation plus some confidence level:

µ↵(t) + �↵(t)

We assume there is some unknown vector θ∗, the same for each arm, for which:

E[ra,t|xa,t] = x

Ta,t✓

✓̂t := C�1t XT

Xt := {xa(1),1, xa(2),2, . . . , xa(t),t}T

yt := {ra(1),1, ra(2),2, . . . , ra(t),t}T

Ct := XTt Xt

Using least squares:

µ̂a(t) := x

Ta,t✓̂t

E[ra,t|xa,t] = x

Ta,t✓

⇤ µ↵(t) + �↵(t)

µ̂a := x

�1t X

The upper confidence bound is some expectation plus some confidence level:

µ↵(t) + �↵(t)

�̂(t) :=qx

�1t xa,tµ̂a := x

�1t X

L. Li, W. Chu, J. Langford, R. E. Schapire, A Contextual-Bandit Approach to Personalized News Article Recommendation, WWW, 2010.

Product onboarding…

Which arm would you pull?

• How can we locate the city of Bristol from tweets?

• 10K candidate locations organised in a 100x100 grid

• At every step we get tweets from one location and count mentions of “Bristol”

• Challenge: find the target in sub-linear time complexity!

Linear methods fail on this problem.

How can we go non-linear?

John-Shawe Taylor & Nello Cristianini, “Kernel Methods for Pattern Analysis”, Cambridge University press, 2004.

The Kernel trick! —no, it’s not just for SVMs

µ̂a(t) := x

Ta,t✓̂t µ̂

(t) = kTx,t

K�1t

�̂a

(t) =q

tkTx,t

K�2t

�̂(t) :=qx

�1t xa,t

Ct := XTt Xt Kt = XtX

LinUCB:

M. Valko, N. Korda, R. Munos, I. Flaounas, N. Cristianini, “Finite-Time Analysis of Kernelised Contextual Bandits”, UAI, 2013.

KernelUCB:

• The last few steps of the algorithm before it locates Bristol.

• KernelUCB with RBF kernel converges after ~300 iterations (instead of >>10K).

Target is the red dot. We locate it using KernelUCB with RBF kernel.

KernelUCB code: http://www.complacs.org/pmwiki.php/CompLACS/KernelUCB

What if we have a high-dimensional space?

Hashing trick

Implementation in Vowpal Wabbit, by J. Langford, et al.

ReferencesM. Valko, N. Korda, R. Munos, I. Flaounas, N. Cristianini, “Finite-Time Analysis of Kernelised Contextual Bandits”, UAI, 2013.

L. Li, W. Chu, J. Langford, R. E. Schapire, “A Contextual-Bandit Approach to Personalized News Article Recommendation”, WWW, 2010.

John-Shawe Taylor & Nello Cristianini, “Kernel Methods for Pattern Analysis”, Cambridge University press, 2004.

Implementation of KernelUCB in Complacs toolkit:http://www.complacs.org/pmwiki.php/CompLACS/KernelUCB

https://en.wikipedia.org/wiki/Multi-armed_bandit

https://github.com/JohnLangford/vowpal_wabbit/wiki/Contextual-Bandit-Example

Thank you - We are hiring!

Dr Ilias Flaounas Senior Data Scientist <first>.<last>@atlassian.com

Multi-Armed Bandits: Intro, examples and tricks

Data & Analytics

Transcript of Multi-Armed Bandits: Intro, examples and tricks

Multi-Armed Bandits (MABs) - Purdue University Bandits (MABs) CS57300 - Data Mining Fall 2016 ... Multi-armed Bandits A B ... Formal Bandit Definition

Multi-Armed Bandits, Gittins Index, and its Calculationsem/2MMS50/CM14.pdf · 441 Multi-Armed Bandits, Gittins Index, and Its Calculation J^ll a player must play one among n available

Introduction to Multi-Armed Banditsslivkins.com/work/MAB-book.pdfIntroduction to Multi-Armed Bandits ... resp.), an “interlude” on practical aspects on bandit algorithms (Interlude

Contextual Multi-Armed Bandits for Web Server Defense

Combinatorial Pure Exploration of Multi-Armed Bandits · Combinatorial Pure Exploration of Multi-Armed Bandits Shouyuan Chen 1Tian Lin2 Irwin King Michael R. Lyu Wei Chen3 1The Chinese

BERNOULLI TWO-ARMED BANDITS WITH …...Stochastic Processes and their Applications 11 (1981) 35-45 North-Holland Publishing Company BERNOULLI TWO-ARMED BANDITS WITH GEOMETRIC TERMINATION

Nonstochastic Multi-Armed Bandits with Graph ... - TAU

Foraging and Multi-armed Bandits Optimal Foraging …vaibhav/talks/2013a.pdfForaging and Multi-armed Bandits ... the multi-armed bandit problem with switching cost. IEEE Transactions

The K -armed Dueling Bandits Problem

Multitasking, Multi-Armed Bandits, and the Italian Judiciary

Roving Bandits? The Geographical Evolution ofkcb38/BGL_ISQ.pdfRoving Bandits? The Geographical Evolution of African Armed Conflicts1 Kyle Beardsley Kristian Skrede Gleditsch Nigel

Multi-armed Bandits: Applications to Online Advertising

Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation.

Introduction to Multi-Armed Bandits

Bayesian Contextual Multi-armed Bandits Contextual Multi-armed Bandits ... The Epoch-Greedy Algorithm for Contextual Multi-armed ... topic model w/ a Bayesian multi-armed bandit analysis

An Introduction to Stochastic Multi-armed Bandits

Improving experimentation velocity via Multi-Armed Bandits

Multi-armed Bandits and the Gittins Indexsem/2WB12/Weber_slideset.pdfMulti-armed Bandits and the Gittins Index Richard Weber Statistical Laboratory, University of Cambridge A talk

Lecture 2: Exploration and Exploitation in Multi … 2: Exploration and Exploitationin Multi-Armed Bandits Lecture 2: Exploration and Exploitation in Multi-Armed Bandits Hado van Hasselt

The Dueling Bandits Problem Yisong Yue. Outline Brief Overview of Multi-Armed Bandits Dueling Bandits – Mathematical properties – Connections to other.