Post on 22-Jan-2018
Multi-Armed Bandits:Intro, examples and tricks
Dr Ilias Flaounas Senior Data Scientist at Atlassian
Data Science Sydney meetup 22 March 2016
Motivation
Increase awareness of some very useful but less known techniques
Demo some current work at Atlassian
Connect it with some research from my past
Hopefully, there will be something useful for everybody — apologies for the few equations and loose notation
http://www.nancydixonblog.com/2012/05/-why-knowledge-management-didnt-save-general-motors-addressing-complex-issues-by-convening-conversat.html
( rA,1 )
( rC,2 )
rB,3
+ rA,4 + rA,5
+ rC,6
+ rA,7 / nA
+ rC,8
/ nB
/ nc
µA =
µB =
µC =
1. e-greedy: the best arm is selected for a proportion of 1-e of the trials and a random arm in e trials.
2. e-greedy with variable e
3. Pure exploration first, then pure exploitation.
4. …
5. Thompson sampling (Draw from the estimated beta-distrom
6. Upper Confidence Bound (UCB)
Many solutions…
Disadvantages
Reaching significance for non-winning arms takes longer
Unclear stopping criteria
Hard to order non-winning arms and assess reliably their impact
Advantages
Reaching significance for the winning arm is faster
Best arm can change over time
There are no false positives in the long term
Optimizely recently introduced MAB rebranded as: “Traffic auto-allocation”
Let’s add some context
What happens if we want to assess 100 variations?
How about 1,000 or 10,000 variations?
Contextual Multi-Armed Bandits
rA, t = f(xA,1, xA,2, xA,3…)A -> {xA,1, xA,2, xA,3…}
rB,t = f(xB,1, xB,2, xB,3…)
rC,t = f(xC,1, xC,2, xC,3…)
Experiment parameters, e.g., price, #users, product, bundles, colour of UI elements…
B -> {xB,1, xB,2, xB,3…}
C -> {xC,1, xC,2, xC,3…}
We introduce a notion of proximity or similarity
between arms
A -> {xA,1, xA,2, xA,3…}B -> {xB,1, xB,2, xB,3…}
Contextual Multi-Armed Bandits
LinUCB
L. Li, W. Chu, J. Langford, R. E. Schapire, “A Contextual-Bandit Approach to Personalized News Article Recommendation”, WWW, 2010.
The UCB is some expectation plus some confidence level:
µ↵(t) + �↵(t)
We assume there is some unknown vector θ∗, the same for each arm, for which:
E[ra,t|xa,t] = x
Ta,t✓
⇤
✓̂t := C�1t XT
t yt
Xt := {xa(1),1, xa(2),2, . . . , xa(t),t}T
yt := {ra(1),1, ra(2),2, . . . , ra(t),t}T
Ct := XTt Xt
Using least squares:
µ̂a(t) := x
Ta,t✓̂t
E[ra,t|xa,t] = x
Ta,t✓
⇤ µ↵(t) + �↵(t)
µ̂a := x
Ta,tC
�1t X
Tt yt
The upper confidence bound is some expectation plus some confidence level:
µ↵(t) + �↵(t)
�̂(t) :=qx
Ta,tC
�1t xa,tµ̂a := x
Ta,tC
�1t X
Tt yt
L. Li, W. Chu, J. Langford, R. E. Schapire, A Contextual-Bandit Approach to Personalized News Article Recommendation, WWW, 2010.
Product onboarding…
Which arm would you pull?
• How can we locate the city of Bristol from tweets?
• 10K candidate locations organised in a 100x100 grid
• At every step we get tweets from one location and count mentions of “Bristol”
• Challenge: find the target in sub-linear time complexity!
Linear methods fail on this problem.
How can we go non-linear?
John-Shawe Taylor & Nello Cristianini, “Kernel Methods for Pattern Analysis”, Cambridge University press, 2004.
The Kernel trick! —no, it’s not just for SVMs
µ̂a(t) := x
Ta,t✓̂t µ̂
a
(t) = kTx,t
K�1t
yt
�̂a
(t) =q
tkTx,t
K�2t
kx,t
�̂(t) :=qx
Ta,tC
�1t xa,t
Ct := XTt Xt Kt = XtX
Tt
LinUCB:
M. Valko, N. Korda, R. Munos, I. Flaounas, N. Cristianini, “Finite-Time Analysis of Kernelised Contextual Bandits”, UAI, 2013.
KernelUCB:
• The last few steps of the algorithm before it locates Bristol.
• KernelUCB with RBF kernel converges after ~300 iterations (instead of >>10K).
Target is the red dot. We locate it using KernelUCB with RBF kernel.
KernelUCB code: http://www.complacs.org/pmwiki.php/CompLACS/KernelUCB
What if we have a high-dimensional space?
Hashing trick
Implementation in Vowpal Wabbit, by J. Langford, et al.
ReferencesM. Valko, N. Korda, R. Munos, I. Flaounas, N. Cristianini, “Finite-Time Analysis of Kernelised Contextual Bandits”, UAI, 2013.
L. Li, W. Chu, J. Langford, R. E. Schapire, “A Contextual-Bandit Approach to Personalized News Article Recommendation”, WWW, 2010.
John-Shawe Taylor & Nello Cristianini, “Kernel Methods for Pattern Analysis”, Cambridge University press, 2004.
Implementation of KernelUCB in Complacs toolkit:http://www.complacs.org/pmwiki.php/CompLACS/KernelUCB
https://en.wikipedia.org/wiki/Multi-armed_bandit
https://github.com/JohnLangford/vowpal_wabbit/wiki/Contextual-Bandit-Example
Thank you - We are hiring!
Dr Ilias Flaounas Senior Data Scientist <first>.<last>@atlassian.com