Post on 12-Apr-2017
Sequential Learning in the Position-Based Model
Claire Vernade, Olivier Cappé, Paul Lagrée (Télécom ParisTech) B.Kveton, S.Katariya, Z.Weng, C.Szepesvàri (Adobe Research, U.Alberta)
-Chris Stucchio
« Don’t use Bandit Algorithms, they probably don’t work for you.»
Blog de C.Stucchio: https://www.chrisstucchio.com/blog/2015/dont_use_bandits.html
Position-Based Model
1
2
3
4
✓
Xt ⇠ B(1 ⇥ ✓)
Chucklin et al. (2008): !
Cascade Model, User Browsing,
DCN, CCN, DCM,
…
Multi-Armed Bandit
0,53
0,61
0,42
0,40 0,60
0,55
Unobserved expected reward
Estimated empirical averages after a
few pulls
Multi-Armed Bandit
0,53
0,61
0,42
0,40 0,60
0,55
✓1
✓2
✓3
Two Bandit Games
1. Website optimization: You are the website manager
!
2. Add Placement: You want to place the right add in the right location
1
2
3
4
Balzac
Zola
Website Optimization
At =( , , , )✓1✓2 ✓3✓4
✓4rt = 4321 + + +✓2 ✓1 ✓3
Multiple-Plays Bandits in the Position-Based Model. NIPS 2016
Website OptimizationThe C-KLUCB algorithm
The KL-UCB algorithm for Bounded Stochastic Bandits and Beyond. Cappé, Garivier, COLT 2011
Website OptimizationComplexity Theorem (Lower Bound on the Regret)
For any uniformly e�cient algorithm, the regret is asymptotically bounded
from below by
For T large enough, R(T ) � log(T )⇥ C(, ✓)
102 103 104
Round t
0
20
40
60
80
100
Reg
retR
(T)
Lower BoundC-KLUCBRanked-UCB
Add Placement
0
BB@
· · · · ·· · ✓kl · ·· · · · ·· · · · ·
1
CCA✓k
l
1
2
3
4
At = (k, l)
rt = ✓kl
✓1
✓2
✓3
✓4
Stochastic Rank-1 Bandits. AISTATS 2017
KxL arms but K+L parameters !
Add PlacementStochastic Rank-1 Bandits. AISTATS 2017
lim inf
T!1
R(T )
log(T )�
KX
k=2
(✓11 � ✓k1)
d(✓k1; ✓11)+
LX
l=2
(✓11 � ✓1l)
d(✓1l; ✓11)
Complexity Theorem (Lower Bound on the Regret)
Ccol
(, ✓) Crow
(, ✓)+R(T ) log(T )� ( )
For any uniformly e�cient algorithm, the regret is asymptotically bounded
from below by
Which can be rewritten : for any T su�ciently large,
Add PlacementBM-KLUCB
Idea : Alternatively explore the rows and the columns of the matrix using KL-UCB
102 103 104 105 106
Round t
0
20
40
60
80
100
120
140
Reg
retR
(T)
K = 3, L = 3
Lower BoundR1klucb
Take-Home Message
‘Real-Life’ Bandit Algorithms are getting real… but not yet.
What comes next on Bandit models for recommendation and conversion optimization : stochastic bandits with delays,
Rank-1 best arm identification, higher rank models ?
No free lunch theorems : exploring comes at some price which depends on the complexity of the problem
Existing ‘super theoretical’ works on bandits provide us super efficient algorithms in the end…
@vernadec