Sequential Learning in the Position-Based Model

Claire Vernade, Olivier Cappé, Paul Lagrée (Télécom ParisTech) B.Kveton, S.Katariya, Z.Weng, C.Szepesvàri (Adobe Research, U.Alberta)

-Chris Stucchio

« Don’t use Bandit Algorithms, they probably don’t work for you.»

Blog de C.Stucchio: https://www.chrisstucchio.com/blog/2015/dont_use_bandits.html

Position-Based Model

Xt ⇠ B(1 ⇥ ✓)

Chucklin et al. (2008): !

Cascade Model, User Browsing,

DCN, CCN, DCM,

Multi-Armed Bandit

0,40 0,60

Unobserved expected reward

Estimated empirical averages after a

few pulls

Multi-Armed Bandit

0,40 0,60

Two Bandit Games

1. Website optimization: You are the website manager

2. Add Placement: You want to place the right add in the right location

Balzac

Website Optimization

At =( , , , )✓1✓2 ✓3✓4

✓4rt = 4321 + + +✓2 ✓1 ✓3

Multiple-Plays Bandits in the Position-Based Model. NIPS 2016

Website OptimizationThe C-KLUCB algorithm

The KL-UCB algorithm for Bounded Stochastic Bandits and Beyond. Cappé, Garivier, COLT 2011

Website OptimizationComplexity Theorem (Lower Bound on the Regret)

For any uniformly e�cient algorithm, the regret is asymptotically bounded

from below by

For T large enough, R(T ) � log(T )⇥ C(, ✓)

102 103 104

Round t

Lower BoundC-KLUCBRanked-UCB

Add Placement

· · · · ·· · ✓kl · ·· · · · ·· · · · ·

CCA✓k

At = (k, l)

rt = ✓kl

Stochastic Rank-1 Bandits. AISTATS 2017

KxL arms but K+L parameters !

Add PlacementStochastic Rank-1 Bandits. AISTATS 2017

lim inf

log(T )�

(✓11 � ✓k1)

d(✓k1; ✓11)+

(✓11 � ✓1l)

d(✓1l; ✓11)

Complexity Theorem (Lower Bound on the Regret)

(, ✓) Crow

(, ✓)+R(T ) log(T )� ( )

For any uniformly e�cient algorithm, the regret is asymptotically bounded

from below by

Which can be rewritten : for any T su�ciently large,

Add PlacementBM-KLUCB

Idea : Alternatively explore the rows and the columns of the matrix using KL-UCB

102 103 104 105 106

Round t

K = 3, L = 3

Lower BoundR1klucb

Take-Home Message

‘Real-Life’ Bandit Algorithms are getting real… but not yet.

What comes next on Bandit models for recommendation and conversion optimization : stochastic bandits with delays,

Rank-1 best arm identification, higher rank models ?

No free lunch theorems : exploring comes at some price which depends on the complexity of the problem

Existing ‘super theoretical’ works on bandits provide us super efficient algorithms in the end…

@vernadec

Sequential Learning in the Position-Based Model

Internet

Transcript of Sequential Learning in the Position-Based Model

Sequential Graph Convolutional Network for Active Learning

Statistical Learning and Sequential Predictionrakhlin/courses/stat928/stat928_notes.pdf1 About This course will focus on theoretical aspects of Statistical Learning and Sequential

Mining Web Log Sequential Patterns with Position Coded Pre-Order ...

Sequential Tests for Large Scale Learning - staff.fnwi.uva.nl · Sequential Tests for Large Scale Learning ... sequential hypothesis tests to adaptively select such a subset of ...

Deep Learning For Sequential Pattern Recognition

Tree-Structured Reinforcement Learning for Sequential ...papers.nips.cc/paper/6532-tree-structured... · Tree-Structured Reinforcement Learning for Sequential Object Localization

Learning Complex Sequential Tasks from Demonstration: A Pizza …lasa.epfl.ch/publications/uploadedFiles/p611-figueroa.pdf · 2016-04-18 · Learning Complex Sequential Tasks from

SEQUENTIAL ACTIVE LEARNING OF LOW-DIMENSIONAL MODEL ...

Sequential Tests for Large-Scale Learningwelling/publications/papers/seqHT_NC_accepted.pdf · Sequential Tests for Large-Scale Learning ... introduce algorithms that use sequential

Sequential Supervised Learning

Sequential & Temporally-Delayed Learningski.clps.brown.edu/cogsim/cogsim.8temporal.pdf · Sequential & Temporally-Delayed Learning 1. The Problem. 2. Sequential Learning & Context.

Private Sequential Learning - Stanford Universityweb.stanford.edu/~kuangxu/papers/PSL_TsiXuXu.pdfPRIVATE SEQUENTIAL LEARNING 1.2. Related Work In the absence of a privacy constraint,

Tonal Harmony Analysis: a Supervised Sequential Learning ...radicion/papers/radicioni07tonal__draft.pdfTonal Harmony Analysis: a Supervised Sequential Learning Approach Daniele P.

A Hybrid Online Sequential Extreme Learning Machine with

Sequential Learning of Movement Prediction in Dynamic ...

proles using sequential machine learning

Sequential and Mixed Genetic Algorithm and Learning ...

DeepQT : Learning Sequential Context for Query Execution ...

Learning Sequential Structure in Simple Recurrent Networks

Position-Aware ListMLE: A Sequential Learning Process for ...gjf/papers/2014/PALIST_UAI2014_poster.pdfPosition-Aware ListMLE: A Sequential Learning Process for Ranking 1Yanyan Lan,