Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
-
Upload
mlconf -
Category
Technology
-
view
327 -
download
1
Transcript of Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Bayesian BanditsByron Galbraith, PhD
Cofounder / Chief Data Scientist, Talla2017.03.24
Bayesian Bandits for the Impatient
Online adaptive learning: “Earn while you Learn”1
2
3
Powerful alternative to A/B testing optimization
Can be efficient and easy to implement
Dining Ware VR Experiences on Demand
Dining Ware VR Experiences on Demand
Iterated Decision Problems
What product recommendations should we present to subscribers to keep them engaged?
A/B Testing
Exploit vs Explore - What should we do?Choose what seems best so far🙂 Feel good about our decision🤔 There still may be something better
Try something new😄 Discover a superior approach😧 Regret our choice
A/B/n Testing
Regret - What did that experiment cost us?
The Multi-Armed Bandit Problem
http://blog.yhat.com/posts/the-beer-bandit.html
Bandit Solutions
𝑅𝑇=∑𝑡=1
𝑇
[𝑟 (𝑌 𝑡 (𝑎∗ ))−𝑟 (𝑌 𝑡 (𝑎𝑡 )) ]
k-MAB =
𝑎𝑡=argmax𝑖 [𝑟 𝑖𝑡+
𝑐√ log 𝑡𝑛𝑖 ]
𝑃 (𝐴𝑡=𝑎 )= 𝑒h 𝑎𝑛
∑𝑏=1
𝑘
𝑒h𝑏𝑛=𝜋𝑡 (𝑎)
𝑃 (𝑋=𝑥 )=𝑥𝛼−1 (1−𝑥 )𝛽− 1
𝐵 (𝛼 , 𝛽 )𝑃 (𝑋=𝑥 )=(𝑛𝑥 )𝑝𝑥 (1−𝑝 )𝑛−𝑥
𝐵𝑒𝑡𝑎𝑎(𝛼+𝑟𝑎 , 𝛽+𝑁−𝑟 𝑎)
𝑃 (𝑋|𝑌 ,𝑍 )= 𝑃 (𝑌|𝑋 ,𝑍 )𝑃 ( 𝑋|𝑍 )𝑃 (𝑌|𝑍 )
Thompson Sampling
𝑷 (𝜽|𝒓 ,𝒂 )∝𝑷 (𝒓|𝜽 ,𝒂) 𝑷 (𝜽∨𝒂 )PriorLikeliho
odPosterior
Bayesian Bandits – The ModelModel if a recommendation will result in user engagement
• Bernoulli distribution: - likelihood of event occurring
How do we find ?• Conjugate prior• Beta distribution: - number of hits, - number of misses𝛼 𝛽
Only need to keep track of two numbers per option• # of hits, # of misses
Bayesian Bandits – The Algorithm1. Initialize (uniform prior)
2. For each user request for recommendations t1. Sample 2. Choose action corresponding to largest 3. Observe reward 4. Update
Belief Adaptation
Belief Adaptation
Belief Adaptation
Belief Adaptation
Belief Adaptation
Bandit Regret
But behavior is dependent on context• Categorical contexts• One bandit model per category• One-hot context vector
• Real-valued contexts• Can capture interrelatedness of context dimensions• More difficult to incorporate effectively
So why would I ever A/B test again?Test intent
Optimization vs understanding
Difficulty with non-stationarityMonday vs Friday behavior
DeploymentFew turnkey optionsSpecialized skill set
https://vwo.com/blog/multi-armed-bandit-algorithm/
Bayesian Bandits for the PatientThompson Sampling balances exploitation & exploration while minimizing decision regret
1
2
3
No need to pre-specify decision splits, time horizon for experiments
Can model a variety of problems and complex interactions
Resourceshttps://github.com/bgalbraith/bandits