Real Time Learning

33
1 ©MapR Technologies - Confidential Real-time Learning

description

A talk about real-time learning, especially using

Transcript of Real Time Learning

Page 1: Real Time Learning

1©MapR Technologies - Confidential

Real-time Learning

Page 2: Real Time Learning

2©MapR Technologies - Confidential

Contact:– [email protected]– @ted_dunning

Slides and such (available late tonight):– http://slideshare.net/tdunning

Hash tags: #mapr #hivedata

Page 3: Real Time Learning

3©MapR Technologies - Confidential

We have a product to sell … from a web-site

Page 4: Real Time Learning

4©MapR Technologies - Confidential

What picture?

What tag-line?

What call to action?

Page 5: Real Time Learning

5©MapR Technologies - Confidential

The Challenge

Design decisions affect probability of success– Cheesy web-sites don’t even sell cheese

The best designers do better when allowed to fail– Exploration juices creativity

But failing is expensive– If only because we could have succeeded– But also because offending or disappointing customers is bad

Page 6: Real Time Learning

6©MapR Technologies - Confidential

More Challenges

Too many designs– 5 pictures– 10 tag-lines– 4 calls to action– 3 back-ground colors=> 5 x 10 x 4 x 3 = 600 designs

It gets worse quickly– What about changes on the back-end?– Search engine variants?– Checkout process variants?

Page 7: Real Time Learning

7©MapR Technologies - Confidential

Example – AB testing in real-time

I have 15 versions of my landing page Each visitor is assigned to a version– Which version?

A conversion or sale or whatever can happen– How long to wait?

Some versions of the landing page are horrible– Don’t want to give them traffic

Page 8: Real Time Learning

8©MapR Technologies - Confidential

A Quick Diversion

You see a coin– What is the probability of heads?– Could it be larger or smaller than that?

I flip the coin and while it is in the air ask again I catch the coin and ask again I look at the coin (and you don’t) and ask again Why does the answer change?– And did it ever have a single value?

Page 9: Real Time Learning

9©MapR Technologies - Confidential

A Philosophical Conclusion

Probability as expressed by humans is subjective and depends on information and experience

Page 10: Real Time Learning

10©MapR Technologies - Confidential

I Dunno

Page 11: Real Time Learning

11©MapR Technologies - Confidential

5 heads out of 10 throws

Page 12: Real Time Learning

12©MapR Technologies - Confidential

2 heads out of 12 throws

Page 13: Real Time Learning

13©MapR Technologies - Confidential

So now you understand Bayesian probability

Page 14: Real Time Learning

14©MapR Technologies - Confidential

Another Quick Diversion

Let’s play a shell game This is a special shell game It costs you nothing to play The pea has constant probability of being under each shell

(trust me) How do you find the best shell? How do you find it while maximizing the number of wins?

Page 15: Real Time Learning

15©MapR Technologies - Confidential

Pause for short con-game

Page 16: Real Time Learning

16©MapR Technologies - Confidential

Interim Thoughts

Can you identify winners or losers without trying them out?

Can you ever completely eliminate a shell with a bad streak?

Should you keep trying apparent losers?

Page 17: Real Time Learning

17©MapR Technologies - Confidential

Pause for second con-game

Page 18: Real Time Learning

18©MapR Technologies - Confidential

So now you understand multi-armed bandits

Page 19: Real Time Learning

19©MapR Technologies - Confidential

Conclusions

Can you identify winners or losers without trying them out?No

Can you ever completely eliminate a shell with a bad streak?No

Should you keep trying apparent losers?Yes, but at a decreasing rate

Page 20: Real Time Learning

20©MapR Technologies - Confidential

Is there an optimum strategy?

Page 21: Real Time Learning

21©MapR Technologies - Confidential

Bayesian Bandit

Compute distributions based on data so far Sample p1, p2 and p2 from these distributions

Pick shell i where i = argmaxi pi

Lemma 1: The probability of picking shell i will match the probability it is the best shell

Lemma 2: This is as good as it gets

Page 22: Real Time Learning

22©MapR Technologies - Confidential

And it works!

Page 23: Real Time Learning

23©MapR Technologies - Confidential

Video Demo

Page 24: Real Time Learning

24©MapR Technologies - Confidential

The Code

Select an alternative

Select and learn

But we already know how to count!

n = dim(k)[1] p0 = rep(0, length.out=n) for (i in 1:n) { p0[i] = rbeta(1, k[i,2]+1, k[i,1]+1) } return (which(p0 == max(p0)))

for (z in 1:steps) { i = select(k) j = test(i) k[i,j] = k[i,j]+1 } return (k)

Page 25: Real Time Learning

25©MapR Technologies - Confidential

The Basic Idea

We can encode a distribution by sampling Sampling allows unification of exploration and exploitation

Can be extended to more general response models

Page 26: Real Time Learning

26©MapR Technologies - Confidential

The Original Problem

x1x2

x3

Page 27: Real Time Learning

27©MapR Technologies - Confidential

Response Function

Page 28: Real Time Learning

28©MapR Technologies - Confidential

Generalized Banditry

Suppose we have an infinite number of bandits– suppose they are each labeled by two real numbers x and y in [0,1]– also that expected payoff is a parameterized function of x and y

– now assume a distribution for θ that we can learn online Selection works by sampling θ, then computing f Learning works by propagating updates back to θ– If f is linear, this is very easy

Don’t just have to have two labels, could have labels and context

Page 29: Real Time Learning

29©MapR Technologies - Confidential

Context Variables

x1x2

x3

user.geo env.time env.day_of_week env.weekend

Page 30: Real Time Learning

30©MapR Technologies - Confidential

Caveats

Original Bayesian Bandit only requires real-time

Generalized Bandit may require access to long history for learning– Pseudo online learning may be easier than true online

Bandit variables can include content, time of day, day of week

Context variables can include user id, user features

Bandit × context variables provide the real power

Page 31: Real Time Learning

31©MapR Technologies - Confidential

You can do thisyourself!

Page 32: Real Time Learning

32©MapR Technologies - Confidential

Thank You

Page 33: Real Time Learning

33©MapR Technologies - Confidential

Contact:– [email protected]– @ted_dunning

Slides and such (available late tonight):– http://slideshare.net/tdunning

Hash tags: #mapr #hivedata