When Machine Learning Meets the Web

42
When Machine Learning Meets the Web Chao Liu Internet Services Research Center Microsoft Research-Redmond

description

Chao Liu Internet Services Research Center Microsoft Research-Redmond. When Machine Learning Meets the Web. Outline. Motivation & Challenges Background on Distributed Computing Standard ML on MapReduce Classification: Naïve Bayes Clustering: Nonnegative Matrix Factorization - PowerPoint PPT Presentation

Transcript of When Machine Learning Meets the Web

Page 1: When Machine Learning Meets the Web

When Machine Learning Meets the Web

Chao LiuInternet Services Research CenterMicrosoft Research-Redmond

Page 2: When Machine Learning Meets the Web

Outline

Motivation & Challenges Background on Distributed Computing Standard ML on MapReduce

Classification: Naïve Bayes Clustering: Nonnegative Matrix Factorization Modeling: EM Algorithm

Customized ML on MapReduce Click Modeling Behavior Targeting

Conclusions04/19/2023 2

Page 3: When Machine Learning Meets the Web

Motivation & Challenges

Data on the Web Scale: terabyte-to-petabyte data

▪ Around 20TB log data per day from Bing Dynamics: evolving data streams

▪ Click data streams with evolving/emerging topics

Applications: Non-traditional ML tasks▪ Predicting clicks & ads

04/19/2023 3

Page 4: When Machine Learning Meets the Web

Outline

Motivation & Challenges Background on Distributed Computing Standard ML on MapReduce

Classification: Naïve Bayes Clustering: Nonnegative Matrix Factorization Modeling: EM Algorithm

Customized ML on MapReduce Click Modeling Behavior Targeting

Conclusions04/19/2023 4

Page 5: When Machine Learning Meets the Web

Parallel vs. Distributed Computing

Parallel computing All processors have access to a shared

memory, which can be used to exchange information between processors

Distributed computing Each processor has its own private

memory (distributed memory), communicating over the network▪ Message passing ▪ MapReduce

04/19/2023 5

Page 6: When Machine Learning Meets the Web

MPI vs. MapReduce

MPI is for task parallelism Suitable for CPU-intensive jobs Fine-grained communication control,

powerful computation model

MapReduce is for data parallelism Suitable for data-intensive jobs A restricted computation model

04/19/2023 6

Page 7: When Machine Learning Meets the Web

Word Counting on MapReduce

7

Reducer

Aggregate values by keys

……

……Mapper

docs

(docId, doc) pairs

(w1,1)(w2,1)

(w3,1)

(w1,<1,1, 1>)

(w1, 3)

Mapper

docs

(docId, doc) pairs

(w1,1) (w3,1)

Mapper

docs

(docId, doc) pairs

(w1,1)(w2,1)

(w3,1)

Reducer

(w2,<1, 1>)

(w2, 2)

Reducer

(w3,<1,1,1>)

(w3, 3)

Web corpus on multiple machines

Mapper: for each word w in a doc, emit (w, 1)

Intermediate (key,value) pairs are aggregated by word

Reducer is copied to each machine to run over the intermediate data locally to produce the result

Page 8: When Machine Learning Meets the Web

Machine Learning on MapReduce

A big picture: Not Omnipotent but good enough

04/19/2023 8

Standard ML Algorithm Customized ML Algorithm

MapReduce Friendly

• Classification: Naïve Bayes, logistic regression, MART, etc• Clustering: k-means, NMF, co-clustering, etc• Modeling: EM algorithm, Gaussian mixture, Latent Dirichlet Allocation, etc

• PageRank• Click Models• Behavior Tageting

MapReduce Unfriendly

• Classification: SVM• Clustering: Spectrum clustering

• Learning-to-Rank

Page 9: When Machine Learning Meets the Web

Outline

Motivation & Challenges Background on Distributed Computing Standard ML on MapReduce

Classification: Naïve Bayes Clustering: Nonnegative Matrix Factorization Modeling: EM Algorithm

Customized ML on MapReduce Click Modeling Behavior Targeting

Conclusions04/19/2023 9

Page 10: When Machine Learning Meets the Web

Classification: Naïve Bayes

P(C|X) P(C) P(X|C) =P(C)∏P(Xj|C)

10

……

Mapper

(x(i),y(i))

(j, xj(i),y(i))

(j, xj(i),y(i))

(j, xj(i),y(i))

Reduce on y(i)

P(C)

Reduce on j

P(Xj|C)(x(i),y(i)) Mapp

er

…………

Page 11: When Machine Learning Meets the Web

Clustering: Nonnegative Matrix Factorization [Liu et al., WWW2010]

Effective tool to uncover latent relationships in nonnegative matrices with many applications [Berry et al., 2007, Sra & Dhillon, 2006] Interpretable dimensionality reduction [Lee & Seung, 1999] Document clustering [Shahnaz et al., 2006, Xu et al, 2006]

• Challenge: Can we scale NMF to million-by-million matrices

Am

n

WH

m

nkk

0,0,0 HWA

Page 12: When Machine Learning Meets the Web

NMF Algorithm [Lee & Seung, 2000]

Am

n

WH

m

nkk

0,0,0 HWA

Page 13: When Machine Learning Meets the Web

Distributed NMF

Data Partition: A, W and H across machines

A…

),,( , jiAji

W. . . . .

),( iwi

H

. . . . .

),( jhj

Page 14: When Machine Learning Meets the Web

Computing DNMF: The Big Picture

WAW

AWH

Y

XHH

T

T

*.*.

Page 15: When Machine Learning Meets the Web

… … …

),,(: , jiAjiA

),,,( , iji wAji

Map-I

Reduce-I

),( , iji wAj

Map-II

),( , iji wAj

Reduce-II

),( jxj

Map-IIIMap-IV

),0( WW T

Map-V

),0( iTi ww

),,,( jjj yxhj

…),( jyj

),(: iwiW ),(: jhjH

… ),( newjhj

Reduce-III

Reduce-V

Page 16: When Machine Learning Meets the Web

AWX T

… …

),,(: , jiAjiA

),,,( , iji wAji

Map-I

Reduce-I

),( , iji wAj

Map-II

),( , iji wAj

Reduce-II

),( jxj

),(: iwiW

X = WTA

Page 17: When Machine Learning Meets the Web

… …

Map-IIIMap-IV

),0( WW T

),0( iTi ww …),( jyj

),(: iwiW ),(: jhjH

Reduce-III WHWY T

m

ii

Ti

T wwWWC1

W

. . . . .

),( iwi

. . .

. . .

Y = WTWH

Page 18: When Machine Learning Meets the Web

),( jxj

Map-V

),,,( jjj yxhj

…),( jyj

),(: jhjH

… ),( newjhj

Reduce-V

H = H.*X/Y

Page 19: When Machine Learning Meets the Web

… … …

),,(: , jiAjiA

),,,( , iji wAji

Map-I

Reduce-I

),( , iji wAj

Map-II

),( , iji wAj

Reduce-II

),( jxj

Map-IIIMap-IV

),0( WW T

Map-V

),0( iTi ww

),,,( jjj yxhj

…),( jyj

),(: iwiW ),(: jhjH

… ),( newjhj

Reduce-III

Reduce-V

Page 20: When Machine Learning Meets the Web

Scalability w.r.t. Matrix Size

3 hours per iteration, 20 iterations take around 20*3*0.72 ≈ 43 hours

Less than 7 hours on a 43.9M-by-769M matrix with 4.38 billion nonzero values

Page 21: When Machine Learning Meets the Web

General EM on MapReduce

Map Evaluate Compute

Reduce

04/19/2023 21

Page 22: When Machine Learning Meets the Web

Outline

Motivation & Challenges Background on Distributed Computing Standard ML on MapReduce

Classification: Naïve Bayes Clustering: Nonnegative Matrix Factorization Modeling: EM Algorithm

Customized ML on MapReduce Click Modeling Behavior Targeting

Conclusions04/19/2023 22

Page 23: When Machine Learning Meets the Web

Click Modeling: Motivation

Clicks are good… Are these two

clicks equally “good”?

Non-clicks may have excuses: Not relevant Not examined

04/19/2023 23

Page 24: When Machine Learning Meets the Web

Eye-tracking User Study

2404/19/2023

Page 25: When Machine Learning Meets the Web

Bayesian Browsing Model [Liu et al., KDD2009]

query

URL1

URL2

URL3

URL4

C1 C2 C3 C4

S1 S2 S3 S4 Relevance

E1 E2 E3 E4

Examine Snippet

ClickThroughs

Page 26: When Machine Learning Meets the Web

Dependencies in BBM

S1

E1

E2

C1

S2

C2

Si

Ei

Ci

the preceding click position before i

i id i r

Page 27: When Machine Learning Meets the Web

Ultimate goal

Observation: conditional independence

Model Inference

Page 28: When Machine Learning Meets the Web

P(C|S) by Chain Rule

Likelihood of search instance

From S to R:

kC

Page 29: When Machine Learning Meets the Web

Putting Things Together

Posterior with

Re-organize by Rj’s

How many times dj

was clicked

How many times dj was not clicked when it is at position (r + d) and the preceding click is on position r

1:nC

Page 30: When Machine Learning Meets the Web

What p(R|C1:n) Tells Us

Exact inference with joint posterior in closed form

Joint posterior factorizes and hence mutually independent

At most M(M+1)/2 + 1 numbers to fully characterize each posterior Count vector: 0 1 2 ( 1) 2( , , ,..., )M Me e e e e

Page 31: When Machine Learning Meets the Web

An Example

ComputeCount vector for

R4

r

0 0

0 0 0

0

0 1 2

d

3 2 1

0

N4

N4, r, d

1

1

Page 32: When Machine Learning Meets the Web

LearnBBM on MapReduce

Map: emit((q,u), idx)

Reduce: construct the count vector

Page 33: When Machine Learning Meets the Web

Example on MapReduce

(U1, 0)(U2, 4)(U3, 0)

Map

(U1, 1)(U3, 0)(U4, 7)

Map

(U1, 1)(U3, 0)(U4, 0)

Map

21 1 1( ) (1 )p R R R 2 2( ) 1 0.98p R R 3

3 3( )p R R 4 4 4( ) (1 )p R R R (U1, 0, 1, 1) (U2,

4)(U4, 0, 7)

(U3, 0, 0, 0)

Reduce

Page 34: When Machine Learning Meets the Web

Petabyte-Scale Experiment

Setup: 8 weeks data, 8

jobs Job k takes first k-

week data

• Experiment platform– SCOPE: Easy and Efficient Parallel Processing of

Massive Data Sets [Chaiken et al, VLDB’08]

Page 35: When Machine Learning Meets the Web

Scalability of BBM

Increasing computation load more queries, more urls, more impressions

Near-constant elapse time

Computation Overload Elapse Time on SCOPE

• 3 hours• Scan 265 terabyte

data• Full posteriors for

1.15 billion (query, url) pairs

Page 36: When Machine Learning Meets the Web

Large-scale Behavior Targeting [Ye et al., KDD2009]

Behavior targeting Ad serving based on users’ historical

behaviors Complementary to sponsored Ads and

content Ads

04/19/2023 36

Page 37: When Machine Learning Meets the Web

Problem Setting

Goal Given ads in a certain category, locate qualified users

based on users’ past behaviors

Data User is identified by cookie Past behavior, profiled as a vector x, includes ad clicks,

ad views, page views, search queries, clicks, etc

Challenges: Scale: e.g., 9TB ad data with 500B entries in Aug'08 Sparse: e.g., the CTR of automotive display ads is 0.05% Dynamic: i.e., user behavior changes over time.

04/19/2023 37

Page 38: When Machine Learning Meets the Web

Learning: Linear Poisson Model

CTR = ClickCnt/ViewCnt A model to predict expected click count A model to predict expected view count

Linear Poisson model

MLE on w

04/19/2023 38

Page 39: When Machine Learning Meets the Web

Implementation on MapReduce

Learning Map: Compute and Reduce: Update

Prediction

04/19/2023 39

Page 40: When Machine Learning Meets the Web

Outline

Motivation & Challenges Background on Distributed Computing Standard ML on MapReduce

Classification: Naïve Bayes Clustering: Nonnegative Matrix Factorization Modeling: EM Algorithm

Customized ML on MapReduce Click Modeling Behavior Targeting

Conclusions04/19/2023 40

Page 41: When Machine Learning Meets the Web

Conclusions

Challenges imposed by Web data Scalability of standard algorithms Application-driven customized algorithms

Capability to consume huge amount of data outweighs algorithm sophistication Simple counting is no less powerful than sophisticated

algorithms when data is abundant or even infinite

MapReduce: a restricted computation model Not omnipotent but powerful enough Things we want to do turn out to be things we can do

04/19/2023 41

Page 42: When Machine Learning Meets the Web

Q&A

Thank You!

04/19/2023 SEWM‘10 Keynote, Chengdu, China 42