When Machine Learning Meets the Web
-
Upload
alfonso-roberts -
Category
Documents
-
view
23 -
download
0
description
Transcript of When Machine Learning Meets the Web
When Machine Learning Meets the Web
Chao LiuInternet Services Research CenterMicrosoft Research-Redmond
Outline
Motivation & Challenges Background on Distributed Computing Standard ML on MapReduce
Classification: Naïve Bayes Clustering: Nonnegative Matrix Factorization Modeling: EM Algorithm
Customized ML on MapReduce Click Modeling Behavior Targeting
Conclusions04/19/2023 2
Motivation & Challenges
Data on the Web Scale: terabyte-to-petabyte data
▪ Around 20TB log data per day from Bing Dynamics: evolving data streams
▪ Click data streams with evolving/emerging topics
Applications: Non-traditional ML tasks▪ Predicting clicks & ads
04/19/2023 3
Outline
Motivation & Challenges Background on Distributed Computing Standard ML on MapReduce
Classification: Naïve Bayes Clustering: Nonnegative Matrix Factorization Modeling: EM Algorithm
Customized ML on MapReduce Click Modeling Behavior Targeting
Conclusions04/19/2023 4
Parallel vs. Distributed Computing
Parallel computing All processors have access to a shared
memory, which can be used to exchange information between processors
Distributed computing Each processor has its own private
memory (distributed memory), communicating over the network▪ Message passing ▪ MapReduce
04/19/2023 5
MPI vs. MapReduce
MPI is for task parallelism Suitable for CPU-intensive jobs Fine-grained communication control,
powerful computation model
MapReduce is for data parallelism Suitable for data-intensive jobs A restricted computation model
04/19/2023 6
Word Counting on MapReduce
7
Reducer
Aggregate values by keys
……
……Mapper
docs
(docId, doc) pairs
(w1,1)(w2,1)
(w3,1)
(w1,<1,1, 1>)
(w1, 3)
Mapper
docs
(docId, doc) pairs
(w1,1) (w3,1)
Mapper
docs
(docId, doc) pairs
(w1,1)(w2,1)
(w3,1)
Reducer
(w2,<1, 1>)
(w2, 2)
Reducer
(w3,<1,1,1>)
(w3, 3)
…
Web corpus on multiple machines
Mapper: for each word w in a doc, emit (w, 1)
Intermediate (key,value) pairs are aggregated by word
Reducer is copied to each machine to run over the intermediate data locally to produce the result
Machine Learning on MapReduce
A big picture: Not Omnipotent but good enough
04/19/2023 8
Standard ML Algorithm Customized ML Algorithm
MapReduce Friendly
• Classification: Naïve Bayes, logistic regression, MART, etc• Clustering: k-means, NMF, co-clustering, etc• Modeling: EM algorithm, Gaussian mixture, Latent Dirichlet Allocation, etc
• PageRank• Click Models• Behavior Tageting
MapReduce Unfriendly
• Classification: SVM• Clustering: Spectrum clustering
• Learning-to-Rank
Outline
Motivation & Challenges Background on Distributed Computing Standard ML on MapReduce
Classification: Naïve Bayes Clustering: Nonnegative Matrix Factorization Modeling: EM Algorithm
Customized ML on MapReduce Click Modeling Behavior Targeting
Conclusions04/19/2023 9
Classification: Naïve Bayes
P(C|X) P(C) P(X|C) =P(C)∏P(Xj|C)
10
……
Mapper
(x(i),y(i))
(j, xj(i),y(i))
(j, xj(i),y(i))
(j, xj(i),y(i))
Reduce on y(i)
P(C)
Reduce on j
P(Xj|C)(x(i),y(i)) Mapp
er
…………
Clustering: Nonnegative Matrix Factorization [Liu et al., WWW2010]
Effective tool to uncover latent relationships in nonnegative matrices with many applications [Berry et al., 2007, Sra & Dhillon, 2006] Interpretable dimensionality reduction [Lee & Seung, 1999] Document clustering [Shahnaz et al., 2006, Xu et al, 2006]
• Challenge: Can we scale NMF to million-by-million matrices
Am
n
WH
m
nkk
0,0,0 HWA
NMF Algorithm [Lee & Seung, 2000]
Am
n
WH
m
nkk
0,0,0 HWA
Distributed NMF
Data Partition: A, W and H across machines
A…
…
),,( , jiAji
W. . . . .
),( iwi
H
. . . . .
),( jhj
Computing DNMF: The Big Picture
WAW
AWH
Y
XHH
T
T
*.*.
… … …
…
),,(: , jiAjiA
),,,( , iji wAji
Map-I
Reduce-I
),( , iji wAj
Map-II
),( , iji wAj
Reduce-II
),( jxj
Map-IIIMap-IV
),0( WW T
Map-V
),0( iTi ww
…
),,,( jjj yxhj
…),( jyj
),(: iwiW ),(: jhjH
…
…
…
… ),( newjhj
Reduce-III
Reduce-V
AWX T
… …
…
),,(: , jiAjiA
),,,( , iji wAji
Map-I
Reduce-I
),( , iji wAj
Map-II
),( , iji wAj
Reduce-II
),( jxj
),(: iwiW
…
…
…
X = WTA
… …
Map-IIIMap-IV
),0( WW T
),0( iTi ww …),( jyj
),(: iwiW ),(: jhjH
Reduce-III WHWY T
m
ii
Ti
T wwWWC1
W
. . . . .
),( iwi
. . .
. . .
Y = WTWH
…
),( jxj
Map-V
…
),,,( jjj yxhj
…),( jyj
),(: jhjH
…
… ),( newjhj
Reduce-V
H = H.*X/Y
… … …
…
),,(: , jiAjiA
),,,( , iji wAji
Map-I
Reduce-I
),( , iji wAj
Map-II
),( , iji wAj
Reduce-II
),( jxj
Map-IIIMap-IV
),0( WW T
Map-V
),0( iTi ww
…
),,,( jjj yxhj
…),( jyj
),(: iwiW ),(: jhjH
…
…
…
… ),( newjhj
Reduce-III
Reduce-V
Scalability w.r.t. Matrix Size
3 hours per iteration, 20 iterations take around 20*3*0.72 ≈ 43 hours
Less than 7 hours on a 43.9M-by-769M matrix with 4.38 billion nonzero values
General EM on MapReduce
Map Evaluate Compute
Reduce
04/19/2023 21
Outline
Motivation & Challenges Background on Distributed Computing Standard ML on MapReduce
Classification: Naïve Bayes Clustering: Nonnegative Matrix Factorization Modeling: EM Algorithm
Customized ML on MapReduce Click Modeling Behavior Targeting
Conclusions04/19/2023 22
Click Modeling: Motivation
Clicks are good… Are these two
clicks equally “good”?
Non-clicks may have excuses: Not relevant Not examined
04/19/2023 23
Eye-tracking User Study
2404/19/2023
Bayesian Browsing Model [Liu et al., KDD2009]
query
URL1
URL2
URL3
URL4
C1 C2 C3 C4
S1 S2 S3 S4 Relevance
E1 E2 E3 E4
Examine Snippet
ClickThroughs
Dependencies in BBM
S1
E1
E2
C1
S2
C2
…
…
…
Si
Ei
Ci
the preceding click position before i
i id i r
Ultimate goal
Observation: conditional independence
Model Inference
P(C|S) by Chain Rule
Likelihood of search instance
From S to R:
kC
Putting Things Together
Posterior with
Re-organize by Rj’s
How many times dj
was clicked
How many times dj was not clicked when it is at position (r + d) and the preceding click is on position r
1:nC
What p(R|C1:n) Tells Us
Exact inference with joint posterior in closed form
Joint posterior factorizes and hence mutually independent
At most M(M+1)/2 + 1 numbers to fully characterize each posterior Count vector: 0 1 2 ( 1) 2( , , ,..., )M Me e e e e
An Example
ComputeCount vector for
R4
r
0 0
0 0 0
0
0 1 2
d
3 2 1
0
N4
N4, r, d
1
1
LearnBBM on MapReduce
Map: emit((q,u), idx)
Reduce: construct the count vector
Example on MapReduce
(U1, 0)(U2, 4)(U3, 0)
Map
(U1, 1)(U3, 0)(U4, 7)
Map
(U1, 1)(U3, 0)(U4, 0)
Map
21 1 1( ) (1 )p R R R 2 2( ) 1 0.98p R R 3
3 3( )p R R 4 4 4( ) (1 )p R R R (U1, 0, 1, 1) (U2,
4)(U4, 0, 7)
(U3, 0, 0, 0)
Reduce
Petabyte-Scale Experiment
Setup: 8 weeks data, 8
jobs Job k takes first k-
week data
• Experiment platform– SCOPE: Easy and Efficient Parallel Processing of
Massive Data Sets [Chaiken et al, VLDB’08]
Scalability of BBM
Increasing computation load more queries, more urls, more impressions
Near-constant elapse time
Computation Overload Elapse Time on SCOPE
• 3 hours• Scan 265 terabyte
data• Full posteriors for
1.15 billion (query, url) pairs
Large-scale Behavior Targeting [Ye et al., KDD2009]
Behavior targeting Ad serving based on users’ historical
behaviors Complementary to sponsored Ads and
content Ads
04/19/2023 36
Problem Setting
Goal Given ads in a certain category, locate qualified users
based on users’ past behaviors
Data User is identified by cookie Past behavior, profiled as a vector x, includes ad clicks,
ad views, page views, search queries, clicks, etc
Challenges: Scale: e.g., 9TB ad data with 500B entries in Aug'08 Sparse: e.g., the CTR of automotive display ads is 0.05% Dynamic: i.e., user behavior changes over time.
04/19/2023 37
Learning: Linear Poisson Model
CTR = ClickCnt/ViewCnt A model to predict expected click count A model to predict expected view count
Linear Poisson model
MLE on w
04/19/2023 38
Implementation on MapReduce
Learning Map: Compute and Reduce: Update
Prediction
04/19/2023 39
Outline
Motivation & Challenges Background on Distributed Computing Standard ML on MapReduce
Classification: Naïve Bayes Clustering: Nonnegative Matrix Factorization Modeling: EM Algorithm
Customized ML on MapReduce Click Modeling Behavior Targeting
Conclusions04/19/2023 40
Conclusions
Challenges imposed by Web data Scalability of standard algorithms Application-driven customized algorithms
Capability to consume huge amount of data outweighs algorithm sophistication Simple counting is no less powerful than sophisticated
algorithms when data is abundant or even infinite
MapReduce: a restricted computation model Not omnipotent but powerful enough Things we want to do turn out to be things we can do
04/19/2023 41
Q&A
Thank You!
04/19/2023 SEWM‘10 Keynote, Chengdu, China 42