Machine Learning at PeerIndex
-
Upload
ferenc-huszar -
Category
Technology
-
view
3.988 -
download
3
description
Transcript of Machine Learning at PeerIndex
Machine Learning atPeerIndex
Ferenc Huszár
@fhuszar
Wednesday, 16 May 12
PeerIndex.com: understand your influence
Wednesday, 16 May 12
PeerPerks.com: free stuff for influencers
Wednesday, 16 May 12
PeerPerks: free stuff for influencers
Wednesday, 16 May 12
Machine Learning @ PeerIndex
Wednesday, 16 May 12
Machine Learning @ PeerIndex
• The usual stuff
Wednesday, 16 May 12
Machine Learning @ PeerIndex
• The usual stuff
• topic modelling/classification of tweets/statuses/URLs
Wednesday, 16 May 12
Machine Learning @ PeerIndex
• The usual stuff
• topic modelling/classification of tweets/statuses/URLs
• identity resolution across twitter, facebook, linkedIn
Wednesday, 16 May 12
Machine Learning @ PeerIndex
• The usual stuff
• topic modelling/classification of tweets/statuses/URLs
• identity resolution across twitter, facebook, linkedIn
• spambot/fraud detection: identify people gaming the system
Wednesday, 16 May 12
Machine Learning @ PeerIndex
• The usual stuff
• topic modelling/classification of tweets/statuses/URLs
• identity resolution across twitter, facebook, linkedIn
• spambot/fraud detection: identify people gaming the system
• sentiment classification: happy/sad/neutral
Wednesday, 16 May 12
Machine Learning @ PeerIndex
• The usual stuff
• topic modelling/classification of tweets/statuses/URLs
• identity resolution across twitter, facebook, linkedIn
• spambot/fraud detection: identify people gaming the system
• sentiment classification: happy/sad/neutral
• The really exciting stuff
Wednesday, 16 May 12
Machine Learning @ PeerIndex
• The usual stuff
• topic modelling/classification of tweets/statuses/URLs
• identity resolution across twitter, facebook, linkedIn
• spambot/fraud detection: identify people gaming the system
• sentiment classification: happy/sad/neutral
• The really exciting stuff
• inferring networks of influence - more about this later
Wednesday, 16 May 12
Machine Learning @ PeerIndex
• The usual stuff
• topic modelling/classification of tweets/statuses/URLs
• identity resolution across twitter, facebook, linkedIn
• spambot/fraud detection: identify people gaming the system
• sentiment classification: happy/sad/neutral
• The really exciting stuff
• inferring networks of influence - more about this later
• visualise different aspects of influence, in an engaging way
Wednesday, 16 May 12
Machine Learning @ PeerIndex
• The usual stuff
• topic modelling/classification of tweets/statuses/URLs
• identity resolution across twitter, facebook, linkedIn
• spambot/fraud detection: identify people gaming the system
• sentiment classification: happy/sad/neutral
• The really exciting stuff
• inferring networks of influence - more about this later
• visualise different aspects of influence, in an engaging way
• influence maximisation - submodular optimisation
Wednesday, 16 May 12
Inferring networks of influence
Wednesday, 16 May 12
Inferring networks of influence
Social network
Wednesday, 16 May 12
Inferring networks of influence
Social network Propagation probabilities
pi,j
Wednesday, 16 May 12
Inferring networks of influence
Social network Propagation probabilities
pi,j
1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00
32460 2011-08-25T01:11:43+01:00
259725 2011-10-24T03:32:19+01:0076539 2011-10-24T03:32:23+01:00
1922351 2011-10-24T04:28:47+01:009183 2011-10-24T03:30:57+01:00
3335398 2011-10-24T03:34:01+01:001616885 2011-10-24T03:48:16+01:00
82198 2011-10-24T03:48:29+01:00906390 2011-10-24T23:13:51+01:00
1051322 2011-10-24T03:40:02+01:00
Information cascade logshttp://www.pcworld.com/article/239719 http://techcrunch.com/2011/11/21/...
Wednesday, 16 May 12
Heurisric approaches to estimate pi,j
Wednesday, 16 May 12
Heurisric approaches to estimate
• purely based on local network structure
pi,j 1
din(j)
pi,j
Wednesday, 16 May 12
Heurisric approaches to estimate
• purely based on local network structure
• trivalency “model” my personal favourite :)
pi,j {0.1, 0.01, 0.01} randomly
pi,j 1
din(j)
pi,j
Wednesday, 16 May 12
Heurisric approaches to estimate
• purely based on local network structure
• trivalency “model” my personal favourite :)
• data-driven heuristics
pi,j number of items shared by j after i shared it
number of items shared by i
pi,j {0.1, 0.01, 0.01} randomly
pi,j 1
din(j)
pi,j
Wednesday, 16 May 12
Heurisric approaches to estimate
• purely based on local network structure
• trivalency “model” my personal favourite :)
• data-driven heuristics
pi,j number of items shared by j after i shared it
number of items shared by i
pi,j {0.1, 0.01, 0.01} randomly
pi,j 1
din(j)
pi,j
How do you solve this with machine learning?
Wednesday, 16 May 12
The likelihood
Wednesday, 16 May 12
DThe likelihood
✓P ( | )
Wednesday, 16 May 12
DThe likelihood
1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00
32460 2011-08-25T01:11:43+01:00
http://www.pcworld.com/article/239719
✓P ( | )
Wednesday, 16 May 12
DThe likelihood
1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00
32460 2011-08-25T01:11:43+01:00
http://www.pcworld.com/article/239719
pi,j
P ( | )
Wednesday, 16 May 12
DThe likelihood
1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00
32460 2011-08-25T01:11:43+01:00
http://www.pcworld.com/article/239719
pi,j
P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un
Wednesday, 16 May 12
DThe likelihood
1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00
32460 2011-08-25T01:11:43+01:00
http://www.pcworld.com/article/239719
pi,j
P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un
for subsequent users in cascade
Wednesday, 16 May 12
DThe likelihood
1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00
32460 2011-08-25T01:11:43+01:00
http://www.pcworld.com/article/239719
pi,j
p0,u1
P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un
for subsequent users in cascade
Wednesday, 16 May 12
DThe likelihood
1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00
32460 2011-08-25T01:11:43+01:00
http://www.pcworld.com/article/239719
pi,j
(1� (1� p0,u2) (1� pu1,u2))p0,u1
P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un
for subsequent users in cascade
Wednesday, 16 May 12
DThe likelihood
1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00
32460 2011-08-25T01:11:43+01:00
http://www.pcworld.com/article/239719
pi,j
(1� (1� p0,u2) (1� pu1,u2))p0,u1 · · ·
P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un
for subsequent users in cascade
Wednesday, 16 May 12
DThe likelihood
1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00
32460 2011-08-25T01:11:43+01:00
http://www.pcworld.com/article/239719
pi,j
P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un
for subsequent users in cascade
=nY
i=1
0
@1�i�1Y
j=1
(1� puj ,ui)
1
A
Wednesday, 16 May 12
DThe likelihood
1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00
32460 2011-08-25T01:11:43+01:00
http://www.pcworld.com/article/239719
pi,j
P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un
for users that are not in cascade
for subsequent users in cascade
=nY
i=1
0
@1�i�1Y
j=1
(1� puj ,ui)
1
A
Wednesday, 16 May 12
DThe likelihood
1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00
32460 2011-08-25T01:11:43+01:00
http://www.pcworld.com/article/239719
pi,j
P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un
for users that are not in cascade
for subsequent users in cascade
=nY
i=1
0
@1�i�1Y
j=1
(1� puj ,ui)
1
A
Y
u/2{u1...un}
Y
v2users(1� pu,v)
Wednesday, 16 May 12
Maximum likelihood at scale
Wednesday, 16 May 12
Maximum likelihood at scale
• data too sparse to learn one parameter per edge
Wednesday, 16 May 12
Maximum likelihood at scale
• data too sparse to learn one parameter per edge
• large scale gradient-based optimisation is costly
Wednesday, 16 May 12
Maximum likelihood at scale
• data too sparse to learn one parameter per edge
• large scale gradient-based optimisation is costly
• Solution: combine ensemble of heuristics with ML
Wednesday, 16 May 12
Maximum likelihood at scale
• data too sparse to learn one parameter per edge
• large scale gradient-based optimisation is costly
• Solution: combine ensemble of heuristics with ML
• use heuristics to compute probabilities at scale
Wednesday, 16 May 12
Maximum likelihood at scale
• data too sparse to learn one parameter per edge
• large scale gradient-based optimisation is costly
• Solution: combine ensemble of heuristics with ML
• use heuristics to compute probabilities at scale
• use ML to tune parameters on small-scale data
Wednesday, 16 May 12
Influence maximisation
Wednesday, 16 May 12
Influence maximisation
• Select a set of users to maximise outreach
Wednesday, 16 May 12
Influence maximisation
• Select a set of users to maximise outreach
• Influence of people combines non-linearly
Wednesday, 16 May 12
Influence maximisation
• Select a set of users to maximise outreach
• Influence of people combines non-linearly
• In many models it combines sub-modularly
Wednesday, 16 May 12
Influence maximisation
• Select a set of users to maximise outreach
• Influence of people combines non-linearly
• In many models it combines sub-modularly
A ✓ B =) f(A [ {x})� f(A) � f(B [ {x})� f(B)
Wednesday, 16 May 12
Influence maximisation
• Select a set of users to maximise outreach
• Influence of people combines non-linearly
• In many models it combines sub-modularly
A ✓ B =) f(A [ {x})� f(A) � f(B [ {x})� f(B)
• these functions are fun to optimise
Wednesday, 16 May 12
Influence maximisation
• Select a set of users to maximise outreach
• Influence of people combines non-linearly
• In many models it combines sub-modularly
A ✓ B =) f(A [ {x})� f(A) � f(B [ {x})� f(B)
• these functions are fun to optimise
• pops up many times in machine learning
Wednesday, 16 May 12
Wrap up
Wednesday, 16 May 12
Wrap up
• two lines of ‘data’ products: PeerIndex, PeerPerks
Wednesday, 16 May 12
Wrap up
• two lines of ‘data’ products: PeerIndex, PeerPerks
• lots of ‘standard’ machine learning tasks
Wednesday, 16 May 12
Wrap up
• two lines of ‘data’ products: PeerIndex, PeerPerks
• lots of ‘standard’ machine learning tasks
• some uniquely exciting problems
Wednesday, 16 May 12
Wrap up
• two lines of ‘data’ products: PeerIndex, PeerPerks
• lots of ‘standard’ machine learning tasks
• some uniquely exciting problems
• inferring propagation probabilities
Wednesday, 16 May 12
Wrap up
• two lines of ‘data’ products: PeerIndex, PeerPerks
• lots of ‘standard’ machine learning tasks
• some uniquely exciting problems
• inferring propagation probabilities
• compute expected number of users one reaches out to
Wednesday, 16 May 12
Wrap up
• two lines of ‘data’ products: PeerIndex, PeerPerks
• lots of ‘standard’ machine learning tasks
• some uniquely exciting problems
• inferring propagation probabilities
• compute expected number of users one reaches out to
• putting all aspects together into a single number, and visualise
Wednesday, 16 May 12
Wrap up
• two lines of ‘data’ products: PeerIndex, PeerPerks
• lots of ‘standard’ machine learning tasks
• some uniquely exciting problems
• inferring propagation probabilities
• compute expected number of users one reaches out to
• putting all aspects together into a single number, and visualise
• influence maximisation
Wednesday, 16 May 12
Thanks
We’re hiring ML scientists, interns and engineers...
@fhuszar
Wednesday, 16 May 12