Machine Learning at PeerIndex

57
Machine Learning at PeerIndex Ferenc Huszár @fhuszar Wednesday, 16 May 12

description

Slides for talk given at London Machine Learning Meetup on 29 Feb about machine learning behind measuring people's influence at PeerIndex.

Transcript of Machine Learning at PeerIndex

Page 1: Machine Learning at PeerIndex

Machine Learning atPeerIndex

Ferenc Huszár

@fhuszar

Wednesday, 16 May 12

Page 2: Machine Learning at PeerIndex

PeerIndex.com: understand your influence

Wednesday, 16 May 12

Page 3: Machine Learning at PeerIndex

PeerPerks.com: free stuff for influencers

Wednesday, 16 May 12

Page 4: Machine Learning at PeerIndex

PeerPerks: free stuff for influencers

Wednesday, 16 May 12

Page 5: Machine Learning at PeerIndex

Machine Learning @ PeerIndex

Wednesday, 16 May 12

Page 6: Machine Learning at PeerIndex

Machine Learning @ PeerIndex

• The usual stuff

Wednesday, 16 May 12

Page 7: Machine Learning at PeerIndex

Machine Learning @ PeerIndex

• The usual stuff

• topic modelling/classification of tweets/statuses/URLs

Wednesday, 16 May 12

Page 8: Machine Learning at PeerIndex

Machine Learning @ PeerIndex

• The usual stuff

• topic modelling/classification of tweets/statuses/URLs

• identity resolution across twitter, facebook, linkedIn

Wednesday, 16 May 12

Page 9: Machine Learning at PeerIndex

Machine Learning @ PeerIndex

• The usual stuff

• topic modelling/classification of tweets/statuses/URLs

• identity resolution across twitter, facebook, linkedIn

• spambot/fraud detection: identify people gaming the system

Wednesday, 16 May 12

Page 10: Machine Learning at PeerIndex

Machine Learning @ PeerIndex

• The usual stuff

• topic modelling/classification of tweets/statuses/URLs

• identity resolution across twitter, facebook, linkedIn

• spambot/fraud detection: identify people gaming the system

• sentiment classification: happy/sad/neutral

Wednesday, 16 May 12

Page 11: Machine Learning at PeerIndex

Machine Learning @ PeerIndex

• The usual stuff

• topic modelling/classification of tweets/statuses/URLs

• identity resolution across twitter, facebook, linkedIn

• spambot/fraud detection: identify people gaming the system

• sentiment classification: happy/sad/neutral

• The really exciting stuff

Wednesday, 16 May 12

Page 12: Machine Learning at PeerIndex

Machine Learning @ PeerIndex

• The usual stuff

• topic modelling/classification of tweets/statuses/URLs

• identity resolution across twitter, facebook, linkedIn

• spambot/fraud detection: identify people gaming the system

• sentiment classification: happy/sad/neutral

• The really exciting stuff

• inferring networks of influence - more about this later

Wednesday, 16 May 12

Page 13: Machine Learning at PeerIndex

Machine Learning @ PeerIndex

• The usual stuff

• topic modelling/classification of tweets/statuses/URLs

• identity resolution across twitter, facebook, linkedIn

• spambot/fraud detection: identify people gaming the system

• sentiment classification: happy/sad/neutral

• The really exciting stuff

• inferring networks of influence - more about this later

• visualise different aspects of influence, in an engaging way

Wednesday, 16 May 12

Page 14: Machine Learning at PeerIndex

Machine Learning @ PeerIndex

• The usual stuff

• topic modelling/classification of tweets/statuses/URLs

• identity resolution across twitter, facebook, linkedIn

• spambot/fraud detection: identify people gaming the system

• sentiment classification: happy/sad/neutral

• The really exciting stuff

• inferring networks of influence - more about this later

• visualise different aspects of influence, in an engaging way

• influence maximisation - submodular optimisation

Wednesday, 16 May 12

Page 15: Machine Learning at PeerIndex

Inferring networks of influence

Wednesday, 16 May 12

Page 16: Machine Learning at PeerIndex

Inferring networks of influence

Social network

Wednesday, 16 May 12

Page 17: Machine Learning at PeerIndex

Inferring networks of influence

Social network Propagation probabilities

pi,j

Wednesday, 16 May 12

Page 18: Machine Learning at PeerIndex

Inferring networks of influence

Social network Propagation probabilities

pi,j

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

259725 2011-10-24T03:32:19+01:0076539 2011-10-24T03:32:23+01:00

1922351 2011-10-24T04:28:47+01:009183 2011-10-24T03:30:57+01:00

3335398 2011-10-24T03:34:01+01:001616885 2011-10-24T03:48:16+01:00

82198 2011-10-24T03:48:29+01:00906390 2011-10-24T23:13:51+01:00

1051322 2011-10-24T03:40:02+01:00

Information cascade logshttp://www.pcworld.com/article/239719 http://techcrunch.com/2011/11/21/...

Wednesday, 16 May 12

Page 19: Machine Learning at PeerIndex

Heurisric approaches to estimate pi,j

Wednesday, 16 May 12

Page 20: Machine Learning at PeerIndex

Heurisric approaches to estimate

• purely based on local network structure

pi,j 1

din(j)

pi,j

Wednesday, 16 May 12

Page 21: Machine Learning at PeerIndex

Heurisric approaches to estimate

• purely based on local network structure

• trivalency “model” my personal favourite :)

pi,j {0.1, 0.01, 0.01} randomly

pi,j 1

din(j)

pi,j

Wednesday, 16 May 12

Page 22: Machine Learning at PeerIndex

Heurisric approaches to estimate

• purely based on local network structure

• trivalency “model” my personal favourite :)

• data-driven heuristics

pi,j number of items shared by j after i shared it

number of items shared by i

pi,j {0.1, 0.01, 0.01} randomly

pi,j 1

din(j)

pi,j

Wednesday, 16 May 12

Page 23: Machine Learning at PeerIndex

Heurisric approaches to estimate

• purely based on local network structure

• trivalency “model” my personal favourite :)

• data-driven heuristics

pi,j number of items shared by j after i shared it

number of items shared by i

pi,j {0.1, 0.01, 0.01} randomly

pi,j 1

din(j)

pi,j

How do you solve this with machine learning?

Wednesday, 16 May 12

Page 24: Machine Learning at PeerIndex

The likelihood

Wednesday, 16 May 12

Page 25: Machine Learning at PeerIndex

DThe likelihood

✓P ( | )

Wednesday, 16 May 12

Page 26: Machine Learning at PeerIndex

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

✓P ( | )

Wednesday, 16 May 12

Page 27: Machine Learning at PeerIndex

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

pi,j

P ( | )

Wednesday, 16 May 12

Page 28: Machine Learning at PeerIndex

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

pi,j

P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un

Wednesday, 16 May 12

Page 29: Machine Learning at PeerIndex

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

pi,j

P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un

for subsequent users in cascade

Wednesday, 16 May 12

Page 30: Machine Learning at PeerIndex

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

pi,j

p0,u1

P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un

for subsequent users in cascade

Wednesday, 16 May 12

Page 31: Machine Learning at PeerIndex

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

pi,j

(1� (1� p0,u2) (1� pu1,u2))p0,u1

P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un

for subsequent users in cascade

Wednesday, 16 May 12

Page 32: Machine Learning at PeerIndex

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

pi,j

(1� (1� p0,u2) (1� pu1,u2))p0,u1 · · ·

P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un

for subsequent users in cascade

Wednesday, 16 May 12

Page 33: Machine Learning at PeerIndex

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

pi,j

P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un

for subsequent users in cascade

=nY

i=1

0

@1�i�1Y

j=1

(1� puj ,ui)

1

A

Wednesday, 16 May 12

Page 34: Machine Learning at PeerIndex

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

pi,j

P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un

for users that are not in cascade

for subsequent users in cascade

=nY

i=1

0

@1�i�1Y

j=1

(1� puj ,ui)

1

A

Wednesday, 16 May 12

Page 35: Machine Learning at PeerIndex

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

pi,j

P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un

for users that are not in cascade

for subsequent users in cascade

=nY

i=1

0

@1�i�1Y

j=1

(1� puj ,ui)

1

A

Y

u/2{u1...un}

Y

v2users(1� pu,v)

Wednesday, 16 May 12

Page 36: Machine Learning at PeerIndex

Maximum likelihood at scale

Wednesday, 16 May 12

Page 37: Machine Learning at PeerIndex

Maximum likelihood at scale

• data too sparse to learn one parameter per edge

Wednesday, 16 May 12

Page 38: Machine Learning at PeerIndex

Maximum likelihood at scale

• data too sparse to learn one parameter per edge

• large scale gradient-based optimisation is costly

Wednesday, 16 May 12

Page 39: Machine Learning at PeerIndex

Maximum likelihood at scale

• data too sparse to learn one parameter per edge

• large scale gradient-based optimisation is costly

• Solution: combine ensemble of heuristics with ML

Wednesday, 16 May 12

Page 40: Machine Learning at PeerIndex

Maximum likelihood at scale

• data too sparse to learn one parameter per edge

• large scale gradient-based optimisation is costly

• Solution: combine ensemble of heuristics with ML

• use heuristics to compute probabilities at scale

Wednesday, 16 May 12

Page 41: Machine Learning at PeerIndex

Maximum likelihood at scale

• data too sparse to learn one parameter per edge

• large scale gradient-based optimisation is costly

• Solution: combine ensemble of heuristics with ML

• use heuristics to compute probabilities at scale

• use ML to tune parameters on small-scale data

Wednesday, 16 May 12

Page 42: Machine Learning at PeerIndex

Influence maximisation

Wednesday, 16 May 12

Page 43: Machine Learning at PeerIndex

Influence maximisation

• Select a set of users to maximise outreach

Wednesday, 16 May 12

Page 44: Machine Learning at PeerIndex

Influence maximisation

• Select a set of users to maximise outreach

• Influence of people combines non-linearly

Wednesday, 16 May 12

Page 45: Machine Learning at PeerIndex

Influence maximisation

• Select a set of users to maximise outreach

• Influence of people combines non-linearly

• In many models it combines sub-modularly

Wednesday, 16 May 12

Page 46: Machine Learning at PeerIndex

Influence maximisation

• Select a set of users to maximise outreach

• Influence of people combines non-linearly

• In many models it combines sub-modularly

A ✓ B =) f(A [ {x})� f(A) � f(B [ {x})� f(B)

Wednesday, 16 May 12

Page 47: Machine Learning at PeerIndex

Influence maximisation

• Select a set of users to maximise outreach

• Influence of people combines non-linearly

• In many models it combines sub-modularly

A ✓ B =) f(A [ {x})� f(A) � f(B [ {x})� f(B)

• these functions are fun to optimise

Wednesday, 16 May 12

Page 48: Machine Learning at PeerIndex

Influence maximisation

• Select a set of users to maximise outreach

• Influence of people combines non-linearly

• In many models it combines sub-modularly

A ✓ B =) f(A [ {x})� f(A) � f(B [ {x})� f(B)

• these functions are fun to optimise

• pops up many times in machine learning

Wednesday, 16 May 12

Page 49: Machine Learning at PeerIndex

Wrap up

Wednesday, 16 May 12

Page 50: Machine Learning at PeerIndex

Wrap up

• two lines of ‘data’ products: PeerIndex, PeerPerks

Wednesday, 16 May 12

Page 51: Machine Learning at PeerIndex

Wrap up

• two lines of ‘data’ products: PeerIndex, PeerPerks

• lots of ‘standard’ machine learning tasks

Wednesday, 16 May 12

Page 52: Machine Learning at PeerIndex

Wrap up

• two lines of ‘data’ products: PeerIndex, PeerPerks

• lots of ‘standard’ machine learning tasks

• some uniquely exciting problems

Wednesday, 16 May 12

Page 53: Machine Learning at PeerIndex

Wrap up

• two lines of ‘data’ products: PeerIndex, PeerPerks

• lots of ‘standard’ machine learning tasks

• some uniquely exciting problems

• inferring propagation probabilities

Wednesday, 16 May 12

Page 54: Machine Learning at PeerIndex

Wrap up

• two lines of ‘data’ products: PeerIndex, PeerPerks

• lots of ‘standard’ machine learning tasks

• some uniquely exciting problems

• inferring propagation probabilities

• compute expected number of users one reaches out to

Wednesday, 16 May 12

Page 55: Machine Learning at PeerIndex

Wrap up

• two lines of ‘data’ products: PeerIndex, PeerPerks

• lots of ‘standard’ machine learning tasks

• some uniquely exciting problems

• inferring propagation probabilities

• compute expected number of users one reaches out to

• putting all aspects together into a single number, and visualise

Wednesday, 16 May 12

Page 56: Machine Learning at PeerIndex

Wrap up

• two lines of ‘data’ products: PeerIndex, PeerPerks

• lots of ‘standard’ machine learning tasks

• some uniquely exciting problems

• inferring propagation probabilities

• compute expected number of users one reaches out to

• putting all aspects together into a single number, and visualise

• influence maximisation

Wednesday, 16 May 12

Page 57: Machine Learning at PeerIndex

Thanks

We’re hiring ML scientists, interns and engineers...

[email protected]

@fhuszar

Wednesday, 16 May 12