Data By The People, For The People

Post on 09-May-2015

15.920 views 4 download

description

Data By The People, For The People Daniel Tunkelang Director, Data Science at LinkedIn Invited Talk at the 21st ACM International Conference on Information and Knowledge Management (CIKM 2012) LinkedIn has a unique data collection: the 175M+ members who use LinkedIn are also the content those same members access using our information retrieval products. LinkedIn members performed over 4 billion professionally-oriented searches in 2011, most of those to find and discover other people. Every LinkedIn search and recommendation is deeply personalized, reflecting the user's current employment, career history, and professional network. In this talk, I will describe some of the challenges and opportunities that arise from working with this unique corpus. I will discuss work we are doing in the areas of relevance, recommendation, and reputation, as well as the ecosystem we have developed to incent people to provide the high-quality semi-structured profiles that make LinkedIn so useful. Bio: Daniel Tunkelang leads the data science team at LinkedIn, which analyzes terabytes of data to produce products and insights that serve LinkedIn's members. Prior to LinkedIn, Daniel led a local search quality team at Google. Daniel was a founding employee of faceted search pioneer Endeca (recently acquired by Oracle), where he spent ten years as Chief Scientist. He has authored fourteen patents, written a textbook on faceted search, created the annual workshop on human-computer interaction and information retrieval (HCIR), and participated in the premier research conferences on information retrieval, knowledge management, databases, and data mining (SIGIR, CIKM, SIGMOD, SIAM Data Mining). Daniel holds a PhD in Computer Science from CMU, as well as BS and MS degrees from MIT.

Transcript of Data By The People, For The People

Recruiting Solutions Recruiting Solutions Recruiting Solutions

Data By The People, For The People Daniel Tunkelang Director, Data Science LinkedIn

Daniel

1

Why do 175M+ people use LinkedIn?

2

Identity: find and be found

3

Insights: discover and share knowledge

4

People use LinkedIn because of other people.

5

People as Users + People as Data

Unique opportunities and challenges! §  Search §  Recommendations §  Networking

6

Search

7

People search is personal!

8

But not all relevance factors are personal.

9

Good Bad

People are semi-structured objects.

10 10

for i in [1..n]! s ← w1 w2 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← {s}! a.prob ← Pc(s)! B[i] ← {a}! for j in [1..i-1]! for b in B[j]! s ← wj wj+1 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← b.segs U {s}! a.prob ← b.prob * Pc(s)! B[i] ← B[i] U {a}! sort B[i] by prob! truncate B[i] to size k!

LinkedIn uses scale to derive structure.

11 11

Software Developer

Social network is more than a ranking signal.

12 12

People are a gateway to other entities.

13 13

Search: Summary

14

People finding people.

People being found.

People finding content.

Through other people.

Recommendations

15 15

Recommendation products at LinkedIn

16 16

Similar Profiles

Events You May Be Interested In

News

Network updates

Connections

LinkedIn’s recommender ecosystem

17

Recommendations drive:

> 50% of connections > 50% of job applications > 50% of group joins

Inputs for recommender systems

18

Content Social Graph

Behavior

Page Views Actions

Queries

Jobs You Might Be Interested In

19

How LinkedIn matches people to jobs

20

Corpus Stats

Job

User Base

Filtered

title geo company

industry description functional area

Candidate

General expertise specialties education headline geo experience

Current Position title summary tenure length industry functional area …

Similarity (candidate expertise, job description)

0.56 Similarity

(candidate specialties, job description)

0.2 Transition probability

(candidate industry, job industry)

0.43

Title Similarity

0.8

Similarity (headline, title)

0.7 . . .

derived

Matching Binary Exact matches: geo, industry, … Soft transition probabilities, similarity, … Text

Transition probabilities Connectivity yrs of experience to reach title education needed for this title …

Is job-hunting socially contagious?

21

[Posse, 2012]

Social referral

22

Suggest based on connection strength and relevance to target user.

2x conversion!

[Amin et al, 2012]

Suggested skill endorsements

23

Recommendations: Summary

24 24

Content is king.

Connections provide social dimension.

Context determines where and when a recommendation is appropriate.

Networking

25

People You May Know

26

Closing the triangles

§  Triads suggest and affect relationships. [Simmel, 1908], [Granovetter, 1973]

§  Triangle closing is a Big Data problem. [Shah, 2011]

§  Use machine learning to rank candidates. 27

Alice

Bob

Carol

?

Shared connections as a signal

28

Power of social proof

29

More power of social proof

30

Networking: Summary

31

Close triangles to suggest connections.

Connections as social proof.

Unleash the power of weak ties.

Conclusion

§  People use LinkedIn because of other people. §  Primary use cases:

– Find and be found. – Discover and share knowledge.

§  People are at the heart of LinkedIn’s products: – Search – Recommendations – Networking

32

2 4 8

17

32

55

90

2004 2005 2006 2007 2008 2009 2010 2011 LinkedIn Members (Millions)

175M+

25th Most visit website worldwide (Comscore 6-12)

Company pages

>2M

62% non U.S.

2/sec

85% Fortune 500 Companies use LinkedIn to hire

Thank You!

33

We’re

Hiring!

Learn more at http://data.linkedin.com/