Scale, Structure, and Semantics
-
Upload
daniel-tunkelang -
Category
Technology
-
view
6.408 -
download
0
description
Transcript of Scale, Structure, and Semantics
![Page 1: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/1.jpg)
Recruiting Solutions Recruiting Solutions Recruiting Solutions
Scale, Structure, and Semantics Daniel Tunkelang Principal Data Scientist at LinkedIn
Daniel
1
![Page 2: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/2.jpg)
Take-Aways
2
Communication trumps knowledge representation.
Communication is the problem and the solution.
![Page 3: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/3.jpg)
Overview
1. Knowledge representation is overrated. 2. Computation is underrated.
3. We have a communication problem.
3
![Page 4: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/4.jpg)
The Bad News
1. Knowledge representation is overrated. 2. Computation is underrated.
3. We have a communication problem.
4
![Page 5: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/5.jpg)
AI: a dream deferred.
5
![Page 6: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/6.jpg)
Memex: the Computer Science Version
6
![Page 7: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/7.jpg)
Cyc
7
![Page 8: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/8.jpg)
Freebase
8
![Page 9: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/9.jpg)
Wolfram Alpha
9
![Page 10: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/10.jpg)
Knowledge representation is overrated.
Today’s knowledge repositories are: § incomplete § inconsistent § inscrutable § and not sustained by economic incentives. 1986 estimate of effort to complete Cyc: § 250,000 rules + 350 person-years
10
![Page 11: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/11.jpg)
The Good News
1. Knowledge representation is overrated. 2. Computation is underrated.
3. We have a communication problem.
11
![Page 12: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/12.jpg)
Deep Blue
12
vs.
![Page 13: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/13.jpg)
Watson
13
![Page 14: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/14.jpg)
Plain Old Search Engines are Pretty Good Too
14
http://blog.stephenwolfram.com/2011/01/jeopardy-ibm-and-wolframalpha/
![Page 15: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/15.jpg)
The Unreasonable Effectiveness of Data
§ simple models + lots of data >> elaborate models + less data
§ machine translation: parallel corpora >> elaborate rules for syntactic and semantic patterns
§ semantic web formalism just means semantic interpretation on shorter strings between angle brackets
Alon Halevy, Peter Norvig, and Fernando Pereira (2009)
15
![Page 16: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/16.jpg)
Today’s Challenge
1. Knowledge representation is overrated. 2. Computation is underrated.
3. We have a communication problem.
16
![Page 17: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/17.jpg)
Semi-structured Data
17
Michael K. Bergman, http://www.mkbergman.com/
![Page 18: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/18.jpg)
Semi-structured Data at LinkedIn
<person> <id> <first-name /> <last-name /> <location> <name> <country> <code> </country>
</location> <industry> … </person>
Summary
I lead a data science team at LinkedIn, which analyzes terabytes of data to produce products and insights that serve LinkedIn’s members. Prior to LinkedIn, I led a local search quality team at Google and was a founding employee of faceted search pioneer Endeca (acquired by Oracle in 2010), where…
![Page 19: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/19.jpg)
Semi-structured Search is a Killer App
19
![Page 20: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/20.jpg)
Another Example: Helping a Friend
Dear Daniel, I'm attaching the resume of an old friend who just moved up to the Bay Area.
He has a very strong background in: § mobile / wireless applications § start-ups and new product launches § international expansion
Best regards, XXX
20
![Page 21: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/21.jpg)
Company Search
21
![Page 22: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/22.jpg)
Semi-structured Data Empowers Users
22
![Page 23: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/23.jpg)
Data-Driven Recommendations
23
![Page 24: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/24.jpg)
Data-Driven Computation Serves Communication
24
for i in [1..n]! s ← w1 w2 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← {s}! a.prob ← Pc(s)! B[i] ← {a}! for j in [1..i-1]! for b in B[j]! s ← wj wj+1 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← b.segs U {s}! a.prob ← b.prob * Pc(s)! B[i] ← B[i] U {a}! sort B[i] by prob! truncate B[i] to size k!
![Page 25: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/25.jpg)
Recommendations Leverage Semi-structured Data
25
Corpus Stats
Job
User Base
Filtered
title geo company
industry description functional area
…
Candidate
General expertise specialties education headline geo experience
Current Position title summary tenure length industry functional area …
Similarity (candidate expertise, job description)
0.56 Similarity
(candidate specialties, job description)
0.2 Transition probability
(candidate industry, job industry)
0.43
Title Similarity
0.8
Similarity (headline, title)
0.7 . . .
derived
Matching Binary Exact matches: geo, industry, … Soft transition probabilities, similarity, … Text
Transition probabilities Connectivity yrs of experience to reach title education needed for this title …
![Page 26: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/26.jpg)
Skills: A Practical Knowledge Representation
26
![Page 27: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/27.jpg)
Data-Driven Query Expansion for Recall
27
![Page 28: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/28.jpg)
Data-Driven Query Refinement for Precision
28
![Page 29: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/29.jpg)
There is no perfect schema or vocabulary.
§ And even if there were, not everyone would use it.
§ Knowledge representation has only succeeded within narrow scope.
§ Brute force is surprisingly effective but does not leverage the user as an intelligent partner.
29
![Page 30: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/30.jpg)
Communication is the problem and the solution.
§ Rich communication channel fills gaps in system’s knowledge representation and in user’s knowledge.
§ Use data science to make the system smart, but be humble and empower the human user.
You've got the brawn I've got the brains Let's make lots of money Pet Shop Boys, “Opportunities”
30
![Page 31: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/31.jpg)
The Future is Upon Us
31
![Page 32: Scale, Structure, and Semantics](https://reader038.fdocuments.in/reader038/viewer/2022103016/554d8f71b4c9053e0c8b56f0/html5/thumbnails/32.jpg)
One More Thing
“More data beats clever algorithms but better data beats more data.”
Monica Rogati @ Strata 2012
32