OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies
-
Upload
basis-technology -
Category
Technology
-
view
257 -
download
2
description
Transcript of OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies
![Page 1: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies](https://reader033.fdocuments.in/reader033/viewer/2022052623/55999af11a28ab640d8b46bf/html5/thumbnails/1.jpg)
Real World Facets with Entity Resolution
Benson Margulies CTO
Basis Technology
![Page 2: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies](https://reader033.fdocuments.in/reader033/viewer/2022052623/55999af11a28ab640d8b46bf/html5/thumbnails/2.jpg)
The analyst’s dilemma
§ Andy’s job is to analyze interac;ons between European countries and Syria.
§ In par;cular, Andy wants to find unusual events on the Internet that he can report to his boss.
Government Analyst Andy
2
![Page 3: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies](https://reader033.fdocuments.in/reader033/viewer/2022052623/55999af11a28ab640d8b46bf/html5/thumbnails/3.jpg)
Haystack of raw data
§ The problem? Searching mul;ple messy datasets with both structured and unstructured data.
§ A common solu;on? Use complex, string-‐based queries to to try and “structurize” messy data
3
![Page 4: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies](https://reader033.fdocuments.in/reader033/viewer/2022052623/55999af11a28ab640d8b46bf/html5/thumbnails/4.jpg)
Analyst as Google keyword search pro
4
![Page 5: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies](https://reader033.fdocuments.in/reader033/viewer/2022052623/55999af11a28ab640d8b46bf/html5/thumbnails/5.jpg)
The lim of keyword search
5
The limita;on of Google is that it is engaged with strings, not things.
![Page 6: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies](https://reader033.fdocuments.in/reader033/viewer/2022052623/55999af11a28ab640d8b46bf/html5/thumbnails/6.jpg)
What is Andy really looking for?
6
Co-‐occurence
People Locations Organizations Date & Time Language
![Page 7: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies](https://reader033.fdocuments.in/reader033/viewer/2022052623/55999af11a28ab640d8b46bf/html5/thumbnails/7.jpg)
Entity resolution can help Andy…
§ Find references to REAL THINGS (people, places…)
§ Know that all men;ons of SYRIA are one en;ty
§ Reference a master dataset to resolve en;;es, connec;ng names to knowledge sources
§ Ul;mately, Andy can spend more ;me reading the right documents
7
![Page 8: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies](https://reader033.fdocuments.in/reader033/viewer/2022052623/55999af11a28ab640d8b46bf/html5/thumbnails/8.jpg)
Facet search by location [Syria]
8
![Page 9: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies](https://reader033.fdocuments.in/reader033/viewer/2022052623/55999af11a28ab640d8b46bf/html5/thumbnails/9.jpg)
Filter by time range [Sept. 26-‐27, 2013]
9
![Page 10: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies](https://reader033.fdocuments.in/reader033/viewer/2022052623/55999af11a28ab640d8b46bf/html5/thumbnails/10.jpg)
Filter by person of interest [Laurent Fabius]
10
![Page 11: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies](https://reader033.fdocuments.in/reader033/viewer/2022052623/55999af11a28ab640d8b46bf/html5/thumbnails/11.jpg)
What’s Hard About All This?
11
• Who goes with whom?
• Many John Smiths, and no one can spell Ghadaffi.
• What happens when new things appear?
• …and its implications for scale.
• What happens when the system is rwong?
• This system that makes decisions that stick around.
![Page 12: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies](https://reader033.fdocuments.in/reader033/viewer/2022052623/55999af11a28ab640d8b46bf/html5/thumbnails/12.jpg)
Addressing ambiguity
12
![Page 13: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies](https://reader033.fdocuments.in/reader033/viewer/2022052623/55999af11a28ab640d8b46bf/html5/thumbnails/13.jpg)
Addressing variety
13
![Page 14: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies](https://reader033.fdocuments.in/reader033/viewer/2022052623/55999af11a28ab640d8b46bf/html5/thumbnails/14.jpg)
The shock of the new
14
• Starting point: digest a knowledge base, match new arrivals.
• For example, Wikipedia
• Here comes someone new, we don’t want to:
• Decide that Jones Smyth is John Smith
• Decide that Jones Smtyh is different from Jones Smyth
• … with more limited evidence
• Relationship to scale
• Now we have a data structure that gets modified
![Page 15: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies](https://reader033.fdocuments.in/reader033/viewer/2022052623/55999af11a28ab640d8b46bf/html5/thumbnails/15.jpg)
Evaluation / Confidence
15
![Page 16: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies](https://reader033.fdocuments.in/reader033/viewer/2022052623/55999af11a28ab640d8b46bf/html5/thumbnails/16.jpg)
Human Correction
16
![Page 17: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies](https://reader033.fdocuments.in/reader033/viewer/2022052623/55999af11a28ab640d8b46bf/html5/thumbnails/17.jpg)
17
Thank you.