Natural Language Processing for Underground Communications
description
Transcript of Natural Language Processing for Underground Communications
![Page 1: Natural Language Processing for Underground Communications](https://reader033.fdocuments.in/reader033/viewer/2022051115/56813fc8550346895daaa4bf/html5/thumbnails/1.jpg)
Natural Language Processing for Underground Communications
Dan Klein
MURI Kickoff, 11/20/2009
![Page 2: Natural Language Processing for Underground Communications](https://reader033.fdocuments.in/reader033/viewer/2022051115/56813fc8550346895daaa4bf/html5/thumbnails/2.jpg)
Underground Communications
Example Data
![Page 3: Natural Language Processing for Underground Communications](https://reader033.fdocuments.in/reader033/viewer/2022051115/56813fc8550346895daaa4bf/html5/thumbnails/3.jpg)
Underground Communications
Example Data, Manual Extraction
![Page 4: Natural Language Processing for Underground Communications](https://reader033.fdocuments.in/reader033/viewer/2022051115/56813fc8550346895daaa4bf/html5/thumbnails/4.jpg)
Processing: Information Extraction
![Page 5: Natural Language Processing for Underground Communications](https://reader033.fdocuments.in/reader033/viewer/2022051115/56813fc8550346895daaa4bf/html5/thumbnails/5.jpg)
Observation Graphs
http://www.spam-reklama.ru/contact.html
http://www.rossmail.ru/offline.htm
http://www.fax-reklama.ru/contact.html
http://www.f-mail.ru/kontact/
![Page 6: Natural Language Processing for Underground Communications](https://reader033.fdocuments.in/reader033/viewer/2022051115/56813fc8550346895daaa4bf/html5/thumbnails/6.jpg)
Underlying Entities and Relations
Person 1211Alias: SteakcapICQ: 598199837Location: France
ReferralFrom: Person 2133To: Person 1211Product: 3319
Person 2133Alias: ThunderelviICQ: 787659871Location: USA
Product 3319Type: FB HarvesterContact: 709-324-0989
Person 9876Alias: ZakarICQ: 234150301Email: zakar@e-...
EmployeePerson: Person 9876Product: 5621Role: Developer
Product 5621Type: Spam SenderContact: 495-210-4423
Extraction Goal
![Page 7: Natural Language Processing for Underground Communications](https://reader033.fdocuments.in/reader033/viewer/2022051115/56813fc8550346895daaa4bf/html5/thumbnails/7.jpg)
Existing NLP Tasks
![Page 8: Natural Language Processing for Underground Communications](https://reader033.fdocuments.in/reader033/viewer/2022051115/56813fc8550346895daaa4bf/html5/thumbnails/8.jpg)
Discourse Structure
sign deliver vote
![Page 9: Natural Language Processing for Underground Communications](https://reader033.fdocuments.in/reader033/viewer/2022051115/56813fc8550346895daaa4bf/html5/thumbnails/9.jpg)
General Approach
![Page 10: Natural Language Processing for Underground Communications](https://reader033.fdocuments.in/reader033/viewer/2022051115/56813fc8550346895daaa4bf/html5/thumbnails/10.jpg)
![Page 11: Natural Language Processing for Underground Communications](https://reader033.fdocuments.in/reader033/viewer/2022051115/56813fc8550346895daaa4bf/html5/thumbnails/11.jpg)
![Page 12: Natural Language Processing for Underground Communications](https://reader033.fdocuments.in/reader033/viewer/2022051115/56813fc8550346895daaa4bf/html5/thumbnails/12.jpg)
![Page 13: Natural Language Processing for Underground Communications](https://reader033.fdocuments.in/reader033/viewer/2022051115/56813fc8550346895daaa4bf/html5/thumbnails/13.jpg)
![Page 14: Natural Language Processing for Underground Communications](https://reader033.fdocuments.in/reader033/viewer/2022051115/56813fc8550346895daaa4bf/html5/thumbnails/14.jpg)
An Entity Reference Model
Our Existing Approach
![Page 15: Natural Language Processing for Underground Communications](https://reader033.fdocuments.in/reader033/viewer/2022051115/56813fc8550346895daaa4bf/html5/thumbnails/15.jpg)
![Page 16: Natural Language Processing for Underground Communications](https://reader033.fdocuments.in/reader033/viewer/2022051115/56813fc8550346895daaa4bf/html5/thumbnails/16.jpg)
![Page 17: Natural Language Processing for Underground Communications](https://reader033.fdocuments.in/reader033/viewer/2022051115/56813fc8550346895daaa4bf/html5/thumbnails/17.jpg)
Adding Semantic Knowledge
America Online company
Our Current Work
![Page 18: Natural Language Processing for Underground Communications](https://reader033.fdocuments.in/reader033/viewer/2022051115/56813fc8550346895daaa4bf/html5/thumbnails/18.jpg)
Evaluation: Reference
MUC F1 - Cluster Similarity UnsupervisedSupervised
UnsupervisedBaseline
Bengston &Roth 08
PreliminaryCurrent Work
Does it Work?
![Page 19: Natural Language Processing for Underground Communications](https://reader033.fdocuments.in/reader033/viewer/2022051115/56813fc8550346895daaa4bf/html5/thumbnails/19.jpg)
Cross-Document IdentityWhat’s Coming Up
![Page 20: Natural Language Processing for Underground Communications](https://reader033.fdocuments.in/reader033/viewer/2022051115/56813fc8550346895daaa4bf/html5/thumbnails/20.jpg)
Extracting Global Entities
![Page 21: Natural Language Processing for Underground Communications](https://reader033.fdocuments.in/reader033/viewer/2022051115/56813fc8550346895daaa4bf/html5/thumbnails/21.jpg)
Underlying Entities and Relations
Person 1211Alias: SteakcapICQ: 598199837Location: France
ReferralFrom: Person 2133To: Person 1211Product: 3319
Person 2133Alias: ThunderelviICQ: 787659871Location: USA
Product 3319Type: FB HarvesterContact: 709-324-0989
Person 9876Alias: ZakarICQ: 234150301Email: zakar@e-...
EmployeePerson: Person 9876Product: 5621Role: Developer
Product 5621Type: Spam SenderContact: 495-210-4423
Subsequent Goals
![Page 22: Natural Language Processing for Underground Communications](https://reader033.fdocuments.in/reader033/viewer/2022051115/56813fc8550346895daaa4bf/html5/thumbnails/22.jpg)
Summary
Goal: systems which simultaneously extract and dedupe Train in an unsupervised / discovery manner Requires: both new statistical machinery and good models of
underlying domain structure (transactions, etc) Requires: processing domain-specific language (domain adaptation,
grammar induction)
Evaluation: are the entities and relations correct? First steps: measure general approach on newswire, etc. where we
know the right answers Also: evaluate on underground network data
Near term: increased accuracy in identity resolution, begin to extract simple relations, better basic analysis
![Page 23: Natural Language Processing for Underground Communications](https://reader033.fdocuments.in/reader033/viewer/2022051115/56813fc8550346895daaa4bf/html5/thumbnails/23.jpg)
Thanks!