Natural Language Processing: Data, Algorithms, and Knowledge

26
Natural Language Processing: Data, Algorithms, and Knowledge BEARS 2011 Dan Klein Computer Science Division University of California, Berkeley

description

Natural Language Processing: Data, Algorithms, and Knowledge. BEARS 2011. Dan Klein Computer Science Division University of California, Berkeley. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A. Language Technologies. Goal: Deep Understanding. - PowerPoint PPT Presentation

Transcript of Natural Language Processing: Data, Algorithms, and Knowledge

Page 1: Natural Language Processing: Data, Algorithms, and Knowledge

Natural Language Processing: Data, Algorithms, and Knowledge

BEARS 2011

Dan Klein

Computer Science Division

University of California, Berkeley

Page 2: Natural Language Processing: Data, Algorithms, and Knowledge

Language Technologies

Goal: Deep Understanding Requires context,

linguistic structure, meanings…

Reality: Shallow Matching Requires robustness and

scale Amazing successes, but

fundamental limitations

Page 3: Natural Language Processing: Data, Algorithms, and Knowledge

Large-Scale NLP: Watson

Page 4: Natural Language Processing: Data, Algorithms, and Knowledge

Factoids and Limitations

Page 5: Natural Language Processing: Data, Algorithms, and Knowledge

Text Data is Superficial

An iceberg is a large piece of freshwater ice that has broken off from a snow-formed glacier or ice shelf and is floating in open water.

Page 6: Natural Language Processing: Data, Algorithms, and Knowledge

… But Language is Complex

Semantic structures References and entities Discourse-level connectives Meanings and implicatures Contextual factors Perceptual grounding …

An iceberg is a large piece of freshwater ice that has broken off from a snow-formed glacier or ice shelf and is floating in open water.

Page 7: Natural Language Processing: Data, Algorithms, and Knowledge

More Data: Machine Translation

Cela constituerait une solution transitoire qui permettrait de conduire à terme à une charte à valeur contraignante.

That would be an interim solution which would make it possible to work towards a binding charter in the long term .

[this] [constituerait] [assistance] [transitoire] [who] [permettrait] [licences] [to] [terme] [to] [a] [charter] [to] [value] [contraignante] [.]

[it] [would] [a solution] [transitional] [which] [would] [of] [lead] [to] [term] [to a] [charter] [to] [value] [binding] [.]

[this] [would be] [a transitional solution] [which would] [lead to] [a charter] [legally binding] [.]

[that would be] [a transitional solution] [which would] [eventually lead to] [a binding charter] [.]

SOURCE

HUMAN

1x DATA

10x DATA

100x DATA

1000x DATA

Page 8: Natural Language Processing: Data, Algorithms, and Knowledge

Data By Itself Isn’t Enough!

Page 9: Natural Language Processing: Data, Algorithms, and Knowledge

Analysis and Alignment

[Burkett, Blitzer, and Klein 10]

Page 10: Natural Language Processing: Data, Algorithms, and Knowledge

Data and Knowledge Classic knowledge representation worry: How

will a machine ever know that… Ice is frozen water? Beige looks like this: Chairs are solid?

Answers: 1980: write it all down 2000: get by without it 2020: learn it from data

Page 11: Natural Language Processing: Data, Algorithms, and Knowledge

Deeper Linguistic Analysis

Hurricane Emily howled toward Mexico 's Caribbean coast on Sunday packing 135 mph winds and torrential rain and causing panic in Cancun,

where frightened tourists squeezed into musty shelters .

Accuracy: 90+ [Petrov and Klein 09]

Page 12: Natural Language Processing: Data, Algorithms, and Knowledge

Personal Pronouns (PRP)

Learning Hidden Syntax

PRP-1 it them him

PRP-2 it he they

PRP-3 It He I

NNP-14 Oct. Nov. Sept.

NNP-12 John Robert James

NNP-2 J. E. L.

NNP-1 Bush Noriega Peters

NNP-15 New San Wall

NNP-3 York Francisco Street

Proper Nouns (NNP)

Parsing Accuracy: 90.5+ [Petrov and Klein 09]

Page 13: Natural Language Processing: Data, Algorithms, and Knowledge

Data and Knowlege: Parsing

They considered running the ad during the Super Bowl.

considered it during: 112running it during: 239

running * during: 3k considered * during: 2k

[Bansal and Klein 11]

Page 14: Natural Language Processing: Data, Algorithms, and Knowledge

Deeper Understanding: Reference

Page 15: Natural Language Processing: Data, Algorithms, and Knowledge

Names vs. Entities

Page 16: Natural Language Processing: Data, Algorithms, and Knowledge

Example Errors

Page 17: Natural Language Processing: Data, Algorithms, and Knowledge

Discovering Knowledge

Page 18: Natural Language Processing: Data, Algorithms, and Knowledge

Unsupervised Learning

Page 19: Natural Language Processing: Data, Algorithms, and Knowledge

Coreference Systems

Page 20: Natural Language Processing: Data, Algorithms, and Knowledge

Cross-Document Identity

Page 21: Natural Language Processing: Data, Algorithms, and Knowledge

Cross-Document Summaries

Lindsay Lohan pleaded not guilty Wednesday to felony grand theft of a $2,500 necklace, a case that could return the troubled starlet to jail rather than the big screen. Saying it appeared that Lohan had violated her probation in a 2007 drunken driving case, the judge set bail at $40,000 and warned that if Lohan was accused of breaking the law while free he would have her held without bail. The Mean Girls star is due back in court on Feb. 23, an important hearing in which Lohan could opt to end the case early.

[Berg-Kirkpatrick, Gillick, and Klein 11]

Page 22: Natural Language Processing: Data, Algorithms, and Knowledge

Grounded Language

[Golland, Liang, and Klein 10]

Page 23: Natural Language Processing: Data, Algorithms, and Knowledge

Grounding with Natural Data

… on the beige loveseat.

Page 24: Natural Language Processing: Data, Algorithms, and Knowledge

PredictionsToday 2020 (likely) 2020 (hopefully)

Find information Synthesize information Infer information

Keywords and names Entities Concepts

Knowledge-free “structural” systems

Knowledge from text Knowledge from grounded contexts

“Talk” to search engines Talk to embedded devices

Talk to mobile robots

Superficial patterns Deep understanding Monologs dialogs

Page 25: Natural Language Processing: Data, Algorithms, and Knowledge

Conclusion

Simple algorithms and large data have gotten us amazingly far!

To go further, we need Algorithms that work with deeper structure Learning methods that turn data into knowledge Systems that are contextualized

Page 26: Natural Language Processing: Data, Algorithms, and Knowledge

Thank you!