Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

39
Natural Language Natural Language Processing: Processing: Overview & Current Applications Overview & Current Applications Noriko Tomuro Noriko Tomuro April 7, 2006 April 7, 2006
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    229
  • download

    4

Transcript of Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

Page 1: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

Natural Language Natural Language Processing:Processing:Overview & Current ApplicationsOverview & Current Applications

Noriko TomuroNoriko Tomuro

April 7, 2006April 7, 2006

Page 2: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

What is NLP?What is NLP?

Natural Language Processing (NLP) is a Natural Language Processing (NLP) is a field in Computer Science devoted to field in Computer Science devoted to creating computers that use natural creating computers that use natural language as input and/or output. language as input and/or output.

Page 3: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

NLP involves other NLP involves other disciplines..disciplines..

LinguisticsLinguistics– NLP is also calledNLP is also called ”Computational ”Computational

Linguistics”Linguistics” PsychologyPsychology Mathematics and StatisticsMathematics and Statistics Information TheoryInformation Theory

Page 4: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

Machines that Can Speak Machines that Can Speak (1)(1)

HAL 9000 in “2001: A Space Odyssey”HAL 9000 in “2001: A Space Odyssey”

Page 5: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

Machines that Can Speak Machines that Can Speak (2)(2)

C3PO C3PO in Star Warsin Star Wars

KITT KITT in Knight Riderin Knight Rider

Page 6: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

But Still a Sci-Fi…But Still a Sci-Fi…

““However, as 2001 approached it However, as 2001 approached it became clear that became clear that 2001'2001's predictions in s predictions in computer technology were far fetched. computer technology were far fetched. Natural language, lip reading, planning Natural language, lip reading, planning and plain common sense in computers and plain common sense in computers were still the stuff of science fiction.”were still the stuff of science fiction.”HAL 9000 - WikipediaHAL 9000 - Wikipedia

Page 7: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

NLP is AI-completeNLP is AI-complete

““The most difficult problems in AI The most difficult problems in AI manifest themselves in human manifest themselves in human language phenomena.”language phenomena.”

Use of language is the touchstone of Use of language is the touchstone of intelligent behavior.intelligent behavior.

Page 8: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

Test for Intelligence – Test for Intelligence – Turing TestTuring Test

Alan Turing (1950) Alan Turing (1950) proposed a test of a proposed a test of a machine's capability machine's capability to perform human-to perform human-like conversation.like conversation.

A human judge engages in a natural language A human judge engages in a natural language conversation with two other parties, one a conversation with two other parties, one a human and the other a machine; if the judge human and the other a machine; if the judge cannot reliably tell which is which, then the cannot reliably tell which is which, then the machine is said to pass the test. machine is said to pass the test.

Page 9: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

Early Conversational Early Conversational ProgramsPrograms

ELIZA ELIZA (by Joseph Weizenbaum)(by Joseph Weizenbaum), 1966, 1966– A psychotherapistA psychotherapist– No real understanding; simple No real understanding; simple

pattern-matching to respond to user pattern-matching to respond to user inputinput

(my ?x depresses me) (why does your ?x depress you) 5

(life ?x) (why do you say it ?x) 3

(I could ?x) (you could ?x) 2

(because ?x) (that is a good reason) 3

(?x) (tell me more) 0

Page 10: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

Loebner Prize &Loebner Prize &Chatterbots/Chatbots (1)Chatterbots/Chatbots (1)

In 1990, Hugh Loebner started an In 1990, Hugh Loebner started an annual Turing Test competition.annual Turing Test competition.

Conversational Programs are nowadays Conversational Programs are nowadays called called ChatterbotsChatterbots (or (or ChatbotsChatbots).).

$100,000 to the first bot that judges $100,000 to the first bot that judges cannot distinguish from a real human cannot distinguish from a real human in a Turing test that includes text, in a Turing test that includes text, visual, and auditory input.visual, and auditory input.

The Prize dissolves once the $100,000 The Prize dissolves once the $100,000 prize is won.prize is won.

Page 11: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

Loebner Prize &Loebner Prize &Chatterbots/Chatbots (2)Chatterbots/Chatbots (2)

Nobody has won the prize yet.Nobody has won the prize yet.

Page 12: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

NLP ApplicationsNLP Applications

NLP can be stand-along applications or NLP can be stand-along applications or components embedded in other components embedded in other systems.systems.

NLP components/tasks include:NLP components/tasks include:– Part-of-speech taggingPart-of-speech tagging– Named Entity identificationNamed Entity identification– Chunking (Partial parsing)Chunking (Partial parsing)

I discuss some current NLP stand-alone I discuss some current NLP stand-alone applications.applications.

Page 13: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

1. Machine Translation 1. Machine Translation (MT)(MT)

One of the very earliest pursuits in One of the very earliest pursuits in computer science (after WWII).computer science (after WWII).

Broad application domain – military to Broad application domain – military to literary to search enginesliterary to search engines

Basic approaches:Basic approaches:– Inter-lingualInter-lingual (rule-based)(rule-based)

– Direct translationDirect translation (corpus-based)(corpus-based) more morepopular these dayspopular these days

Page 14: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

Let’s translate!Let’s translate!

Google includes a Google includes a MT engineMT engine (based on (based on SYSTRAN system developed in EC).SYSTRAN system developed in EC).

Let’s translate Let’s translate “Saddam Hussein has dismissed “Saddam Hussein has dismissed evidence suggesting he approved the evidence suggesting he approved the execution of people under 18.” execution of people under 18.” (BBC world news, April 5, 2006)(BBC world news, April 5, 2006)

Page 15: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

2. Text Summarization2. Text Summarization

Create a summary of a text or texts.Create a summary of a text or texts. Many difficult problems, including:Many difficult problems, including:

– ParaphrasesParaphrases– Anaphora (e.g.“it”, “they”)Anaphora (e.g.“it”, “they”)

Page 16: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

3. Question Answering3. Question Answering

Finds an answer Finds an answer (not a document)(not a document) to a to a question typed in as a natural question typed in as a natural language sentence language sentence (not keywords)(not keywords)..

Most systems can only answer simple, Most systems can only answer simple, trivial pursuit type questions.trivial pursuit type questions.

Ask.comAsk.com FAQFinderFAQFinder Some search engines perform limited, Some search engines perform limited,

phrase-based Q&A, e.g. phrase-based Q&A, e.g. GoogleGoogle

Page 17: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

Let’s ask questions!Let’s ask questions!

““Who wrote “The Da Vinci Code”? ”Who wrote “The Da Vinci Code”? ” ““What is the longest river in What is the longest river in

Australia?”Australia?” ““What is the name of the main What is the name of the main

character in James Joyce's “Ulysses"? “character in James Joyce's “Ulysses"? “

Page 18: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.
Page 19: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.
Page 20: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.
Page 21: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

4. Analyzing Web 4. Analyzing Web DocumentsDocuments

Recently there have been many NLP Recently there have been many NLP applications which analyze applications which analyze (not just (not just retrieve)retrieve) web documents web documents– BlogsBlogs – – for semantic analysis, sentiment for semantic analysis, sentiment

(polarity/opinion) identification(polarity/opinion) identification

– Email Spam Filtering – Email Spam Filtering – but most often but most often systems utilize simple word probabilitysystems utilize simple word probability

– A general approach “Web as a A general approach “Web as a corpus” – corpus” – web as the vast collection of web as the vast collection of documentsdocuments

Page 22: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

5. Tutoring Systems5. Tutoring Systems

We’ll hear from Peter shortly We’ll hear from Peter shortly

Page 23: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

Why is NLP so hard..?Why is NLP so hard..?

Understanding natural languages is Understanding natural languages is hard … because of inherent hard … because of inherent ambiguityambiguity

Engineering NLP systems is also hard Engineering NLP systems is also hard … because of… because of– Huge amount of data resource Huge amount of data resource

needed needed (e.g. grammar, dictionary, (e.g. grammar, dictionary, documents to extract statistics from)documents to extract statistics from)

– Computational complexity Computational complexity (intractable) of analyzing a sentence(intractable) of analyzing a sentence

Page 24: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

Ambiguity (1)Ambiguity (1)

““At last, a computer that understands At last, a computer that understands you like your mother.”you like your mother.”

Three possible (syntactic) Three possible (syntactic) interpretations:interpretations:

1.1. It understands you as well as your It understands you as well as your mother understands you.mother understands you.

2.2. It understands that you like your mother.It understands that you like your mother.

3.3. It understands you as well as it It understands you as well as it understands your mother.understands your mother.

Page 25: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

Ambiguity (2)Ambiguity (2)

At the acoustic level,At the acoustic level,– ““.. computer that understands you .. computer that understands you

like yourlike your mother” mother”– ““.. computer that understands you .. computer that understands you

lie curedlie cured mother” mother” At the semantic level, a “mother” is:At the semantic level, a “mother” is:

– a female parent ?a female parent ?– a cask used in vinegar-making ?a cask used in vinegar-making ?

Page 26: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

Ambiguity (3)Ambiguity (3)

Word segmentationWord segmentation

e.g. e.g. 社長兼業務部長社長兼業務部長

Possibilities include:Possibilities include:- - 社長社長 兼兼 業務業務 部長部長

president both business general-managerpresident both business general-manager

- - 社長社長 兼業兼業 務務 部長部長president multi-business Tsutomu general-managerpresident multi-business Tsutomu general-manager

Page 27: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

Ambiguity -- summaryAmbiguity -- summary

Ambiguity occurs at different levels.Ambiguity occurs at different levels. To resolve ambiguities, deep linguistic To resolve ambiguities, deep linguistic

(as well as common-sense) (as well as common-sense) knowledge is required.knowledge is required.

Two immediate ramifications:Two immediate ramifications:1.1. Knowledge bottleneckKnowledge bottleneck – – How do we How do we

acquire and encode ALL such information?acquire and encode ALL such information?

2.2. Computational Complexity – Computational Complexity – O(cO(cnn)), , exponential w.r.t. the length of the exponential w.r.t. the length of the sentencesentence

Page 28: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

The Bottom LineThe Bottom Line

Complete NL Understanding (thus Complete NL Understanding (thus general intelligence) is impossible.general intelligence) is impossible.

But we can make incremental But we can make incremental progress.progress.

Also we have made successes in Also we have made successes in limited domains.limited domains.

[But NLP is costly – Lots of work and resources are [But NLP is costly – Lots of work and resources are needed, but the return is sometimes not worth it.]needed, but the return is sometimes not worth it.]

Page 29: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

Sentence AnalysisSentence Analysis

(ACTION ingest

(ACTOR John-1)

(PATIENT food))

Syntactic structure Semantic structure

“John ate the cake”

S

NP V NP

“John” “ate” “the cake”

Page 30: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

Syntactic ParsingSyntactic Parsing

GrammarR0:R1:R2:R3:R4:R5:R6:R7: cake"" N

the"" Det ate"" V

John"" NP V VG

NPVG VPN Det NP

VP NP S

S

NP VP

V NP

“John” “ate”

“the”

Det N

“cake”

The process of deriving the phrase The process of deriving the phrase structure of a sentence is called structure of a sentence is called “parsing”.“parsing”.

The structure The structure (often represented by a Context (often represented by a Context

Free parse tree)Free parse tree) is based on the grammar. is based on the grammar.

Page 31: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

Parsing AlgorithmsParsing Algorithms

Top-down Parsing -- (top-down) Top-down Parsing -- (top-down) derivationderivation

Bottom-up ParsingBottom-up Parsing Chart ParsingChart Parsing Earley’s Algorithm – most efficient, Earley’s Algorithm – most efficient,

O(nO(n33)) Left-corner Parsing – optimization of Left-corner Parsing – optimization of

Earley’sEarley’s and lots more…and lots more…

Page 32: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

(Bottom-up) Chart Parsing(Bottom-up) Chart Parsing

“John ate the cake”0 1 2 3 4

John""NP,1,0

VP NPS,4,0

NPVGVP,2,1

(2) shift 2

(1) shift 1 “John”

(7) shift 2

the""Det,3,2 cake""N,4,3

NPVG VP,4,1

(9) reduce

(10) reduce

ate""V,2,1

(5) shift 2

VPNPS,1,0 VVG,2,1 NDetNP,3,2

NDet NP,4,2

(3) shift 1 “ate” (6) shift 1 “the” (8) shift 1 “cake”

(4) shift 2

(11) reduce

---

---

---

0 1 2 3 4

cake"" Nthe"" Det

ate"" VJohn"" NP V VG

NPVG VPN Det NP

VP NP S

Grammar

Page 33: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

Earley’s AlgorithmEarley’s Algorithm

“John ate the cake”0 1 2 3 4

cake"" Nthe"" Det

ate"" VJohn"" NP V VG

NPVG VPN Det NP

VP NP S

Grammar

VP NPS,0,0 VPNPS,1,0 VP NPS,4,0

NPVG VP,1,1 NPVGVP,2,1

(2) scanner“John”

(4) predictor

(5) scanner“ate”

NDet NP,2,2

(7) predictor

NDetNP,3,2

(8) scanner“the”

NDet NP,4,2

(9) scanner“cake”

NPVG VP,4,1

(10) completor

(11) completor

SS',0,0

(1) predictor

VVG,1,1 VVG,2,1

(3) predictor

(6) completor

SS',4,0

(12) completor

Page 34: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

Demo using my CF parser

Page 35: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

Probabilistic ParsingProbabilistic Parsing

For ambiguous sentences, we’d like to know For ambiguous sentences, we’d like to know which parse tree is more likely than others.which parse tree is more likely than others.

So we must assign probability to each parse So we must assign probability to each parse tree … but how?tree … but how?

A probability of a parse tree A probability of a parse tree t t isis where where rr is a rule used in is a rule used in tt..

and and p(r)p(r) is obtained from a (annotated) is obtained from a (annotated) corpus.corpus.

r

rptp )()(

Page 36: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

Partial ParsingPartial Parsing

Parsing fails when the coverage of the Parsing fails when the coverage of the grammar is not complete – but it’s grammar is not complete – but it’s almost impossible to write out all legal almost impossible to write out all legal syntax syntax (without accepting ungrammatical (without accepting ungrammatical

sentences)sentences).. We’d like to at least get pieces even We’d like to at least get pieces even

when full parsing fails.when full parsing fails. Why not abandon full parsing and aim Why not abandon full parsing and aim

for partial parsing from the start…for partial parsing from the start…

Page 37: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

Semantic Analysis (1)Semantic Analysis (1)

Derive the meaning of a sentence.Derive the meaning of a sentence. Often applied on the result of syntactic Often applied on the result of syntactic

analysis.analysis.““JohnJohn ateate the cakethe cake.”.” NP V NPNP V NP

((action((action INGEST)INGEST) ; syntactic verb ; syntactic verb (actor (actor JOHN-01)JOHN-01) ; syntactic subj; syntactic subj (object (object FOOD))FOOD)) ; syntactic obj; syntactic obj

To do semantic analysis, we need a To do semantic analysis, we need a (semantic) dictionary (e.g. WordNet, (semantic) dictionary (e.g. WordNet, http://http://www.cogsci.princeton.edu/~wnwww.cogsci.princeton.edu/~wn//).).

Page 38: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

Semantic Analysis (2)Semantic Analysis (2)

Semantics is a double-edged sword…Semantics is a double-edged sword…– Can resolve syntactic ambiguityCan resolve syntactic ambiguity

““I saw a man on the hill with a I saw a man on the hill with a telescopetelescope”” ““I saw a man on the hill with a I saw a man on the hill with a hathat””

– But introduces semantic ambiguityBut introduces semantic ambiguity ““She walked towards the She walked towards the bank”bank”

But in human sentence processing, we But in human sentence processing, we seem to resolve both types of seem to resolve both types of ambiguities simultaneously (and in ambiguities simultaneously (and in linear time)…linear time)…

Page 39: Natural Language Processing: Overview & Current Applications Noriko Tomuro April 7, 2006.

Demo using my Unification parser