Natural Language Processing Artificial Intelligence CMSC 25000 February 28, 2002.
-
Upload
harry-turner -
Category
Documents
-
view
213 -
download
0
Transcript of Natural Language Processing Artificial Intelligence CMSC 25000 February 28, 2002.
Agenda
• Why NLP?– Goals & Applications
• Challenges: Knowledge & Ambiguity– Key types of knowledge
• Morphology, Syntax, Semantics, Pragmatics, Discourse
– Handling Ambiguity• Syntactic Ambiguity: Probabilistic Parsing
• Semantic Ambiguity: Word Sense Disambiguation
• Conclusions
Why Language?
• Natural Language in Artificial Intelligence– Language use as distinctive feature of human
intelligence– Infinite utterances:
• Diverse languages with fundamental similarities
• “Computational linguistics”
– Communicative acts• Inform, request,...
Why Language? Applications
• Machine Translation
• Question-Answering– Database queries to web search
• Spoken language systems
• Intelligent tutoring
Knowledge of Language
• What does it mean to know a language?– Know the words (lexicon)
• Pronunciation, Formation, Conjugation
– Know how the words form sentences• Sentence structure, Compositional meaning
– Know how to interpret the sentence• Statement, question,..
– Know how to group sentences• Narrative coherence, dialogue
Word-level Knowledge
• Lexicon: – List of legal words in a language– Part of speech:
• noun, verb, adjective, determiner
• Example:– Noun -> cat | dog | mouse | ball | rock– Verb -> chase | bite | fetch | bat – Adjective -> black | brown | furry | striped | heavy– Determiner -> the | that | a | an
Word-level Knowledge: Issues
• Issue 1: Lexicon Size– Potentially HUGE!– Controlling factor: morphology
• Store base forms (roots/stems)– Use morphologic process to generate / analyze
– E.g. Dog: dog(s); sing: sings, sang, sung, singing, singer,..
• Issue 2: Lexical ambiguity– rock: N/V; dog: N/V; – “Time flies like a banana”
Sentence-level Knowledge: Syntax• Language models
– More than just words: “banana a flies time like”– Formal vs natural: Grammar defines language
ChomskyHierarchy
RecursivelyEnumerable
=Any
Context = AB->BASensitiveContext A-> aBc
Free
Regular S->aS Expression a*b*
nnn cbannba
Syntactic Analysis: Grammars
• Natural vs Formal languages– Natural languages have degrees of acceptability
• ‘It ain’t hard’; ‘You gave what to whom?’
• Grammar combines words into phrases– S-> NP VP– NP -> {Det} {Adj} N– VP -> V | V NP | V NP PP
Syntactic Analysis: Parsing
• Recover phrase structure from sentence– Based on grammar
S
NP VP
Det Adj N V NP
Det Adj N
The black cat chased the furry mouse
Syntactic Analysis: Parsing
• Issue 1: Complexity• Solution 1: Chart parser - dynamic
programming– O( )
• Issue 2: Structural ambiguity– ‘I saw the man on the hill with the telescope’
• Is the telescope on the hill?’
• Solution 2 (partial): Probabilistic parsing
2Gn
Semantic Analysis
• Grammatical = Meaningful– “Colorless green ideas sleep furiously”
• Compositional Semantics– Meaning of a sentence is meaning of subparts– Associate semantic interpretation with syntactic– E.g. Nouns are variables (themselves): cat,mouse
• Adjectives: unary predicates: Black(cat), Furry(mouse)• Verbs: multi-place: VP: x chased(x,Furry(mouse))• Sentence ( x chased(x, Furry(mouse))Black(cat)
– chased(Black(cat),Furry(mouse))
Semantic Ambiguity
• Examples:– I went to the bank-
• of the river• to deposit some money
– He banked • at First Union• the plane
• Interpretation depends on– Sentence (or larger) topic context
– Syntactic structure
Pragmatics & Discourse
• Interpretation in context– Act accomplished by utterance
• “Do you have the time?”, “Can you pass the salt?”
• Requests with non-literal meaning
– Also, includes politeness, performatives, etc
• Interpretation of multiple utterances– “The cat chased the mouse. It got away.”– Resolve referring expressions
Natural Language Understanding
Input Tokenization/Morphology Parsing
SemanticAnalysis
Pragmatics/Discourse
Meaning
• Key issues:– Knowledge
• How acquire this knowledge of language?– Hand-coded? Automatically acquired?
– Ambiguity• How determine appropriate interpretation?
– Pervasive, preference-based
Handling Syntactic Ambiguity
• Natural language syntax • Varied, has DEGREES of acceptability
• Ambiguous
• Probability: framework for preferences– Augment original context-free rules: PCFG– Add probabilities to transitions
NP -> NNP -> Det NNP -> Det Adj NNP -> NP PP
0.2
0.65
0.10
VP -> VVP -> V NPVP -> V NP PP
0.45
0.45
0.10
S -> NP VPS -> S conj S
0.85
0.15
0.05
PP -> P NP1.0
PCFGs
• Learning probabilities– Strategy 1: Write (manual) CFG,
• Use treebank (collection of parse trees) to find probabilities
– Strategy 2: Use larger treebank (+ linguistic constraint)• Learn rules & probabilities (inside-outside algorithm)
• Parsing with PCFGs– Rank parse trees based on probability– Provides graceful degradation
• Can get some parse even for unusual constructions - low value
Parse Ambiguity
• Two parse trees
S
NP VP
N V NP PP
Det N P NPDet N
I saw the man with the telescope
S
NP VP
N V NP
NP PP Det N P NP
Det N
I saw the man with the telescope
Parse Probabilities
– T(ree),S(entence),n(ode),R(ule)– T1 = 0.85*0.2*0.1*0.65*1*0.65 = 0.007– T2 = 0.85*0.2*0.45*0.05*0.65*1*0.65 = 0.003
• Select T1
• Best systems achieve 92-93% accuracy
Tn
nrpSTP ))((),(
Semantic Ambiguity
• “Plant” ambiguity– Botanical vs Manufacturing senses
• Two types of context– Local: 1-2 words away– Global: several sentence window
• Two observations (Yarowsky 1995)– One sense per collocation (local)– One sense per discourse (global)
Learn Disambiguators
• Initialize small set of “seed” cases
• Collect local context information– “collocations”
• E.g. 2 words away from “production”, 1 word from “seed”
• Contexts = rules
• Make decision list= rules ranked by mutual info
• Iterate: Labeling via DL, collecting contexts
• Label all entries in discourse with majority sense– Repeat
Disambiguate
• For each new unlabeled case,– Use decision list to label
• > 95% accurate on set of highly ambiguous– Also used for accent restoration in e-mail
Natural Language Processing
• Goals: Understand and imitate distinctive human capacity
• Myriad applications: MT, Q&A, SLS• Key Issues:
– Capturing knowledge of language• Automatic acquisition current focus: linguistics+ML
– Resolving ambiguity, managing preference• Apply (probabilistic) knowledge
• Effective in constrained environment