Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.
-
date post
19-Dec-2015 -
Category
Documents
-
view
215 -
download
2
Transcript of Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.
Christel Kemke 2007/08
COMP 4060 Natural Language Processing
IntroductionAnd
Overview
2007/08 Christel Kemke
Evolution of Human Language
communication for "work" social interaction basis of cognition and thinking
(Whorff & Saphir)
2007/08 Christel Kemke
Communication
"Communication is the intentional exchange of information brought about by the production and perception of signs drawn from a shared system of conventional signs."
[Russell & Norvig, p.651]
2007/08 Christel Kemke
Natural Language - General
Natural Language is characterized by a common or shared set of signs alphabet;
lexicon a systematic procedure to produce
combinations of signs syntax
a shared meaning of signs and combinations of signs (constructive) semantics
2007/08 Christel Kemke
Natural Language Processing Overview
Speech Recognition Natural Language Processing
Syntax Semantics Pragmatics
Spoken Language
2007/08 Christel Kemke
Natural Language and Speech
Speech Recognition acoustic signal as input conversion into phonemes and written words
Natural Language Processing written text as input; sentences (or 'utterances') syntactic analysis: parsing; grammar semantic analysis: "meaning", semantic representation pragmatics: dialogue; discourse; metaphors
Spoken Language Processing transcribed utterances Phenomena of spontaneous speech
Christel Kemke 2007/08
Words
2007/08 Christel Kemke
Morphology
A morphological analyzer determines (at least) the stem + ending of a word,
and usually delivers related information, like the word class, the number and the person of the word.
The morphology can be part of the lexicon or implemented as a single component, for example as a rule-based system.
eats eat + s verb, singular, 3rd persondog dog noun, singular
2007/08 Christel Kemke
Lexicon
The Lexicon contains information on words, as inflected forms (e.g. goes, eats) or word-stems (e.g. go, eat). The Lexicon usually assigns a syntactic category, the word class or Part-of-Speech categorySometimes also further syntactic information (see Morphology); semantic information (e.g. semantic classifications like
‘agent’);
syntactic-semantic information, e.g. on verb complements like ‘give’ requires a direct object.
2007/08 Christel Kemke
Lexicon
Example contents: eats verb; singular, 3rd person;
can have direct object dog dog, noun, singular; animal
semantic annotation
2007/08 Christel Kemke
POS (Part-of-Speech) Tagging
POS Tagging determines word class or ‘part-of-speech’ category (basic syntactic categories) of single words or word-stems.
The det (determiner)dog nouneat, eats verb (3rd singular)the detbone noun
Syntax
2007/08 Christel Kemke
NLP - Syntactic Analysis
MorphologicalAnalyzer
Lexicon
Part-of-Speech(POS) Tagging
GrammarRules
Parser
eat + s eat – verb Verb VP → Verb Noun VP recognized
3rd sing VP
Verb Noun
parse tree
2007/08 Christel Kemke
Language and Grammar
Natural Language described as Formal Language L using a Formal Grammar G:
• start-symbol S ≡ sentence• non-terminals NT ≡ syntactic constituents• terminals T ≡ lexical entries/ words• production rules P ≡ grammar rules
Generate sentences or recognize sentences (Parsing) of the language L through the application of grammar rules from G.
Overgeneration / undergeneration: accept/generate sentences not in L / not all sentences from L.
2007/08 Christel Kemke
Grammar
Terminals can be words, part-of-speech categories, or more complex lexical items (including additional syntactic/semantic information related to the word).
dog: noun, singular; animal
Non-Terminals represent (higher level) ‘syntactic categories’.
Noun, NP (noun phrase), S (sentence)
2007/08 Christel Kemke
Grammar
Most often we deal with Context-free Grammars, with a distinguished Start-symbol S (sentence).det thenoun dog | bone verb eat | eatsNP det noun (NP noun phrase)VP verb (VP verb phrase)VP verb NPS NP VP (S sentence)
Here, POS Tagging is included in the grammar.
2007/08 Christel Kemke
Parsing (here: LR, bottom-up)
Determine the syntactic structure of the sentence:
“the dog eats the bone”
the det POS Taggingdog noundet noun NP Rule applicationeats verb the detbone noundet noun NPverb NP VPNP VP S
2007/08 Christel Kemke
Syntax Analysis / Parsing
Syntactic Structure often represented as Parse Tree.Connect symbols according to applied grammar rules (like Rewrite Systems).
2007/08 Christel Kemke
Parse Tree
det noun
NP
verb NP
VP
NP VP
S
2007/08 Christel Kemke
Lexical Ambiguity
Several word senses or word categories:e.g. chase – noun or verb e.g. plant – ????
2007/08 Christel Kemke
Syntactic Ambiguity
Several parse trees:
1) “The dog eats the bone in the park.”
2) “The dog eats the bone in the package.”
Who/what is in the park and who/what is in the package?
Syntactically speaking: How do I bind the Prepositional Phrase "in the ..." ?
Christel Kemke 2007/08
Semantics
2007/08 Christel Kemke
Semantic Representation
Represent the meaning of a sentence.Generate, e.g.
a logic-based representation or a frame-based representation
Fillmore’s case frames based on the syntactic structure, lexical entries, and particularly the head-verb, which determines how to arrange parts of the sentence and relate them to each other in the semantic representation.
2007/08 Christel Kemke
Semantic Representation
Verb-centered representation:Verb (action, head) is regarded as center of verbal expression and determines the case frame with possible case roles; other parts of the sentence are described in relation to the action as fillers of case slots. (cf. also Schank’s CD Theory)
Typing of case roles is possible (e.g. 'agent' refers to a specific sort or concept, like “humans”)
2007/08 Christel Kemke
Frame Representation
Case Frames Verb-centered representation Verb (action, head) is regarded as center of
verbal expression and determines the case frame with possible case roles; other parts of the sentence are described in relation to the action as fillers of case slots (roles). (cf. also Schank’s CD Theory)
Typing of case roles is possible (e.g. 'agent' refers to a specific sort or concept, like “humans”)
2007/08 Christel Kemke
General Frame for eat
Agent: animateAction: eatPatiens: foodManner: {e.g. fast}Location: {e.g. in the yard}Time: {e.g. at noon}
2007/08 Christel Kemke
Frame with Fillers
Agent: the dog Action: eatPatiens: the bone / the bone in the
packageLocation: in the park
Christel Kemke
General Frame for drive Frame with fillers
Agent: animate Agent: sheAction: drive Action: drivesPatiens: vehicle Patiens: the convertibleManner: {how} Manner: fastLocation: Loc-spec Location: [in the] Rocky
MountainsSource: Loc-spec Source: [from] homeDestination: Loc-spec Destination: [to the] ASIC
conferenceTime: Time-spec Time: [in] August
Christel Kemke 2007/08
Pragmatics
2007/08 Christel Kemke
Pragmatics
Pragmatics includes context-related aspects of NL expressions (utterances). These are in particular anaphoric references, elliptic expressions, deictic expressions, …
anaphoric references – refer to items mentioned earlierdeictic expressions – simulate pointing gestureselliptic expressions – incomplete expressions; have to be completed with reference to item mentioned earlier
2007/08 Christel Kemke
Pragmatics
“I put the box on the top shelve.”
“I know that. But I can’t find it there.”
deictic expressionanaphoric reference
2007/08 Christel Kemke
Pragmatics
“I put the box on the top shelve.”
“I know that. But I can’t find it there.”
anaphoric reference
2007/08 Christel Kemke
Pragmatics
“I put the box on the top shelve.”
“I know that. But I can’t find it there.”
deictic expression
2007/08 Christel Kemke
Pragmatics
“I put the box on the top shelve.”
“I know that. But I can’t find it there.”
“The candy-box?”
elliptic expression
deictic expressionanaphoric reference
2007/08 Christel Kemke
Pragmatics
“I know that. But I can’t find it there.”
“The candy-box?”
elliptic expression
2007/08 Christel Kemke
Intentions
One philosophical assumption is that natural language is used to achieve something:
“Do things with words.”
The meaning of an utterance is essentially determined by the intention of the speaker.
2007/08 Christel Kemke
Intentionality - Examples
What was said: “There is a terrible
draft here.” “How does it look
here?” "Will this ever end?"
What was meant: "Can you please
close the window." "I am really mad;
clean up your room." "I would prefer to be
with my friends than to sit in class now."
2007/08 Christel Kemke
Metaphors
The meaning of a sentence or expression is not directly inferable from the sentence structure and the word meanings.
Metaphors transfer concepts and relations from one area of discourse into another area.
For example, seeing time as a line (in space) or seeing life as a journey.
2007/08 Christel Kemke
Metaphors - Examples
“This car eats a lot of gas.”“She devoured the book.”“He was tied up with his clients.”“Marriage is like a journey.”“Their marriage was a one-way road into hell.”
see George Lakoff, Women, Fire and Dangerous Things
Christel Kemke 2007/08
Dialogue and Discourse
2007/08 Christel Kemke
Discourse / Dialogue Structure
Grammar for various sentence types (speech acts) = dialogue, discourse, story grammarDistinguish e.g. questions, commands, and statements:
Where is the remote-control? Bring the remote-control! The remote-control is on the brown table.
Dialogue Grammars describe possible sequences of speech acts in communication, e.g. that a question is followed by an answer/statement.
Similar for Discourse (like continuous texts).
Christel Kemke 2007/08
Spoken Language Interfaces
2007/08 Christel Kemke
Christel Kemke 2007/08
Speech
2007/08 Christel Kemke
Speech Processing SystemsTypes and Characteristics
Speech Recognition vs. Speaker Recognition (Voice Recognition; Speaker Identification )
speaker-dependent vs. speaker-independent training? unlimited vs. large vs. small vocabulary single word vs. continuous speech
2007/08 Christel Kemke
Speech Recognition Phases
acoustic signal as input signal analysis - spectrogram feature extraction phoneme recognition word recognition conversion into written words
Christel Kemke
Video of glottis and speech signal in lingWAVES (http://www.lingcom.de)
2007/08 Christel Kemke
Speech Signal Analysis
Analog-Digital Conversion of Acoustic Signal
Sampling in Time Frames (“windows”) frequency = 0-crossings per time frame
e.g. 2 crossings/second is 1 Hz (1 wave) e.g. 10kHz needs sampling rate 20kHz
measure amplitudes of signal in time frame digitized wave form
separate different frequency components FFT (Fast Fourier Transform) spectrogram
other frequency based representations LPC (linear predictive coding), Cepstrum
2007/08 Christel Kemke
Waveform and Spectrogram
2007/08 Christel Kemke
Phoneme Recognition
Recognition Process based on features extracted from spectral analysis phonological rules statistical properties of language/ pronunciation
Recognition Methods Hidden Markov Models Neural Networks Pattern Classification in general
Christel Kemke
Formants
2007/08 Christel Kemke
Phoneme Recognition
Recognition Process based on features extracted from spectral analysis phonological rules statistical properties of language/ pronunciation
Recognition Methods Hidden Markov Models Neural Networks Pattern Classification in general
Christel Kemke
Pronunciation Networks / Word Models as Probabilistic FAs (HMMs)
Christel Kemke
Speech Recognizer Architecture
Christel Kemke 2007/08
Spoken Language
2007/08 Christel Kemke
Spoken Language
Output of Speech Recognition System as input "text".
Can be associated with probabilities for different word sequences.
Contains ungrammatical structures, so-called "disfluencies", e.g. repetitions and corrections.
2007/08 Christel Kemke
Spoken Language - Examples
1. no [s-] straight southwest
2. right to [my] my left
3. [that is] that is correct
Robin J. Lickley. HCRC Disfluency Coding Manual. http://www.ling.ed.ac.uk/~robin/maptask/HCRCdsm-01.html
2007/08 Christel Kemke
Spoken Language - Disfluency
Reparandum and Repair
Reparandum Repair
[come to] ... walk right to [the] ... the right-hand side of the page
2007/08 Christel Kemke
Spoken Language - Example
1. we're going to [g-- ]... turn straight back around
for testing.
2. [come to] ... walk right to the ... right-hand side of the page.
3. right [up ... past] ... up on the left of the ... white mountain walk ... right up past.
4. [i'm still] ... i've still gone halfway back round the lake again.
2007/08 Christel Kemke
Spoken Language - Example
1. [I’d] [d if] I need to go
2. [it’s basi--] see if you go over the old mill
3. [you are going] make a gradual slope … to your right
4. [I’ve got one] I don’t realize why it is there
Christel Kemke