Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

61
Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    2

Transcript of Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

Page 1: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

Christel Kemke 2007/08

COMP 4060 Natural Language Processing

IntroductionAnd

Overview

Page 2: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Evolution of Human Language

communication for "work" social interaction basis of cognition and thinking

(Whorff & Saphir)

Page 3: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Communication

"Communication is the intentional exchange of information brought about by the production and perception of signs drawn from a shared system of conventional signs."

[Russell & Norvig, p.651]

Page 4: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Natural Language - General

Natural Language is characterized by a common or shared set of signs alphabet;

lexicon a systematic procedure to produce

combinations of signs syntax

a shared meaning of signs and combinations of signs (constructive) semantics

Page 5: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Natural Language Processing Overview

Speech Recognition Natural Language Processing

Syntax Semantics Pragmatics

Spoken Language

Page 6: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Natural Language and Speech

Speech Recognition acoustic signal as input conversion into phonemes and written words

Natural Language Processing written text as input; sentences (or 'utterances') syntactic analysis: parsing; grammar semantic analysis: "meaning", semantic representation pragmatics: dialogue; discourse; metaphors

Spoken Language Processing transcribed utterances Phenomena of spontaneous speech

Page 7: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

Christel Kemke 2007/08

Words

Page 8: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Morphology

A morphological analyzer determines (at least) the stem + ending of a word,

and usually delivers related information, like the word class, the number and the person of the word.

The morphology can be part of the lexicon or implemented as a single component, for example as a rule-based system.

eats eat + s verb, singular, 3rd persondog dog noun, singular

Page 9: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Lexicon

The Lexicon contains information on words, as inflected forms (e.g. goes, eats) or word-stems (e.g. go, eat). The Lexicon usually assigns a syntactic category, the word class or Part-of-Speech categorySometimes also further syntactic information (see Morphology); semantic information (e.g. semantic classifications like

‘agent’);

syntactic-semantic information, e.g. on verb complements like ‘give’ requires a direct object.

Page 10: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Lexicon

Example contents: eats verb; singular, 3rd person;

can have direct object dog dog, noun, singular; animal

semantic annotation

Page 11: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

POS (Part-of-Speech) Tagging

POS Tagging determines word class or ‘part-of-speech’ category (basic syntactic categories) of single words or word-stems.

The det (determiner)dog nouneat, eats verb (3rd singular)the detbone noun

Page 12: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

Syntax

Page 13: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

NLP - Syntactic Analysis

MorphologicalAnalyzer

Lexicon

Part-of-Speech(POS) Tagging

GrammarRules

Parser

eat + s eat – verb Verb VP → Verb Noun VP recognized

3rd sing VP

Verb Noun

parse tree

Page 14: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Language and Grammar

Natural Language described as Formal Language L using a Formal Grammar G:

• start-symbol S ≡ sentence• non-terminals NT ≡ syntactic constituents• terminals T ≡ lexical entries/ words• production rules P ≡ grammar rules

Generate sentences or recognize sentences (Parsing) of the language L through the application of grammar rules from G.

Overgeneration / undergeneration: accept/generate sentences not in L / not all sentences from L.

Page 15: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Grammar

Terminals can be words, part-of-speech categories, or more complex lexical items (including additional syntactic/semantic information related to the word).

dog: noun, singular; animal

Non-Terminals represent (higher level) ‘syntactic categories’.

Noun, NP (noun phrase), S (sentence)

Page 16: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Grammar

Most often we deal with Context-free Grammars, with a distinguished Start-symbol S (sentence).det thenoun dog | bone verb eat | eatsNP det noun (NP noun phrase)VP verb (VP verb phrase)VP verb NPS NP VP (S sentence)

Here, POS Tagging is included in the grammar.

Page 17: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Parsing (here: LR, bottom-up)

Determine the syntactic structure of the sentence:

“the dog eats the bone”

the det POS Taggingdog noundet noun NP Rule applicationeats verb the detbone noundet noun NPverb NP VPNP VP S

Page 18: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Syntax Analysis / Parsing

Syntactic Structure often represented as Parse Tree.Connect symbols according to applied grammar rules (like Rewrite Systems).

Page 19: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Parse Tree

det noun

NP

verb NP

VP

NP VP

S

Page 20: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Lexical Ambiguity

Several word senses or word categories:e.g. chase – noun or verb e.g. plant – ????

Page 21: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Syntactic Ambiguity

Several parse trees:

1) “The dog eats the bone in the park.”

2) “The dog eats the bone in the package.”

Who/what is in the park and who/what is in the package?

Syntactically speaking: How do I bind the Prepositional Phrase "in the ..." ?

Page 22: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

Christel Kemke 2007/08

Semantics

Page 23: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Semantic Representation

Represent the meaning of a sentence.Generate, e.g.

a logic-based representation or a frame-based representation

Fillmore’s case frames based on the syntactic structure, lexical entries, and particularly the head-verb, which determines how to arrange parts of the sentence and relate them to each other in the semantic representation.

Page 24: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Semantic Representation

Verb-centered representation:Verb (action, head) is regarded as center of verbal expression and determines the case frame with possible case roles; other parts of the sentence are described in relation to the action as fillers of case slots. (cf. also Schank’s CD Theory)

Typing of case roles is possible (e.g. 'agent' refers to a specific sort or concept, like “humans”)

Page 25: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Frame Representation

Case Frames Verb-centered representation Verb (action, head) is regarded as center of

verbal expression and determines the case frame with possible case roles; other parts of the sentence are described in relation to the action as fillers of case slots (roles). (cf. also Schank’s CD Theory)

Typing of case roles is possible (e.g. 'agent' refers to a specific sort or concept, like “humans”)

Page 26: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

General Frame for eat

Agent: animateAction: eatPatiens: foodManner: {e.g. fast}Location: {e.g. in the yard}Time: {e.g. at noon}

Page 27: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Frame with Fillers

Agent: the dog Action: eatPatiens: the bone / the bone in the

packageLocation: in the park

Page 28: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

Christel Kemke

General Frame for drive Frame with fillers

Agent: animate Agent: sheAction: drive Action: drivesPatiens: vehicle Patiens: the convertibleManner: {how} Manner: fastLocation: Loc-spec Location: [in the] Rocky

MountainsSource: Loc-spec Source: [from] homeDestination: Loc-spec Destination: [to the] ASIC

conferenceTime: Time-spec Time: [in] August

Page 29: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

Christel Kemke 2007/08

Pragmatics

Page 30: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Pragmatics

Pragmatics includes context-related aspects of NL expressions (utterances). These are in particular anaphoric references, elliptic expressions, deictic expressions, …

anaphoric references – refer to items mentioned earlierdeictic expressions – simulate pointing gestureselliptic expressions – incomplete expressions; have to be completed with reference to item mentioned earlier

Page 31: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Pragmatics

“I put the box on the top shelve.”

“I know that. But I can’t find it there.”

deictic expressionanaphoric reference

Page 32: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Pragmatics

“I put the box on the top shelve.”

“I know that. But I can’t find it there.”

anaphoric reference

Page 33: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Pragmatics

“I put the box on the top shelve.”

“I know that. But I can’t find it there.”

deictic expression

Page 34: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Pragmatics

“I put the box on the top shelve.”

“I know that. But I can’t find it there.”

“The candy-box?”

elliptic expression

deictic expressionanaphoric reference

Page 35: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Pragmatics

“I know that. But I can’t find it there.”

“The candy-box?”

elliptic expression

Page 36: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Intentions

One philosophical assumption is that natural language is used to achieve something:

“Do things with words.”

The meaning of an utterance is essentially determined by the intention of the speaker.

Page 37: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Intentionality - Examples

What was said: “There is a terrible

draft here.” “How does it look

here?” "Will this ever end?"

What was meant: "Can you please

close the window." "I am really mad;

clean up your room." "I would prefer to be

with my friends than to sit in class now."

Page 38: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Metaphors

The meaning of a sentence or expression is not directly inferable from the sentence structure and the word meanings.

Metaphors transfer concepts and relations from one area of discourse into another area.

For example, seeing time as a line (in space) or seeing life as a journey.

Page 39: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Metaphors - Examples

“This car eats a lot of gas.”“She devoured the book.”“He was tied up with his clients.”“Marriage is like a journey.”“Their marriage was a one-way road into hell.”

see George Lakoff, Women, Fire and Dangerous Things

Page 40: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

Christel Kemke 2007/08

Dialogue and Discourse

Page 41: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Discourse / Dialogue Structure

Grammar for various sentence types (speech acts) = dialogue, discourse, story grammarDistinguish e.g. questions, commands, and statements:

Where is the remote-control? Bring the remote-control! The remote-control is on the brown table.

Dialogue Grammars describe possible sequences of speech acts in communication, e.g. that a question is followed by an answer/statement.

Similar for Discourse (like continuous texts).

Page 42: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

Christel Kemke 2007/08

Spoken Language Interfaces

Page 43: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Page 44: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

Christel Kemke 2007/08

Speech

Page 45: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Speech Processing SystemsTypes and Characteristics

Speech Recognition vs. Speaker Recognition (Voice Recognition; Speaker Identification )

speaker-dependent vs. speaker-independent training? unlimited vs. large vs. small vocabulary single word vs. continuous speech

Page 46: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Speech Recognition Phases

acoustic signal as input signal analysis - spectrogram feature extraction phoneme recognition word recognition conversion into written words

Page 47: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

Christel Kemke

Video of glottis and speech signal in lingWAVES (http://www.lingcom.de)

Page 48: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Speech Signal Analysis

Analog-Digital Conversion of Acoustic Signal

Sampling in Time Frames (“windows”) frequency = 0-crossings per time frame

e.g. 2 crossings/second is 1 Hz (1 wave) e.g. 10kHz needs sampling rate 20kHz

measure amplitudes of signal in time frame digitized wave form

separate different frequency components FFT (Fast Fourier Transform) spectrogram

other frequency based representations LPC (linear predictive coding), Cepstrum

Page 49: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Waveform and Spectrogram

Page 50: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Phoneme Recognition

Recognition Process based on features extracted from spectral analysis phonological rules statistical properties of language/ pronunciation

Recognition Methods Hidden Markov Models Neural Networks Pattern Classification in general

Page 51: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

Christel Kemke

Formants

Page 52: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Phoneme Recognition

Recognition Process based on features extracted from spectral analysis phonological rules statistical properties of language/ pronunciation

Recognition Methods Hidden Markov Models Neural Networks Pattern Classification in general

Page 53: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

Christel Kemke

Pronunciation Networks / Word Models as Probabilistic FAs (HMMs)

Page 54: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

Christel Kemke

Speech Recognizer Architecture

Page 55: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

Christel Kemke 2007/08

Spoken Language

Page 56: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Spoken Language

Output of Speech Recognition System as input "text".

Can be associated with probabilities for different word sequences.

Contains ungrammatical structures, so-called "disfluencies", e.g. repetitions and corrections.

Page 57: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Spoken Language - Examples

1. no [s-] straight southwest

2. right to [my] my left

3. [that is] that is correct

Robin J. Lickley. HCRC Disfluency Coding Manual. http://www.ling.ed.ac.uk/~robin/maptask/HCRCdsm-01.html

Page 58: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Spoken Language - Disfluency

Reparandum and Repair

Reparandum Repair

[come to] ... walk right to [the] ... the right-hand side of the page

Page 59: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Spoken Language - Example

1. we're going to [g-- ]... turn straight back around

for testing.

2. [come to] ... walk right to the ... right-hand side of the page.

3. right [up ... past] ... up on the left of the ... white mountain walk ... right up past.

4. [i'm still] ... i've still gone halfway back round the lake again.

Page 60: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

2007/08 Christel Kemke

Spoken Language - Example

1. [I’d] [d if] I need to go

2. [it’s basi--] see if you go over the old mill

3. [you are going] make a gradual slope … to your right

4. [I’ve got one] I don’t realize why it is there

Page 61: Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

Christel Kemke