Fall 2005 Lecture Notes #8

18
Fall 2005 Lecture Notes #8 EECS 595 / LING 541 / SI 661 Natural Language Processing

description

EECS 595 / LING 541 / SI 661. Natural Language Processing. Fall 2005 Lecture Notes #8. Evaluation of NLP systems. The classical pipeline (for supervised learning). Training set/dev set/test set Dumb baseline Intelligent baseline Your algorithm Human ceiling Accuracy/precision/recall - PowerPoint PPT Presentation

Transcript of Fall 2005 Lecture Notes #8

Page 1: Fall 2005 Lecture Notes #8

Fall 2005

Lecture Notes #8

EECS 595 / LING 541 / SI 661

Natural Language Processing

Page 2: Fall 2005 Lecture Notes #8

Evaluation of NLP systems

Page 3: Fall 2005 Lecture Notes #8

The classical pipeline (for supervised learning)

• Training set/dev set/test set• Dumb baseline• Intelligent baseline• Your algorithm• Human ceiling• Accuracy/precision/recall• Multiple references• Statistical significance

Page 4: Fall 2005 Lecture Notes #8

Special cases

• Document retrieval systems

• Part of speech tagging

• Parsing– Labeled recall– Labeled precision– Crossing brackets

Page 5: Fall 2005 Lecture Notes #8

Word classes andpart-of-speech tagging

Page 6: Fall 2005 Lecture Notes #8

Part of speech tagging

• Problems: transport, object, discount, address• More problems: content• French: est, président, fils• “Book that flight” – what is the part of speech

associated with “book”?• POS tagging: assigning parts of speech to words

in a text.• Three main techniques: rule-based tagging,

stochastic tagging, transformation-based tagging

Page 7: Fall 2005 Lecture Notes #8

Rule-based POS tagging

• Use dictionary or FST to find all possible parts of speech

• Use disambiguation rules (e.g., ART+V)

• Typically hundreds of constraints can be designed manually

Page 8: Fall 2005 Lecture Notes #8

Example in French

<S> ^ beginning of sentence

La rf b nms u article

teneur nfs nms noun feminine singular

Moyenne jfs nfs v1s v2s v3s adjective feminine singular

en p a b preposition

uranium nms noun masculine singular

des p r preposition

rivi`eres nfp noun feminine plural

, x punctuation

bien_que cs subordinating conjunction

délicate jfs adjective feminine singular

À p preposition

calculer v verb

Page 9: Fall 2005 Lecture Notes #8

Sample rules

BS3 BI1: A BS3 (3rd person subject personal pronoun) cannot be followed by a BI1 (1st person indirect personal pronoun). In the example: ``il nous faut'' ({\it we need}) - ``il'' has the tag BS3MS and ``nous'' has the tags [BD1P BI1P BJ1P BR1P BS1P]. The negative constraint ``BS3 BI1'' rules out ``BI1P'', and thus leaves only 4 alternatives for the word ``nous''.

N K: The tag N (noun) cannot be followed by a tag K (interrogative pronoun); an example in the test corpus would be: ``... fleuve qui ...'' (...river, that...). Since ``qui'' can be tagged both as an ``E'' (relative pronoun) and a ``K'' (interrogative pronoun), the ``E'' will be chosen by the tagger since an interrogative pronoun cannot follow a noun (``N'').

R V:A word tagged with R (article) cannot be followed by a word tagged with V (verb): for example ``l' appelle'' (calls him/her). The word ``appelle'' can only be a verb, but ``l''' can be either an article or a personal pronoun. Thus, the rule will eliminate the article tag, giving preference to the pronoun.

Page 10: Fall 2005 Lecture Notes #8

Confusion matrixIN JJ NN NNP RB VBD VBN

IN - .2 .7

JJ .2 - 3.3 2.1 1.7 .2 2.7

NN 8.7 - .2

NNP .2 3.3 4.1 - .2

RB 2.2 2.0 .5 -

VBD .3 .5 - 4.4

VBN 2.8 2.6 -

Most confusing: NN vs. NNP vs. JJ, VBD vs. VBN vs. JJ

Page 11: Fall 2005 Lecture Notes #8

HMM Tagging

• T = argmax P(T|W), where T=t1,t2,…,tn

• By Bayes’s theorem: P(T|W) = P(T)P(W|T)/P(W)• Thus we are attempting to choose the sequence of

tags that maximizes the rhs of the equation• P(W) can be ignored• P(T)P(W|T) = ?• P(T) is called the prior, P(W|T) is called the

likelihood.

Page 12: Fall 2005 Lecture Notes #8

HMM tagging (cont’d)

• P(T)P(W|T) = P(wi|w1t1…wi-1ti-1ti)P(ti|t1…ti-2ti-1)

• Simplification 1: P(W|T) = P(wi|ti)

• Simplification 2: P(T)= P(ti|ti-1)

• T = argmax P(T|W) = argmax P(wi|ti) P(ti|ti-1)

Page 13: Fall 2005 Lecture Notes #8

Estimates

• P(NN|DT) = C(DT,NN)/C(DT)=56509/116454 = .49

• P(is|VBZ = C(VBZ,is)/C(VBZ)=10073/21627=.47

Page 14: Fall 2005 Lecture Notes #8

Example

• Secretariat/NNP is/VBZ expected/VBN to/TO race/VB tomorrow/NR

• People/NNS continue/VBP to/TO inquire/VB the/AT reason/NN for/IN the/AT race/NN for/IN outer/JJ space/NN

• TO: to+VB (to sleep), to+NN (to school)

Page 15: Fall 2005 Lecture Notes #8

Example

NNP VBZ VBN TO VB NR

Secretariat is expected race tomorrowto

NNP VBZ VBN TO NN NR

Secretariat is expected race tomorrowto

Page 16: Fall 2005 Lecture Notes #8

Example (cont’d)

• P(NN|TO) = .00047• P(VB|TO) = .83• P(race|NN) = .00057• P(race|VB) = .00012• P(NR|VB) = .0027• P(NR|NN) = .0012• P(VB|TO)P(NR|VB)P(race|VB) = .00000027• P(NN|TO)P(NR|NN)P(race|NN) = .00000000032

Page 17: Fall 2005 Lecture Notes #8

Decoding

• Finding what sequence of states is the source of a sequence of observations

• Viterbi decoding (dynamic programming) – finding the optimal sequence of tags

• Input: HMM and sequence of words, output: sequence of states

Page 18: Fall 2005 Lecture Notes #8

Transformation-based learning

• P(NN|race) = .98• P(VB|race) = .02• Change NN to VB when the previous tag is TO• Types of rules:

– The preceding (following) word is tagged z– The word two before (after) is tagged z– One of the two preceding (following) words is tagged z– One of the three preceding (following) words is tagged z– The preceding word is tagged z and the following word is

tagged w