School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory...

23
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing Eric Atwell, Language Research Group (with thanks to Katja Markert, Marti Hearst, and other contributors)

Transcript of School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory...

Page 1: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

School of somethingFACULTY OF OTHER

School of ComputingFACULTY OF ENGINEERING

PoS-Tagging theory and terminology

COMP3310 Natural Language Processing

Eric Atwell, Language Research Group

(with thanks to Katja Markert, Marti Hearst, and other contributors)

Page 2: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

Reminder: PoS-tagging programs

Models behind some example PoS-tagging methods in NLTK:

Hand-coded

Statistical taggers

Brill (transformation-based) tagger

NB you don’t have to use NLTK – useful to illustrate

Page 3: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

Training and Testing ofMachine Learning Algorithms

Algorithms that “learn” from data see a set of examples and try to generalize from them.

Training set:

• Examples trained on

Test set:

• Also called held-out data and unseen data

• Use this for evaluating your algorithm

• Must be separate from the training set

• Otherwise, you cheated!

“Gold standard” evaluation set

• A test set that a community has agreed on and uses as a common benchmark. DO NOT USE IN TRAINING OR TESTING

Page 4: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

PoS word classes in English

Word classes, also called syntactic categories or grammatical categories or Parts of Speech

closed class type: classes with fixed and few members, function words e.g. prepositions;

open class type: large class of members, many new additions, content words e.g. nouns

8 major word classes: nouns, verbs, adjectives, adverbs,

prepositions, determiners, conjunctions, pronouns

In English, also most (?all) Natural Languages

Page 5: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

What properties define “noun”?

Semantic properties: refer to people, places and things

Distributional properties: ability to occur next to determiners, possessives, adjectives (specific locations)

Morphological properties: most occur in singular and plural

These are properties of a word TYPE,

eg “man” is a noun (usually)

Sometimes a given TOKEN may not meet all these criteria …

The men are happy … the man is happy …

They man the lifeboat (?)

Page 6: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

Subcategories

Noun

Proper Noun v Common Noun

(Mass noun v Count Noun)

singular v plural

Count v mass (often not covered in PoS-tagsets)

Some tag-sets may have other subcategories,

Eg NNP = common noun with Word Initial Capital

(eg Englishman)

PoS-tagset Often encodes morphological categories like person, number, gender, tense, case . . .

Page 7: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

Verb: action or process

VB present/infinitive teach, eat

VBZ 3rd-person-singular present (s-form) teaches, eats

VBG progressive (ing-form) teaching, eating

VBD/VBN past taught, ate/eaten

Intransitive he died, transitive she killed him, …

(transitivity usually not marked in PoS-tags)

Auxiliaries: Modal verb e.g. can, must, may

Have, be, do can be modal or ma verbs

e.g. I have a present v I have given you a present

Page 8: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

Adjective: quality or property (of a thing: noun phrase)

English is simple:

JJ big, JJR comparative bigger, JJT superlative biggest

More features in other languages, eg

Agreement (number, gender) with noun

Before a noun v after “be”

Page 9: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

Adverb: quality or property of verb or adjective (or other functions…)

A hodge-podge (!)

General adverb often ends –ly slowly, happily (but NOT early)

Place adverb home, downhill

Time adverb now, tomorrow

Degree adverbs very, extremely, somewhat

Page 10: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

Function words

Preposition e.g. in of on for over with (to)

Determiner e.g. this that, article the a

Conjunction e.g. and or but because that

Pronoun e.g. personal pronouns

I we (1st person),

you (2nd person),

he she it they (3rd person)

Possessive pronouns my, your, our, their

WH-pronouns what who whoever

Others: negatives (not), interjections (oh), existential there, …

Page 11: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

Parts of “multi word expressions”

Particle – like preposition but “part of” a phrasal verb

I looked up her address v I looked up her skirt

I looked her address up v *I looked her skirt up

Big problem for PoS-tagging: common, and ambiguous

Other multi-word idioms: ditto tags

Page 12: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

Bigram Markov Model tagger

Naive Method

1. Get all possible tag sequences of the sentence

2. Compute the probability of each tag sequence given the

Sentence, using word-tag and tag-bigram probabilites

3. Take the maximum probability

Problem: This method has exponential complexity!

Solution: Viterbi Algorithm (not discussed in this module)

Page 13: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

N-gram tagger

Uses the preceding N-1 predicted tags

Also uses the unigram estimate for the current word

Page 14: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

Example

p(AT NN BEZ IN AT NN|The bear is on the move) =

p(the|AT)p(AT|PERIOD)× p(bear|NN)p(NN|AT) . . .

×p(move|NN)p(NN|AT)

p(AT NN BEZ IN AT VB|The bear is on the move) =

p(the|AT)p(AT|PERIOD)× p(bear|NN)p(NN|AT) . . .

×p(move|VB)p(VB|AT)

Page 15: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

Bigram tagger: problems

Unknown words in new input

Parameter estimation: need a tagged training text, what if this is different genre/dialect/language-type from new input?

Tokenization of training text and new input: contractions (isn’t), multi-word tokens (New York)

crude assumptions

very short distance dependencies

tags are not conditioned on previous words

Unintuitive

Page 16: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

Transformation-based tagging

Markov model tagging: small range of regularities only

TB tagging first used by Brill, 1995

Encodes more complex interdependencies between words

and tags

by learning intuitive rules from a training corpus

exploits linguistic knowledge; rules can be tuned manually

Page 17: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

Transformation Templates

Templates specify general, admissible transformations:

Change Tag1 to Tag2 if

The preceding (following) word is tagged Tag3

The word two before (after) is tagged Tag3

One of the two preceding (following) words is tagged Tag3

One of the three preceding (following) words is tagged Tag3

The preceding word is tagged Tag3

and the following word is tagged Tag4

The preceding (following) word is tagged Tag3

and the word two before (after) is tagged Tag4

Page 18: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

Machine Learning Algorithm

Learns rules from tagged training corpus by specialising in templates

1. Assume you do not know the precise tagging sequence in your training corpus

2. Tag each word in the training corpus with its most frequent tag, e.g. move => VB

3. Consider all possible transformations and apply the one that

improves tagging most (greedy search) ,

e.g. Change VB to NN if the preceding word is tagged AT

4. Retag whole corpus applying that rule

5. Go back to 3 and repeat until no significant improvements are reached

6. Output all the rules you learnt in order!

Page 19: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

Example: 1st cycle

First approximation: Initialise with most frequent tag (lexical information)

The/AT

bear/VB

is/BEZ

on/IN

the/AT

move/VB

to/TO

race/NN

there/RN

Page 20: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

Change VB to NN if previous tag is AT

Try all possible transformations, choose the most useful one and apply it:

The/AT

bear/NN

is/BEZ

on/IN

the/AT

move/NN

to/TO

race/NN

there/RN

Page 21: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

Change NN to VB if previous tag is TO

Try all possible transformations, choose the most useful one and apply it:

The/AT

bear/NN

is/BEZ

on/IN

the/AT

move/NN

to/TO

race/VB

there/RN

Page 22: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

Final set of learnt rules

Brill rules corresponding to syntagmatic patterns

1. Change VB to NN if previous tag is AT

2. Change NN to VB if previous tag is TO

Can now be applied to an untagged corpus!

uses pre-encoded linguistic knowledge explicitly

uses wider context + following context

can be expanded to word-driven templates

can be expanded to morphology-driven templates (for unknown words)

learnt rules are intuitive, easy to understand

Page 23: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

Combining taggers

Can be combined via backoff: if first tagger finds no tag (None) then try another tagger

This really only makes sense with N-gram taggers:

If trigram tagger finds no tag, backoff to bigram tagger,

if bigram tagger fails then backoff to unigram tagger

Better: combine tagger results by a voting system

Combinatory Hybrid Elementary Analysis of Text

(combines results of morphological analysers / taggers)