Part of Speech Tagging
Perpectivising NLP: Areas of AI andtheir inter-dependencies
KnowledgeSearch Logic Representation
MachineLearning Planning
ExpertSystemsNLP Vision Robotics
Two pictures
ProblemNLP
Semantics NLPnity
Parsing
MorphVision SpeechAnalysis
HMMStatistics and Probability Hindi English
LanguageCRF+
Knowledge Based
MEMM
Algorithm
Semantics N
Tri
Part of SpeechTagging
Marathi French
What it is
POS Tagging is a process that attaches
each word in a sentence with a suitable
tag from a given set of tags.
The set of tags is called the Tag-set.
Standard Tag-set : Penn Treebank (for
English).
Definition
Tagging is the assignment of a
singlepart-of-speech tag to each word
(and punctuation marker) in a corpus.
“_“ The_DT guys_NNS that_WDT
make_VBP traditional_JJ hardware_NN
are_VBP really_RB being_VBG
obsoleted_VBN by_IN microprocessor-
based_JJ machines_NNS ,_, ”_” said_VBD
Mr._NNP Benton_NNP ._.
POS Tags
NN – Noun; e.g.
VM – Main Verb;
Dog_NN
e.g. Run_VM
VAUX – Auxiliary Verb; e.g. Is_VAUX
JJ – Adjective; e.g. Red_JJ
PRP – Pronoun; e.g. You_PRP
NNP – Proper Noun; e.g. John_NNP
etc.
POS Tag Ambiguity
In English : I bank1 on the bank2 on the
river bank3 for
Bank1 is verb,
my transactions.
the other two banks are
noun
In Hindi :
”Khaanaa” : can be noun (food) or
eat)
verb (to
For Hindi
Rama achhaa gaata hai. (hai is VAUX :
Auxiliary verb); Ram sings well
Rama achha ladakaa hai. (hai is VCOP :
Copula verb); Ram is a good boy
Process
List all possible tag for each word in
sentence.
Choose best suitable tag sequence.
Example
”People jump high”.
People : Noun/Verb
jump : Noun/Verb
high : Noun/Verb/Adjective
We can start with probabilities.
Importance of POS tagging
Ack: presentation by ClaireGardent on POS tagging by
NLTK
What is Part of Speech (POS)
Words can be divided into classesbehave similarly.
Traditionally eight parts of speech
that
inEnglish: noun, verb, pronoun,preposition, adverb,adjective and article
More recently larger
conjunction,
sets have beenused: e.g. Penn Treebank (45 tags),Susanne (353 tags).
Why POS POS tell us a lot about a word (and the
words near it). E.g, adjectives often followed by nouns
personal pronouns often followed by verbs
possessive pronouns by nouns
Pronunciations depends on POS, e.g. object (first syllable NN, second syllable VM), content, discount
First step in many NLP applications
Categories of POSOpen and closed classes
Closed classes have a fixed membership of words: determiners, pronouns, prepositions
Closed class words are usually functionword: frequently occurring, grammatically important, often short (e.g. of, it, the, in)
Open classes: nouns, verbs, adjectives and adverbs(allow new addition of word)
Open Class (1/2) Nouns:
Proper nouns (Scotland, BBC),
common nouns count nouns (goat, glass)
mass nouns (snow, pacifism)
Verbs: actions and processes (run, hope)
also auxiliary verbs (is, are, am, will, can)
Open Class (2/2) Adjectives:
properties and qualitiesvalue)
Adverbs:
(age, colour,
modify verbs, or verb phrases, or otheradverbs- Unfortunately John walked home extremely slowly yesterday
Sentential adverb: unfortunately
Manner adverb: extremely, slowly
Time adverb: yesterday
Closed class Prepositions: on, under, over, to, with,
by
Determiners: the, a, an, some
Pronouns: she, you, I, who
Conjunctions: and, but, or, as, when, if Auxiliary verbs: can, may, are
Penn tagset (1/2)
Penn tagset (2/2)
IndianNoun
Language Tagset:
Indian Language Tagset:Pronoun
Indian Language Tagset:Quantifier
Indian Language Tagset:Demonstrative
3 Demonstrative DM DM Vaha, jo, yaha,
3.1 Deictic DMD DM DMD Vaha, yaha
3.2 Relative DMR DM DMR jo, jis
3.3 Wh-word DMQ DM DMQ kis, kaun
Indefinite DMI DM DMI KoI, kis
Indian Language Tagset:Verb, Adjective, Adverb
Indian Language Tagset:Postposition, conjunction
Indian Language Tagset:Particle
Indian Language Tagset:Residuals
BigramBest tag sequence
Assumption
===
T*argmax P(T|W)argmax P(T)P(W|T) (by Baye’s Theorem)
P(T) = P(t0=^ t1t2 … tn+1=.)
= P(t0)P(t1|t0)P(t2|t1t0)P(t3|t2t1t0) …P(tn|tn-1tn-2…t0)P(tn+1|tntn-1…t0)
= P(t0)P(t1|t0)P(t2|t1) … P(tn|tn-1)P(tn+1|tn)
N+1
∏i = 0
= P(ti|ti-1) Bigram Assumption
Lexical Probability AssumptionP(W|T) = P(w0|t0-tn+1)P(w1|w0t0-tn+1)P(w2|w1w0t0-tn+1) …
P(wn|w0-wn-1t0-tn+1)P(wn+1|w0-wnt0-tn+1)
Assumption: A word is determined completely by inspired by speech recognition
its tag. This is
= P(wo|to)P(w1|t1) … P(wn+1|tn+1)
n+1
=∏ P(wi|ti)i = 0
n+1
= ∏ P(wi|ti)i = 1
(Lexical Probability Assumption)
Generative Model
^_^ People_N Jump_V High_R ._.
LexicalProbabilities
^ N V A .
V N N BigramProbabilities
NA A
This model is called Generative model.Here words are observed from tags as states.This is similar to HMM.
Bigram probabilities
N V A
N 0.2 0.7 0.1
V 0.6 0.2 0.2
A 0.5 0.2 0.3
Lexical Probability
10
10
10
People jump high
N- 5
10- 3
0.4x10 - 7
V- 7
10-2
10 - 7
A 0 0 - 1
values in cell are P(col-heading/row-heading)
Calculation from Corpus
actual data
^ Ram got many NLP books. He found themall very interesting.
Pos Tagged ^ N V A N N . N V N A R A .
Recording numbers^ N V A R .
^ 0 2 0 0 0 0
N 0 1 2 1 0 1
V 0 1 0 1 0 0
A 0 1 0 0 1 1
R 0 0 0 1 0 0
. 1 0 0 0 0 0
Probabilities^ N V A R .
^ 0 1 0 0 0 0
N 0 1/5 2/5 1/5 0 1/5
V 0 1/2 0 1/2 0 0
A 0 1/3 0 0 1/3 1/3
R 0 0 0 1 0 0
. 1 0 0 0 0 0
To find
T* = argmax (P(T) P(W/T)) P(T).P(W/T) = Π P( ti / ti+1 ).P(wi /ti)
i=1 n
) : Bigram probability P( ti / ti+1
P(wi /ti): Lexical probability
Bigram probabilities
N V A R
N 0.15 0.7 0.05 0.1
V 0.6 0.2 0.1 0.1
A 0.5 0.2 0.3 0
R 0.1 0.3 0.5 0.1
Lexical Probability
10
10
10
People jum p high
N-5
10-3
0.4x10 -7
V-7
10-2
10 -7
A 0 0 -1
R 0 0 0
values in cell are P(col-heading/row-heading)
Top Related