Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.
-
Upload
hilda-strickland -
Category
Documents
-
view
215 -
download
1
Transcript of Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.
Part-of-Speech Tagging
Foundation of Statistical NLP
CHAPTER 10
2
Contents
Markov Model TaggersHidden Markov Model TaggersTransformation-Based Learning of TagsTagging Accuracy and Uses of Taggers
3
Markov Model Taggers
Markov propertiesLimited horizon
Time invariant
cf. Wh-extraction (Chomsky)a. Should Peter buy a book?
b. Which book should Peter buy?
)|(),...,|( 111 ij
iij
i XtXPXXtXP
)|()|( 121 XtXPXtXP ji
ji
4
Markov Model Taggers
The probabilistic modelFinding the best tagging t1,n for a sentence
w1,n
n
iiiii
t
nnt
ttPtwP
wtP
n
n
11
,1,1
)|()|(maxarg
)|(maxarg
,1
,1
ex: P(AT NN BEZ IN AT VB | The bear is on the move)
5
assumtion• words are independent of each other• a word’s identity only depends on its tag
)()|(maxarg
)(
)()|(maxarg)|(maxarg
,1,1,1
,1
,1,1,1,1,1
,1
,1,1
nnnt
n
nnn
tnn
t
tPtwP
wP
tPtwPwtP
n
nn
n
iiiii
nnnn
n
iii
nnnn
n
ininnn
ttPtwP
ttPttPttPtwP
ttPttPttPtwPtPtwP
11
122111
122,111,11
,1,1,1,1
)|()|(
)|()|()|()|(
)|()|()|()|()()|(
6
Markov Model Taggers
Trainingfor all tags t j do
for all tags t k do
end
end
for all tags t j do
for all words w l do
end
end
)(
),(:)|(
j
jkjk
tC
ttCttP
)(
):(:)|(
j
jljl
tC
twCtwP
7
First tagSecond tag
AT BEZ IN NN VB PERIOD
AT 0 0 0 48636 0 19
BEZ 1973 0 426 187 0 38
IN 43322 0 1325 17314 0 185
NN 1067 3720 42470 11773 614 21392
VB 6072 42 4758 1476 129 1522
PERIOD 8016 75 4656 1329 954 0
AT BEZ IN NN VB PERIOD
bear 0 0 0 10 43 0
is 0 10065 0 0 0 0
move 0 0 0 36 133 0
on 0 0 5484 0 0 0
president 0 0 0 382 0 0
progress 0 0 0 108 4 0
the 69016 0 0 0 0 0
. 0 0 0 0 0 48809
8
Markov Model TaggersTagging (the Viterbi algorithm)
9
Variations
The models for unknown words1. assuming that they can be any part of
speech
2. using morphological to make inferences about a possible parts of speech
10
Z: normalization constant
)|/()|()|(1
)|( jjjjl thyphendingsPtdcapitalizePtwordunknownPZ
twP
11
Variation
Trigram taggers
Interpolation
Variable Memory Markov Model (VMMM)
)|()|()()|( 2,133122111,1 iiiiiiii ttPttPtPttP
12
VariationSmoothing
Reversibility
)(
),()1()|(
1
11
j
jjjj
tC
ttCttP
)|()......|()|()(
)()......()(
)()......()()(
)|()......|()|()()(
21121
121
,13,22,11
123121,1
ttPttPttPtP
tPtPtP
tPtPtPtP
ttPttPttPtPtP
nnnnn
n
nn
nnn
Kl: the number of
possible parts of speech
of w ll
l
ljlj
KwC
wtCwtP
)(
1),()|(
13
Variation
Sequence vs. tag by tag
Time flies like an arrow.
a. NN VBZ RB AT NN. P(.) = 0.01
b. NN NNS VB AT NN. P(.) = 0.01
there is no large difference in accuracy between maximizing the sequence and tag
14
Hidden Markov Model Taggers
When we have no tagged training data
Initializing all parameters with the dictionary informationJelinek’s methodKupiec’s method
15
Hidden Markov Model Taggers
Jelinek’s methodinitializing the HMM with the MLE for P(w
k|t i
)assuming that words occur equally likely with
each of their possible tags.
otherwise)(
1
for allowedspeech ofpart anot is if0
)(
)(
*.
*.
*.
.
l
lj
lj
w
mmj
llj
lj
wT
wtb
wCb
wCbb
m
T(w j): the number of
tags allowed for w j
16
Hidden Markov Model Taggers
Kupiec’s methodgrouping all words with the same possible
parts of speech into ‘metawords’ uL
not to fine-tune parameters for each word
TLwtLjwu ljlL ,...,1for allowed is|
otherwise1
L j if0
)(
)(
*.
*.
*.
.
L
b
uCb
uCbb
Lj
u LLj
LLjLj
L
17
Hidden Markov Model Taggers
Trainingafter initialization, the HMM is trained using
the Forward-Backward algorithm
Taggingequal to VMM
! the difference between VMM tagging and HMM tagging is in how we train the model, not in how we tag.
18
Hidden Markov Model Taggers
The effect of initialization on HMMovertraining problem
D0 maximum likelihood estimates from a tagged training corpus
D1 correct ordering only of lexical probabilities
D2 lexical probabilities proportional to overall tag probabilities
D3 equal lexical probabilities for all tags admissible for a word
T0 maximum likelihood estimates from a tagged training corpus
T1 equal probabilities for all transitions
19
Use Visible Markov Modela sufficiently large training textsimilar to the intended text of application
Run Forward-Backward for a few iterationsno training texttraining and test text are very differentbut at least some lexical information
Run Forward-Backward for a larger number of iterationsno lexical information