Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.
-
Upload
hilda-strickland -
Category
Documents
-
view
215 -
download
1
Transcript of Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.
![Page 1: Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.](https://reader035.fdocuments.in/reader035/viewer/2022080902/56649ef25503460f94c04a1e/html5/thumbnails/1.jpg)
Part-of-Speech Tagging
Foundation of Statistical NLP
CHAPTER 10
![Page 2: Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.](https://reader035.fdocuments.in/reader035/viewer/2022080902/56649ef25503460f94c04a1e/html5/thumbnails/2.jpg)
2
Contents
Markov Model TaggersHidden Markov Model TaggersTransformation-Based Learning of TagsTagging Accuracy and Uses of Taggers
![Page 3: Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.](https://reader035.fdocuments.in/reader035/viewer/2022080902/56649ef25503460f94c04a1e/html5/thumbnails/3.jpg)
3
Markov Model Taggers
Markov propertiesLimited horizon
Time invariant
cf. Wh-extraction (Chomsky)a. Should Peter buy a book?
b. Which book should Peter buy?
)|(),...,|( 111 ij
iij
i XtXPXXtXP
)|()|( 121 XtXPXtXP ji
ji
![Page 4: Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.](https://reader035.fdocuments.in/reader035/viewer/2022080902/56649ef25503460f94c04a1e/html5/thumbnails/4.jpg)
4
Markov Model Taggers
The probabilistic modelFinding the best tagging t1,n for a sentence
w1,n
n
iiiii
t
nnt
ttPtwP
wtP
n
n
11
,1,1
)|()|(maxarg
)|(maxarg
,1
,1
ex: P(AT NN BEZ IN AT VB | The bear is on the move)
![Page 5: Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.](https://reader035.fdocuments.in/reader035/viewer/2022080902/56649ef25503460f94c04a1e/html5/thumbnails/5.jpg)
5
assumtion• words are independent of each other• a word’s identity only depends on its tag
)()|(maxarg
)(
)()|(maxarg)|(maxarg
,1,1,1
,1
,1,1,1,1,1
,1
,1,1
nnnt
n
nnn
tnn
t
tPtwP
wP
tPtwPwtP
n
nn
n
iiiii
nnnn
n
iii
nnnn
n
ininnn
ttPtwP
ttPttPttPtwP
ttPttPttPtwPtPtwP
11
122111
122,111,11
,1,1,1,1
)|()|(
)|()|()|()|(
)|()|()|()|()()|(
![Page 6: Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.](https://reader035.fdocuments.in/reader035/viewer/2022080902/56649ef25503460f94c04a1e/html5/thumbnails/6.jpg)
6
Markov Model Taggers
Trainingfor all tags t j do
for all tags t k do
end
end
for all tags t j do
for all words w l do
end
end
)(
),(:)|(
j
jkjk
tC
ttCttP
)(
):(:)|(
j
jljl
tC
twCtwP
![Page 7: Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.](https://reader035.fdocuments.in/reader035/viewer/2022080902/56649ef25503460f94c04a1e/html5/thumbnails/7.jpg)
7
First tagSecond tag
AT BEZ IN NN VB PERIOD
AT 0 0 0 48636 0 19
BEZ 1973 0 426 187 0 38
IN 43322 0 1325 17314 0 185
NN 1067 3720 42470 11773 614 21392
VB 6072 42 4758 1476 129 1522
PERIOD 8016 75 4656 1329 954 0
AT BEZ IN NN VB PERIOD
bear 0 0 0 10 43 0
is 0 10065 0 0 0 0
move 0 0 0 36 133 0
on 0 0 5484 0 0 0
president 0 0 0 382 0 0
progress 0 0 0 108 4 0
the 69016 0 0 0 0 0
. 0 0 0 0 0 48809
![Page 8: Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.](https://reader035.fdocuments.in/reader035/viewer/2022080902/56649ef25503460f94c04a1e/html5/thumbnails/8.jpg)
8
Markov Model TaggersTagging (the Viterbi algorithm)
![Page 9: Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.](https://reader035.fdocuments.in/reader035/viewer/2022080902/56649ef25503460f94c04a1e/html5/thumbnails/9.jpg)
9
Variations
The models for unknown words1. assuming that they can be any part of
speech
2. using morphological to make inferences about a possible parts of speech
![Page 10: Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.](https://reader035.fdocuments.in/reader035/viewer/2022080902/56649ef25503460f94c04a1e/html5/thumbnails/10.jpg)
10
Z: normalization constant
)|/()|()|(1
)|( jjjjl thyphendingsPtdcapitalizePtwordunknownPZ
twP
![Page 11: Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.](https://reader035.fdocuments.in/reader035/viewer/2022080902/56649ef25503460f94c04a1e/html5/thumbnails/11.jpg)
11
Variation
Trigram taggers
Interpolation
Variable Memory Markov Model (VMMM)
)|()|()()|( 2,133122111,1 iiiiiiii ttPttPtPttP
![Page 12: Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.](https://reader035.fdocuments.in/reader035/viewer/2022080902/56649ef25503460f94c04a1e/html5/thumbnails/12.jpg)
12
VariationSmoothing
Reversibility
)(
),()1()|(
1
11
j
jjjj
tC
ttCttP
)|()......|()|()(
)()......()(
)()......()()(
)|()......|()|()()(
21121
121
,13,22,11
123121,1
ttPttPttPtP
tPtPtP
tPtPtPtP
ttPttPttPtPtP
nnnnn
n
nn
nnn
Kl: the number of
possible parts of speech
of w ll
l
ljlj
KwC
wtCwtP
)(
1),()|(
![Page 13: Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.](https://reader035.fdocuments.in/reader035/viewer/2022080902/56649ef25503460f94c04a1e/html5/thumbnails/13.jpg)
13
Variation
Sequence vs. tag by tag
Time flies like an arrow.
a. NN VBZ RB AT NN. P(.) = 0.01
b. NN NNS VB AT NN. P(.) = 0.01
there is no large difference in accuracy between maximizing the sequence and tag
![Page 14: Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.](https://reader035.fdocuments.in/reader035/viewer/2022080902/56649ef25503460f94c04a1e/html5/thumbnails/14.jpg)
14
Hidden Markov Model Taggers
When we have no tagged training data
Initializing all parameters with the dictionary informationJelinek’s methodKupiec’s method
![Page 15: Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.](https://reader035.fdocuments.in/reader035/viewer/2022080902/56649ef25503460f94c04a1e/html5/thumbnails/15.jpg)
15
Hidden Markov Model Taggers
Jelinek’s methodinitializing the HMM with the MLE for P(w
k|t i
)assuming that words occur equally likely with
each of their possible tags.
otherwise)(
1
for allowedspeech ofpart anot is if0
)(
)(
*.
*.
*.
.
l
lj
lj
w
mmj
llj
lj
wT
wtb
wCb
wCbb
m
T(w j): the number of
tags allowed for w j
![Page 16: Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.](https://reader035.fdocuments.in/reader035/viewer/2022080902/56649ef25503460f94c04a1e/html5/thumbnails/16.jpg)
16
Hidden Markov Model Taggers
Kupiec’s methodgrouping all words with the same possible
parts of speech into ‘metawords’ uL
not to fine-tune parameters for each word
TLwtLjwu ljlL ,...,1for allowed is|
otherwise1
L j if0
)(
)(
*.
*.
*.
.
L
b
uCb
uCbb
Lj
u LLj
LLjLj
L
![Page 17: Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.](https://reader035.fdocuments.in/reader035/viewer/2022080902/56649ef25503460f94c04a1e/html5/thumbnails/17.jpg)
17
Hidden Markov Model Taggers
Trainingafter initialization, the HMM is trained using
the Forward-Backward algorithm
Taggingequal to VMM
! the difference between VMM tagging and HMM tagging is in how we train the model, not in how we tag.
![Page 18: Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.](https://reader035.fdocuments.in/reader035/viewer/2022080902/56649ef25503460f94c04a1e/html5/thumbnails/18.jpg)
18
Hidden Markov Model Taggers
The effect of initialization on HMMovertraining problem
D0 maximum likelihood estimates from a tagged training corpus
D1 correct ordering only of lexical probabilities
D2 lexical probabilities proportional to overall tag probabilities
D3 equal lexical probabilities for all tags admissible for a word
T0 maximum likelihood estimates from a tagged training corpus
T1 equal probabilities for all transitions
![Page 19: Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.](https://reader035.fdocuments.in/reader035/viewer/2022080902/56649ef25503460f94c04a1e/html5/thumbnails/19.jpg)
19
Use Visible Markov Modela sufficiently large training textsimilar to the intended text of application
Run Forward-Backward for a few iterationsno training texttraining and test text are very differentbut at least some lexical information
Run Forward-Backward for a larger number of iterationsno lexical information