CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof....
-
Upload
cale-leeming -
Category
Documents
-
view
219 -
download
1
Transcript of CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof....
CS460/IT632 Natural Language
Processing/Language Technology for the Web
Lecture 2 (06/01/06)Prof. Pushpak Bhattacharyya
IIT Bombay
Part of Speech (PoS) Tagging
06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 2
Tagging or Annotation
● Purpose is Disambiguation● A word can have a number of labels● The problem is to give unique label.● PoS tagging makes use of the “local context”,
whereas Sense tagging needs “long distance dependency” and hence difficult too.
● PoS tagging is needed in mainly parsing and also in other applications.
06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 3
Approaches
● Rule Based approach● Statistical approach
– we will mainly focus on the statistical approach
06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 4
Types of Tagging Tasks
● PoS● Named entity● Sense● Parse tree
06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 5
PoS Tagging
● Example– “The Orange ducks clean the bills.”
● Assign tags to each word from the lexicon; multiple possibilities exist
06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 6
Lexicon dictionary
● The: – DT (Determiner)
● Orange:– NN (Noun)
– JJ (Adjective)
● Duck:– NN
– VB ( Basic verb)
● Clean:– NN – VB
● Bill:– NN– VB
JJ, VB, NN are called as Syntactic entities or PoS tags
06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 7
PoS tagging as a sequence labelling task
● Task is to assign the correct PoS tag sequence to the words.
● It can be:– Unigram: Consider one word while deciding the
sequence.
– Multigram: Consider multiple words.
● 16 (=1*2*2*2*1*2) possible sequences for the “Duck” example.
● It is a classification problem: classify each word’s tag correctly into the right category.
06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 8
Challenges● Lexical ambiguity: Multiple choices● Morphology analysis: Find the root word● Tokenization: Find word boundaries
– In Thai language there is no blank space
– Non trivial (example: capturing boundaries when the word is continued to the next line with a “-”)
06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 9
Named Entity tagging
● Example 1:– “Mohan went to school in Kolkata”
● Tagged as:– “Mohan_Person went to School_Place in
Kolkata_Place”.
● Example 2:– “Kolkata bore the brunt of 1947 riots when 1947
children died at Kolkata.
– “Kolkata_? bore the brunt of 1947_year riots when 1947_num children died at Kolkata_Place.
06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 10
Sense tagging
● Detecting the meaning.● Our example tagged as:
– The Orange_{colour} ducks_{bird} clean the bills_{body_part}
● Sense tagging has been done by means of hypernymy.
● Semantic relations like hypernymy are stored in the lexical resource called “WordNet”.
06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 11
Parse Tree tagging
● Example parse tree:
06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 12
Parse Tree tagging (contd.)
● Given a grammar, one can construct the parse tree.
● Annotation will produce following structure:– [ [The_DT [Orange_JJ Ducks_NN]NP]NP [clean_VB[the_VB
[bills_NN]NP]NP]VP]S
● This structure is called the Penn Treebank form
● From the Treebank form, one can arrive at a grammar through learning.
06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 13
Statistical Formulation of the PoS tagging problem
● Input:
– W1,W2,...Wn words
– C1,C2,....Cm Lexical tags reposition (DT,JJ, NN et. al.)
● Output:
– “Best” PoS tag sequence Ci1, Ci2
, Ci3....Cin
for the
given words.
● Best means:
– P(Ci1, Ci2
, Ci3....Cin
|W1,W2,...Wn) is the maximum of all
possible C-sequence.
06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 14
● Example:
– P(DT JJ NN| The Orange duck) > P(DT NN VB| The Orange duck) is required
● Why?:– Because given the phrase “The orange duck”, there
is overwhelming evidence in the corpus that “DT JJ NN” is the right tag sequence.
Statistical Formation of PoS tagging problem
06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 15
Mathematical machinery
06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 16
Bayes Theorem
● P(A|B) = (P(A).P(B|A)) / P(B)– Where,
– P(A): Prior probability
– P(A|B): Posterior probability
– P(B|A): likelihood
● Why apply Bayes theorem:– This is the Generative Vs Discriminative model
question.
06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 17
Apply Bayes theorem
P(Ci1, Ci2
, Ci3....Cin
|W1,W2,...Wn) = P(C|W)
=
where,
C = <Ci1, Ci2
, Ci3....Cin
>
W = <W1,W2,...Wn>
P(C). P(W|C)
P(W)
06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 18
Best tag sequence
C* = <Ci1, Ci2
, Ci3....Cin
>* , where * signifies best
C-sequence
= argmax(P(C|W))● As denominator is common in all the tag sequences
Therefore,
C* = argmax(P(C).P(W|C))
06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 19
Processing the1st part
P(C) = P(Ci1, Ci2
, Ci3....Cin
)
= P(Ci1).P(Ci2
|Ci1).P(Ci3
|Ci1. Ci2
)..P(Cin|Ci1
Ci2..
Cin-1)
(on applying chain rule of probability)
Ex: P(DT JJ NN) = P(DT).P(JJ|DT).P(NN|DT JJ)
06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 20
Markov assumption
● Tag depends only on a window, not on everything that the “chain law” of probability demands.
● Kth order Markov assumption considers only previous K tags.
● Typical values of K = 3 for English, and (it seems) 5 for Hindi.
06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 21
Apply assumption
With K=2, our problem will be:
P(C) = P(Ci|Ci-1),
i: 1..n
C0: sentence beginning marker.
06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 22
Exercise given in the lecture
● Contrast PoS tagging with Sense tagging.● Find an example to show the difference.