CS344: Introduction to Artificial Intelligence (associated lab: CS386)
description
Transcript of CS344: Introduction to Artificial Intelligence (associated lab: CS386)
![Page 1: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/1.jpg)
CS344: Introduction to Artificial Intelligence
(associated lab: CS386)
Pushpak BhattacharyyaCSE Dept., IIT Bombay
Lecture 12, 13: Sequence Labeling using HMM- POS tagging; Baum Welch
1st Feb, 2011
![Page 2: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/2.jpg)
POS tagging sits between Morphology and Parsing
Semantics Extraction
Parsing: Syntactic Processing
POS tagging
Morphological Processing
Rules andResources
![Page 3: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/3.jpg)
MorphPOS->Parse Because of this sequence, at the
level of POS tagging the only information available is the word, its constituents, its properties and its neighbouring words and their propertiesW0 W1 W2 Wi Wn-1 Wn… Wi+1
Word of interest
![Page 4: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/4.jpg)
Cannot assume parsing and semantic processing Parsing identifies long distance
dependencies Needs POS tagging which must
finish earlier Semantic processing needs parsing
and POS tagging
![Page 5: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/5.jpg)
Example Vaha ladakaa so rahaa hai (that boy is sleeping) Vaha cricket khel rahaa hai (he plays cricket) The fact that “vaha” is
demonstrative in the first sentence and pronoun in the second sentence, needs deeper levels of information
![Page 6: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/6.jpg)
“vaha cricket” is not that simple! Vaha cricket jisme bhrastaachaar
ho, hame nahii chaahiye (that cricket which has corruption
in it is not acceptable to us) Here “vaha” is demonstrative Needs deeper level of processing
![Page 7: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/7.jpg)
Syntactic processing also cannot be assumed
raam kaa yaha baar baar shyaam kaa ghar binaa bataaye JAANAA mujhe bilkul pasand nahii haai
(I do not at all like the fact that Ram goes to Shyam's house repeatedly without informing (anybody))
"Ram-GENITIVE this again and again Shyam-GENITIVE house any not saying GOING I-dative at all like not VCOP“
JAANAA can be VINF (verb infinitive) or VN (verb nominal, i.e., gerundial)
![Page 8: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/8.jpg)
Syntactic processing also cannot be assumed (cntd.)
raam kaa yaha baar baar shyaam kaa ghar binaa bataaye JAANAA mujhe bilkul pasand nahii haai
The correct clue for disambiguation here is 'raam kaa', and this word group is far apart
One needs to determine the structure of intervening constituents
This needs parsing which in turn needs correct tags Thus there is a circularity which can be broken only by
retaining ONE of VINF and VN.
![Page 9: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/9.jpg)
Fundamental principle of POS tagset design IN THE TAGSET DO NOT HAVE TAGS
THAT ARE POTENTIAL COMPETITORS AND TIE BETWEEN WHICH CAN BE BROKEN ONLY BY NLP PROCESSES COMING AFTER THE PARTICULAR TAGGING TASK.
![Page 10: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/10.jpg)
Computation of POS tags
![Page 11: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/11.jpg)
Process List all possible tag for each word
in sentence. Choose best suitable tag sequence.
![Page 12: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/12.jpg)
Example ”People jump high”. People : Noun/Verb jump : Noun/Verb high : Noun/Adjective We can start with probabilities.
![Page 13: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/13.jpg)
Generative Model^_^ People_N Jump_V High_R ._.
^ N
V
N
V
R
N
V
.
Lexical Probabilities
BigramProbabilities
This model is called Generative model. Here words are observed from tags as states.This is similar to HMM.
RR
![Page 14: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/14.jpg)
Example of Calculation from Actual Data
Corpus ^ Ram got many NLP books. He found
them all very interesting. Pos Tagged
^ N V A N N . N V N A R A .
![Page 15: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/15.jpg)
Recording numbers (bigram assumption)
^ N V A R .^ 0 2 0 0 0 0N 0 1 2 1 0 1V 0 1 0 1 0 0A 0 1 0 0 1 1R 0 0 0 1 0 0. 1 0 0 0 0 0
^ Ram got many NLP books. He found them all very interesting.
Pos Tagged^ N V A N N . N V N A R A .
![Page 16: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/16.jpg)
Probabilities^ N V A R .
^ 0 1 0 0 0 0N 0 1/5 2/5 1/5 0 1/5V 0 1/2 0 1/2 0 0A 0 1/3 0 0 1/3 1/3R 0 0 0 1 0 0. 1 0 0 0 0 0
^ Ram got many NLP books. He found them all very interesting.
Pos Tagged^ N V A N N . N V N A R A .
![Page 17: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/17.jpg)
To find T* = argmax (P(T) P(W/T)) P(T).P(W/T) = Π P( ti / ti-1 ).P(wi /ti) i=1n+1
P( ti / ti-1 ) : Bigram probability
P(wi /ti): Lexical probability
Note: P(wi/ti)=1 for i=0 (^, sentence beginner)) and i=(n+1) (., fullstop)
![Page 18: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/18.jpg)
Bigram probabilities N V A R
N 0.15 0.7 0.05 0.1
V 0.6 0.2 0.1 0.1
A 0.5 0.2 0.3 0
R 0.1 0.3 0.5 0.1
![Page 19: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/19.jpg)
Lexical Probability People jump high
N 10-5 0.4x10-3 10-7
V 10-7 10-2 10-7
A 0 0 10-1
R 0 0 0
values in cell are P(col-heading/row-heading)
![Page 20: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/20.jpg)
Some notable text corpora of English American National Corpus Bank of English British National Corpus Corpus Juris Secundum Corpus of Contemporary American English
(COCA) 400+ million words, 1990-present. Freely searchable online.
Brown Corpus, forming part of the "Brown Family" of corpora, together with LOB, Frown and F-LOB.
International Corpus of English Oxford English Corpus Scottish Corpus of Texts & Speech
![Page 21: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/21.jpg)
Accuracy measurement in POS tagging
![Page 22: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/22.jpg)
Standard Bar chart: Per Part of Speech Accuracy
![Page 23: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/23.jpg)
Standard Data: Confusion Matrix
![Page 24: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/24.jpg)
How to check quality of tagging (P, R, F) Three parameters
Precision P = |A ^ O|/|O|
Recall R = |A ^ O| / |A|
F-score = 2PR/(P+R) Harmonic mean
Actual(A)Obtained(O)A ^
O
![Page 25: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/25.jpg)
Relation between P & R
Precision P
Recall R
P is inversely related to R (unless additional knowledge is given)
![Page 26: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/26.jpg)
Ambiguity problem in POS Tagging To go – jaana – VNF (infinite verb) Going – jaana – VN (Gerundial
verb) In general, their POS
disambiguation is impossible.
![Page 27: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/27.jpg)
Counting the probability
( , )) )#
|( ( # N NN P N NP NN
![Page 28: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/28.jpg)
Context clues1 2 3 1 1i i iw w w www
Current word
Model
Look Ahead Look Before
DiscriminativeGenerative
HMM
MEMMCRF
![Page 29: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/29.jpg)
Assignment on Part of British National Corpus We will also make available Hindi
and Marathi POS tag corpus
![Page 30: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/30.jpg)
Syncretism – Advantage of Hindi over English Example:
For English: “will go”: both will and go may be noun or verb
For Indic languages: jaaunga word suffixes decides the tag
Marathi and Dravidian language have more word information
![Page 31: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/31.jpg)
Concept of categorials<category> → <category>al
noun → nominaladjective → adjectival
verb → verbaladverb → adverbial
Def: An entity that behaves like a <category> {The boy} plays football{The boy from Delhi} plays football The part of the sentence in braces is nominal clause
• These can be found using substitution test in linguistics• Distinguishing may be tricky - major cause of accuracy drop
![Page 32: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/32.jpg)
HMM Training
Baum Welch or Forward Backward Algorithm
![Page 33: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/33.jpg)
Key Intuition
Given: Training sequenceInitialization: Probability valuesCompute:Pr (state seq | training seq)
get expected count of transitioncompute rule probabilities
Approach: Initialize the probabilities and recompute them… EM like approach
a
b
a
b
a
b
a
b
q r
![Page 34: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/34.jpg)
Baum-Welch algorithm: counts
String = abb aaa bbb aaa
Sequence of states with respect to input symbols
a, b
a,b
qr
a,b
rqrqqqrqrqqrq aaabbbaaabba o/p seqState seq
a,b
![Page 35: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/35.jpg)
Calculating probabilities from tableTable of
counts
T=#statesA=#alphabet symbolsNow if we have a non-deterministic transitions then
multiple state seq possible for the given o/p seq (ref. to previous slide’s feature). Our aim is to find expected count through this.
8/3)( rqP b
Src Dest
O/P Count
q r a 5q q b 3r q a 3r q b 2
8/5)( rqP a
T
l
A
m
li
jiji
swsc
swscswsPm
kk
1 1
)(
)()(
![Page 36: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/36.jpg)
Interplay Between Two Equations
T
l
A
m
lWmi
jWijWi
ssc
sscssPk
k
0 0
)(
)()(
1,0
),,()|(
)(
,01,0,01,0n
k
k
snn
jWinn
jWi
wSssnWSP
ssC
wk
No. of times the transitions sisj occurs in the string
![Page 37: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/37.jpg)
Illustrationa:0.67
b:1.0
b:0.17
a:0.16
q r
a:0.04
b:1.0
b:0.48
a:0.48
q r
Actual (Desired) HMM
Initial guess
![Page 38: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/38.jpg)
One run of Baum-Welch algorithm: string ababb
P(path)q r q r q q 0.0007
70.0015
40.0015
40 0.000
77q r q q q q 0.0044
20.0044
20.0044
20.004
420.008
84q q q r q q 0.0044
20.0044
20.0044
20.004
420.008
84q q q q q q 0.0254
80.0 0.000 0.050
960.076
44Rounded Total 0.035 0.01 0.01 0.06 0.095
New Probabilities (P) 0.06=(0.01/(0.01+0.06+0.09
5)
1.0 0.36 0.581
qbq qaq raq qbr a ba ab bb bba
* ε is considered as starting and ending symbol of the input sequence string.
State sequences
Through multiple iterations the probability values will converge.
![Page 39: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/39.jpg)
Computational part (1/2)
ntn
jtkt
it
n
nt snn
jtkt
it
n
snn
jWinn
n
snn
jWinn
jWi
WsSwWsSPWP
WSsSwWsSPWP
WSssnWSPWP
WSssnWSPssC
n
n
k
n
kk
,0,01
,0
,0,01,01
,0
,01,0,01,0,0
,01,0,01,0
)],,,([)(
1
)],,,,([)(
1
)],,(),([)(
1
)],,()|([)(
1,0
1,0
1,0
w0 w1 w2 wk wn-1 wn
S0 S1 S1 … Si Sj … Sn-1 Sn Sn+1
![Page 40: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/40.jpg)
Computational part (2/2)
),1()(),1(
),1()|,(),1(
),1()|,(),1(
)|(),|,(),(
),,,,(
),,,(
0
0
1
0
1
1
0,11,011,0
0,111,0
0,01
jtBswsPitF
jtBsSwWsSPitF
jtBsSwWsSPitF
sSWPsSWwWsSPsSWP
WwWsSsSWP
WwWsSsSP
n
t
ji
n
t
itkt
jt
n
t
itkt
jt
jt
n
tnt
ittkt
jt
itt
n
tntkt
jt
itt
n
tnkt
jt
it
k
w0 w1 w2 wk wn-1 wn
S0 S1 S1 … Si Sj … Sn-1 Sn Sn+1
![Page 41: CS344: Introduction to Artificial Intelligence (associated lab: CS386)](https://reader036.fdocuments.in/reader036/viewer/2022081520/56816599550346895dd87469/html5/thumbnails/41.jpg)
Discussions1. Symmetry breaking:
Example: Symmetry leads to no change in initial values
2 Struck in Local maxima3. Label bias problem
Probabilities have to sum to 1.Values can rise at the cost of fall of values for others.
s
s sb:1.0
b:0.5
a:0.5
a:1.0
s
s sa:0.5
b:0.5
a:0.25
a:0.5b:0.5
a:0.25
b:0.25
b:0.5
Desired Initialized