CS344: Introduction to Artificial Intelligence (associated...
Transcript of CS344: Introduction to Artificial Intelligence (associated...
![Page 1: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/1.jpg)
CS344: Introduction to Artificial Intelligence
(associated lab: CS386)Pushpak Bhattacharyya
CSE Dept., IIT Bombay
Lecture–22, 25: EM, Baum Welch28th March and 1st April, 2014
(Lecture 21 was by Girish on Markov Logic Network )
![Page 2: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/2.jpg)
Expectation Maximization
One of the key ideas of Statistical AI, ML, NLP, CV
Iterative procedure Find Parameters Find hidden variables Maiximize data likelihood
![Page 3: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/3.jpg)
The coin tossing problem
Case of 1 coin: Suppose there are N tosses of a coin. NH = The number of Heads What is the probability of a head i.e. PH = ?
![Page 4: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/4.jpg)
Observed variable
#Observation = N
N
xPTherefore
otherwiseheadaproducestossthewhenxwhere
xxxx
N
ii
H
i
N
1
321
,
,0,1
,, :X
![Page 5: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/5.jpg)
Each observation is a Bernoulli’s Trial where
is the probability of success i.e., getting a head
is the probability of failure i.e., getting a tail
HP
HP1
![Page 6: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/6.jpg)
Likelihood of X
• Likelihood of X, i.e., probability of Observation Sequence X is:
Each trial is identical and independent. Maximum Likelihood of data, requires
us to make and thus, get
the expression for PH
ii x-1H
N
1i
xHH )P -(1P )PL(X,
0HdP
dL
![Page 7: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/7.jpg)
Mathematical Convenience
Take log of the likelihood.
Differentiating w.r.t. PH
To get the expression for , make
N
iHiHi PxPxXLL
1
)1log()1(log);(
H
iN
i H
i
Px
Px
dHdLL
1
11
HP 0HdP
dLL
![Page 8: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/8.jpg)
Equating to 0, expression for PH
H
N
ii
H
N
ii
H P
x
PNx
P
111 1
1
N
xP
N
ii
H
1
![Page 9: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/9.jpg)
Maximum Entropy
Suppose we do not know how to get the MLE, or the likelihood expression is impossible to get, then we use: Maximum Entropy. Example: In problems like co-reference
resolution.
Entropy= To be elaborated later.
)1log()1(log HHHH PPPP
![Page 10: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/10.jpg)
Case for Expectation Maximization
Instead of one coin we toss two coins.Parameters <P, P1, P2>
P = Probability of choosing first coin P1 = Probability of choosing head from first
coin P2 = Probability of choosing head from second
coin
We do not know which coin the observation came from
NxxxxX ,.....,,: 321
![Page 11: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/11.jpg)
EM continued..
Z1, Z2, Z3,…, ZN is the hidden sequence running alongside X1, X2, X3,…, XN
Where, Zi =1, if the ith observation came from coin 1, =0, otherwise
21
321
,,,....,,,
),,Pr();Pr(
PPPzzzzZ
ZXX
N
Z
NN zxzxzxzxY ,......,,,: 332211
![Page 12: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/12.jpg)
Cntd.
We want to work with
Invoke convexity/concavity and expectation of Zi and work with log(Pr(Y;θ))
N
i
zxxzxx iiiiii PPPPPP
ZXPY
1
1122
111 ))1.().1((*))1.(.(
);,();Pr(
));,(log();( Z
ZXPXLL
![Page 13: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/13.jpg)
N
iiii PxPxPzEXLL
1
11 ))1log()1(log)(log([);(
))]1log()1(log)1))(log((1( 22 PxPxPziE ii
Log Likelihood of the Data
![Page 14: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/14.jpg)
IMPORTANT POINTS TO NOTE
Log moves inside the product term. Σ disappears giving rise to E(Zi) in place
of Zi
Differentiate wrt p, p1, p2, equate to 0 and get the results
![Page 15: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/15.jpg)
P, P1, P2
)()(
1
11
i
Ni
iiNi
zExzE
p
)()(
1
12
i
Ni
iiNi
zENxzEM
p M= observed no. of heads
NzE
p iNi )(1
![Page 16: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/16.jpg)
Application of EM: HMM Training
Baum Welch or Forward Backward Algorithm
![Page 17: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/17.jpg)
Key Intuition
Given: Training sequenceInitialization: Probability valuesCompute: Pr (state seq | training seq)
get expected count of transitioncompute rule probabilities
Approach: Initialize the probabilities and recompute them… EM like approach
a
b
a
b
a
b
a
b
q r
![Page 18: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/18.jpg)
Baum-Welch algorithm: counts
String = abb aaa bbb aaa
Sequence of states with respect to input symbols
a, b
a,b
q ra,b
rqrqqqrqrqqrq aaabbbaaabba o/p seq
State seq
a,b
![Page 19: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/19.jpg)
Calculating probabilities from tableTable of counts
T=#statesA=#alphabet symbols
Now if we have a non-deterministic transitions then multiple state seq possible for the given o/p seq (ref. to previous slide’s feature). Our aim is to find expected count through this.
8/3)( bqP b
Src Dest O/P Count
q r a 5
q q b 3
r q a 3
r q b 2
8/5)( rqP a
T
l
A
m
li
jiji
swsc
swscswsPm
kk
1 1)(
)()(
![Page 20: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/20.jpg)
Interplay Between Two Equations
T
l
A
m
lWmi
jWijWi
ssc
sscssPk
k
0 0
)(
)()(
1,0
),,()|()(
,01,0,01,0n
k
k
snn
jWinn
jWi
wSssnWSPssC
wk
No. of times the transitions sisj occurs in the string
![Page 21: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/21.jpg)
Illustration
a:0.67
b:1.0
b:0.17
a:0.16
q r
a:0.04
b:1.0
b:0.48
a:0.48
q r
Actual (Desired) HMM
Initial guess
![Page 22: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/22.jpg)
One run of Baum-Welch algorithm: string ababb
P(path)
q r q r q q 0.00077 0.00154 0.00154 0 0.00077
q r q q q q 0.00442 0.00442 0.00442 0.00442
0.00884
q q q r q q 0.00442 0.00442 0.00442 0.00442
0.00884
q q q q q q 0.02548 0.0 0.000 0.05096
0.07644
Rounded Total 0.035 0.01 0.01 0.06 0.095
New Probabilities (P) 0.06=(0.01/(0.01+0.06+
0.095)
1.0 0.36 0.581
qbq qaq raq qbr a ba ab bb bba
* ε is considered as starting and ending symbol of the input sequence string.
State sequences
Through multiple iterations the probability values will converge.
![Page 23: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/23.jpg)
Computational part (1/2)
ntn
jtkt
it
n
nt snn
jtkt
it
n
snn
jWinn
n
snn
jWinn
jWi
WsSwWsSPWP
WSsSwWsSPWP
WSssnWSPWP
WSssnWSPssC
n
n
k
n
kk
,0,01
,0
,0,01,01
,0
,01,0,01,0,0
,01,0,01,0
)],,,([)(
1
)],,,,([)(
1
)],,(),([)(
1
)],,()|([)(
1,0
1,0
1,0
w0 w1 w2 wk wn-1 wn
S0 S1 S1 … Si Sj … Sn-1 Sn Sn+1
![Page 24: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/24.jpg)
Computational part (2/2)
),1()(),1(
),1()|,(),1(
),1()|,(),1(
)|(),|,(),(
),,,,(
),,,(
0
0
1
0
1
1
0,11,011,0
0,111,0
0,01
jtBswsPitF
jtBsSwWsSPitF
jtBsSwWsSPitF
sSWPsSWwWsSPsSWP
WwWsSsSWP
WwWsSsSP
n
t
ji
n
t
itkt
jt
n
t
itkt
jt
jt
n
tnt
ittkt
jt
itt
n
tntkt
jt
itt
n
tnkt
jt
it
k
w0 w1 w2 wk wn-1 wn
S0 S1 S1 … Si Sj … Sn-1 Sn Sn+1
![Page 25: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/25.jpg)
Discussions1. Symmetry breaking:
Example: Symmetry breaking leads to no change in initial values
2 Struck in Local maxima3. Label bias problem
Probabilities have to sum to 1.Values can rise at the cost of fall of values for others.
s
ss
b:1.0
b:0.5
a:0.5
a:1.0
s
ss
a:0.5
b:0.5
a:0.25
a:0.5b:0.5
a:0.25
b:0.25
b:0.5
Desired Initialized
![Page 26: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/26.jpg)
Another application of EM
WSD
Mitesh Khapra, Salil Joshi and PushpakBhattacharyya, It takes two to Tango: A Bilingual Unsupervised Approach for estimating Sense Distributions using Expectation Maximization, 5th International Joint Conference on Natural Language Processing (IJCNLP 2011), Chiang Mai, Thailand, November 2011.
![Page 27: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/27.jpg)
Definition: WSD
Given a context: Get “meaning” of
a set of words (targeted wsd) or all words (all words wsd)
The “Meaning” is usually given by the id of senses in a sense repository usually the wordnet
![Page 28: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/28.jpg)
Example: “operation” (from Princeton Wordnet) Operation, surgery, surgical operation, surgical procedure, surgical
process -- (a medical procedure involving an incision with instruments; performed to repair damage or arrest disease in a living body; "they will schedule the operation as soon as an operating room is available"; "he died while undergoing surgery") TOPIC->(noun) surgery#1
Operation, military operation -- (activity by a military or naval force (as a maneuver or campaign); "it was a joint operation of the navy and air force") TOPIC->(noun) military#1, armed forces#1, armed services#1, military machine#1, war machine#1
mathematical process, mathematical operation, operation --((mathematics) calculation by mathematical methods; "the problems at the end of the chapter demonstrated the mathematical processes involved in the derivation"; "they were learning the basic operations of arithmetic") TOPIC->(noun) mathematics#1, math#1, maths#1
![Page 29: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/29.jpg)
Hindi Wordnet
Dravidian Language Wordnet
North East Language Wordnet
Marathi Wordnet
Sanskrit Wordnet
EnglishWordnet
Bengali Wordnet
Punjabi Wordnet
KonkaniWordnet
UrduWordnet
WSD for ALL Indian languages: Critical resource: INDOWORDNET
Gujarati Wordnet
Oriya Wordnet
Kashmiri Wordnet
![Page 30: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/30.jpg)
Synset Based Multilingual Dictionary
Expansion approach for creating wordnets [Mohanty et. al., 2008]
Instead of creating from scratch link to the synsets of existing wordnet
Relations get borrowed from existing wordnet
S1
S3 S4
S6
S5
S7
S2
S1
S3 S4
S6
S5
S7
S2
S1
S3 S4
S6
S5
S7
S2 A sample entry from the MultiDict
Hindi Marathi
![Page 31: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/31.jpg)
Hypothesis
Sense distributions across languages is invariant!! Proportion of times a sense appears in a
language is uniform across languages!
E.g., proportion of times the sense of “sun” appears in any language through “sun” and its synonyms remains the same!
![Page 32: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/32.jpg)
ESTIMATING SENSE DISTRIBUTIONS
If sense tagged Marathi corpus were available, we could have estimated
But such a corpus is not available
![Page 33: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/33.jpg)
EM for estimating sense distributions
‘
Problem: ‘galaa’ itself is ambiguous Its raw count cannot be used as it
is
Solution: Its count should be weighted by
![Page 34: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/34.jpg)
Word correspondencesSense inEnglish
Smar
(Marathisensenumber)
wordsmar
(partial list)Shin=π(Smar)(projectedHindi sensenumber)
wordsmar
(partial listof words inprojectedHindisense)
Neck 1 maan, greeva 1 gardan, galaa
Respect 2 maan,satkaar,sanmaan
3 izzat, aadar
Voice 3 awaaz, swar 2 galaa
![Page 35: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/35.jpg)
EM for estimating sense distributions
‘
M-Step
E-Step
)().#|()().#|()().#|()().#|()().#|()().#|(
)|(
1111
11
1
swarswarSPawaajawaajSPgreevagreevaSPmaanmaanSPgreevagreevaSPmaanmaanSP
galaSP
marmarmarmar
marmar
hin
)().#|()().#|()().#|()().#|()().#|()().#|(
)|(
2211
11
1
izzatizzatSPaadaraadarSPgalagalaSPgardangardanSPgalagalaSPgardangardanSP
maanSP
hinhinhinhin
hinhin
mar
![Page 36: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/36.jpg)
General Algo
stepExxSP
vvSPuSP L
jLSxS
LiL
SvLi
LjL
Lj
LiL
)2()()).#|((
)().#|)(()|(
1
21
21
1
21
21
)(
)(
)(
)3()()).#|((
)().#|)(()|(
1
2
2
2
12
22
2
11
22
)(
)(
LiL
Lk
LmL
SyS
LkL
SvLk
SSwhere
stepMyySP
vvSPvSP
LmL
Lm
LiL
![Page 37: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/37.jpg)
Results Algorithm
MarathiP % R % F %
IWSD (training onself corpora; noparameterprojection) 81.29 80.42 80.85
IWSD (training onHindi and projectingparameters forMarathi) 73.45 70.33 71.86
EM (no sensecorpora in eitherHindi or Marathi) 68.57 67.93 68.25
Wordnet Baseline 58.07 58.07 58.07
![Page 38: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/38.jpg)
Results & Discussions
Performance of projection using manual cross linkages is within 7% of Self-Training
Performance of projection using probabilistic cross linkages is within 10-12% of Self-Training – remarkable since no additional cost incurred in target language
Both MCL and PCL give 10-14% improvement over Wordnet First Sense Baseline
Not prudent to stick to knowledge based and unsupervised approaches –they come nowhere close to MCL or PCL
Manual Cross LinkagesProbabilistic Cross LinkagesSkyline - self training data is available
Wordnet first sense baseline
S-O-T-A Knowledge Based ApproachS-O-T-A Unsupervised Approach
Our values
![Page 39: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/39.jpg)
Delving deeper into EM
![Page 40: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/40.jpg)
Some Useful mathematical concepts
Convex/ concave functions Jensen’s inequality Kullback–Leibler distance/divergence
![Page 41: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/41.jpg)
)( 1xf
)( 2xf
)()1()( 21 xfxf
))1(( 21 xxf
21 )1( xxz
![Page 42: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/42.jpg)
Criteria for convexity
A function f(x) is said to be convex inthe interval [a,b] iff
)()1()())1(( 2121 xfxfxxf
],[,
21
21
baxxxx
![Page 43: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/43.jpg)
Jensen’s inequality
For any convex function f(x)
n
iii
n
iii xfxf
11)()(
Where 11
n
ii and 10, ii
![Page 44: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/44.jpg)
Proof of Jensen´s inequality
Method:- By induction on N Base case:-
ally truef(x),trivif(x)λλ
λf(x)x)f(λN
i
11 where.
1
![Page 45: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/45.jpg)
Another base case
N = 2
convex is f(x) since )()1()(1 since ))1((
)(
2111
212111
2211
xfxfxxf
xxf
![Page 46: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/46.jpg)
Hypothesis
n
iii
n
iii xfxf
kN
11
)()( i.e
for trueSuppose
![Page 47: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/47.jpg)
Induction Step
1
)()(
given
)()(
thatShow
1321
11
1
1
1
1
kk
k
iii
k
iii
k
iii
k
iii
xfxf
xfxf
![Page 48: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/48.jpg)
Proof
)1( where )()()1(
convexityBy )())1(
()1(
))1(
)1((
)(
111
11
111 1
1
111 1
1
11332211
k
iikk
k
iiik
kk
k
i k
iik
kk
k
i k
iik
kk
xfxf
xfxf
xxf
xxxxf
![Page 49: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/49.jpg)
Continued...
Examine each µi
)1()1(
)1(
)1()1()1()1(
1
1
1
321
11
3
1
2
1
1
3211
k
k
k
k
k
k
kkk
k
k
ii
![Page 50: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/50.jpg)
Continued...
Therefore,
proved is inequality Jensen´s Thus
)()(
stepinduction at theFinally
)()(
)()()1(
)()()1(
1
1
1
1
111
111
1
111
1
i
k
ii
k
iii
kki
k
ii
kki
k
iik
kk
k
iiik
xfxf
xfxf
xfxf
xfxf
![Page 51: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/51.jpg)
KL -divergence
We will do the discrete form of probability distribution.
Given two probability distribution P,Q on the random variable
X : x1,x2,x3...xN
P:p1=p(x1 ), p2=p(x2), ... pn=p(xn) Q:q1=q(x1 ), q2=q(x2), ... qn=q(xn)
![Page 52: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/52.jpg)
KLD definition
Q)(EP)(E DKL(P,Q)
DD
q,p qpp D KL(P,Q)
pp
iii
iN
ii
loglog
as written also0 and cassymmetri is
11log1
![Page 53: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/53.jpg)
Proof: KLD>=0
)x(pxp
],[x pqp
qpp
qpp KL(P,Q)
i
N
iii
N
ii
i
iN
ii
i
iN
ii
i
iN
ii
loglog So
0in convex islog
loglog
-:Proof
0log
11
11
1
![Page 54: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/54.jpg)
Proof cntd.
Apply Jensen’s inequality
10log
loglog
loglog So
11
11
11
N
ii
i
iN
ii
i
iN
ii
N
ii
i
iN
ii
i
iN
ii
q qpp
qppq
)pq(p
pqp
![Page 55: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/55.jpg)
Convexity of –log x
1 1)1(
1)1(
1)1(
)1(
log)1(log))1(log(..
)log)(1()log())1(log(
2
1
1
2
2
1
1
112
2
1
12121
2121
2121
1
1
1
xxy
yy
xx
xx
xx
xx
xxxx
xxxxei
xxxx
![Page 56: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/56.jpg)
Interesting problem
Try to prove:-
21 2121
21
2211 ww ww xxww
xwxw
![Page 57: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/57.jpg)
2nd definition of convexity
Theorem:
.convex is log So.in convex is then ,0
and in abledifferenti twiceis )( If
x-[a,b]f(x)[a,b] x (x)f
[a,b]xf''
![Page 58: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/58.jpg)
Lemma 1
],[ s t,and s t, ),()(then ],[in 0)( If
''
''
batssftfbaxf
a s z t b
![Page 59: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/59.jpg)
Mean Value Theorem
npm(p) m)f(nf(m)f(n)xf
(z,a)s (s) a)f(zf(a)f(z)
'
'
where)(function any For
![Page 60: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/60.jpg)
Alternative form of z
Add –λz to both sides
21 1 λ)x(λxz
)xλ(zz)λ)(x(λ)x(z)λ(xλ)z(
12
21
111
![Page 61: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/61.jpg)
Alternative form of convexity
Add –λf(z) to both sides
)λ)f(x()λf(x)λ)x(f(λ( 2121 11
)λ)f(x(f(z)))λ(f(xλ)f(z)()λ)f(x(f(z)))λ(f(xλ)f(z)(
λf(z))λ)f(x()λf(xλf(z)f(z)
21
21
21
1111
1
![Page 62: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/62.jpg)
Proof: second derivative >=0 implies convexity (1/2)We have that,
(2) ][z]-)[x-(1(1) )]()([)]()()[1(
)()1()()(
)1(
12
12
21
21
xzxfzfzfxf
xfxfzf
xxz
![Page 63: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/63.jpg)
Second derivative >=0 implies convexity (2/2)
(2) Is equivalent to
For some s and t , where
Now since f’’(x) >=0
)(')(' sftf
Combining this with (1), the result is proved
))(()).(()1( 12 xzsfxtf
21 xtzsx
![Page 64: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/64.jpg)
Why all this In EM, we maximize the expectation of
log likelihood of the data Log is a concave function We have to take iterative steps to get
to the maximum There are two unknown values: Z
(unobserved data) and θ (parameters) From θ, get new value of Z (E-step) From Z, get new value of θ (M-step)
![Page 65: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/65.jpg)
How to change θ How to choose the next θ? Take
Where,X: observed dataZ: unobserved dataΘ: parameterLL(X,Z:θn): log likelihood of complete
data with parameter value at θn
This is in lieu of, for example, gradient ascent
θnΘ
At every step LL(.) willIncrease, ultimatelyreaching local/globalmaximum
)):,():,((maxarg nZXLLZXLL
![Page 66: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/66.jpg)
Why expectation of log likelihood? (1/2) P(X:θ) may not be a convenient mathematical
expression Deal with P(X,Z:θ), marginalized over Z Log(ΣZP(X,Z:θ)) is mathematically processed with
multiplying by P(Z|X: θn) which for each Z is between 0 and 1 and sums to 1
Then Jensen inequality will give
));|(
);,(log();|(
at y probabilit theis );|( where ),;|(by devide andmultiply
));|(
);,();|(log());,(log(
nzn
nn
n
z n
n
z
XZPZXPXZP
XZPXZP
XZPZXPXZPZXP
![Page 67: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/67.jpg)
Why expectation of log likelihood? (2/2)
Z. w.r.t.data complete of liklihood log ofn expectatio theis (.) where))),;,((log(
));,(log();|(
));();((maxarg So,
));,();,(log();|(
1);|( since
));().;|(
);,(log();|(
));(log());|(
);,();|(log(
));(log());,(log();();(
zz
Zn
n
nZn
Zn
nnZn
nZ n
n
nZ
n
EZXPE
ZXPXZP
XLLXLLZXPZXPXZP
XZPXPXZP
ZXPXZP
XPXZP
ZXPXZP
XPZXPXLLXLL
![Page 68: CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2014/cs344-lect22-25-em-baum...CS344: Introduction to Artificial Intelligence (associated lab: ... Parameters](https://reader033.fdocuments.in/reader033/viewer/2022051802/5afc7a4b7f8b9aa34d8c22d6/html5/thumbnails/68.jpg)
Why expectation of Z?
If the log likelihood is a linear function of Z, then the expectation can be carried inside of the log likelihood and E(Z) is computed
The above is true when the hidden variables form a mixture of distributions (e..g, in tosses of two coins), and
Each distribution is an exponential distribution like multinomial/normal/poisson