Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured...
-
Upload
shanna-wilkins -
Category
Documents
-
view
223 -
download
0
Transcript of Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured...
![Page 1: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/1.jpg)
Relation Extraction
William Cohen10-18
![Page 2: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/2.jpg)
Kernels vs Structured Output Spaces
• Two kinds of structured learning:– HMMs, CRFs, VP-trained HMM, structured
SVMs, stacked learning, ….: the output of the learner is structured.
• Eg for linear-chain CRF, the output is a sequence of labels—a string Yn
– Bunescu & Mooney (EMNLP, NIPS): the input to the learner is structured.
• EMNLP: structure derived from a dependency graph.
New!
![Page 3: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/3.jpg)
![Page 4: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/4.jpg)
x1 × x2 × x3 × x4 × x5
= 4*1*3*1*4 = 48 featuresx1 x2 x3 x4 x5
…
K( x1 × … × xn, y1 × … × yn ) =
( x1 × … × xn ) ∩ (y1 × … × yn)
x H(x)
![Page 5: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/5.jpg)
and the NIPS paper…
• Similar representation for relation instances: x1 × … × xn where each xi is a set….
• …but instead of informative dependency path elements, the x’s just represent adjacent tokens.
• To compensate: use a richer kernel
![Page 6: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/6.jpg)
Background: edit distances
![Page 7: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/7.jpg)
Levenshtein distance - example
• distance(“William Cohen”, “Willliam Cohon”)
W I L L I A M _ C O H E N
W I L L L I A M _ C O H O NC C C C I C C C C C C C S C
0 0 0 0 1 1 1 1 1 1 1 1 2 2
s
t
op
cost
alignment
![Page 8: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/8.jpg)
Levenshtein distance - example
• distance(“William Cohen”, “Willliam Cohon”)
W I L L I A M _ C O H E N
W I L L L I A M _ C O H O NC C C C I C C C C C C C S C
0 0 0 0 1 1 1 1 1 1 1 1 2 2
s
t
op
cost
alignment
gap
![Page 9: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/9.jpg)
Computing Levenshtein distance - 1 D(i,j) = score of best alignment from s1..si to t1..tj
= min
D(i-1,j-1), if si=tj //copyD(i-1,j-1)+1, if si!=tj //substituteD(i-1,j)+1 //insertD(i,j-1)+1 //delete
![Page 10: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/10.jpg)
Computing Levenstein distance - 2D(i,j) = score of best alignment from s1..si to t1..tj
= minD(i-1,j-1) + d(si,tj) //subst/copyD(i-1,j)+1 //insertD(i,j-1)+1 //delete
(simplify by letting d(c,d)=0 if c=d, 1 else)
also let D(i,0)=i (for i inserts) and D(0,j)=j
![Page 11: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/11.jpg)
Computing Levenstein distance - 3
D(i,j)= minD(i-1,j-1) + d(si,tj) //subst/copyD(i-1,j)+1 //insertD(i,j-1)+1 //delete
C O H E NM 1 2 3 4 5C 1 2 3 4 5C 2 2 3 4 5O 3 2 3 4 5H 4 3 2 3 4N 5 4 3 3 3 = D(s,t)
M ~ __
C ~ __
C ~ C
O ~ O
H ~ H
__ ~ E
N ~ N
![Page 12: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/12.jpg)
Computing Levenstein distance - 3
D(i,j)= minD(i-1,j-1) + d(si,tj) //subst/copyD(i-1,j)+1 //insertD(i,j-1)+1 //delete
C O H E NM 1 2 3 4 5C 1 2 3 4 5C 2 2 3 4 5O 3 2 3 4 5H 4 3 2 3 4N 5 4 3 3 3 = D(s,t)
![Page 13: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/13.jpg)
Computing Levenshtein distance – 4
D(i,j) = minD(i-1,j-1) + d(si,tj) //subst/copyD(i-1,j)+1 //insertD(i,j-1)+1 //delete
C O H E NM 1 2 3 4 5
C 1 2 3 4 5
C 2 3 3 4 5
O 3 2 3 4 5
H 4 3 2 3 4
N 5 4 3 3 3
A trace indicates where the min value came from, and can be used to find edit operations and/or a best alignment (may be more than 1)
![Page 14: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/14.jpg)
Computing Levenstein distance - 3
D(i,j)= minD(i-1,j-1) + d(si,tj) //subst/copyD(i-1,j)+1 //insertD(i,j-1)+1 //delete
C O H E NM 1 2 3 4 5C 1 2 3 4 5C 2 2 3 4 5O 3 2 3 4 5H 4 3 2 3 4N 5 4 3 3 3 = D(s,t)
![Page 15: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/15.jpg)
Affine gap distances
• Levenshtein fails on some pairs that seem quite similar:
William W. Cohen
William W. ‘Don’t call me Dubya’ Cohen
![Page 16: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/16.jpg)
Affine gap distances - 2
• Idea: – Current cost of a “gap” of n characters: nG– Make this cost: A + (n-1)B, where A is cost of
“opening” a gap, and B is cost of “continuing” a gap.
![Page 17: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/17.jpg)
Computing Levenstein distance - variant
D(i,j) = score of best alignment from s1..si to t1..tj
= maxD(i-1,j-1) + d(si,tj) //subst/copyD(i-1,j)-1 //insertD(i,j-1)-1 //delete
d(x,x) = 2d(x,y) = -1 if x!=y
= minD(i-1,j-1) + d(si,tj) //subst/copyD(i-1,j)+1 //insertD(i,j-1)+1 //delete
d(x,x) = 0d(x,y) = 1 if x!=y
![Page 18: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/18.jpg)
Affine gap distances - 3
D(i,j) = maxD(i-1,j-1) + d(si,tj) //subst/copyD(i-1,j)-1 //insertD(i,j-1)-1 //delete
IS(i,j) = max D(i-1,j) - AIS(i-1,j) - B
IT(i,j) = max D(i,j-1) - AIT(i,j-1) - B
Best score in which si is aligned with a ‘gap’
Best score in which tj is aligned with a ‘gap’
D(i-1,j-1) + d(si,tj)
IS(i-1,j-1) + d(si,tj)
IT(i-1,j-1) + d(si,tj)
![Page 19: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/19.jpg)
Affine gap distances - 4
-B
-B
-d(si,tj)D
IS
IT-d(si,tj)
-d(si,tj)
-A
-A
![Page 20: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/20.jpg)
Back to subsequence kernels
![Page 21: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/21.jpg)
Subsequence kernel
set of all sparse subsequences u of x1 × … × xn with each u downweighted according to sparsity
Relaxation of old kernel: 1. We don’t have to match everywhere, just at selected locations2. For every position in the pattern, we get a penalty of λ
To pick a “feature” inside (x1 … xn)’ Pick a subset of locations i=i1,…,ik and then Pick a feature value in each location1. In the preprocessed vector x’ weight every feature for i by λlength(i) = λik-i1+1
![Page 22: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/22.jpg)
Subsequence kernel
][],[ :,
)()(),(jiji,
ji
tusuu
lengthlengthtsK
or
![Page 23: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/23.jpg)
Example1-Nop7 2-binds 3-readily 4-to 5-the 6-ribosomal 7-protein 8-YTM1
1-Erb1 2-binds 3-to 4-YTM1
,...PROTto,VERB,PROT,,PROTto,binds,PROT,YTM1to,binds,Erb1,][YTM1to,binds,Nop7,][
4,3,2,18,4,2,1
),(][],[ :,
)()(
uts
tsKtusuu
lengthlength
ji
ji
jiji,
ji
![Page 24: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/24.jpg)
Example1-Nop7 2-binds 3-readily 4-to 5-the 6-ribosomal 7-protein 8-YTM1
1-Erb1 2-binds 3-to 4-YTM1
,...PROTVERB,PROT,,PROTbinds,PROT,YTM1binds,Erb1,][YTM1binds,Nop7,][
4,2,18,2,1
),(][],[ :,
)()(
uts
tsKtusuu
lengthlength
ji
ji
jiji,
ji
![Page 25: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/25.jpg)
Subsequence kernels w/o features
• Example strings:– “Elvis Presley was born on Jan 8” s1) PERSON was born on DATE.– “William Cohen was born in New York City on April 6” s2) PERSON was born in LOCATION on DATE.
• Plausible pattern: – PERSON was born … on DATE.
• What we’ll actually learn:– u = PERSON … was … born … on … DATE.– u matches s if exists i=i1,…,in so that s[i]=s[i1]…s[in]=u– For string s1, i=1234. For string s2, i=12367
i=i1,…,in are increasing indices
in s
[Lohdi et al, JMLR 2002]
![Page 26: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/26.jpg)
Subsequence kernels w/o features
s1) PERSON was born on DATE. s2) PERSON was born in LOCATION on DATE.
• Pattern: – u = PERSON … was … born … on … DATE.– u matches s if exists i=i1,…,in so that s[i]=s[i1]…s[in]=u– For string s1, i=1234. For string s2, i=12367
• How to we say that s1 matches better than s2?– Weight a match of s to u by λlength(i) where length(i)=in-i1+1
• Now let’s define K(s,t) = the sum over all u that match both s and t of matchWeight(u,s)*matchweight(u,t)
![Page 27: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/27.jpg)
K’i(s,t) = “we’re paying the λ penalty now” …. #patterns u of length i that match s and t where the pattern extends to the end of s.
These recursions allow dynamic programming
![Page 28: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/28.jpg)
Subsequence kernel with features
set of all sparse subsequences u of x1 × … × xn with each u downweighted according to sparsity
Relaxation of old kernel: 1. We don’t have to match everywhere, just at selected locations2. For every position we decide to match at, we get a penalty of λ
To pick a “feature” inside (x1 … xn)’1. Pick a subset of locations i=i1,…,ik and then2. Pick a feature value in each location3. In the preprocessed vector x’ weight every feature for i by λlength(i) = λik-i1+1
![Page 29: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/29.jpg)
Subsequence kernel w/ features
][],[ :,
)()(),(jiji,
ji
tusuu
lengthlengthtsK
or
Where c(x,y) = Number of ways x and y match (i.e number of common features)
![Page 30: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/30.jpg)
![Page 31: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/31.jpg)
all j
* c(x,t[j])
Number of ways x and t[j] match (i.e number of common features)
![Page 32: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/32.jpg)
all j
* c(x,t[j])
* c(x,t[j])
![Page 33: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/33.jpg)
Additional details
• Special domain-specific tricks for combining the subsequences for what matches in the fore, aft, and between sections of a relation-instance pair.– Subsequences are of length less than 4.
• Is DP needed for this now?– Count fore-between, between-aft, and
between subsequences separately.
![Page 34: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/34.jpg)
ResultsProtein-protein interaction
![Page 35: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/35.jpg)
And now a further extension…• Suppose we don’t have annotated data, but we
do know which proteins interact– This is actually pretty reasonable
• We can find examples of sentences with p1,p2 that don’t interact, and be pretty sure they are negative.
• We can find example strings for interacting p1, p2, eg. “<p1> phosphorilates <p2>”, but we can’t be sure they are all positive examples of a relation.
![Page 36: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/36.jpg)
And now a further extension…
• Multiple instance learning:– Instance is a bag {x1,…,xn},y where each xi is
a vector of features and• If y is positive, some of the xi’s have a positive
label• If y is negative, none of the xi’s have a positive
label.– Approaches: EM, SVM techniques– Their approach: treat all xi’s as positive
examples but downweight the cost of misclassifying them.
![Page 37: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/37.jpg)
![Page 38: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/38.jpg)
Intercept term
Slack variables
Lp = total size of pos bagsLn = total size of negative bags
cp < 0.5 is a parameter
![Page 39: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/39.jpg)
Datasets
Collected with Google search queries, then
sentence-segmented.
This is terrible data since there lot of
spurious correlations with Google, Adobe, …
![Page 40: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/40.jpg)
Datasets
• Fix: downweight words in patterns u if they are strongly correlated with particular bags (eg the Google/Youtube bag).
![Page 41: Relation Extraction William Cohen 10-18. Kernels vs Structured Output Spaces Two kinds of structured learning: –HMMs, CRFs, VP-trained HMM, structured.](https://reader037.fdocuments.in/reader037/viewer/2022103010/5a4d1aeb7f8b9ab05997ae24/html5/thumbnails/41.jpg)
Results