Infusion of Labeled Data into Distant Supervision for ...pershina/acl2014.pdf · Relation...

7
Infusion of Labeled Data into Distant Supervision for Relation Extraction Maria Pershina + Bonan Min ˆ Wei Xu # Ralph Grishman + + New York University, New York, NY {pershina, grishman}@cs.nyu.edu ˆ Raytheon BBN Technologies, Cambridge, MA [email protected] # University of Pennsylvania, Philadelphia, PA [email protected] Abstract Distant supervision usually utilizes only unlabeled data and existing knowledge bases to learn relation extraction models. However, in some cases a small amount of human labeled data is available. In this paper, we demonstrate how a state-of-the- art multi-instance multi-label model can be modified to make use of these reli- able sentence-level labels in addition to the relation-level distant supervision from a database. Experiments show that our ap- proach achieves a statistically significant increase of 13.5% in F-score and 37% in area under the precision recall curve. 1 Introduction Relation extraction is the task of tagging semantic relations between pairs of entities from free text. Recently, distant supervision has emerged as an important technique for relation extraction and has attracted increasing attention because of its effec- tive use of readily available databases (Mintz et al., 2009; Bunescu and Mooney, 2007; Snyder and Barzilay, 2007; Wu and Weld, 2007). It automat- ically labels its own training data by heuristically aligning a knowledge base of facts with an unla- beled corpus. The intuition is that any sentence which mentions a pair of entities (e 1 and e 2 ) that participate in a relation, r, is likely to express the fact r(e 1 ,e 2 ) and thus forms a positive training ex- ample of r. One of most crucial problems in distant super- vision is the inherent errors in the automatically generated training data (Roth et al., 2013). Ta- ble 1 illustrates this problem with a toy exam- ple. Sophisticated multi-instance learning algo- rithms (Riedel et al., 2010; Hoffmann et al., 2011; Most of the work was done when this author was at New York University Surdeanu et al., 2012) have been proposed to ad- dress the issue by loosening the distant supervision assumption. These approaches consider all men- tions of the same pair (e 1 ,e 2 ) and assume that at- least-one mention actually expresses the relation. On top of that, researchers further improved per- formance by explicitly adding preprocessing steps (Takamatsu et al., 2012; Xu et al., 2013) or addi- tional layers inside the model (Ritter et al., 2013; Min et al., 2013) to reduce the effect of training noise. True Positive ... to get information out of captured al-Qaida leader Abu Zubaydah. False Positive ...Abu Zubaydah and former Taliban leader Jalaluddin Haqqani ... False Negative ...Abu Zubaydah is one of Osama bin Laden’s senior operational planners... Table 1: Classic errors in the training data gener- ated by a toy knowledge base of only one entry personTitle(Abu Zubaydah, leader). However, the potential of these previously pro- posed approaches is limited by the inevitable gap between the relation-level knowledge and the instance-level extraction task. In this paper, we present the first effective approach, Guided DS (distant supervision), to incorporate labeled data into distant supervision for extracting relations from sentences. In contrast to simply taking the union of the hand-labeled data and the corpus la- beled by distant supervision as in the previous work by Zhang et al. (2012), we generalize the labeled data through feature selection and model this additional information directly in the latent variable approaches. Aside from previous semi- supervised work that employs labeled and unla- beled data (Yarowsky, 2013; Blum and Mitchell, 1998; Collins and Singer, 2011; Nigam, 2001, and others), this is a learning scheme that combines unlabeled text and two training sources whose quantity and quality are radically different (Liang et al., 2009). To demonstrate the effectiveness of our pro-

Transcript of Infusion of Labeled Data into Distant Supervision for ...pershina/acl2014.pdf · Relation...

Page 1: Infusion of Labeled Data into Distant Supervision for ...pershina/acl2014.pdf · Relation extraction is the task of ... formance by explicitly adding preprocessing steps (Takamatsu

Infusion of Labeled Data into Distant Supervision for Relation Extraction

Maria Pershina

+

Bonan Min

ˆ ⇤Wei Xu

#

Ralph Grishman

+

+New York University, New York, NY{pershina, grishman}@cs.nyu.edu

ˆRaytheon BBN Technologies, Cambridge, [email protected]

#University of Pennsylvania, Philadelphia, [email protected]

Abstract

Distant supervision usually utilizes onlyunlabeled data and existing knowledgebases to learn relation extraction models.However, in some cases a small amountof human labeled data is available. In thispaper, we demonstrate how a state-of-the-art multi-instance multi-label model canbe modified to make use of these reli-able sentence-level labels in addition tothe relation-level distant supervision froma database. Experiments show that our ap-proach achieves a statistically significantincrease of 13.5% in F-score and 37% inarea under the precision recall curve.

1 Introduction

Relation extraction is the task of tagging semanticrelations between pairs of entities from free text.Recently, distant supervision has emerged as animportant technique for relation extraction and hasattracted increasing attention because of its effec-tive use of readily available databases (Mintz etal., 2009; Bunescu and Mooney, 2007; Snyder andBarzilay, 2007; Wu and Weld, 2007). It automat-ically labels its own training data by heuristicallyaligning a knowledge base of facts with an unla-beled corpus. The intuition is that any sentencewhich mentions a pair of entities (e1 and e2) thatparticipate in a relation, r, is likely to express thefact r(e1,e2) and thus forms a positive training ex-ample of r.

One of most crucial problems in distant super-vision is the inherent errors in the automaticallygenerated training data (Roth et al., 2013). Ta-ble 1 illustrates this problem with a toy exam-ple. Sophisticated multi-instance learning algo-rithms (Riedel et al., 2010; Hoffmann et al., 2011;

⇤ Most of the work was done when this author was atNew York University

Surdeanu et al., 2012) have been proposed to ad-dress the issue by loosening the distant supervisionassumption. These approaches consider all men-tions of the same pair (e1,e2) and assume that at-least-one mention actually expresses the relation.On top of that, researchers further improved per-formance by explicitly adding preprocessing steps(Takamatsu et al., 2012; Xu et al., 2013) or addi-tional layers inside the model (Ritter et al., 2013;Min et al., 2013) to reduce the effect of trainingnoise.

True Positive ... to get information out of capturedal-Qaida leader Abu Zubaydah.

False Positive ...Abu Zubaydah and former Talibanleader Jalaluddin Haqqani ...

False Negative ...Abu Zubaydah is one of Osama binLaden’s senior operational planners...

Table 1: Classic errors in the training data gener-ated by a toy knowledge base of only one entrypersonTitle(Abu Zubaydah, leader).

However, the potential of these previously pro-posed approaches is limited by the inevitablegap between the relation-level knowledge and theinstance-level extraction task. In this paper, wepresent the first effective approach, Guided DS(distant supervision), to incorporate labeled datainto distant supervision for extracting relationsfrom sentences. In contrast to simply taking theunion of the hand-labeled data and the corpus la-beled by distant supervision as in the previouswork by Zhang et al. (2012), we generalize thelabeled data through feature selection and modelthis additional information directly in the latentvariable approaches. Aside from previous semi-supervised work that employs labeled and unla-beled data (Yarowsky, 2013; Blum and Mitchell,1998; Collins and Singer, 2011; Nigam, 2001, andothers), this is a learning scheme that combinesunlabeled text and two training sources whosequantity and quality are radically different (Lianget al., 2009).

To demonstrate the effectiveness of our pro-

Page 2: Infusion of Labeled Data into Distant Supervision for ...pershina/acl2014.pdf · Relation extraction is the task of ... formance by explicitly adding preprocessing steps (Takamatsu

Guideline g = {gi|i = 1, 2, 3}: Relation r(g)

types of entities, dependency path, span word (optional)person person, nsubj ! dobj, married personSpouseperson organization, nsubj ! prep of , became personMemberOforganization organization, nsubj ! prep of , company organizationSubsidiariesperson person, poss! appos, sister personSiblingsperson person, poss! appos, father personParentsperson title, nn personTitleorganization person, prep of ! appos! organizationTopMembersEmployeesperson cause, nsubj ! prep of personCauseOfDeathperson number, appos personAgeperson date, nsubjpass! prep on num personDateOfBirth

Table 2: Some examples from the final set G of extracted guidelines.

posed approach, we extend MIML (Surdeanu etal., 2012), a state-of-the-art distant supervisionmodel and show a significant improvement of13.5% in F-score on the relation extraction bench-mark TAC-KBP (Ji and Grishman, 2011) dataset.While prior work employed tens of thousands ofhuman labeled examples (Zhang et al., 2012) andonly got a 6.5% increase in F-score over a logisticregression baseline, our approach uses much lesslabeled data (about 1/8) but achieves much higherimprovement on performance over stronger base-lines.

2 The Challenge

Simply taking the union of the hand-labeled dataand the corpus labeled by distant supervision is noteffective since hand-labeled data will be swampedby a larger amount of distantly labeled data. Aneffective approach must recognize that the hand-labeled data is more reliable than the automaticallylabeled data and so must take precedence in casesof conflict. Conflicts cannot be limited to thosecases where all the features in two examples arethe same; this would almost never occur, becauseof the dozens of features used by a typical relationextractor (Zhou et al., 2005). Instead we proposeto perform feature selection to generalize humanlabeled data into training guidelines, and integratethem into latent variable model.

2.1 Guidelines

The sparse nature of feature space dilutes the dis-criminative capability of useful features. Giventhe small amount of hand-labeled data, it is im-portant to identify a small set of features that aregeneral enough while being capable of predictingquite accurately the type of relation that may holdbetween two entities.

We experimentally tested alternative featuresets by building supervised Maximum Entropy(MaxEnt) models using the hand-labeled data (Ta-ble 3), and selected an effective combination ofthree features from the full feature set used by Sur-deanu et al., (2011):

• the semantic types of the two arguments (e.g.person, organization, location, date, title, ...)

• the sequence of dependency relations along thepath connecting the heads of the two argumentsin the dependency tree.

• a word in the sentence between the two argu-ments

These three features are strong indicators of thetype of relation between two entities. In somecases the semantic types of the arguments alonenarrows the possibilities to one or two relationtypes. For example, entity types such as personand title often implies the relation personTitle.Some lexical items are clear indicators of partic-ular relations, such as “brother” and “sister” for asibling relationship

We extract guidelines from hand-labeled data.Each guideline g={gi|i=1,2,3} consists of a pairof semantic types, a dependency path, and option-ally a span word and is associated with a partic-ular relation r(g). We keep only those guidelines

Model Precision Recall F-scoreMaxEnt

all 18.6 6.3 9.4MaxEnt

two 24.13 10.75 14.87MaxEnt

three 40.27 12.40 18.97

Table 3: Performance of a MaxEnt, trained onhand-labeled data using all features (Surdeanu etal., 2011) vs using a subset of two (types of en-tities, dependency path), or three (adding a spanword) features, and evaluated on the test set.

Page 3: Infusion of Labeled Data into Distant Supervision for ...pershina/acl2014.pdf · Relation extraction is the task of ... formance by explicitly adding preprocessing steps (Takamatsu

which make the correct prediction for all and atleast k=3 examples in the training corpus (thresh-old 3 was obtained by running experiments on thedevelopment dataset). Table 2 shows some exam-ples in the final set G of extracted guidelines.

3 Guided DS

Our goal is to jointly model human-labeled groundtruth and structured data from a knowledge basein distant supervision. To do this, we extend theMIML model (Surdeanu et al., 2012) by adding anew layer as shown in Figure 1.

The input to the model consists of (1) distantlysupervised data, represented as a list of n bags1

with a vector y

i

of binary gold-standard labels, ei-ther Positive(P ) or Negative(N) for each rela-tion r2R; (2) generalized human-labeled groundtruth, represented as a set G of feature conjunc-tions g={gi|i=1,2,3} associated with a unique re-lation r(g). Given a bag of sentences, x

i

, whichmention an ith entity pair (e1, e2), our goal is tocorrectly predict which relation is mentioned ineach sentence, or NR if none of the relations underconsideration are mentioned. The vector z

i

con-tains the latent mention-level classifications for theith entity pair. We introduce a set of latent vari-ables h

i

which model human ground truth for eachmention in the ith bag and take precedence overthe current model assignment z

i

.

G

|R|

|xi|n

zi

hi

yi

xi����

���

{relationlevel

mentionlevel

Figure 1: Plate diagram of Guided DS

Let i, j be the index in the bag and the men-tion level, respectively. We model mention-level extraction p(zij |xij ;wz), human relabel-ing hij(xij , zij) and multi-label aggregationp(y

ri |hi;wy

). We define:• y

ri 2{P, N} : r holds for the ith bag or not.

• xij is the feature representation of the jth rela-tion mention in the ith bag. We use the same setof features as in Surdeanu et al. (2012).1A bag is a set of mentions sharing same entity pair.

• zij2R [ NR: a latent variable that denotes therelation of the jth mention in the ith bag

• hij 2R [NR: a latent variable that denotes therefined relation of the mention xij

We define relabeled relations hij as following:

hij(xij , zij)=

⇢r(g), if 9!g2G s.t.g={gk}✓{xij}zij , otherwise

Thus, relation r(g) is assigned to hij iff thereexists a unique guideline g 2 G, such that thefeature vector xij contains all constituents of g,i.e. entity types, a dependency path and maybe aspan word, if g has one. We use mention relationzij inferred by the model only in case no such aguideline exists or there is more than one match-ing guideline. We also define:• wz is the weight vector for the multi-class rela-

tion mention-level classifier2

• w

ry is the weight vector for the rth binary top-

level aggregation classifier (from mention labelsto bag-level prediction). We use wy to representw

1y,w

2y, . . . ,w

|R|y .

Our approach is aimed at improving the mention-level classifier, while keeping the multi-instancemulti-label framework to allow for joint modeling.

4 Training

We use a hard expectation maximization algorithmto train the model. Our objective function is tomaximize log-likelihood of the data:

LL(w

y

,w

z

) =

nX

i=1

log p(y

i

|xi

,w

y

,w

z

,G)

=

nX

i=1

log

X

hi

p(y

i

,h

i

|xi

,w

y

,w

z

,G)

=

nX

i=1

log

X

hi

|hi|Y

j=1

p(hij |xij ,wz

,G)

Y

r2Pi[Ni

p(y

ri |hi,w

ry

)

where the last equality is due to conditionalindependence. Because of the non-convexityof LL(w

y

,w

z

) we approximate and maximizethe joint log-probability p(y

i

,h

i

|xi

,w

y

,w

z

,G) foreach entity pair in the database:

log p(y

i

,h

i

|xi

,w

y

,w

z

,G)

=

|hi|X

j=1

log p(hij |xij ,wz

,G)+

X

r2Pi[Ni

log p(y

ri |hi,w

ry

).

2All classifiers are implemented using L2-regularized lo-gistic regression with Stanford CoreNLP package.

Page 4: Infusion of Labeled Data into Distant Supervision for ...pershina/acl2014.pdf · Relation extraction is the task of ... formance by explicitly adding preprocessing steps (Takamatsu

Iteration 1 2 3 4 5 6 7 8(a) Corrected relations: 2052 718 648 596 505 545 557 535(b) Retrieved relations: 10219 860 676 670 621 599 594 592Total relabelings 12271 1578 1324 1264 1226 1144 1153 1127

Table 4: Number of relabelings for each training iteration of Guided DS: (a) relabelings due to cor-rected relations, e.g. personChildren! personSiblings (b) relabelings due to retrieved relations, e.g.notRelated(NR)!personTitle

Algorithm 1 : Guided DS training1: Phase 1: build set G of guidelines2: Phase 2: EM training3: for iteration = 1, . . . , T do

4: for i = 1, . . . , n do

5: for j = 1, . . . , |xi

| do

6: z

⇤ij= argmaxzijp(zij |x

i

,y

i

,w

z

,w

y

)

7: h

⇤ij=

⇢r(g), if 9!g2G :{gk}✓{xij}zij

⇤, otherwise

8: update h

i

with h

⇤ij

9: end for

10: end for

11: w

⇤z

=argmax

w

Pni=1

P|xi|j=1log p(hij |xij ,w)

12: for r 2 R do

13: w

r⇤y

=argmax

w

P1in s.t. r2Pi[Ni

log p(y

ri |hi

,w)

14: end for

15: end for

16: return w

z

,w

y

The pseudocode is presented as algorithm 1.

The following approximation is used for infer-ence at step 6:

p(zij |xi

,y

i

,w

z

,w

y

) / p(y

i

, zij |xi

,w

y

,w

z

)

⇡ p(zij |xij ,wz

)p(y

i

|h0i

,w

y

)

= p(zij |xij ,wz

)

Y

r2Pi[Ni

p(y

ri |h0

i

,w

r

y

),

where h

0i

contains previously inferred andmaybe further relabeled mention labels for groupi (steps 5-10), with the exception of component j

whose label is replaced by zij . In the M-step (lines12-15) we optimize model parameters w

z

,w

y

,given the current assignment of mention-level la-bels h

i

.

Experiments show that Guided DS efficientlylearns new model, resulting in a drastically de-creasing number of needed relabelings for furtheriterations (Table 4). At the inference step we firstclassify all mentions:

z

⇤ij = argmaxz2R[NR p(z|xij ,wz

)

Then final relation labels for ith entity tuple are

obtained via the top-level classifiers:y

r⇤i = argmaxy2{P,N} p(y|z⇤

i

,w

r

y

)

5 Experiments

5.1 Data

We use the KBP (Ji and Grishman, 2011) dataset3

which is preprocessed by Surdeanu et al. (2011)using the Stanford parser4 (Klein and Manning,2003). This dataset is generated by mappingWikipedia infoboxes into a large unlabeled corpusthat consists of 1.5M documents from KBP sourcecorpus and a complete snapshot of Wikipedia.

The KBP 2010 and 2011 data includes 200query named entities with the relations they areinvolved in. We used 40 queries as developmentset and the rest 160 queries (3334 entity pairs thatexpress a relation) as the test set. The official KBPevaluation is performed by pooling the system re-sponses and manually reviewing each response,producing a hand-checked assessment data. Weused KBP 2012 assessment data to generate guide-lines since queries from different years do notoverlap. It contains about 2500 labeled sentencesof 41 relations, which is less than 0.09% of thesize of the distantly labeled dataset of 2M sen-tences. The final set G consists of 99 guidelines(section 2.1).

5.2 Models

We implement Guided DS on top of the MIML(Surdeanu et al., 2012) code base5. TrainingMIML on a simple fusion of distantly-labeledand human-labeled datasets does not improve themaximum F-score since this hand-labeled data isswamped by a much larger amount of distant-supervised data of much lower quality. Upsam-pling the labeled data did not improve the perfor-mance either. We experimented with different up-sampling ratios and report best results using ratio1:1 in Figure 2.

3Available from Linguistic Data Consortium (LDC) athttp://projects.ldc.upenn.edu/kbp/data.

4http://nlp.stanford.edu/software/lex-parser.shtml5Available at http://nlp.stanford.edu/software/mimlre.shtml.

Page 5: Infusion of Labeled Data into Distant Supervision for ...pershina/acl2014.pdf · Relation extraction is the task of ... formance by explicitly adding preprocessing steps (Takamatsu

a) b) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Recall

Precision

Student Version of MATLAB

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Recall

Prec

ision

Guided DSMIMLMintz++MultiR

Student Version of MATLAB

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Recall

Precision

Student Version of MATLAB

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Recall

Prec

ision

Guided DSSemi−MIMLDS+upsamplingMaxEnt

Student Version of MATLAB

Model P R F1 AUC Model P R F1 AUC

MaxEnt 40.27 12.40 18.97 1.97 MultiR 30.64 19.79 24.05 6.4

DS+upsampling 32.26 24.31 27.72 12.00 Mintz++ 25.17 25.87 25.51 10.94

Semi MIML 30.02 26.21 27.98 12.31 MIML 28.06 28.64 28.35 11.74

Guided DS 31.9 32.46 32.19 16.1

Model P R F1 AUC

state-of-art

Model P R F1 AUC

MaxEnt 40.27 12.40 18.97 1.97 MultiR 30.64 19.79 24.05 6.4

DS+upsampling 32.26 24.31 27.72 12.00 Mintz++ 25.17 25.87 25.51 10.94

Semi MIML 30.02 26.21 27.98 12.31 MIML 28.06 28.64 28.35 11.74

Guided DS 31.9 32.46 32.19 16.1

baseline

Model P R F1 AUC

state-of-art

Model P R F1 AUC

MaxEnt 40.27 12.40 18.97 1.97 MultiR 30.64 19.79 24.05 6.4

DS+upsampling 32.26 24.31 27.72 12.00 Mintz++ 25.17 25.87 25.51 10.94

Semi MIML 30.02 26.21 27.98 12.31 MIML 28.06 28.64 28.35 11.74

Guided DS 31.9 32.46 32.19 16.1

baseline

Model P R F1 AUC

state-of-art

Model P R F1 AUC

MaxEnt 40.27 12.40 18.97 1.97 MultiR 30.64 19.79 24.05 6.4

DS+upsampling 32.26 24.31 27.72 12.00 Mintz++ 25.17 25.87 25.51 10.94

Semi MIML 30.02 26.21 27.98 12.31 MIML 28.06 28.64 28.35 11.74

Guided DS 31.9 32.46 32.19 16.1

1

Figure 2: Performance of Guided DS on KBP task compared to a) baselines: MaxEnt, DS+upsampling,Semi-MIML (Min et al., 2013) b) state-of-art models: Mintz++ (Mintz et al., 2009), MultiR (Hoffmannet al., 2011), MIML (Surdeanu et al., 2012)

Our baselines: 1) MaxEnt is a supervised maxi-mum entropy baseline trained on a human-labeleddata; 2) DS+upsampling is an upsampling ex-periment, where MIML was trained on a mix ofa distantly-labeled and human-labeled data; 3)Semi-MIML is a recent semi-supervised exten-sion. We also compare Guided DS with threestate-of-the-art models: 1) MultiR and 2) MIMLare two distant supervision models that supportmulti-instance learning and overlapping relations;3) Mintz++ is a single-instance learning algorithmfor distant supervision. The difference betweenGuided DS and all other systems is significantwith p-value less than 0.05 according to a pairedt-test assuming a normal distribution.

5.3 Results

We scored our model against all 41 relations andthus replicated the actual KBP evaluation. Figure2 shows that our model consistently outperformsall six algorithms at almost all recall levels and im-proves the maximum F -score by more than 13.5%

relative to MIML (from 28.35% to 32.19%) as wellas increases the area under precision-recall curveby more than 37% (from 11.74 to 16.1). Also,Guided DS improves the overall recall by morethan 9% absolute (from 30.9% to 39.93%) at acomparable level of precision (24.35% for MIMLvs 23.64% for Guided DS), while increases therunning time of MIML by only 3%. Thus, ourapproach outperforms state-of-the-art model forrelation extraction using much less labeled datathat was used by Zhang et al., (2012) to outper-

form logistic regression baseline. Performanceof Guided DS also compares favorably with bestscored hand-coded systems for a similar task suchas Sun et al., (2011) system for KBP 2011, whichreports an F-score of 25.7%.

6 Conclusions and Future Work

We show that relation extractors trained with dis-tant supervision can benefit significantly from asmall number of human labeled examples. Wepropose a strategy to generate and select guide-lines so that they are more generalized forms oflabeled instances. We show how to incorporatethese guidelines into an existing state-of-art modelfor relation extraction. Our approach significantlyimproves performance in practice and thus opensup many opportunities for further research in REwhere only a very limited amount of labeled train-ing data is available.

Acknowledgments

Supported by the Intelligence Advanced ResearchProjects Activity ( IARPA) via Air Force ResearchLaboratory (AFRL) contract number FA8650-10-C-7058. The U.S. Government is authorized toreproduce and distribute reprints for Governmen-tal purposes notwithstanding any copyright anno-tation thereon. The views and conclusions con-tained herein are those of the authors and shouldnot be interpreted as necessarily representing theofficial policies or endorsements, either expressedor implied, of IARPA, AFRL, or the U.S. Govern-ment.

Page 6: Infusion of Labeled Data into Distant Supervision for ...pershina/acl2014.pdf · Relation extraction is the task of ... formance by explicitly adding preprocessing steps (Takamatsu

References

Avrim Blum and Tom M. Mitchell. 1998. Combin-ing labeled and unlabeled sata with co-training. InProceedings of the 11th Annual Conference on Com-putational Learning Theory (COLT), pages 92–100.

Razvan C. Bunescu and Raymond J. Mooney. 2007.Learning to extract relations from the web usingminimal supervision. In Proceedings of the 45th An-nual Meeting of the Association for ComputationalLinguistics (ACL).

Michael Collins and Yorav Singer. 1999. Unsuper-vised models for named entity classification. Pro-ceedings of the Conference on Empirical Methodsin Natural Language Processing (EMNLP-VLC). ,

Mark Craven and Johan Kumlien. 1999. Constructingbiological knowledge bases by extracting informa-tion from text sources. In Proceedings of the Sev-enth International Conference on Intelligent Systemsfor Molecular Biology (ISMB), pages 77–86.

Oren Etzioni, Michele Banko, Stephen Soderland, andDaniel S. Weld. 2008. Open information extrac-tion from the web. Communications of the ACM,51(12):68–74.

Raphael Hoffmann, Congle Zhang, and Daniel S.Weld. 2010. Learning 5000 relational extractors.In Proceedings of the 49th Annual Meetings of theAssociation for Computational Linguistics (ACL),pages 286–295.

Raphael Hoffmann, Congle Zhang, Xiao Ling,Luke S. Zettlemoyer, and Daniel S. Weld. 2011.Knowledge-based weak supervision for informationextraction of overlapping relations. In Proceedingsof the 49th Annual Meeting of the Association forComputational Linguistics (ACL), pages 541–550.

Heng Ji and Ralph Grishman. 2011. Knowledge basepopulation: Successful approaches and challenges.In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics (ACL),pages 1148–1158.

Heng Ji, Ralph Grishman, and Hoa Trang Dang. 2011.Overview of the TAC-2011 knowledge base popula-tion track. In Text Analysis Conference Workshop.

Dan Klein and Christopher D. Manning. 2003. Ac-curate unlexicalized parsing. In Proceedings of the41th Annual Meetings of the Association for Com-putational Linguistics (ACL).

Percy Liang, Michael I.Jordan and Dan Klein. 2009.Learning From Measurements in Exponential Fami-lies. In Proceedings of the 26th Annual InternationalConference on Machine Learning (ICML), pages =641–648

Bonan Min, Ralph Grishman, Li Wan, Chang Wang,and David Gondek. 2013. Distant supervision forrelation extraction with an incomplete knowledge

base. In Proceedings of the Conference of the NorthAmerican Chapter of the Association for Computa-tional Linguistics (NAACL).

Mike Mintz, Steven Bills, Rion Snow, and Daniel Ju-rafsky. 2009. Distant supervision for relation ex-traction without labeled data. In Proceedigns of the47th Annual Meeting of the Association for Compu-tational Linguistics and the 4th International JointConference on Natural Language Processing (ACL),pages 1003–1011.

Ramesh Nallapati. 2004. Discriminative models forinformation retrieval. In Proceedigns of the 27th An-nual International ACM SIGIR Conference on Re-search and Development in Information Retrieval(SIGIR), pages 64–71.

Truc-Vien T. Nguyen and Alessandro Moschitti. 2011.End-to-end relation extraction using distant super-vision from external semantic repositories. In Pro-ceedings of the 49th Annual Meeting of the Asso-ciation for Computational Linguistics (ACL), pages277–282.

Kamal Paul Nigam. 2001. Using Unlabeled Data toImprove Text Classification. Ph.D. thesis, School ofComputer Science, Carnegie Mellon University.

Sebastian Riedel, Limin Yao, and Andrew McCallum.2010. Modeling relations and their mentions with-out labeled text. In Proceedigns of the EuropeanConference on Machine Learning and Principlesand Practice of Knowledge Discovery in Databases(ECML/PKDD), pages 148–163.

Alan Ritter, Luke Zettlemoyer, Mausam, and Oren Et-zioni. 2013. Modeling missing data in distant su-pervision for information extraction. Transactionsof the Association for Computational Linguistics.

Benjamin Roth, Tassilo Barth, Michael Wiegand, andDietrich Klakow 2013. A Survey of Noise Reduc-tion Methods for Distant Supervision. In Proceed-ings of Conference on Information and KnowledgeManagement (CIKM-AKBC).

Benjamin Snyder and Regina Barzilay 2007.Database-text alignment via structured multilabelclassification. In Proceedings of IJCAI.

Ang Sun, Ralph Grishman, Wei Xu, and Bonan Min.2011. New york university 2011 system for kbp slotfilling. In Text Analysis Conference (TAC-KBP).

Mihai Surdeanu, J. Turmo, and A. Ageno. 2006. Ahybrid approach for the acquisition of informationextraction patterns. In Proceedings of the 11th Con-ference of the European Chapter of the Associatefor Computational Linguistics Workshop on Adap-tive Text Extraction and Mining (EACL).

Mihai Surdeanu, Sonal Gupta, John Bauer, David Mc-Closky, Angel X. Chang, Valentin I. Spitkovsky,and Christopher D.Manning. 2011. Stanford’s

Page 7: Infusion of Labeled Data into Distant Supervision for ...pershina/acl2014.pdf · Relation extraction is the task of ... formance by explicitly adding preprocessing steps (Takamatsu

Distantly-Supervised Slot-Filling System. In Pro-ceedings of the Text Analysis Conference (TAC-KBP).

Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati,and Christopher D. Manning. 2012. Multi-instancemulti-label learning for relation extraction. In Pro-ceedings of the 50th Annual Meeting of the Asso-ciation for Computational Linguistics (ACL), pages455–465.

Shingo Takamatsu, Issei Sato, and Hiroshi Nakagawa.2012. Reducing wrong labels in distant supervi-sion for relation extraction. In Proceedings of the50th Annual Meeting of the Association for Compu-tational Linguistics (ACL), pages 721–729.

Fei Wu and Daniel S. Weld. 2007. Autonomously se-mantifying wikipedia. In Proceedings of the Inter-national Conference on Information and KnowledgeManagement (CIKM), pages 41–50.

Wei Xu, Raphael Hoffmann, Zhao Le, and Ralph Gr-ishman. 2013. Filling knowledge base gaps for dis-tant supervision of relation extraction. In Proceed-ings of the 51th Annual Meeting of the Associationfor Computational Linguistics (ACL).

David Yarowsky. 1995. Unsupervised word sense dis-ambiguation rivaling supervised methods. In Pro-ceedings of the 33th Annual Meeting of the Associa-tion for Computational Linguistics (ACL).

Ce Zhang, Feng Niu, Christopher Re, and Jude Shav-lik. 2012. Big data versus the crowd: Looking forrelationships in all the right places. In Proceedingsof the 50th Annual Meeting of the Association forComputational Linguistics, pages 825–834. Associ-ation for Computational Linguistics.

Guodong Zhou, Jian Su, Jie Zhang, and Min Zhang.2005. Exploring various knowledge in relation ex-traction. In Proceedings of the Annual Meetingof the Association for Computational Linguistics(ACL).