Updating a Name Tagger Using
Contemporary Unlabeled Data
ACL-IJCNLP 2009Singapore, August 3rd - 5th
Cristina Mota1,2 and Ralph Grishman2
1IST & L2F INESC-ID (Portugal)2New York University (USA)
(Advisors: Ralph Grishman & Nuno Mamede)
This research was funded by Fundacao para a Ciencia e a Tecnologia (doctoral scholarship SFRH/BD/3237/2000)
Motivation
0 1 2 3 4 5 6 7
0.79
0.80
0.81
0.82
0.83
0.84
0.85
Time gap (year)
F−m
easu
re
y=−0.00391x+0.82479R2=0.3647
The performance of a co-trainednamed entity tagger decreases asthe time gap increases betweentraining and test sets (Mota &Grishman, 2008)
Do we need to update the seeds or the unlabeled data?
Does more older data help?
Motivation
0 1 2 3 4 5 6 7
0.79
0.80
0.81
0.82
0.83
0.84
0.85
Time gap (year)
F−m
easu
re
y=−0.00391x+0.82479R2=0.3647
The performance of a co-trainednamed entity tagger decreases asthe time gap increases betweentraining and test sets (Mota &Grishman, 2008)
Do we need to update the seeds or the unlabeled data?
Does more older data help?
Related Work
“More data are better data” (Church & Mercer, 1993)Enlarge labeled data as a way of improving performance
Contemporary (labeled) data reduces out-of-vocabulary rates
Time-adaptive language model (Auzanne et al., 2000)Generation of offline name lists (Palmer & Ostendorf, 2005)Daily adaptation of the language model of a broadcast newstranscription system (Martins et al., 2006)
Data Sets
Data sets were drawn from the Politics section of CETEMPublicocorpus (Santos & Rocha, 2001)
Language: Portuguese
Time span: 8 years (1991-1998)
Time gap: 1=6 months
For each six month period
Seeds (S): names collected from first 192 extracts∗
Test data (T): next 208 extractsUnlabeled data (U): next 7856 extracts
∗1 extract = app. 2 paragraphs
Named Entity Tagger
Identification
Pairs (spelling features,
contextual features)
Co-training
Spelling +
contextual rules
Seeds
Unlabeled text
Training
Based on a co-training classifier(Collins & Singer, 1999)
Includes propagation step
Needs few seeds andperformance is high (above80%)
Performance is parametrized bycombination of seeds,unlabeled set and test set:(S,U,T)
Tagger is evaluated afterpropagation with HAREMscoring programs
Named Entity Tagger
Test text
Labeled Pairs
Text with classified NE
Identification
Classification
Propagation
Pairs (spelling features,
contextual features)
Co-training
Spelling +
contextual rules
Seeds
Unlabeled text
TestingTraining
Based on a co-training classifier(Collins & Singer, 1999)
Includes propagation step
Needs few seeds andperformance is high (above80%)
Performance is parametrized bycombination of seeds,unlabeled set and test set:(S,U,T)
Tagger is evaluated afterpropagation with HAREMscoring programs
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 1: Baseline (vary seeds and unlabeled datasynchronously as in Mota & Grishman (2008))
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 1: Baseline (vary seeds and unlabeled datasynchronously as in Mota & Grishman (2008))
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 1: Baseline (vary seeds and unlabeled datasynchronously as in Mota & Grishman (2008))
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 1: Baseline (vary seeds and unlabeled datasynchronously as in Mota & Grishman (2008))
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 1: Baseline (vary seeds and unlabeled datasynchronously as in Mota & Grishman (2008))
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 1: Baseline (vary seeds and unlabeled datasynchronously as in Mota & Grishman (2008))
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 1: Baseline (vary seeds and unlabeled datasynchronously as in Mota & Grishman (2008))
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 1: Baseline (vary seeds and unlabeled datasynchronously as in Mota & Grishman (2008))
Update seeds or unlabeled data?0.
740.
760.
780.
800.
820.
84
Training epoch
F−m
easu
re
(i,i,98b)(98b,i,98b)(i,98b,98b)
91a
91b
92a
92b
93a
93b
94a
94b
95a
95b
96a
96b
97a
97b
98a
98b
Performance decays as thetime gap increases (Mota &Grishman, 2008)
v v v v v v v v v v v v v v v v v v v v v v v v
Update seeds or unlabeled data?
Timeline
Tn
SnSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 2: Update seeds (vary unlabeled data but usecontemporary seeds)
Update seeds or unlabeled data?
Timeline
Tn
Seeds
Unlabeled
examples Ui
Test
91a 98b
Sn
Experiment 2: Update seeds (vary unlabeled data but usecontemporary seeds)
Update seeds or unlabeled data?
Timeline
Tn
Seeds
Unlabeled
examples Ui
Test
91a 98b
Sn
Experiment 2: Update seeds (vary unlabeled data but usecontemporary seeds)
Update seeds or unlabeled data?
Timeline
Tn
Seeds
Unlabeled
examples Ui
Test
91a 98b
Sn
Experiment 2: Update seeds (vary unlabeled data but usecontemporary seeds)
Update seeds or unlabeled data?
Timeline
Tn
Seeds
Unlabeled
examples Ui
Test
91a 98b
Sn
Experiment 2: Update seeds (vary unlabeled data but usecontemporary seeds)
Update seeds or unlabeled data?
Timeline
Tn
SnSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 2: Update seeds (vary unlabeled data but usecontemporary seeds)
Update seeds or unlabeled data?
Timeline
Tn
SnSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 2: Update seeds (vary unlabeled data but usecontemporary seeds)
Update seeds or unlabeled data?
Timeline
Tn
SnSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 2: Update seeds (vary unlabeled data but usecontemporary seeds)
Update seeds or unlabeled data?0.
740.
760.
780.
800.
820.
84
Training epoch
F−m
easu
re
(i,i,98b)(98b,i,98b)(i,98b,98b)
91a
91b
92a
92b
93a
93b
94a
94b
95a
95b
96a
96b
97a
97b
98a
98b
Contemporary seeds slightlyattenuate the decrease
v v v v v v v v v v v v v v v v v v v v v v v v
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Un
Test
91a 98b
Experiment 3: Update unlabeled data (vary seeds but usecontemporary unlabeled data)
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples
Test
91a 98b
Un
Experiment 3: Update unlabeled data (vary seeds but usecontemporary unlabeled data)
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples
Test
91a 98b
Un
Experiment 3: Update unlabeled data (vary seeds but usecontemporary unlabeled data)
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples
Test
91a 98b
Un
Experiment 3: Update unlabeled data (vary seeds but usecontemporary unlabeled data)
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples
Test
91a 98b
Un
Experiment 3: Update unlabeled data (vary seeds but usecontemporary unlabeled data)
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples
Test
91a 98b
Un
Experiment 3: Update unlabeled data (vary seeds but usecontemporary unlabeled data)
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Un
Test
91a 98b
Experiment 3: Update unlabeled data (vary seeds but usecontemporary unlabeled data)
Update seeds or unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Un
Test
91a 98b
Experiment 3: Update unlabeled data (vary seeds but usecontemporary unlabeled data)
Updating the unlabeled data is better than
updating the seeds0.
740.
760.
780.
800.
820.
84
Training epoch
F−m
easu
re
(i,i,98b)(98b,i,98b)(i,98b,98b)
91a
91b
92a
92b
93a
93b
94a
94b
95a
95b
96a
96b
97a
97b
98a
98b
Contemporary unlabeled datamaintain the performance
v v v v v v v v v v v v v v v v v v v v v v v v
Augment unlabeled data?
Timeline
Tn
SnSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 4: Enlarge unlabeled data with older data anduse contemporary seeds
Augment unlabeled data?
Timeline
Tn
SnSeeds
Unlabeled
examples Ui
Test
91a 98b
Ui
Experiment 4: Enlarge unlabeled data with older data anduse contemporary seeds
Augment unlabeled data?
Timeline
Tn
SnSeeds
Unlabeled
examples Ui
Test
91a 98b
UiUi
Experiment 4: Enlarge unlabeled data with older data anduse contemporary seeds
Augment unlabeled data?
Timeline
Tn
SnSeeds
Unlabeled
examples Ui
Test
91a 98b
UiUiUi
Experiment 4: Enlarge unlabeled data with older data anduse contemporary seeds
Augment unlabeled data?
Timeline
Tn
SnSeeds
Unlabeled
examples Ui
Test
91a 98b
UiUiUiUi
Experiment 4: Enlarge unlabeled data with older data anduse contemporary seeds
Augment unlabeled data?
Timeline
Tn
SnSeeds
Unlabeled
examples Ui
Test
91a 98b
UiUiUiUiUi
Experiment 4: Enlarge unlabeled data with older data anduse contemporary seeds
Augment unlabeled data?
Timeline
Tn
SnSeeds
Unlabeled
examples
Test
91a 98b
UiUiUiUiUiUiUi
Experiment 4: Enlarge unlabeled data with older data anduse contemporary seeds
Augment unlabeled data?0.
740.
760.
780.
800.
820.
84
Time frame (semester)
F−m
easu
re
(i,98b,98b)(i,u[i,...,98a],98b)(98b,u[i,...,98a],98b)
91a
91b
92a
92b
93a
93b
94a
94b
95a
95b
96a
96b
97a
97b
98a
98b
Green line: Same seeds for all taggers (98b);unlabeled data is enlarging backwards
Blue line: Different seeds for each tagger; sameunlabeled data for all taggers (98b)
Larger amounts of olderunlabeled data does not alwaysresult in better performance
Augment unlabeled data?0.
740.
760.
780.
800.
820.
84
Time frame (semester)
F−m
easu
re
(i,98b,98b)(i,u[i,...,98a],98b)(98b,u[i,...,98a],98b)
91a
91b
92a
92b
93a
93b
94a
94b
95a
95b
96a
96b
97a
97b
98a
98b
Green line: Same seeds for all taggers (98b);unlabeled data is enlarging backwards
Blue line: Different seeds for each tagger; sameunlabeled data for all taggers (98b)
Larger amounts of olderunlabeled data does not alwaysresult in better performance
Augment unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Experiment 5: Enlarge the size of unlabeled data and varyseeds
Augment unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Ui
Experiment 5: Enlarge the size of unlabeled data and varyseeds
Augment unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Ui Ui
Experiment 5: Enlarge the size of unlabeled data and varyseeds
Augment unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Ui Ui Ui
Experiment 5: Enlarge the size of unlabeled data and varyseeds
Augment unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Ui Ui Ui Ui
Experiment 5: Enlarge the size of unlabeled data and varyseeds
Augment unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Ui Ui Ui Ui Ui
Experiment 5: Enlarge the size of unlabeled data and varyseeds
Augment unlabeled data?
Timeline
Tn
SiSeeds
Unlabeled
examples Ui
Test
91a 98b
Ui Ui Ui Ui Ui Ui
Experiment 5: Enlarge the size of unlabeled data and varyseeds
Updating the unlabeled data is better than
accumulating older unlabeled data0.
740.
760.
780.
800.
820.
84
Time frame (semester)
F−m
easu
re
(i,98b,98b)(i,u[i,...,98a],98b)(98b,u[i,...,98a],98b)
91a
91b
92a
92b
93a
93b
94a
94b
95a
95b
96a
96b
97a
97b
98a
98b
Violet line: Seeds in the same time frame asunlabeled set being added; unlabeled data isenlarging backwards
Blue line: Seeds are the same as in the violetline; same unlabeled data for all taggers (98b)
Green line: Same seeds for all taggers (98b);unlabeled data is enlarging backwards
Larger amounts of unlabeleddata is worse than training withcontemporary unlabeled data
Larger amounts of unlabeleddata does not outperform thetagger trained withcontemporary seeds andunlabeled data
Updating the unlabeled data is better than
accumulating older unlabeled data0.
740.
760.
780.
800.
820.
84
Time frame (semester)
F−m
easu
re
(i,98b,98b)(i,u[i,...,98a],98b)(98b,u[i,...,98a],98b)
91a
91b
92a
92b
93a
93b
94a
94b
95a
95b
96a
96b
97a
97b
98a
98b
Violet line: Seeds in the same time frame asunlabeled set being added; unlabeled data isenlarging backwards
Blue line: Seeds are the same as in the violetline; same unlabeled data for all taggers (98b)
Green line: Same seeds for all taggers (98b);unlabeled data is enlarging backwards
Larger amounts of unlabeleddata is worse than training withcontemporary unlabeled data
Larger amounts of unlabeleddata does not outperform thetagger trained withcontemporary seeds andunlabeled data
Updating the unlabeled data is better than
accumulating older unlabeled data0.
740.
760.
780.
800.
820.
84
Time frame (semester)
F−m
easu
re
(i,98b,98b)(i,u[i,...,98a],98b)(98b,u[i,...,98a],98b)
91a
91b
92a
92b
93a
93b
94a
94b
95a
95b
96a
96b
97a
97b
98a
98b
Violet line: Seeds in the same time frame asunlabeled set being added; unlabeled data isenlarging backwards
Blue line: Seeds are the same as in the violetline; same unlabeled data for all taggers (98b)
Green line: Same seeds for all taggers (98b);unlabeled data is enlarging backwards
Larger amounts of unlabeleddata is worse than training withcontemporary unlabeled data
Larger amounts of unlabeleddata does not outperform thetagger trained withcontemporary seeds andunlabeled data
Final remarks
Contemporary unlabeled data are better data
But...
Why doesn’t the labeled data impact the performance more?Are other semi-supervised approaches also sensitive?
Acknowledgments
This research work was funded by Fundacao para a Ciencia e a
Tecnologia (doctoral scholarship SFRH/BD/3237/2000)
Updating a Name Tagger Using
Contemporary Unlabeled Data
ACL-IJCNLP 2009Singapore, August 3rd - 5th
Cristina Mota1,2 and Ralph Grishman2
1IST & L2F INESC-ID (Portugal)2New York University (USA)
(Advisors: Ralph Grishman & Nuno Mamede)
This research was funded by Fundacao para a Ciencia e a Tecnologia (doctoral scholarship SFRH/BD/3237/2000)
Example of (miss)classification
Test set 98b includes two instances of “Tizi Ouzou”:Tizi Ouzou tem (en: Tizi Ouzou has)manifestacoes em Tizi Ouzou (en: demonstrations in Tizi Ouzou)
Does not occur in u 91a so depends on contexts:(”n v” ”tem”) ORGANIZATION 0.52(”type” ”nprop v”) PERSON 0.43(”len” 2) PERSON 0.62
But occurs in u 98b:noite em Tizi (en: night in Tizi)ruas de Tizi Ouzou (en: street of Tizi Ouzou)ir a Tizi-Ouzou (en: go to Tizi Ouzou)
NE tagger: Identification
Raw text
Lexical analysis
Chunking
NE + context identification
Portuguese dictionary
Pairs (NE,context)
Priority dictionaries
Chunking grammars
Morphological grammars
NE + context grammars
Text with unclassified NE
Identification designed with NooJ(Silberztein, 2004)
1 Elisa Ferreira comecou porcriticar Cavaco Silva
2 [Elisa Ferreira]SEQM [comecouporcriticar]V+Complexo+Pred=criticar
[Cavaco Silva]SEQM
3 [Elisa Ferreira]nprop v+criticar
comecou por criticar [CavacoSilva]v nprop+criticar
4 [Elisa Ferreira]nprop v+criticar
[Cavaco Silva]v nprop+criticar
NE tagger: Classification
Seeds
Label with name rules
Infer context rules
Label with context rules
Infer name rules
Labeled examples
Context rules
Labeled examples
Name rules
Label with name + context rules
Labeled examples
Infername + context rules
List of examples
Name + context rules
Spelling features ← SEEDS: (ElisaFerreira,PESSOA,0.9999)
1 LABEL: Elisa Ferreira,criticar ← PESSOA
2 INFER: (criticar,PESSOA,0.98)
3 LABEL: Cavaco Silva,criticar ← PESSOA
4 INFER: (Silva,PESSOA,0.97)
5 REPEAT
NE tagger performance decreases over time (Mota & Grishman, 2008)
Detailed analysis using six-month periods (instead of periods of 1 year)
(Si , Ui , Tj)a b R2
P 0.827 -0.0024 0.24824R 0.773 -0.0022 0.19393F 0.799 -0.0023 0.23765
0 5 10 15
0.74
0.76
0.78
0.80
0.82
Time gap (1=6 months)
F−m
easu
re
y=−0.00232x+0.79906R2=0.2376
The performance decreases at an estimated rate of:
0.00232 in F-measure each 6 months (0.0348 after 8 years)
The low R-squared values show that not all variation is attributableto increasing the time gap
Updating the unlabeled data is better thanupdating the seeds (Complete training-test configurations)
0 5 10 15
0.74
0.76
0.78
0.80
0.82
Time gap (1=6 months)
F−m
easu
re
y=−0.00232x+0.79906R2=0.2376
Update? a b R2
No 0.799 -0.0023 0.238Seeds 0.800 -0.0019 0.192Unlabeled 0.807 -0.0005 0.019
Updating the unlabeled data is better thanupdating the seeds (Complete training-test configurations)
0 5 10 15
0.76
0.78
0.80
0.82
Time gap (1=6 months)
F−m
easu
re
y=−0.00189x+0.80025R2=0.1917
Update? a b R2
No 0.799 -0.0023 0.238Seeds 0.800 -0.0019 0.192Unlabeled 0.807 -0.0005 0.019
Updating the unlabeled data is better thanupdating the seeds (Complete training-test configurations)
0 5 10 15
0.77
0.78
0.79
0.80
0.81
0.82
0.83
Time gap (1=6 months)
F−m
easu
re
y=−0.00051x+0.80769R2=0.0189
Update? a b R2
No 0.799 -0.0023 0.238Seeds 0.800 -0.0019 0.192Unlabeled 0.807 -0.0005 0.019
Confusion matrices
91a 335 12 22 330 16 20 393 12 22
52 453 79 52 456 69 12 463 38
23 21 330 28 14 342 5 11 371
92b 368 19 42 368 16 40 391 11 22
19 435 55 23 445 39 14 463 29
23 32 334 19 25 352 5 12 380
95b 375 14 34 387 14 30 394 12 26
22 465 78 13 461 73 12 463 43
13 7 319 10 11 328 4 11 362
98a 390 16 31 386 16 28 395 11 28
11 458 58 13 460 48 11 464 39
9 12 342 11 10 355 4 11 364
98b 394 9 20 394 9 20 394 9 20
8 467 29 8 467 29 8 467 29
8 10 382 8 10 382 8 10 382
Top Related