Cross-Domain Generalization of Neural Constituency Parsersdfried/talks/fkk... · 2019. 8. 3. ·...
Transcript of Cross-Domain Generalization of Neural Constituency Parsersdfried/talks/fkk... · 2019. 8. 3. ·...
-
Cross-Domain Generalization of Neural Constituency Parsers
Daniel Fried*, Nikita Kitaev*, and Dan Klein
-
Cross-Domain Transfer
But Coleco bounced back with the introduction of the Cabbage Patch dolls, whose sales hit $600 million in 1985.
Penn Treebank
Genia
English Web Treebank
Several of the heterogeneous clinical manifestations of systemic lupus erythematosus have been associated with specific autoantibodies.
Where can I get morcillas in tampa bay, I will like the Argentinian type, but I will to try another please?
-
Penn Treebank Parsing by the Numbers
Charniak, Collins
Charniak & Johnson
Petrov+
Sagae & Lavie
McClosky+
Carreras+
Zhang+
Shindo+
Zhu+
Henderson
Socher+
Durrett+
Dyer+
Choe & Charniak
Dyer+
Stern+
Liu & Zhang
Fried+
Kitaev & Klein
Kitaev & Klein
Joshi+
Kitaev+
89
90
91
92
93
94
95
96
2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020
Pen
n T
ree
ban
k F1
YearNon-Neural Neural
-
Penn Treebank Parsing by the Numbers
Charniak, Collins
Charniak & Johnson
Petrov+
Sagae & Lavie
McClosky+
Carreras+
Zhang+
Shindo+
Zhu+
Henderson
Socher+
Durrett+
Dyer+
Choe & Charniak
Dyer+
Stern+
Liu & Zhang
Fried+
Kitaev & Klein
Kitaev & Klein
Joshi+
Kitaev+
89
90
91
92
93
94
95
96
2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020
Pen
n T
ree
ban
k F1
YearNon-Neural Neural
No Structure
-
Penn Treebank Parsing by the Numbers
Charniak, Collins
Charniak & Johnson
Petrov+
Sagae & Lavie
McClosky+
Carreras+
Zhang+
Shindo+
Zhu+
Henderson
Socher+
Durrett+
Dyer+
Choe & Charniak
Dyer+
Stern+
Liu & Zhang
Fried+
Kitaev & Klein
Kitaev & Klein
Joshi+
Kitaev+
89
90
91
92
93
94
95
96
2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020
Pen
n T
ree
ban
k F1
YearNon-Neural Neural
Transformers
ELMo
BERT
No Structure
-
Methodology
Non-neural:• Berkeley [Petrov and Klein 2007]
• BLLIP [Charniak and Johnson 2005]
• ZPar (Chinese) [Zhang and Clark 2011]
Neural:• Self-Attentive Chart [Stern et al. 2017; Kitaev and Klein 2018]
• In-Order Recurrent Neural Network Grammars (RNNG) [Dyer et al. 2016; Kuncoro et al. 2017; Liu and Zhang 2017]
-
Methodology
Train Test out-of-domain
EnglishNewswire(PTB WSJ)
Literature (Brown)
Biomedical (Genia)
Web newsgroups, reviews, questions
(EWT)
ChineseNewswire(CTB v5)
TV News(CTB v8)
Web forums(CTB v8)
Blogs(CTB v8)
Zero-shot generalization setup:
-
Fact or Myth?
Neural parsers transfer less reliably than non-neural?
Pre-trained representations are most useful out-of-domain?
Structure helps in generalization?
-
Fact or Myth?
Neural parsers transfer less reliably than non-neural?
Pre-trained representations are most useful out-of-domain?
Structure helps in generalization?
-
Neural vs. Non-Neural Generalization
90.1
84.6
79.177.4
91.5
85.9
79.6 79.9
91.5
85.6
80.379.1
93.3
88.0
82.7 82.2
75
80
85
90
95
Penn Treebank Brown Corpus Genia English Web
F1
English Corpora
Berkeley BLLIP In-Order RNNG Chart
NeuralNon-Neural
In-Domain Out-of-Domain
-
Neural vs. Non-Neural Generalization
83.0
77.2
74.3 73.9
83.7
77.8
75.774.7
65
70
75
80
85
CTB v5 (mostlynewswire)
Broadcast News Web Forums Blogs
F1
Chinese Corpora
ZPar In-Order RNNGNeuralNon-Neural
In-Domain Out-of-Domain
-
Fact or Myth?
Neural parsers transfer less reliably than non-neural?
Pre-trained representations are most useful out-of-domain?
Structure helps in generalization?
-
Fact or Myth?
Neural and non-neural parsers transfer similarly.
Pre-trained representations are most useful out-of-domain?
Structure helps in generalization?
-
Fact or Myth?
Neural and non-neural parsers transfer similarly.
Pre-trained representations are most useful out-of-domain?
Structure helps in generalization?
-
Effects of Pre-Trained Representations
91.5
85.6
80.379.1
92.1
86.8
81.680.5
95.7
93.5
87.889.3
75
80
85
90
95
100
Penn Treebank Brown Corpus Genia English Web
F1
English Corpora
In-Order RNNG +Word Embeddings +BERT
-50%
-8%
-7%
-38%
-7%
-49%
-8%-55%
% Relative Error Reductions:In-Domain Out-of-Domain
-
Fact or Myth?
Neural and non-neural parsers transfer similarly.
Pre-trained representations are most useful out-of-domain?
Structure helps in generalization?
-
Fact or Myth?
Neural and non-neural parsers transfer similarly.
Pre-training helps across domains.
Structure helps in generalization?
-
Fact or Myth?
Neural and non-neural parsers transfer similarly.
Pre-training helps across domains.
Structure helps in generalization?
-
Structured Decoding?
We wanted more structure
BERT
Condition on sentence only Also condition on predicted structure
UnstructuredSelf-Attentive Chart Parser[Stern et al. 2017, Kitaev and Klein 2018]
?
We wanted more structure
?
BERT
VP
S
NP
StructuredIn-Order RNNG
[Dyer et al. 2016, Liu and Zhang 2017]
VP
S
NP NP NP
-
Structure Helps More Out-of-Domain
95.6
93.1
88.787.5
95.7
93.5
89.3
87.8
85
90
95
100
Penn Treebank Brown Corpus Genia English Web
F1
Labeled Bracket F1 (English)
Unstructured (Chart + BERT) Structured (In-Order + BERT)
In-Domain Out-of-Domain
-6%
-5%-2%
-2%
% Relative Error Reductions:
-
Structure Helps More Out-of-Domain
95.6
93.1
88.787.5
95.7
93.5
89.3
87.8
85
90
95
100
Penn Treebank Brown Corpus Genia English Web
F1
Labeled Bracket F1 (English)
Unstructured (Chart + BERT) Structured (In-Order + BERT)
In-Domain Out-of-Domain
-6%
-5%-2%
-2%
% Relative Error Reductions:
-
Structure Helps with Larger Spans
F1 by minimum span length, on English Web
StructuredUnstructured
Minimum span length
Lab
elle
d s
pan
F1
-
55.1
49.2
41.8
17.5
57.1
52.0
44.0
18.0
10
20
30
40
50
60
Penn Treebank Brown Corpus Genia English Web
Exac
t M
atch
%
Exact Parse Match Accuracies (English)
Unstructured (Chart + BERT) Structured (In-Order + BERT)
Structure Improves Exact Match
-4%-6%
-4%
-1%
% Relative Error Reductions:
-
Fact or Myth?
Neural and non-neural parsers transfer similarly.
Pre-training helps across domains.
Structure helps in generalization?
-
Fact or Myth?
Neural and non-neural parsers transfer similarly.
Pre-training helps across domains.
Structure helps in domain transfer, longer spans, and whole parses.
-
Conclusions
Neural and non-neural parsers transfer similarly.
Pre-training helps across domains.
Structure helps in domain transfer, longer spans, and whole parses.
-
Thank you!
Code and models:
Chart + BERT:In-Order RNNG + BERT: github.com/dpfried/rnng-bert
parser.kitaev.io
-
High Accuracy Parsers Benefit NLP Systems
Syntactic parses can improve system performance, even for neural models
[Roth and Lapata 2016; Andreas et al. 2016; Aharoni and Goldberg 2017; Strubell et al. 2018; Swayamdipta et al. 2018; Hale et al. 2018; Kuncoro et al. 2018; Kim et al. 2019; He et al. 2019]
Input Text
Output
Parsed Text
-
Past Evaluations of Generalization
Charniak, Collins
Charniak & Johnson
Petrov+
Sagae & Lavie
McClosky+
Carreras+
Zhang+
Shindo+
Zhu+
Henderson
Socher+
Durrett+
Dyer+
Choe & Charniak
Dyer+
Stern+
Liu & Zhang
Fried+
Kitaev & Klein
Kitaev & Klein
Joshi+
Kitaev+
89
90
91
92
93
94
95
96
2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020
Pen
n T
ree
ban
k F1
Year
Gildea, 2001
McClosky+, 2006
Petrov & Klein, 2007
Joshi+, 2018
-
Structure Helps with Larger Spans
F1 by minimum span length, on Genia corpus
-
Effects of Pre-Trained Representations
83.7
77.8
75.774.7
85.7
81.6
79.478.2
91.8
88.487.0
84.3
70
75
80
85
90
95
CTB v5 (newswire) Broadcast News Web Forums Blogs
F1
Chinese Corpora
In-Order RNNG +Word Embeddings +BERT
-12%
-50%
-15%
-47%
-14%
-38%
-17%
-48%
% Relative Error Reductions:In-Domain Out-of-Domain