Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013.

16
Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013

Transcript of Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013.

Page 1: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013.

Parsing with Compositional Vector Grammars

Socher, Bauer, Manning, NG 2013

Page 2: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013.

Problem

• How can we parse a sentence and create a dense representation of it?

– N-grams have obvious problems, most important is sparsity

• Can we resolve syntactic ambiguity with context? “They ate udon with forks” vs “They ate udon with chicken”

Page 3: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013.

Standard Recursive Neural Net

I like green eggs

[ Vector(I)] [ Vector(like)]

WMain

[ Vector(I-like)]

Score

[ Vector(green)] [ Vector(eggs)]

Classifier? WMain

[ Vector((I-like)green)]

Page 4: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013.

Standard Recursive Neural Net

Where is usually or logistic()In other words, stack the two word vectors and multiply through a matrix W and you get a vector of the same dimensionality as the children a or b.

Page 5: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013.

Syntactically Untied RNN

I like green eggs

[ Vector(I)] [ Vector(like)]

WN,V

[ Vector(I-like)]

Score

[ Vector(green)] [ Vector(eggs)]

Classifier

Wadj,N

[ Vector(green-eggs)]

First, parse lower level with PCFG

N V Adj N

Page 6: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013.

Syntactically Untied RNN

The weight matrix is determined by the PCFG parse category of a and b. (You have one per parse combination)

Page 7: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013.

Examples: Composition Matrixes

• Notice that he initializes them with two identity matrixes (in the absence of other information we should average

Page 8: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013.

Learning the Weights

• Errors are backpropagated through structure (Goller and Kuchler, 1996)

Weight derivatives are additive across branches! (Not obvious- good proof/explanation in Socher, 2014)

𝛿

𝑓 ′ (𝑥)

(for logistic)

𝑒input

Page 9: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013.

Tricks

• Our good friend, ada-grad (diagonal variant):

(Elementwise)• Initialize matrixes with identity + small

random noise• Uses Collobert and Weston (2008) word

embeddings to start

Page 10: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013.

Learning the Tree

• We want the scores of the correct parse trees to be better than all incorrect trees by a margin:

(Correct Parse Trees are Given in the Training Set)

Page 11: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013.

Finding the Best Tree (inference)

• Want to find the parse tree with the max score (which is the sum all the scores of all sub trees)

• Too expensive to try every combination • Trick: use non-RNN method to select best 200

trees (CKY algorithm). Then, beam search these trees with RNN.

Page 12: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013.

Model Comparisons (WSJ Dataset)

(Socher’s Model)F1 for parse labels

Page 13: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013.

Analysis of Errors

Page 14: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013.
Page 15: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013.

Conclusions:

• Not the best model, but fast• No hand engineered features• Huge number of parameters:

• Notice that Socher can’t make the standard RNN perform better than the PCFG: there is a pattern here. Most of the papers from this group involve very creative modifications to the standard RNN. (SU-RNN, RNTN, RNN+Max Pooling)

Page 16: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013.

• The model in this paper has (probably) been eclipsed by the Recursive Neural Tensor Network. Subsequent work showed this model performed better (in different situations) than the SU-RNN