Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network...

48
Slide credit from Richard Socher 1

Transcript of Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network...

Page 1: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Slide credit from Richard Socher 1

Page 2: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Sequence ModelingIdea: aggregate the meaning from all words into a vector

Compositionality

Method:◦ Basic combination: average, sum

◦ Neural combination: Recursive neural network (RvNN)

Recurrent neural network (RNN)

Convolutional neural network (CNN)

2

How to compute

這(this)

規格(specification)

有(have)

誠意(sincerity)

N-dim

Page 3: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Recursive Neural NetworkFrom Words to Phrases

3

Page 4: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Recursive Neural NetworkIdea: leverage the linguistic knowledge (syntax) for combining multiple words into phrases

Assumption: language is described recursively

4

Page 5: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Related Work for RvNNPollack (1990): Recursive auto-associative memories

Previous Recursive Neural Networks work by Goller & Küchler(1996), Costa et al. (2003) assumed fixed tree structure and used one-hot vectors.

Hinton (1990) and Bottou (2011): Related ideas about recursive models and recursive operators as smooth versions of logic operations

5

Page 6: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption

Network Architecture and Definition◦ Standard Recursive Neural Network

◦ Weight-Tied◦ Weight-Untied

◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network

Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis

6

Page 7: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption

Network Architecture and Definition◦ Standard Recursive Neural Network

◦ Weight-Tied◦ Weight-Untied

◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network

Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis

7

Page 8: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Phrase MappingPrinciple of “Compositionality”◦ The meaning (vector) of a sentence is determined by

1) the meanings of its words and

2) the rules that combine them

8

Idea: jointly learn parse trees and compositional vector representations

the country of my birth

0.40.3

2.13.3

77

44.5

2.33.6

13.5

2.53.8

5.56.1

15

the country of my birth

the place where I was born

Page 9: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Sentence Syntactic ParsingParsing is a process of analyzing a string of symbols

Parsing tree conveys1) Part-of-speech for each word

2) Phrases

3) Relationships

9

(NN = noun, VB = verb, DT = determiner, IN = Preposition)

The cat sat on the mat.

DT NN VB IN DT NN

NP

PP

VP

NP

S

Page 10: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Sentence Syntactic ParsingParsing is a process of analyzing a string of symbols

Parsing tree conveys1) Part-of-speech for each word

2) Phrases

3) Relationships

10

The cat sat on the mat.

DT NN VB

(NN = noun, VB = verb, DT = determiner, IN = Preposition)

IN DT NN

NP

PP

VP

NP

S

Page 11: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Sentence Syntactic ParsingParsing is a process of analyzing a string of symbols

Parsing tree conveys1) Part-of-speech for each word

2) Phrases• Noun phrase (NP): “the cat”, “the mat”

• Preposition phrase (PP): “on the mat”

• Verb phrase (VP): “sat on the mat”

• Sentence: “the cat sat on the mat”

3) Relationships

11

The cat sat on the mat.

DT NN VB

(NN = noun, VB = verb, DT = determiner, IN = Preposition)

IN DT NN

NP

PP

VP

NP

S

Page 12: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Sentence Syntactic ParsingParsing is a process of analyzing a string of symbols

Parsing tree conveys1) Part-of-speech for each word

2) Phrases

3) Relationships

12

The cat sat on the mat.

DT NN VB

(NN = noun, VB = verb, DT = determiner, IN = Preposition)

IN DT NN

NP

PP

VP

NP

S

subject verb modifier_of_place• “the cat” is the subject of “sat”• “on the mat” is the place

modifier of “sat”

Page 13: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Learning Structure & RepresentationVector representations incorporate the meaning of wordsand their compositional structures

13

The cat sat on the mat.

NP

PP

VP

NP

S

Page 14: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption

Network Architecture and Definition◦ Standard Recursive Neural Network

◦ Weight-Tied◦ Weight-Untied

◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network

Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis

14

Page 15: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Recursion AssumptionAre languages recursive?

Recursion helps describe natural language◦ Ex. “the church which has nice windows”, a noun phrase containing a

relative clause that contains a noun phrases

◦ NP NP PP

15

debatable

Page 16: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Recursion AssumptionCharacteristics of recursion1. Helpful in disambiguation

2. Helpful for some tasks to refer to specific phrases:◦ John and Jane went to a big festival. They enjoyed the trip and the music there.

◦ “they”: John and Jane; “the trip”: went to a big festival; “there”: big festival

3. Works better for some tasks to use grammatical tree structure

16

Language recursion is still up to debate

Page 17: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption

Network Architecture and Definition◦ Standard Recursive Neural Network

◦ Weight-Tied◦ Weight-Untied

◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network

Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis

17

Page 18: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Recursive Neural Network ArchitectureA network is to predict the vectors along with the structure◦ Input: two candidate children’s vector representations

◦ Output:

1) vector representations for the merged node

2) score of how plausible the new node would be

18

Neural Network

on the mat.

NP

PPscore

Page 19: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Recursive Neural Network Definition1) vector representations for the merged node

2) score of how plausible the new node would be

19

Neural Network

same W parameters at all nodes of the tree weight-tied

Page 20: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

3.1 0.3 0.1 0.4 2.3

Sentence Parsing via RvNN

20

Neural Network

Neural Network

Neural Network

Neural Network

Neural Network

Page 21: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

1.1

0.1 0.4 2.3

Sentence Parsing via RvNN

21

Neural Network

Neural Network

Neural Network

Neural Network

Page 22: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

1.1

0.1

3.6

Sentence Parsing via RvNN

22

Neural Network

Neural Network

Neural Network

Page 23: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

1.1

3.8

Sentence Parsing via RvNN

23

Neural Network

Neural Network

Page 24: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Sentence parsing score

Sentence Parsing via RvNN

24

Neural Network

Sentence vector embeddings

Page 25: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Backpropagation through StructurePrincipally the same as general backpropagation (Goller& Küchler, 1996)

25

1

11

lx

la

j

l

jl

i

Backward Pass

Forward Pass

Three differences Sum derivatives of W from all nodes Split derivatives at each node Add error messages from parent + node

itself

Page 26: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

1) Sum derivatives of W from all nodes

26

Neural Network

Neural Network

Page 27: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

2) Split derivatives at each node During forward propagation, the parent node is computed based on two children

During backward propagation, the errors should be computed wrt each of them

27

Neural Network Neural Network

Page 28: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

3) Add error messagesFor each node, the error message is compose of◦ Error propagated from parent

◦ Error from the current node

28

Neural Network

Page 29: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption

Network Architecture and Definition◦ Standard Recursive Neural Network

◦ Weight-Tied◦ Weight-Untied

◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network

Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis

29

Page 30: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Composition Matrix W

30

Neural Network

Issue: using the same network Wfor different compositions

Page 31: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Syntactically Untied RvNN

31

Idea: the composition function is conditioned on the syntactic categories

Neural Network

Benefit• Composition function are syntax-dependent• Allows different composition functions for word pairs,

e.g. Adv + AdjP, VP + NP

Issue: speed due to many candidates

Page 32: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Compositional Vector GrammarCompute score only for a subset of trees coming from a simpler, faster model (Socher et al, 2013)

◦ Prunes very unlikely candidates for speed

◦ Provides coarse syntactic categories of the children for each beam candidate

32Socher et al., “Parsing with Compositional Vector Grammars,” in ACL, 2013.

Probability context-free grammar (PCFG) helps decrease the search space

Page 33: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Labels for RvNNThe score can be passed through a softmax function to compute the probability of each category

33

x1

x2

x3

x4

y1

y2

y3Neural Network

softmax

NP

Socher et al., “Parsing Natural Scenes and Natural Language with Recursive Neural Networks,” in ICML, 2011.

Softmax loss cross-entropy error for optimization

Page 34: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption

Network Architecture and Definition◦ Standard Recursive Neural Network

◦ Weight-Tied◦ Weight-Untied

◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network

Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis

34

Page 35: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Recursive Neural Network

35

Neural Network

Issue: some words act mostly as an operator, e.g. “very” in “very good”

Page 36: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Matrix-Vector Recursive Neural Network

36

Neural Network

Idea: each word can additionally serve as an operator

Page 37: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption

Network Architecture and Definition◦ Standard Recursive Neural Network

◦ Weight-Tied◦ Weight-Untied

◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network

Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis

37

Page 38: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Recursive Neural Tensor Network

38

Idea: allow more interactions of vectors

Page 39: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption

Network Architecture and Definition◦ Standard Recursive Neural Network

◦ Weight-Tied◦ Weight-Untied

◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network

Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis

39

Page 40: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Language Compositionality

40

Page 41: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Image Compositionality

41

Idea: image can be composed by the visual segments (same as natural language parsing)

Page 42: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption

Network Architecture and Definition◦ Standard Recursive Neural Network

◦ Weight-Tied◦ Weight-Untied

◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network

Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis

42

Page 43: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Paraphrase for Learning Sentence Vectors

43

A pair-wise sentence comparison of nodes in parsed trees for learning sentence embeddings

Page 44: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption

Network Architecture and Definition◦ Standard Recursive Neural Network

◦ Weight-Tied◦ Weight-Untied

◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network

Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis

44

Page 45: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Sentiment Analysis

45

Sentiment analysis for sentences with negation words can benefit from RvNN

Page 46: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Sentiment AnalysisSentiment Treebank with richer annotations

46

Phrase-level sentiment labels indeed improve the performance

Socher et al., “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank,” in EMNLP, 2013.

Page 47: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Sentiment Tree Illustration Stanford live demo: http://nlp.stanford.edu/sentiment/

47Socher et al., “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank,” in EMNLP, 2013.

Phrase-level annotations learn the specific compositional functions for sentiment

Page 48: Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network (RvNN) ... Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa

Concluding RemarksRecursive Neural Network◦ Idea: syntactic compositionality

& language recursion

Network Variants◦ Standard Recursive Neural

Network◦ Weight-Tied

◦ Weight-Untied

◦ Matrix-Vector Recursive Neural Network

◦ Recursive Neural Tensor Network

48