Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff...

22
Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides & Speech: Rui Zhang

Transcript of Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff...

Page 1: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.

Parsing Natural Scenes and Natural Language with

Recursive Neural NetworksRichard Socher

Cliff Chiung-Yu Lin

Andrew Y. Ng

Christopher D. Manning

Slides & Speech: Rui Zhang

Page 2: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.

Outline

•Motivation & Contribution•Recursive Neural Network•Scene Segmentation using RNN• Learning and Optimization• Language Parsing using RNN•Experiments

Page 3: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.

Motivation

•Data naturally contains recursive structures• Image: Scenes split into objects, objects split into

parts• Language: A noun phrase contains a clause which

contains noun phrases of its own

Page 4: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.

Motivation

•The recursive structure helps to• Identify components of the data•Understand how the components interact to form

the whole data

Page 5: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.

Contribution

•First deep learning method to achieve state-of-art performance on scene segmentation and annotation• Learned deep features outperform hand-crafted

ones(e.g. Gist)•Can be generalized for other tasks, e.g. language

parsing

Page 6: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.

Recursive Neural Network

•Similar to one-layer full-connected network•Models transformation from children nodes to

parent node•Recursively applied to tree structure• Parent of one layer become child

of the upper layer•Parameters shared across layers

𝑐1 𝑐2

𝑊 𝑟𝑒𝑐𝑢𝑟

𝑥

h

𝑊 𝑟𝑒𝑐𝑢𝑟

𝑐3

Page 7: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.

Recursive vs. Recurrent NN

•There are two models called RNN: Recursive and Recurrent•Similar• Both have shared parameter which are applied in a

recursive style•Different• Recursive NN applies to trees, while Recurrent NN applies

to sequences • Recurrent NN could be considered as Recursive NN for

one-way trees

Page 8: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.

Scene Segmentation Pipeline

Over segment image into superpixels

Extract feature of superpixels

Map feature onto semantic

space

Compute score for each merge

with RNN

Permute possible merges

Merge pair of nodes with

highest score

Repeat until only one node

is left

Page 9: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.

Input Data Representation

• Image• Over-segmented superpixels• Extract hand-crafted feature•Map onto semantic space by one full-connection layer to

obtain feature vector• Each superpixel has a class label

Page 10: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.

Tree Construction

• Scene parse trees are constructed in bottom-up style• Leaf nodes are over-segmented superpixels• Extract hand-crafted feature• Map onto semantic space by one full-connection layer• Each leaf has a feature vector

• An adjacency matrix records neighboring relations

Adjacency Matrix

Page 11: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.

Greedy Merging

• Nodes are merged in a greedy style• In each iteration• Permute all possible merge(pairs of adjacent nodes)• Compute score for each possible merge

• Full-connection transformation upon • Merge the pair with highest score

• and replaced by new node • becomes feature for • Union of neighbors of and becomes neighbors of

• Repeat until only one node is left

𝑐1 𝑐2

𝑊 𝑟𝑒𝑐𝑢𝑟

𝑥

h12

𝑠𝑐𝑜𝑟𝑒

𝑊 𝑠𝑐𝑜𝑟𝑒

Page 12: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.

Training(1)

•Max Margin Estimation• Structured Margin Loss • Penalize merging a segment with another one of a different label before

merging with all its neighbors of the same label • Number of sub-trees not appearing in correct trees

• Tree Score • Sum of merge scores on all non-leaf nodes

• Class Label• Softmax upon node feature vector

• Correct Trees• Adjacent nodes with same label are merged first• One image may have more than one correct

tree

Page 13: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.

Training(2)

• Intuition: We want the score of highest scoring correct tree to be larger than other trees by a margin • Formulation

• Margin

• Loss Function

• is minimized

is all model parameters is index of training image is training image is labels of is set of correct trees of is all possible trees of is the tree score function

is a node in the parse tree is the set of nodes

Page 14: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.

Training(3)

• Label of node is predicted by softmax• The margin is no differentiable• Therefore only a sub-gradient is computed

• is obtained by back-propagation• Gradient of label prediction is also obtained by

back-propagation

𝑐1 𝑐2

𝑊 𝑟𝑒𝑐𝑢𝑟

𝑥

h12

𝑠𝑐𝑜𝑟𝑒

𝑊 𝑠𝑐𝑜𝑟𝑒 𝑊 𝑙𝑎𝑏𝑒𝑙

𝑙𝑎𝑏𝑒𝑙

Page 15: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.

Language Parsing

• Language parsing is similar to scene parsing• Differences

• Input is natural language sentence• Adjacency is strictly left and right• Class labels are syntactical classes

• Word Level• Phrase Level• Clause(从句 ) Level

• Each sentence has only one correct tree

Page 16: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.

Experiments Overview

• Image• Scene Segmentation and Annotation• Scene Classification• Nearest Neighbor Scene Subtree

• Language• Supervised Language Parsing• Nearest Neighbor Phrases

Page 17: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.

Scene Segmentation and Annotation• Dataset

• Stanford Background Dataset

• Task:• Segment and label foreground and

different types of background pixelwise

• Result• 78.1% pixelwise accuracy• 0.6% above state-of-art

Page 18: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.

Scene Classification

• Dataset• Stanford Background Dataset

• Task• Three classes: city, countryside, sea-side

• Method• Feature: Average of all node features/top node feature only• Classifier: Linear SVM

• Result• 88.1% accuracy for average feature

• 4.1% above Gist, the state-of-art feature• 71.0% accuracy for top feature

• Discussion• Learned RNN feature can better capture semantic info of scene• Top feature losses some lower level info

Page 19: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.

Nearest Neighbor Scene Subtrees• Dataset

• Stanford Background Dataset

• Task• Retrieve similar segments from all images• Subtrees whose nodes have the same label

corresponds to a segment

• Method• Feature: Top node feature of the subtree• Metrics: Euclidean Distance

• Result• Similar segments are retrieved

• Discuss• RNN feature can capture segment level

characteristics

Page 20: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.

Supervised Language Parsing

• Dataset• Penn Treebank• Wall Street Journal Section

• Task• Generate parse tree with labeled node

• Result• Unlabeled bracketing F-measure• 90.29%, comparable to 91.63% of Berkley Parser

Page 21: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.

Nearest Neighbor Phrases

• Dataset• Penn Treebank• Wall Street Journal Section

• Task• Retrieve nearest neighbor of given

sentence

• Method• Feature: Top node feature• Metrics: Euclidean Distance

• Result• Similar sentences are retrieved

Page 22: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.

Discussion

• Understanding semantic structure of data is essential for applications like fine-grained search or captioning• Recursive NN predicts tree structure along with node labels

in an elegant way• Recursive NN can be incorporated with CNN• If we can jointly learn Recursive NN with