Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar ....

33
Lecture 10: Learning in Stochastic Image Grammar Unsupervised learning and learning rate Song-Chun Zhu Center for Vision, Cognition, Learning and Arts University of California, Los Angeles At CVPR, Providence, Rhode Island June 16, 2012 SIG-12: Tutorial on Stochastic Image Grammar

Transcript of Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar ....

Page 1: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Lecture 10: Learning in Stochastic Image Grammar Unsupervised learning and learning rate

Song-Chun Zhu

Center for Vision, Cognition, Learning and Arts University of California, Los Angeles

At CVPR, Providence, Rhode Island June 16, 2012

SIG-12: Tutorial on Stochastic Image Grammar

Page 2: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Two objectives of learning

1, Explaining the data --- Generative approaches --- Hierarchy, dictionary --- Max. likelihood, Min KL-divergence 2, Minimize the loss --- driven by task --- Discriminative approaches --- long features --- Max margin, Min loss

Both have some merits. In this lecture, we focus on learning hierarchical grammatical models to explain the data. One can boost the performance by Structured-SVM once the task and loss functions are known.

Page 3: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Outline of this lecture

1, Learning And-Or Trees from data matrix case study 0: toy example case study 1: object S-AOT case study 2: event T-AOT case study 3: causal C-AOT (in lecture 8) 2, A unifying principle: learning by information projection --- pursuit of stochastic sets in the image universe 3, PAC-learning: the learning rate of AOG models

Page 4: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

1, Case study 0: a toy example Suppose we design a 2-layer And-Or graph (rewritten as tree) which generates many examples, Can we recover this AOG by unsupervised learning? What are the key factors affecting learning accuracy?

Page 5: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Phrase instances –sampled from the AoG Noisy letters of random length s are inserted between words.

Page 6: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Issue: large redundancy and ambiguities due to partition problem at this level.

Top 3-letter or 4-letter words and their frequencies.

Method: we slide windows of length =3 or 4 on the text and count the word frequency (2nd coloumn), hoping that the background/noise will be averaged out while the foreground/signal will pop-out. The 3rd column is the information Gain measure in information projection (talk later)

We treat them as our first level dictionary ∆(1) .

Page 7: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Top 6-letter ~ 8-letter words and their frequency

The top few words recover the true words in the generating grammar.

We replace the letter sequence in ∆(1) by symbols and thus shorten the sentences. Then we slide windows of length 2 on the shortened sentences to get the word account.

Page 8: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Two graph compression operators to regularize the AOT 1, Lossless compression. Should always perform this operation.

2, Lossy compression. Should perform this operation only when loss is small.

Page 9: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Underlying AoT

Successful recovery.

After the lossless compression. After the lossy compression.

Two graph compression operators to regularize the AOT

Page 10: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

The recovered grammar ! We continue this process from ∆(1), ∆(2), ∆(3), then we work downward to remove the redundancy by the two operators, to get an AOT below.

Ref: Si and Zhu, “Learning And-Or Templates for Object Recognition and Detection,” 2011-12.

Page 11: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Key-factors affecting the learning accuracy

ROC

Comparing the learned dictionary vs. the underlying (true) dictionary

AUC as a function of s and sample number n.

Separation parameter s.

Ref: Si and Zhu, “Learning And-Or Templates for Object Recognition and Detection,” 2011-12.

Page 12: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Evaluation

AoT1 AoT2

Dis( ) = ? ,

How to estimate the distance between two AND-OR templates?

Estimated using Monte-Carlo method, i.e. integration by importance sampling:

Page 13: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Complexity parameter α : controlling model complexity in graph compression.

To investigate the affect of parameter α and training sample size n on the model generalizability, we perform repeated cross validations. The result is shown in the Figure (left). The horizontal axis is the logarithm of α which is sampled at seven points (10^−6 , 10^−3 , 10^−2 , 0.1, 0.2, 0.5, 0.8), and the vertical axis is the distance between the learned model AoT α and the true model AoT∗.

The right figure shows at what sample size n and what α values can we successfully recover the generating grammar.

Evaluation

Page 14: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

0.9

0.5

0.8

0.7

1.0

0.7

0.8

1.0

0.7

0.6

0.7

0.7

0.9

1.0

1.0

1.0

0.1

0.1

0.2

0.3

0.1

0.0

0.0

0.2

0.9

1.0

0.8

1.0

0.8

1.0

1.0

1.0

0.9

1.0

1.0

1.0

0.7

0.9

0.6

0.8

0.9

1.0

0.8

1.0

0.0

0.1

0.2

0.3

0.3

0.6

0.6

0.1

0.2

0.1

0.3

0.3

0.4

0.0

0.0

0.2

0.2

0.1

0.5

0.3

0.7

0.0

0.1

0.2

0.6

0.1

0.2

0.6

0.4

0.0

0.1

0.5

0.7 0.9 1.0 0.8 0.2 0.0 0.7 0.3 0.9 0.6 0.8 0.2 0.6 0.3 0.0 0.5 0.1 0.2 0.1

Data matrix

Case study 1: Learning AOT for objects

Dictionary of atomic elements (sketch, texture, flatness, color) D

Page 15: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Recursive block pursuit

15

Page 16: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Block pursuit: geometric OR nodes

Initially collected patches

Transformed patches

Learned template One example: bear’s ear

There are other hidden variables besides the rows of the block: geometric transformation of the block template when matched to each image instance.

Geometric OR

Page 17: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Cases of Graph Compression

0.1

0.05

0.1

0.1

0.2

0.1

0.1

...

2/3

1/3

1/3

1/3

1/3

0.05

2/5

2/5

1/5

0.15

0.3

...

0.25

2/3

1/3

1/3

2/3 1/2

2/5

3/5

0.15

0.3

0.25

1/2

1/3

2/3

Page 18: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Learned And-Or templates for animal faces

Ref: Si and Zhu, “Learning And-Or Templates for Object Recognition and Detection,” 2011-12.

Page 19: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Example: learned horse AOT

AND-OR Template for articulated horses.

Page 20: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Example: learned egret AOT in comparison to the Deformable parts model

Page 21: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Go beyond classification and detection. Comparison on locating key points using the learned parts.

Page 22: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Some neurons in the pre-motor area encode actions

Mirror neurons: firing when performing action or seeing other people performing the action.

Case study 2, Unsupervised learning of actions/events

Page 23: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Some of the learned atomic actions by pursuing the co-occurrence of relations.

We ground actions as spatio-temporal relations between body parts and objects in the scene.

Ref: M.T. Pei, Z.Z. Si, B. Yao, and S.C. Zhu , “Video Event Parsing and Learning with Goal and Intent Prediction,” 2012.

Page 24: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Block for temporal co-occurrence patterns

Si, Pei et al ICCV 2011

Page 25: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Learned And-or graph for events in office scenes.

Page 26: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

2, The principle of Information projection ----Pursuit of stochastic sets in the universe of images

• High dimensional Ensembles by statistical physics and MRF models; • Low dimensional subspace or manifolds in sparse coding; • Language by grammar that compose the above primitive spaces.

Recall that, in lecture 2, we have discussed that visual concepts can be defined in three types of stochastic sets

Therefore, the objective of generative learning is to discover these stochastic sets in the vast universe of images.

Page 27: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Pursuit of stochastic sets in the universe of images

1, q = unif()

2, q = δ()

f : target distribution; p: our model; q: initial model

image universe: every point is an image.

model ~ image set ~ manifold ~ cluster

Two canonical cases: 1) Push for texture; and 2) pull for texton.

Page 28: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

We pursue a series of models p to approach a underlying “true” probability f

Model pursuit by information projection

At each step, we augment the current model p to a new model

Subject to a projection constraint:

Given only positive examples from a class c

Page 29: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Solving the constrained optimization problem leads to the Euler-Lagrange equation

Manifold pursuit by information projection

where

.

For q being a uniform distribution, we have

Page 30: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Information projection DellaPietra, DellaPietra,Lafferty, 97 Zhu, Wu, Mumford, 97

We pursue a series of models p to approach a underlying “true” probability f

It converges monotonically, as long as we have a sufficient set of features to choose from. Just like Adaboost converges as long as you have effective weak classifiers.

Page 31: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

max-step: choosing a distinct statistics to augment the structure

The key issue: what and where to look at ?

Information projection

min-step: computing the parameter given the selected feature constraint,

The algorithm iterates two steps

Page 32: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Structure learning is like IQ test

Deep issues: tabula rasa learning? Nature vs. Nurture

Ref: Susan Carey, The origin of Concepts, Oxford Univ. Press, 2009

What and where do you look?

Page 33: Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

3, Theoretical topics in the space of AOG’s

This is a theoretical topic of some ongoing research by Prof. Adrian Barbu at Florida State Univ. --- a former student of mine and Maria Pavlovskaia – a current student. If we believe AOG is the proper mathematical model, then various objects, scene, and events are instances of the space of AOG’s. Studying the properties of this space (hypothesis or model space) will lead to answers For many fundamental questions, such as, 1, The learning rate---how many examples are needed to learn a concept; 2, Learnability and identifiability; 3, Transfer, curriculum, and analogical learning; 4, Regimes of models/patterns in the space. This will explain why certain algorithms, like Adaboost, SVM, decision trees, work so well in one regime but fail in others.