One-Shot Learning with a Hierarchical Nonparametric Bayesian … › ~lcarin ›...

13
One-Shot Learning with a Hierarchical Nonparametric Bayesian Model R. Salakhutdinov, J. Tenenbaum and A. Torralba MIT Technical Report, 2010 Presented by Esther Salazar Duke University June 10, 2011 E. Salazar (Reading group) June 10, 2011 1 / 13

Transcript of One-Shot Learning with a Hierarchical Nonparametric Bayesian … › ~lcarin ›...

Page 1: One-Shot Learning with a Hierarchical Nonparametric Bayesian … › ~lcarin › Esther6.10.2011.pdf · 2011-06-10 · One-Shot Learning with a Hierarchical Nonparametric Bayesian

One-Shot Learning with a Hierarchical

Nonparametric Bayesian Model

R. Salakhutdinov, J. Tenenbaum and A. TorralbaMIT Technical Report, 2010

Presented by Esther SalazarDuke University

June 10, 2011

E. Salazar (Reading group) June 10, 2011 1 / 13

Page 2: One-Shot Learning with a Hierarchical Nonparametric Bayesian … › ~lcarin › Esther6.10.2011.pdf · 2011-06-10 · One-Shot Learning with a Hierarchical Nonparametric Bayesian

Summary

Contribution: The authors develop a hierarchical nonparametricBayesian model for learning a novel category based on a singletraining example.

The mode discovers how to group categories into meaningfulsuper-categories that express different priors for new classes.

Given a single example of a novel category, the model is able toquickly infer which super-category the new basic-level category shouldbelong to.

Page 3: One-Shot Learning with a Hierarchical Nonparametric Bayesian … › ~lcarin › Esther6.10.2011.pdf · 2011-06-10 · One-Shot Learning with a Hierarchical Nonparametric Bayesian

Motivation

Categorizing an object requires information about the category’s mean andvariance in an appropriate feature space

Similarity-based approach:

- the mean represents the category prototype- the inverse variances (precisions) correspond to the dimensional weights

in a category-specific similarity metric

In this context, a one-shot learning is not possible because a single example canprovide information about the mean but not about the variances (or similaritymetric)

Proposal: The proposed model leverages higher-order knowledge abstracted frompreviously learned categories to estimate the new category’s prototype (mean) as well asan appropriate similarity metric (variance) from just one single example or more

E. Salazar (Reading group) June 10, 2011 3 / 13

Page 4: One-Shot Learning with a Hierarchical Nonparametric Bayesian … › ~lcarin › Esther6.10.2011.pdf · 2011-06-10 · One-Shot Learning with a Hierarchical Nonparametric Bayesian

Hierarchical Bayesian model

Consider observing a set of N input features

{x1, . . . ,xn}, xn ∈ RD

Features derived from high-dimensional data such images, (e.g.D = 50, 000).

Assumption: fixed two-level category hierarchy.

Suppose that N objects are partitioned into C basic-level (level-1)categories. That partition is represented by a vectorzb = (zb1, . . . , z

bN ) such that zbn ∈ {1, . . . , C}

The C basic-level categories are partitioned into K super-categories(level-2) represented by zs = (zs1, . . . , z

sC) such that zsc ∈ {1, . . . ,K}

E. Salazar (Reading group) June 10, 2011 4 / 13

Page 5: One-Shot Learning with a Hierarchical Nonparametric Bayesian … › ~lcarin › Esther6.10.2011.pdf · 2011-06-10 · One-Shot Learning with a Hierarchical Nonparametric Bayesian

For any basic-level category c, the distribution over the observed feature vectors is

P (xn|zbn = c, θ1) =D∏

d=1

N (xnd |µcd, 1/τ

cd) (1)

where θ1 = {µc, τc}Cc=1 denotes the level-1 category parameters

Let k = zsc , the level-1 category c belong to level-2 category k and θ2 = {µk, τk, αk}Kk=1 thelevel-2 parameters

P (µc, τc|θ2,zs) =

D∏d=1

P (µcd, τcd |θ

2,zs)

P (µcd, τcd |θ

2) = P (µcd|τcd , θ

2)P (τcd |θ2) (2)

= N (µcd|µkd , 1/(ντ

cd))Γ(τcd |α

kd , α

kd/τ

kd )

Page 6: One-Shot Learning with a Hierarchical Nonparametric Bayesian … › ~lcarin › Esther6.10.2011.pdf · 2011-06-10 · One-Shot Learning with a Hierarchical Nonparametric Bayesian

P (µcd, τcd |θ2) = P (µcd|τ cd , θ2)P (τ cd |θ2)

= N (µcd|µkd, 1/(ντ cd))Γ(τ cd |αkd, αkd/τkd )

It can be easily derived that

E[µc] = µk, E[τ c] = τk

That means: the expected values of the basic level-1 parameters θ1

are given by the corresponding level-2 parameters θ2

The parameters αk controls the variability of τ c around its mean

For the level-2 parameters θ2 they assume:

E. Salazar (Reading group) June 10, 2011 6 / 13

Page 7: One-Shot Learning with a Hierarchical Nonparametric Bayesian … › ~lcarin › Esther6.10.2011.pdf · 2011-06-10 · One-Shot Learning with a Hierarchical Nonparametric Bayesian

Modeling the number of super-categories

The model is presented with a two-level partition z = {zs, zb} that assumesa fixed hierarchy for sharing parameters

Aim: To infer the distribution over the possible categories z

Proposal: The authors place a nonparametric two-level nested ChineseRestaurant Prior (nCRP) (Blei et al. 2003, 2010) over z

The nCRP(γ) extends CRP to nested sequence of partitions, one for eachlevel of the tree (prior: γ ∼ Γ(1, 1))

In this case, each observation n is first assigned to the super category zsnusing Eq. 4

Its assignment to the basic-level category zbn (under a super category zsn) isagain recursively drawn from Eq. 4

Full generative model in Fig. 1 right panel

E. Salazar (Reading group) June 10, 2011 7 / 13

Page 8: One-Shot Learning with a Hierarchical Nonparametric Bayesian … › ~lcarin › Esther6.10.2011.pdf · 2011-06-10 · One-Shot Learning with a Hierarchical Nonparametric Bayesian

Inference

Sampling level-1 parameters: Given θ2 and z the conditionaldistribution P (µc, τ c|θ2, z,x) is Normal-Gamma also

Sampling level-2 parameters: Given z and θ1 the conditionaldistributions over the mean µk and precision τk take Gaussian andInverse-Gamma forms. Also, for αk

Sampling assignments z: Given model parameters θ = {θ1, θ2},combining the likelihood term with the nCRP(γ) prior, the posteriorover the assignment zn can be calculated as follows:

E. Salazar (Reading group) June 10, 2011 8 / 13

Page 9: One-Shot Learning with a Hierarchical Nonparametric Bayesian … › ~lcarin › Esther6.10.2011.pdf · 2011-06-10 · One-Shot Learning with a Hierarchical Nonparametric Bayesian

One-shot learning

Consider observing a single new instance x∗ of a novel category c∗

Conditioned on the level-2 parameters θ2 and the tree structure z.

- Compute the posterior distribution over the assignments z∗c using Eq.

5 (which super-category the novel category belong to)- The new category can either be placed under one of the existing

super-categories or create its own super-category

Given an inferred assignment z∗c and using Eq. 2, we can infer theposterior mean and precisions terms {µ∗, τ∗} for the novel category

Also, the conditional probability that a new test input xt belongs to anovel category c∗ is:

E. Salazar (Reading group) June 10, 2011 9 / 13

Page 10: One-Shot Learning with a Hierarchical Nonparametric Bayesian … › ~lcarin › Esther6.10.2011.pdf · 2011-06-10 · One-Shot Learning with a Hierarchical Nonparametric Bayesian

Experimental results

The authors present experimental results on the MNIST handwrittendigit and MSR Cambridge object recognition image datasets.

The proposed model was compared with four base-line models

Euclidean: use Euclidean metric, i.e. all precision terms are set toone and are never updated

HB-Flat: always uses a single super-category

HB-Var: is based on clustering only covariance matrices withouttaking into account the means of the super-categories

MLE: ignores hierarchical Bayes altogether and estimates acategory-specific mean and precision from sample averages

Oracle: model that always uses the correct, instead of inferred,similarity metric

E. Salazar (Reading group) June 10, 2011 10 / 13

Page 11: One-Shot Learning with a Hierarchical Nonparametric Bayesian … › ~lcarin › Esther6.10.2011.pdf · 2011-06-10 · One-Shot Learning with a Hierarchical Nonparametric Bayesian
Page 12: One-Shot Learning with a Hierarchical Nonparametric Bayesian … › ~lcarin › Esther6.10.2011.pdf · 2011-06-10 · One-Shot Learning with a Hierarchical Nonparametric Bayesian

MSR Cambridge DatasetThe Cambridge Dataset contains images of 24 different categories

The figure shows 24 basic-level categories along with a typical partitionthat the model discovers, where many super-categories containsemantically similar basic-level categories

E. Salazar (Reading group) June 10, 2011 12 / 13

Page 13: One-Shot Learning with a Hierarchical Nonparametric Bayesian … › ~lcarin › Esther6.10.2011.pdf · 2011-06-10 · One-Shot Learning with a Hierarchical Nonparametric Bayesian