One-Shot Learning with a Hierarchical Nonparametric Bayesian … › ~lcarin ›...

One-Shot Learning with a Hierarchical

Nonparametric Bayesian Model

R. Salakhutdinov, J. Tenenbaum and A. TorralbaMIT Technical Report, 2010

Presented by Esther SalazarDuke University

June 10, 2011

E. Salazar (Reading group) June 10, 2011 1 / 13

Summary

Contribution: The authors develop a hierarchical nonparametricBayesian model for learning a novel category based on a singletraining example.

The mode discovers how to group categories into meaningfulsuper-categories that express different priors for new classes.

Given a single example of a novel category, the model is able toquickly infer which super-category the new basic-level category shouldbelong to.

Motivation

Categorizing an object requires information about the category’s mean andvariance in an appropriate feature space

Similarity-based approach:

- the mean represents the category prototype- the inverse variances (precisions) correspond to the dimensional weights

in a category-specific similarity metric

In this context, a one-shot learning is not possible because a single example canprovide information about the mean but not about the variances (or similaritymetric)

Proposal: The proposed model leverages higher-order knowledge abstracted frompreviously learned categories to estimate the new category’s prototype (mean) as well asan appropriate similarity metric (variance) from just one single example or more


Hierarchical Bayesian model

Consider observing a set of N input features

{x1, . . . ,xn}, xn ∈ RD

Features derived from high-dimensional data such images, (e.g.D = 50, 000).

Assumption: fixed two-level category hierarchy.

Suppose that N objects are partitioned into C basic-level (level-1)categories. That partition is represented by a vectorzb = (zb1, . . . , z

bN ) such that zbn ∈ {1, . . . , C}

The C basic-level categories are partitioned into K super-categories(level-2) represented by zs = (zs1, . . . , z

sC) such that zsc ∈ {1, . . . ,K}


For any basic-level category c, the distribution over the observed feature vectors is

P (xn|zbn = c, θ1) =D∏

d=1

N (xnd |µcd, 1/τ

cd) (1)

where θ1 = {µc, τc}Cc=1 denotes the level-1 category parameters

Let k = zsc , the level-1 category c belong to level-2 category k and θ2 = {µk, τk, αk}Kk=1 thelevel-2 parameters

P (µc, τc|θ2,zs) =

D∏d=1

P (µcd, τcd |θ

2,zs)

P (µcd, τcd |θ

2) = P (µcd|τcd , θ

2)P (τcd |θ2) (2)

= N (µcd|µkd , 1/(ντ

cd))Γ(τcd |α

kd , α

kd/τ

kd )

P (µcd, τcd |θ2) = P (µcd|τ cd , θ2)P (τ cd |θ2)

= N (µcd|µkd, 1/(ντ cd))Γ(τ cd |αkd, αkd/τkd )

It can be easily derived that

E[µc] = µk, E[τ c] = τk

That means: the expected values of the basic level-1 parameters θ1

are given by the corresponding level-2 parameters θ2

The parameters αk controls the variability of τ c around its mean

For the level-2 parameters θ2 they assume:


Modeling the number of super-categories

The model is presented with a two-level partition z = {zs, zb} that assumesa fixed hierarchy for sharing parameters

Aim: To infer the distribution over the possible categories z

Proposal: The authors place a nonparametric two-level nested ChineseRestaurant Prior (nCRP) (Blei et al. 2003, 2010) over z

The nCRP(γ) extends CRP to nested sequence of partitions, one for eachlevel of the tree (prior: γ ∼ Γ(1, 1))

In this case, each observation n is first assigned to the super category zsnusing Eq. 4

Its assignment to the basic-level category zbn (under a super category zsn) isagain recursively drawn from Eq. 4

Full generative model in Fig. 1 right panel


Inference

Sampling level-1 parameters: Given θ2 and z the conditionaldistribution P (µc, τ c|θ2, z,x) is Normal-Gamma also

Sampling level-2 parameters: Given z and θ1 the conditionaldistributions over the mean µk and precision τk take Gaussian andInverse-Gamma forms. Also, for αk

Sampling assignments z: Given model parameters θ = {θ1, θ2},combining the likelihood term with the nCRP(γ) prior, the posteriorover the assignment zn can be calculated as follows:


One-shot learning

Consider observing a single new instance x∗ of a novel category c∗

Conditioned on the level-2 parameters θ2 and the tree structure z.

- Compute the posterior distribution over the assignments z∗c using Eq.

5 (which super-category the novel category belong to)- The new category can either be placed under one of the existing

super-categories or create its own super-category

Given an inferred assignment z∗c and using Eq. 2, we can infer theposterior mean and precisions terms {µ∗, τ∗} for the novel category

Also, the conditional probability that a new test input xt belongs to anovel category c∗ is:


Experimental results

The authors present experimental results on the MNIST handwrittendigit and MSR Cambridge object recognition image datasets.

The proposed model was compared with four base-line models

Euclidean: use Euclidean metric, i.e. all precision terms are set toone and are never updated

HB-Flat: always uses a single super-category

HB-Var: is based on clustering only covariance matrices withouttaking into account the means of the super-categories

MLE: ignores hierarchical Bayes altogether and estimates acategory-specific mean and precision from sample averages

Oracle: model that always uses the correct, instead of inferred,similarity metric


MSR Cambridge DatasetThe Cambridge Dataset contains images of 24 different categories

The figure shows 24 basic-level categories along with a typical partitionthat the model discovers, where many super-categories containsemantically similar basic-level categories


One-Shot Learning with a Hierarchical Nonparametric Bayesian … › ~lcarin ›...

Documents

Transcript of One-Shot Learning with a Hierarchical Nonparametric Bayesian … › ~lcarin ›...