One-Shot Learning with a Hierarchical Nonparametric Bayesian … › ~lcarin ›...
Transcript of One-Shot Learning with a Hierarchical Nonparametric Bayesian … › ~lcarin ›...
One-Shot Learning with a Hierarchical
Nonparametric Bayesian Model
R. Salakhutdinov, J. Tenenbaum and A. TorralbaMIT Technical Report, 2010
Presented by Esther SalazarDuke University
June 10, 2011
E. Salazar (Reading group) June 10, 2011 1 / 13
Summary
Contribution: The authors develop a hierarchical nonparametricBayesian model for learning a novel category based on a singletraining example.
The mode discovers how to group categories into meaningfulsuper-categories that express different priors for new classes.
Given a single example of a novel category, the model is able toquickly infer which super-category the new basic-level category shouldbelong to.
Motivation
Categorizing an object requires information about the category’s mean andvariance in an appropriate feature space
Similarity-based approach:
- the mean represents the category prototype- the inverse variances (precisions) correspond to the dimensional weights
in a category-specific similarity metric
In this context, a one-shot learning is not possible because a single example canprovide information about the mean but not about the variances (or similaritymetric)
Proposal: The proposed model leverages higher-order knowledge abstracted frompreviously learned categories to estimate the new category’s prototype (mean) as well asan appropriate similarity metric (variance) from just one single example or more
E. Salazar (Reading group) June 10, 2011 3 / 13
Hierarchical Bayesian model
Consider observing a set of N input features
{x1, . . . ,xn}, xn ∈ RD
Features derived from high-dimensional data such images, (e.g.D = 50, 000).
Assumption: fixed two-level category hierarchy.
Suppose that N objects are partitioned into C basic-level (level-1)categories. That partition is represented by a vectorzb = (zb1, . . . , z
bN ) such that zbn ∈ {1, . . . , C}
The C basic-level categories are partitioned into K super-categories(level-2) represented by zs = (zs1, . . . , z
sC) such that zsc ∈ {1, . . . ,K}
E. Salazar (Reading group) June 10, 2011 4 / 13
For any basic-level category c, the distribution over the observed feature vectors is
P (xn|zbn = c, θ1) =D∏
d=1
N (xnd |µcd, 1/τ
cd) (1)
where θ1 = {µc, τc}Cc=1 denotes the level-1 category parameters
Let k = zsc , the level-1 category c belong to level-2 category k and θ2 = {µk, τk, αk}Kk=1 thelevel-2 parameters
P (µc, τc|θ2,zs) =
D∏d=1
P (µcd, τcd |θ
2,zs)
P (µcd, τcd |θ
2) = P (µcd|τcd , θ
2)P (τcd |θ2) (2)
= N (µcd|µkd , 1/(ντ
cd))Γ(τcd |α
kd , α
kd/τ
kd )
P (µcd, τcd |θ2) = P (µcd|τ cd , θ2)P (τ cd |θ2)
= N (µcd|µkd, 1/(ντ cd))Γ(τ cd |αkd, αkd/τkd )
It can be easily derived that
E[µc] = µk, E[τ c] = τk
That means: the expected values of the basic level-1 parameters θ1
are given by the corresponding level-2 parameters θ2
The parameters αk controls the variability of τ c around its mean
For the level-2 parameters θ2 they assume:
E. Salazar (Reading group) June 10, 2011 6 / 13
Modeling the number of super-categories
The model is presented with a two-level partition z = {zs, zb} that assumesa fixed hierarchy for sharing parameters
Aim: To infer the distribution over the possible categories z
Proposal: The authors place a nonparametric two-level nested ChineseRestaurant Prior (nCRP) (Blei et al. 2003, 2010) over z
The nCRP(γ) extends CRP to nested sequence of partitions, one for eachlevel of the tree (prior: γ ∼ Γ(1, 1))
In this case, each observation n is first assigned to the super category zsnusing Eq. 4
Its assignment to the basic-level category zbn (under a super category zsn) isagain recursively drawn from Eq. 4
Full generative model in Fig. 1 right panel
E. Salazar (Reading group) June 10, 2011 7 / 13
Inference
Sampling level-1 parameters: Given θ2 and z the conditionaldistribution P (µc, τ c|θ2, z,x) is Normal-Gamma also
Sampling level-2 parameters: Given z and θ1 the conditionaldistributions over the mean µk and precision τk take Gaussian andInverse-Gamma forms. Also, for αk
Sampling assignments z: Given model parameters θ = {θ1, θ2},combining the likelihood term with the nCRP(γ) prior, the posteriorover the assignment zn can be calculated as follows:
E. Salazar (Reading group) June 10, 2011 8 / 13
One-shot learning
Consider observing a single new instance x∗ of a novel category c∗
Conditioned on the level-2 parameters θ2 and the tree structure z.
- Compute the posterior distribution over the assignments z∗c using Eq.
5 (which super-category the novel category belong to)- The new category can either be placed under one of the existing
super-categories or create its own super-category
Given an inferred assignment z∗c and using Eq. 2, we can infer theposterior mean and precisions terms {µ∗, τ∗} for the novel category
Also, the conditional probability that a new test input xt belongs to anovel category c∗ is:
E. Salazar (Reading group) June 10, 2011 9 / 13
Experimental results
The authors present experimental results on the MNIST handwrittendigit and MSR Cambridge object recognition image datasets.
The proposed model was compared with four base-line models
Euclidean: use Euclidean metric, i.e. all precision terms are set toone and are never updated
HB-Flat: always uses a single super-category
HB-Var: is based on clustering only covariance matrices withouttaking into account the means of the super-categories
MLE: ignores hierarchical Bayes altogether and estimates acategory-specific mean and precision from sample averages
Oracle: model that always uses the correct, instead of inferred,similarity metric
E. Salazar (Reading group) June 10, 2011 10 / 13
MSR Cambridge DatasetThe Cambridge Dataset contains images of 24 different categories
The figure shows 24 basic-level categories along with a typical partitionthat the model discovers, where many super-categories containsemantically similar basic-level categories
E. Salazar (Reading group) June 10, 2011 12 / 13