Chris Bishop’s PRML Ch. XII: Continuous Latent Variables
description
Transcript of Chris Bishop’s PRML Ch. XII: Continuous Latent Variables

Chris Bishops PRMLCh. XII: Continuous Latent Variables
Caroline BernardMichel & Herve Jegou
June 12, 2008
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Introduction
I Aim of this chapter: dimensionality reductionI Can be interesting for lossy data compression, feature
extraction and data visualization.
I Example: synthetic data setI Choice of one of the offline digit imagesI Creation of multiple copies with a random displacement and
rotation
I Individuals= images (28 28 = 784)I Variables= pixels grey levelsI Only two latent variables: the translation and the rotation
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Chapter content
I Principal Component AnalysisI Maximum variance formulationI Minimumerror formulationI Applications of PCAI PCA for highdimensional data
I Probabilistic PCAI Problem setupI Maximum likelihood PCAI EM algorithm for PCAI Bayesian PCAI Factor analysis
I Kernel PCAI Nonlinear Latent Variable Models
I Independent component analysisI Autoassociative neural networksI Modelling nonlinear manifolds
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Maximum variance formulationI Consider a data set of observations {xn} where n = 1, . . . , N
(xn with dimensionality D).I Idea of PCA: Project this data onto a space of lower
dimensionality M < D, called the principal subspace, whilemaximizing the variance of the projected data
45
50
55
60
65
70
30
35
40
45
50
55
60
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Notations
We will denote by:
I D the dimensionalityI M the fixed dimension of the principal subspaceI {ui}, i = 1, . . . ,M the basis vectors ((D 1) vectors) of the
principal subspace
I The sample mean ((D 1) vector) by:
x =1
N
Nn=1
xn (1.90)
I The sample variance/covariance matrix ((D D) matrix) by:
S =1
N
Nn=1
(xn x)(xn x)T (1)
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Idea of PCA with onedimensional principal subspace
I Let us consider a unit Ddimensional normalized vector u1(uT1 u1 = 1)
I Each point xn is then projected onto a scalar is uT1 xnI The mean of the projected data is:
uT1 x (2)
I The variance of the projected data is:
1
N
Nn=1
uT1 xn uT1 x2= uT1 Su1 (3)
Idea of PCA: Maximize the projected variance uT1 Su1 with respectto u1 under the normalization constraint u
T1 u1 = 1
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Idea of PCA with onedimensional principal subspace
I Trick: introduce the Lagrange multiplier 1I Unconstrained maximization of uT1 Su1 + 1(1 uT1 u1)I Solution must verify:
Su1 = 1u1 (4)
I u1 must be an eigenvector of S having eigenvalue 1!I The variance of the projected data is 1 (uT1 Su1 = 1), so 1
has to be the largest eigenvalue!I Additional principal components are obtained maximizing the
projected variance amongst all possible directions orthogonalto those already considered!
I PCA =calculating the eigenvectors of the data covariancematrix corresponding to the largest eigenvalues!
I Note:D
i=1 i is generally called the total inerty or the totalvariance. The percentage of inerty explained by onecomponent ui is then
iPDi=1 i
.
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Chapter content
I Principal Component AnalysisI Maximum variance formulationI Minimumerror formulationI Applications of PCAI PCA for highdimensional data
I Probabilistic PCAI Problem setupI Maximum likelihood PCAI EM algorithm for PCAI Bayesian PCAI Factor analysis
I Kernel PCAI Nonlinear Latent Variable Models
I Independent component analysisI Autoassociative neural networksI Modelling nonlinear manifolds
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Minimumerror formulation
I Based on projection error minimizationI Consider a Ddimensional basis vectors {ui} wherei = 1, . . . , D satisfying uTi uj = ij
I Each data point xn can be represented by:
xn =Di=1
niui where ni = xTnui (5)
I xn can be approximated by
xn =Mi=1
zniui +D
i=M+1
biui (6)
I Idea of PCA: Minimize the distortion J introduced by thereduction in dimensionality
J =1
N
Nn=1
 xn xn 2 (7)
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Minimumerror formulation (2)
I Setting the derivative with respect to znj and to bj , oneobtains that:
znj = xTnuj and bj = x
Tuj (8)
I J can then be expressed as:
J =1
N
Di=M+1
uTi Sui (9)
I The minimum is obtained when {ui}, i =M + 1, . . . , D arethe eigenvectors of S associated to the smallest eigenvalues.
I The distortion is then given by J =D
i=M+1 iI xn is approximated by:
xn =Mi=1
(xTnui)ui +D
i=M+1
(xTui)ui = x+Mi=1
(xTn xTui)ui(10)
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Chapter content
I Principal Component AnalysisI Maximum variance formulationI Minimumerror formulationI Applications of PCAI PCA for highdimensional data
I Probabilistic PCAI Problem setupI Maximum likelihood PCAI EM algorithm for PCAI Bayesian PCAI Factor analysis
I Kernel PCAI Nonlinear Latent Variable Models
I Independent component analysisI Autoassociative neural networksI Modelling nonlinear manifolds
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Application of PCA: data compression
I Individuals = imagesI Variables = grey levels of each pixel (784)
(a)0 200 400 600
0
1
2
3
x 105
(b)0 200 400 600
0
1
2
3x 106
Mean
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Application of PCA: data compression (2)
I Compression using the PCA approximation for xn
xn = x+Mi=1
{xnui xui}ui (11)
I For each data point we have replaced the Ddimensionalvector xn with an Mdimensional vector having components(xTnui xTui)
Original
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Application of PCA: data preprocessing
I Usually, individual standardization of each variable: eachvariable has zero mean and unit variance. Variables stillcorrelated.
I Use of PCA for standardization:I writing the eigenvector equation SU = UL where L is aD D diagonal matrix with element i and U is a D Dorthogonal matrix with columns given by ui
I And defining by: yn = L1/2UT (xn x)
I yn has zero mean and identity covariance matrix (newvariables are decorrelated)
2 4 640
50
60
70
80
90
100
2 0 2
2
0
2
2 0 2
2
0
2
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Application of PCA: data preprocessing
I Comparison: PCA chooses the direction of maximum variancewhereas the Fishers linear discriminant takes account of theclass labels (see Chap. 4).
5 0 52
1.5
1
0.5
0
0.5
1
1.5
I Vizualization: projection of the oil data flow onto the first twoprincipal factors. Three geometrical configurations of the oil,water and gas phases.
Mix
Gas
Water
Oil
Homogeneous
Stratified Annular
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Chapter content
I Principal Component AnalysisI Maximum variance formulationI Minimumerror formulationI Applications of PCAI PCA for highdimensional data
I Probabilistic PCAI Problem setupI Maximum likelihood PCAI EM algorithm for PCAI Bayesian PCAI Factor analysis
I Kernel PCAI Nonlinear Latent Variable Models
I Independent component analysisI Autoassociative neural networksI Modelling nonlinear manifolds
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

PCA for highdimensional data
I Number of data points N is smaller than the dimensionality DI At least D N + 1 of the eigenvalues equal to zero!I Generally computationally infeasible.
I Let us denote X the (N D)dimensional centred matrix.I The covariance matrix can be writen as S = N1XTXI It can be shown that S has D N + 1 eigenvalues of value
zero and N 1 eigenvalues as XXTI If we denote the eigenvectors of XXT by vi, the normalized
eigenvectors ui for S can be deduced by:
ui =1
(Ni)12
XT vi (12)
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Chapter content
I Principal Component AnalysisI Maximum variance formulationI Minimumerror formulationI Applications of PCAI PCA for highdimensional data
I Probabilistic PCAI Problem setupI Maximum likelihood PCAI EM algorithm for PCAI Bayesian PCAI Factor analysis
I Kernel PCAI Nonlinear Latent Variable Models
I Independent component analysisI Autoassociative neural networksI Modelling nonlinear manifolds
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Probabilistic PCA
Advantages:
I Derive an EM algorithm for PCA that is computationallyefficient
I Allows to deal with missing values in the datasetI Mixture of probabilistic PCA modelsI Basis for the Bayesian treatment of PCA in which the
dimensionality of the principal subspace can be foundautomatically.
I ...
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Probabilistic PCA (2)
Related to factor analysis:
I A latent variable model seeks to relate a Ddimensionalobservation vector x to a corresponding M dimensionalGaussian latent variable z
x =Wz + + (13)
whereI z is an M dimensional Gaussian latent variableI W is an (D M) matrix (the latent space)I is a Ddimensional Gaussian noiseI and z are independentI is a parameter vector that permits the model to have non
zero mean
I Factor analysis: v N(O,)I Probabilistic PCA: v N(O, 2I)
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Probabilistic PCA (3)
I The use of the isotropic Gaussian noise model for impliesthat the zconditional probability distribution over xspace isgiven by
x/z v N (Wz + , 2I) (14)I Defining z v N (0, I), the marginal distribution of x is
obtained by integrating out the latent variables and is likewiseGaussian
x v N(,C) (15)
with C =WW T + 2I
To do: estimate the parameters: , W and 2
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Chapter content
I Principal Component AnalysisI Maximum variance formulationI Minimumerror formulationI Applications of PCAI PCA for highdimensional data . .
I Probabilistic PCAI Problem setupI Maximum likelihood PCAI EM algorithm for PCAI Bayesian PCAI Factor analysis
I Kernel PCAI Nonlinear Latent Variable Models
I Independent component analysisI Autoassociative neural networksI Modelling nonlinear manifolds
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Maximum likelihood PCA
Given a data set X = {xn} of observed data points, the loglikelihood is given by:
L = ND2
ln(2pi) N2ln  C  1
2
n=1
N(xn )TC1(xn )(16)
Setting the derivative with respect to gives
= x (17)
Backsubstituting,we can write:
L = ND2
ln(2pi) + ln  C  +Tr(C1S) (18)
This solution represents the unique maximum
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Maximum likelihood PCA (2)
Maximization with respect to W and 2 is more complex but hasan exact closedform solution
WML = UM (LM 2I) 12R (19)
where
I UM is a (D M) matrix whose columns are given by theeigenvectors of S whose eigenvalues are the M largest
I LM is an (M M) diagonal matrix given by thecorresponding eigenvalues i
I R is an arbitrary (M M) orthogonal matrix.
2ML =1
D MD
i=M+1
i (20)
Average variance of the discarded dimensionsCaroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Maximum likelihood PCA (3)
I R can be interpreted as a rotation matrix in the M Mlatent space
I The predictive density is unchanged by rotationsI If R = I, the columns of W are the principle component
eigenvectors scaled by the variance i 2I The model correctly captures the variance of the data along
the principal axes and approximates the variance in allremaining directions with a single average value 2. Variancelost in the projections.
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Maximum likelihood PCA (4)
I PCA generally expressed as a projection of points from theDdimensional dataspace onto an Mdimensional subspace
I Use of the posterior distribution
z/x v N(M1W T (x ), 2M) (21)
where M =W TW + 2IThe mean is given by
E(z/x) =M1W TML(x x) (22)
Note: Takes the same form as the solution of a regularizedlinear regression!This projects to a point in data space given by
WE(z/x) + (23)
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Chapter content
I Principal Component AnalysisI Maximum variance formulationI Minimumerror formulationI Applications of PCAI PCA for highdimensional data
I Probabilistic PCAI Problem setupI Maximum likelihood PCAI EM algorithm for PCAI Bayesian PCAI Factor analysis
I Kernel PCAI Nonlinear Latent Variable Models
I Independent component analysisI Autoassociative neural networksI Modelling nonlinear manifolds
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

EM algorithm for PCA
I In spaces of high dimensionality, computational advantagesusing EM!
I Can be extended to factor analysis for which there is noclosedform solution
I Can be used when values are missing, for mixture models...
Requires the completedata log likelihood function that takes theform:
Lc =Nn=1
lnp(xn/zn) + lnp(zn) (24)
In the followings, is substituted by the sample mean x
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

EM algorithm for PCA
I Initialize the parameters W and 2I Estep
E[Lc] = NXn=1
{D2ln(2pi2) +
1
2Tr(E[znzTn ])
+1
22 xn 2 1
2E[zTn ]WT (xn ) +
1
22Tr(E[znZTn ]WTW )}
withE[zn] = M1W (xn x)
E[znzTn ] = 2M1 + E[zn]E[zn]T
I Mstep
Wnew = [NXn=1
(xn x)E[zn]T ][NXn=1
E[znzTn ]]1 (25)
2new =1
ND
NXn=1
{ xn x 2 2E[zn]TWTnew(xn x)
+ Tr(E[znzTn ]WnewWnew)}I Check for convergence
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

EM algorithm for PCA
I When 2 0, EM approach corresponds to standard PCAI Defining X a matrix of size N D whose nth row is given byxn x
I Defining a matrix of size D M whose nth row is given bythe vector E[zn]
I The Estep becomes
= (W ToldWold)1W ToldX (26)
Orthogonal projection on the current estimate for theprincipal subspace
I The Mstep takes the form
Wnew = XTT (T )1 (27)
Reestimation of the principal subspace minimizing thesquared reconstruction errors in which the projections are fixed
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

EM algorithm for PCA
(a)
2 0 2
2
0
2 (b)
2 0 2
2
0
2 (c)
2 0 2
2
0
2
(d)
2 0 2
2
0
2 (e)
2 0 2
2
0
2 (f)
2 0 2
2
0
2
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Chapter content
I Principal Component AnalysisI Maximum variance formulationI Minimumerror formulationI Applications of PCAI PCA for highdimensional data
I Probabilistic PCAI Problem setupI Maximum likelihood PCAI EM algorithm for PCAI Bayesian PCAI Factor analysis
I Kernel PCAI Nonlinear Latent Variable Models
I Independent component analysisI Autoassociative neural networksI Modelling nonlinear manifolds
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Idea of Bayesian PCA
I Usefull to choose the dimensionality M of the principalsubspace
I Cross validation with a validation data set: computationallycostly!
I Define an independent Gaussian prior over each column wi ofW . The variance is governed by a precision parameter i
p(W/) =Mi=1
(i2pi
)D/2exp{12i
Ti wi} (28)
I Values of i are estimated iteratively by maximizing thelogarithm of the marginal likelihood function:
p(X/, , 2) =
p(XW,, 2)p(W )dW (29)
I The effective dimensionality of the principal subspace isdetermined by the number of finite i values. Principalsubspace = the corresponding wi.
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Idea of Bayesian PCA
I Maximization with respect to i:
newi =D
wTi wi(30)
I These estimations are intervealed with EM algorithm withWnew modified
Wnew = [Nn=1
(xn x)E[zn]T ][Nn=1
E[znzTn ] + 2A]1 (31)
with A = diag(i)I Example: 300 point in dimension D sampled from a Gaussian distribution having M = 3 directions
with larger variance
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Chapter content
I Principal Component AnalysisI Maximum variance formulationI Minimumerror formulationI Applications of PCAI PCA for highdimensional data
I Probabilistic PCAI Problem setupI Maximum likelihood PCAI EM algorithm for PCAI Bayesian PCAI Factor analysis
I Kernel PCAI Nonlinear Latent Variable Models
I Independent component analysisI Autoassociative neural networksI Modelling nonlinear manifolds
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Factor analysis
I Closely related to Bayesian PCA but the covariance of p(xz) diagonal instead of isotropic:
x/z v N (Wz + ,) (64)
where is a D D diagonal matrix.I The components variance of natural axes is explained by I Observed covariance structured is captured by WI Consequences
I For PCA,rotation of data space same fit with W rotated with thesame matrix
I For Factor Analysis, the analogous property is: componentwiserescaling is absorbed into the rescaling elements of
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Factor analysis
I The marginal density of the observed variable is
x v N (,WW T +) (65)I As in probabilistic PCA, the model is invariant w.r.t the latent
space
I , W and can be determined by maximum likehoodI = x, as in probabilistic PCAI But no closedform ML solution for W iteratively estimated using EM
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Parameters estimation using EM
I E step
E[zn] = GW T1(x x) (66)E[znzTn ] = G+ E[zn] E[zn]T (67)
where G = (I +W T1W )1
I M step
W new =
[Nn=1
(x x)E[zn]T][
Nn
E[znzTn ]
]1(69)
new = diag
{S W new 1
N
Nn=1
E[zn](xn x)T}
(70)
where the diag operator zeros all non diagonal elements
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Chapter content
I Principal Component AnalysisI Maximum variance formulationI Minimumerror formulationI Applications of PCAI PCA for highdimensional data
I Probabilistic PCAI Problem setupI Maximum likelihood PCAI EM algorithm for PCAI Bayesian PCAI Factor analysis
I Kernel PCAI Nonlinear Latent Variable Models
I Independent component analysisI Autoassociative neural networksI Modelling nonlinear manifolds
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Kernel PCA
Applying the ideas of Kernel substitution (see Chapter 5) to PCA
x1
x2
2
1v1
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Kernel PCA: preliminaries
Kernel substitution: express each step of PCA in terms of the innerproduct xTx between data vectors to generalize the inner product
I Recall that the principal components are defined as
Sui = iui (71)
with ui2 = uTi ui = 1 and covariance Matrix S defined as
S =1
N
Nn=1
xnxTn (72)
I Consider a nonlinear mapping transformation into aM dimensional feature space maps any data point xn onto (xn)
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Kernel PCA
I Let assume that
n(xn) = 0
I the M M sample covariance matrix C in feature space isgiven by
C =1
N
Nn=1
(xn)(xn)T (73)
with eigenvector expansion as
Cvi = ivi, i = 1, ...,M (74)
I Goal: solve this eigenvalue problem without working explicitlyin the feature space
I Eigenvector vi can be written as a linear combination of the(xn), of the form
vi =Mn=1
ain(xn) (76)
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Kernel PCA
Note: typo in (12.78)
I The eigenvector equation can then be defined in terms of thekernel function as
K2ai = iNKai (79)
where ai = (a1i, . . . , aNi)T , unknown at this point.
I The ai can be found by solving the eigenvalue problem:
Kai = iNai (80)
I The ais normalization condition is obtained by requiring thatthe eigenvectors in feature space be normalized:
1 = vTi vi = aTi Kai = iNa
Ti ai (81)
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Kernel PCA
I The resulting principal component projections can also be castin terms of the kernel function
I A point x is projected onto eigenvector i as
yi(x) = (x)T vi
=Nn=1
ain k(x, xn) (82)
I Remarks:I At most D linear principal componentsI The number of nonlinear principal components can exceed DI The number of nonzero eigenvalues cannot exceed the number
of data points N
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Kernel PCA
I Up to now, we assumed that the projected data has zero mean
Ni=1
(xn) = 0
I This mean cant be simply computed and subtractedI However the projected data points after centralizing can be
obtained as
(xn) = (xn) 1N
Nl=1
(xl) (83)
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Kernel PCA
I The corresponding elements of the Gram matrix are given by
Knm = (xn)T (xm)
= k(xn, xm) 1N
Nl=1
k(xl, xm)
1N
Nl=1
k(xn, xl) +1
N2
Nj=1
Nl=1
k(xj , xl) (84)
i.e.,K = K 1NK K1N + 1NK1N , (85)
where
1N =
1/N ... 1/N. . . . . .1/N ... 1/N
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Kernel PCA: example and remark
I PCA is often used to reconstruct a sample xn with goodaccuracy from its projections on the first principal components
I In kernel PCA, this is not possible in general, as we cant mappoints explicitly from the feature space to the data space
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Chapter content
I Principal Component AnalysisI Maximum variance formulationI Minimumerror formulationI Applications of PCAI PCA for highdimensional data
I Probabilistic PCAI Problem setupI Maximum likelihood PCAI EM algorithm for PCAI Bayesian PCAI Factor analysis
I Kernel PCAI Nonlinear Latent Variable Models
I Independent component analysisI Autoassociative neural networksI Modelling nonlinear manifolds
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Independent component analysis
I Consider models in whichI the observed variables are related linearly to the latent variablesI for which the latent distribution is nonGaussian
I Important class of such models: independent componentanalysis, for which
p(z) =Mj=1
p(zj) (86)
the distribution of latent variables factorize
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Application case: blind source separation
I Setup:I Two people talking at the same timeI their voices recorded using two microphones
I Objective: to reconstruct the two signal separately blind because we are given only the mixed data. Wehavent observedI the original sourcesI the mixing coefficients
I under some assumptions (no time delay and echoes)I the signals received by the microphone are linear combinations
of the voice amplitudesI the coefficient of this linear combination are constant
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Application case: blind source separation
I Hereafter: a possible approach (see Mackay03) that does not consider the temporal aspect of the problem
I Consider generative model withI the latent variables: unobserved speech signal amplitudesI the two observed signal values o = [o1 o2]
T at the microphones
I Distribution of latent variables factorizes as p(z) = p(z1)p(z2)I No need to include noise: observed variables = deterministic
linear combinations of latent variables as
o =
[a11 a12a21 a22
]z
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Application case: blind source separation
I Given a set of observationI the likehood function is a function of the coefficients aijI log likehood maximized using gradientbased optimization particular case of independent analysis
I This requires that the latent variables have non GaussiandistributionsI Probabilistic PCA: latentspace distribution = zeromean
isotropic GaussianI No way to distinguish between two choices for the latent
variables these differ by a rotation in the latent spaceI Common choice for the latentvariable distribution:
p(zj) =1
picosh(zj)=
1
pi(ezj + ezj )(90) 5 4 3 2 1 0 1 2 3 4 5103
102
101
100
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Chapter content
I Principal Component AnalysisI Maximum variance formulationI Minimumerror formulationI Applications of PCAI PCA for highdimensional data
I Probabilistic PCAI Problem setupI Maximum likelihood PCAI EM algorithm for PCAI Bayesian PCAI Factor analysis
I Kernel PCAI Nonlinear Latent Variable Models
I Independent component analysisI Autoassociative neural networksI Modelling nonlinear manifolds
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Autoassociative neural networks
I Chapter 5: Neural networks for predicting outputs given inputsI They can also used for dimensionality reduction
x1
xD
z1
zM
x1
xD
inputs outputs
I Network that perform an autoassociative mappingI #outputs = #inputs > number of hidden units no perfect reconstruction
I find networks parameters w minimizing a given error function for instance sumofsquare errors
E(w) =1
2
Nn=1
y(xn, w) xn2 (91)
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Autoassociative neural networks
I Linear activations functions I unique global minimumI the network performs projections onto the M dimensional
principal component subspaceI this subspace is spanned by the vector of weights
I Even with nonlinear hiddens units, minimum error obtained byprincipal component subspace no advantage of using twolayer neural networks to performdimensionality reduction: use standard PCA techniques
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Autoassociative neural networks
x1
xD
x1
xD
inputs outputs
F1 F2
nonlinear
I Using more hidden layers (4 here), the approach is worthful
x1
x2
x3
x1
x2
x3
z1
z2F1
F2
S
I Training the network involves nonlinear optimizationtechniques (with risk of suboptimally)
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Chapter content
I Principal Component AnalysisI Maximum variance formulationI Minimumerror formulationI Applications of PCAI PCA for highdimensional data
I Probabilistic PCAI Problem setupI Maximum likelihood PCAI EM algorithm for PCAI Bayesian PCAI Factor analysis
I Kernel PCAI Nonlinear Latent Variable Models
I Independent component analysisI Autoassociative neural networksI Modelling nonlinear manifolds
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Modelling nonlinear manifolds
I Data may lie in a manifold of lower dimensionality than theobserved data space
I capture this property explicitly may improve the densitymodelling
I Possible approach: nonlinear manifold modelled by piecewiselinear approximation, e.g.,I kmeans + PCA for each clusterI better: use reconstruction error for cluster assignment
I These are limited by not having an overall density modelI Tipping and Bishop: full probablistic model using a mixture
distribution in which components are probabilistic PCA both discrete latent variables and continuous ones
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Modelling nonlinear manifolds
Alternative approach: to use a single nonlinear model
I Principal curves:I Extension of PCA (that finds a linear subspace)I A curve is described by a vectorvalued function f()I Natural parametrization: the arc length along the curveI Given a point x, we can find the closest point = gf (x) on
the curve in terms of the Euclidean distanceI A principal curve is a curve for which every point on the curve
is the mean of all points in data space to project to it, so that
E[xgf (x) = ] = f() (92)
there may be many principal curves for a continuousdistribution
I Hastie et al: twostage iterative procedure for finding principalcurve
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Modelling nonlinear manifolds: MDS
I PCA is often used for the purpose of visualizationI Another technique with a similar aim: multidimensional
scaling (MDS, Cox and Cox 2000)I preserve as closely as possible the pairwise distances between
data pointsI involves finding the eigenvectors of the distance matrixI equivalent results to PCA when the distance is Euclidean but can be extended to a wide variety of data typesspecified in terms of a similarity matrix
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Modelling nonlinear manifolds: LLE
I Locally linear embedding (LLE, Roweis and Saul 2000)I Compute the set of coefficients that best reconstruct each
data point from its neighbours
I coefficients arranged to be invariant to rotation, translations,scaling characterize the local geometrical properties of theneighborhood
I LLE maps the highdimensional data to a lower dimensionalsubspace while preserving these coefficients
I These weights are used to reconstruct the data points inlowdimensional space as in the high dimensional space
I Albeit non linear, LLE does not exhibit local minima
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Modelling nonlinear manifolds: ISOMAP
I Isometric feature mapping (ISOMAP, Tenenbaum et al. 2000)I Goal: data projected to a lowerdimensional space using MDS but dissimilarities defined in terms of the geodesicdistances on the manifold
I Algorithm:I First defines the neighborhood using KNN or searchI Construct a neighborhood graph with weights corresponding to
the Euclidean distancesI Geodesic distance approximated by the sum of Euclidean
distances along the shortest path connecting two pointsI Apply MDS to the geodesic distances
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables

Modelling nonlinear manifolds: other techniques
I Latent traits: Models having continous latent variablestogether with discrete observed variables can be used to visualize binary vectors analogously to PCAfor continuous variables
I Density network: nonlinear function governed by amultilayered neural network flexible model but computationally intensive
I Generative topographic mapping (GTM): restricted forms forthe nonlinear function nonlinear and efficient to trainI latent distribution defined by a finite regular grid over the
latent space (of dimensionality 2, typically)I can be seen as a probabilistic version of the selforganizing
map (SOM, Kohonen)
Caroline BernardMichel & Herve Jegou Chris Bishops PRML Ch. XII: Continuous Latent Variables