Out of sample extensions of PCA, kernel PCA, and MDSgchen/Math285F15/Math 285 - Project...

1

Transcript of Out of sample extensions of PCA, kernel PCA, and MDSgchen/Math285F15/Math 285 - Project...

Math 285 Project, Fall 2015

Wilson A. Florero-Salinas

Dan Li

Out of sample extensions of PCA, kernel PCA, and

MDS

Page 1 of 15

TABLE OF CONTENTS

1. Introduction ........................................................................................................................................................................................... 2

2.1 Principal Component Analysis (PCA) .................................................................................................................................................... 2

2.2 The out of sample extension of PCA .................................................................................................................................................... 3

2.3 The out of sample extension of PCA DEMO ......................................................................................................................................... 4

3.1 Kernel PCA ............................................................................................................................................................................................ 5

3.2 The out of sample extension of Kernel PCA ......................................................................................................................................... 6

3.3 The out of sample extension of KPCA (demo) ..................................................................................................................................... 7

4.1 Multidimensional Scaling (MDS) .......................................................................................................................................................... 7

4.2 The out of sample extension of MDS ................................................................................................................................................... 8

4.3 The out of sample extension of MDS (DEMO) ................................................................................................................................... 10

4.4 The out of sample extension of MDS (DEMO) ................................................................................................................................... 11

5. Conclusion ............................................................................................................................................................................................ 11

6. Appendix .............................................................................................................................................................................................. 12

7. References ........................................................................................................................................................................................... 15

Page 2 of 15

1. INTRODUCTION

Classification is the problem of categorizing a new observation based on a set of observations called the “training set”, whose

membership is already known. Classification problems usually involve the training of a model, using the training set, to later make

predictions or classify new observations into one of the known categories. In recent years the collection of huge amounts of data has

been eased by improvements in technology, and it is now common to have observations with thousands, if not millions of features.

Even though there has been a great jump in technology, many modern computers are still not able to efficiently handle observations

with very large number of features, which in some cases makes model training unfeasible. However, in many cases, it is still possible

to train a model with a subset of the features, or a transformation of the feature space into a smaller space in which feature

selection is possible. This leads us to the idea of Dimensionality Reduction.

Dimensionality reduction (DR) is the process of reducing the number of variables under consideration for the purpose of feature

selection or feature extraction. To this end, dimensionality reduction allows the modeler to train models using less variables, and in

some cases, obtain a visualization of the data set in two or three dimensions. Three common DR techniques known in the literature

are Principal Component Analysis (PCA), Kernel PCA, and Classical Multidimensional Scaling (MDS). To perform the corresponding

transformation each of these methods use the entire data set. The question is now: “If new data becomes available, how can these

new observations be incorporated in the new feature space?”

In some cases redoing the DR is enough, but that is not our present concern. In other cases the data set may so large that retraining

is no longer feasible. In this context, we need a way to incorporate these new observations in the new feature space, without

retraining and if possible, recycling information already obtained from the first time we performed dimensionality reduction. This

idea of bringing in new observations to the new feature space is known in the literature as “out-of-sample-extensions”, which will be

the focus of this paper. In the following we briefly review PCA, kernel PCA, and MDS before considering their corresponding out-of-

sample extensions1.

2.1 PRINCIPAL COMPONENT ANALYSIS (PCA)

Principal Component Analysis (PCA) is a linear DR feature extraction tool. PCA attempts to find a linear subspace of lower dimension

than the original feature space, where the new features have the largest variance [B2006]. One way to derive the principal

components of a data set { } d

ix is by maximizing the trace of the covariance matrix of data points { } k

iy , given by

1

1: ( )( )

nT

Y i i

i

S y y y yn

, where 1

1 n

i

i

y yn

and where we assume that there is a TV such that

T

i iy V x . An

equivalent way to derive the principal components is by solving the problem of finding the “best” k-dimensional subspaces (“line”)

that minimize the orthogonal distances. We take this approach here. Concretely, let { } d

ix , where 1,2, ,i n . The task is

to solve [S2003]

2

2min || ( ) ||i S iS

x P x

where ( )SP is the projection onto the subspace S . Let dm represent a fixed point and

d kB an orthonormal basis of

S . If x m B is a parametric equation for the plane, then ( ) ( )T

S i iP x m BB x m . It can be shown that the above

minimization problem is equivalent to solving

1 The reader is referred to the references for additional details.

Page 3 of 15

2min || ||T

FB

X XBB

This minimum is achieved whenT

kX XBB and 1

1 n

i

i

m xn

. Here kX is the best rank-k approximation to X under the

Frobenius norm, andd kB V , where the columns correspond to the first k columns2 of V in the SVD of X . In other words

TX U V . The above then can be summarized in the following theorem

Theorem: The projection of X onto the best-fit k -plane is

T

k n k k k k kX U V

The new coordinates with respect to the basis d kV

i.e., the rows of d k n k k kXV U are called the principal components.

This theorem allows us to easily find the first k principal components of X using the following algorithm

Algorithm 1a: Principal Component Analysis (PCA)

Input: Data set 1 2[ , , , ]T

nX x x x

Ouput: Top k principal components

1. Center the data: i ix x x for all i

2. Perform SVD on: TX U V

3. Return the rows of: d k n k k kXV U

2.2 THE OUT OF SAMPLE EXTENSION OF PCA

In the previous section, we obtained the matrix d kV which mapped points d

ix to points k

iy via a linear map. If the

points are centered to begin with, then ( ) T

S i d k d k iP x V V x or in matrix form ( ) T

S d k d kP X XV V . To obtain the points in the

subspace with dimension k , we simply consider d kXV S , where the subspace S was constructed using the data set

1 2[ , , , ]T

nX x x x . If a new data set becomes available, how can PCA be extended to this new data set? We now illustrate this

extension with figure 1, in Section 2.3. Assume we have a new data set in2

. According to the Algorithm 1a, we center the data.

Then using this centered data we construct the line S and map these points to the line. If a new data set 2mZ becomes

available, then we can map it to the line using the matrix d kV . However, because the original data set has been centered, the data

points and the line (subspace) are in a new set of axis. To bring the new data set Z into the current set of axis, it must be centered

exactly the same way X was centered. Finally, we may project the centered data set Z via the matrix d kV [G1966]. This is

summarized in Algorithm 1b and a visualization in Section 2.3:

2 In this paper, given a matrix

M NA , we define

M kA to be the matrix only including the first k columns. We will also this notation

to denote its dimensions.

Page 4 of 15

Algorithm 1b: Out of sample (PCA)

Input: New data set 1 2[ , , , ]T

mZ z z z , d kV

Ouput: Top k principal components

1. Center the data: i iz z x for all i

2. Return the rows of: d kZV

Both Algorithms 1a and 1b written as a function in MATLAB can be found in Appendix 6.1.

2.3 THE OUT OF SAMPLE EXTENSION OF PCA DEMO

Figure 1: (L) Original data set (M) centered data along with best-fit line (R) current projection space and new uncentered data (red points).

Figure 2: (L) New data points brought to current axis by centering (R) new points projected onto the current space.

It is worth comparing the out-of-fit extension with a retrained model that uses the entire data set. Notice the difference.

Page 5 of 15

3.1 KERNEL PCA

PCA is a linear method that cannot properly handle nonlinear data. If the data is nonlinear, the main idea of Kernel PCA is to use a

map ( ) that takes each data vector ix to a vector ( )ix in a higher dimensional space (called feature space) where PCA can be

applied [W2012]. Concretely, let n data points d

ix be given, and suppose : d D , where D d . Assume further

that 1

1( ) 0

n

i

i

xn

, meaning that the feature vectors have zero mean. Define 1 2: ( ) ( ) ( )T n D

nx x x , and

consider the SVD T

n n n D D DU V . Then applying PCA to via Algorithm 1 the new coordinates with respect to the basis

D kV are given by the rows of D k n k n kV U . Usually the ( )ix are unknown and it is not possible to work out the

decomposition explicitly. To remediate this define ( , ) : ( ) ( )T

i j i jx x x x and consider

( ) ( ) ( , ) :T T

i i i jx x x x K . The matrix K is called the Kernel matrix, which under the proper mapping ( ) ,

which is positive semi-definite . If the data is not centered in the feature space, it can be shown that by considering

1

1( ) ( ) ( )

n

i i i

i

x x xn

, constructing n D we obtain the matrix3

2

1 1 1n n n n

n n nK K K K K 1 1 1 1 , for which

we can obtain similar equations by replacing K by K in previous formulas or formulas that follow4 [S1998] . Proceeding with our

analysis, we have T

T T T T T

n n n D D D n n n D D D n n n D n D n nK U V U V U U so that2

n n n n n nKU U , which is

an eigenvalue problem. In other words, by solving for the eigenvalues and eigenvectors of K , we are able to obtain the matrices

needed in the SVD of . Note that if we consider 2T T

D D D D D DV V we obtain2T

D k D k D kV V . And for the

purpose of principal component extraction, we choose vectors so that || || 1T

i iv v , which leads to || || 1/T

i i iu u , where

( ) ( )T

i i iK , for 1, ,i k . This is equivalent to considering 1

k k k k k k k kK U U

because we want to scale

the iu by 1/ i so that || || 1i iu . KPCA is summarized in the following Algorithm 2a.

Algorithm 2a: Kernel PCA

Input: Data set 1 2[ , , , ]T

nX x x x

Ouput: Top k principal components

1. Construct the kernel matrix ( , )i jK x x

2. Center n nK , via

2

1 1 1n n n n

n n nK K K K K 1 1 1 1

3. Solve the eigenvalue problem: 2

n n n n n nKU U

4. Return the rows of

1 1n k k k k kU u u

where we chose || || 1/i iu

3 Here define [1] n n

n

1 4 Place a tilde on all variables, and the results are similar

Page 6 of 15

3.2 THE OUT OF SAMPLE EXTENSION OF KERNEL PCA

To obtain an out-of-sample extension we proceed as before: center the new data with respect to the training set, and then apply the

projection matrix. Here the transformation D kV is explicitly unknown, but we can use Kernels to our advantage like in the previous

section. Assume that 1 2

T

mZ z z z is the new data set. Define 1 2: ( ) ( ) ( )T m D

Z mz z z . This set may need

centering, so define 1

1( ) ( ) ( )

n

i i i

i

z z xn

to obtain the matrix 1 2: ( ) ( ) ( )T

Z mz z z Applying D kV gives

2 2T T

Z D k Z D k k k Z D k k kV V V

, where we are assuming contains centered data. Otherwise

consider 2 2T T

Z D k Z D k k k Z D k k kV V V

. Let : ( ) ( )T m n

Z i jK z x , then

2 1

Z D k Z n k k k k k Z n k k kV K U K U

Note that ZK was constructed using centered data in the feature space, which is explicitly unknown; however, it is possible to write

the kernel in terms of the original kernels as

2

1 1 1T T

Z Z n m Z n n n m n nn n n

K K K K K 1 1 1 1

The out of sample KPCA is summarized in Algorithm 2b.

Algorithm 2b: out of sample Kernel PCA

Input: New data set 1 2

T

mZ z z z , n kU , k k

Ouput: Top k principal components for new data

1. Construct the kernel matrix ( , )Z i jK z x

2. Center ZK , via5

2

1 1 1T T

Z Z n m Z n n n m n nn n n

K K K K K 1 1 1 1

3. Return the rows of:

1

1

1

1 1Z n k k k Z k

k

K U K u u

Both Algorithms 2a and 2b written as a function in MATLAB can be found in Appendix 6.2.

5 Here

T m n

n m

1 , where 1n m 1 [S1998]

Page 7 of 15

3.3 THE OUT OF SAMPLE EXTENSION OF KPCA (DEMO)

Figure 3. Left: Data set in original dimensions. Green points correspond to out-of-sample observations. Middle & Right: out-of-sample extensions. Green

observations are mapped correctly to their corresponding clusters.

Figure 4: A 2D perspective of the above plot.

4.1 MULTIDIMENSIONAL SCALING (MDS)

Multidimensional scaling (MDS) visualizes a set of high dimensional points in lower dimensions (usually two or three) based on their

pairwise distances [Y1985]. The problem MDS solves is to map the original data into lower dimensions while preserving pairwise

distances. In other words, two points that have large distance remain far apart in the reduced dimension and those points that are

close by shall be mapped close to each other. Mathematically: Given a set of n points and their pairwise distances ijd , find n

points { } k

iy such that 2

2

,

|| ||i j ij

i j

y y d is minimized. To solve this problem consider the proximity matrix 2[ ]ijD d .

The proximity matrix is invariant to change in location and rotation, and we can obtain a unique solution provided that we assume

0iy . If we consider the equality 2|| ||i j ijy y d , it can be shown thatTD YY , where 1 2[ ]T n k

nY y y y and

D is the centered proximity matrix6. To explicitly find the iy we use the fact that D is unitarly diagonalizable and obtain

1/2

n k k kY U , whereTD U U . The above approach has been seen before: From data points{ } n

ix , create a

6 Some properties of D include: (1) TD D (2) [ ]T

i jD y y , and (3) D 1 0 & D 1 0 .

Page 8 of 15

“neighborhood” or similarity matrix D , center this matrix if needed, and then solve an eigenvalue problem [B2003;pg1]. Concretely,

if we let i ij

j

S D be the ith row sum of D , the centering is done via

2

1 1 1 1

2ij ij i j k

k

D D S S Sn n n

and the embedding corresponds to the rows of 1 1 k kY u u

. [B2003; pg2]. This is summarized in Algorithm 3a:

Algorithm 3a: MDS

Input: Date set 1 2[ , , , ]T

nX x x x (or skip to step 2)

Ouput: Embedding 1 2

T n k

nY y y y

1. Construct the pairwise squared distance matrix 2[ ]ijD d

2. Center n nD , via

2

1 1 1 1

2ij ij i j k

k

D D S S Sn n n

3. Diagonalize TD U U

4. Return the rows of 1/2

1 1n k k k k kU u u

4.2 THE OUT OF SAMPLE EXTENSION OF MDS

One of the facts to consider in the out of sample extension of MDS is to note that simultaneously embedding m new objects is not

equivalent to individually embedding the same objects one at a time, as individual embeddings does not attempt to approximate

dissimilarities between pairs of new objects [TP2008; pg 3] . For simplicity let 1m . The extension for 1m is straightforward.

Suppose that nd denotes the squared dissimilarity of the new object x from the original n objects. In other words,

2 2 2

1 2[ ]T

nd d d d , where || ||i id x x . Construct a new matrix 1

0

n

T

D dA

d

. Before we formalize the out of

sample extension, let us introduce notation and a few definitions.

Definition: Given mw , we say that 1, , k

my y is w centered if and only if 1

0m

j j

j

w y

Definition: For mw such that 1 0T

m w 1 , define 1 1

1 1

1( )

2

T

m mw

m m

w wC I C I

w w

1 1

1 1

Notice that 1(D)

m

1 gives the same matrix that was used in the centering of D in Algorithm 3a. Moreover, under appropriate

choices of w , the matrix ( )w C has special properties that allow it to be factored for the purpose of MDS [TP2008], exactly as we

did with the matrix D . The usefulness of ( )w C will be shown in the discussion that follows, but before we start our analysis, let

Page 9 of 15

us revisit the question of why applying MDS to A does not solve the out-of-sample extension problem. The reason is due to the fact

that applying MDS to D entails approximating inner products1(D)

m

1 , which are computed with respect to the centroid of the

original n points. On the other hand, if we let 1

( 1) 1 1[ 1]T n

n n

1 1 , applying MDS to A entails approximating inner products

( 1) 1(A)

n

1, computed with respect to the centroid of all 1n points [TP2008;pg5]. Thus to solve the out-of-sample problem we

must preserve the original centering. To do this, let 1

1[ 0]T n

nw

1 and construct 1(D)

( ) :n

w T

bA B

b

1. The goal is

to find ak

xy that corresponds to the out of sample extension. Let 1 2

T n k

nY y y y represent the embedding of

the original n points and construct a new matrix ( 1)

* [ ]T T n k

xY Y y . Then the problem reduces to approximating B via Y

as follows:

2 2

2

* *

1

min || || min 2n

T T T

i i x x x

ik k

x xy yB Y Y b y y y y

If the term 2

T

x xy y is dropped, the objective becomes convex with solution T T

xY Y y Y b , where xy represents an

approximation7 of an out-of-sample extension corresponding to a new and unknown data point dx . If Y has full rank, then

1

T

xy Y Y b

. There is a strong relationship between PCA and CMDS [G1966] in which it can be shown that xy is a solution to

T T

k k x kX X y X b and that 1 ( ) ( )T

T T

nb x x x x x x so that 1/2 T

x k k n ky U b

[T2010, pg 3-4]. In other words, to

obtain an out-sample-extension for CMDS, we would first have to compute PCA8 on the original data set X , which is unknown by

assumption, and thus not very useful. To overcome this difficult, we can explicitly compute b and write it in terms of known values.

First, by definition,

( 1) 1 ( 1) 1

( 1) 1 ( 1) 1 ( 1) 1 ( 1) 12

1

1

1 1 1

2

1 1 1 1

2

01 1 1 1

00 0 0 0 02

T T

n nT

T T T T

n n n n

n n n n n

T T T

n

D bB I w A I w

n nb

A w A Aw w Awn n n

D d D d D d

d d dn n

1 1

1 1 1 1

1 1 1

1

1

2

1

1

2

1 1 1 1 1 1

0

0 0 0 0

1 1 1 1

02

n n n n n

T

n

n n n n n n n n n n n n n n n

T TT

n n n n n n n n n n

D d

dn

D d D D D DD d

D d d d D Dd n n n

1 1 1

1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

Since we are only interested in b , either taking the last column or row of B , we have

1 12

1 1 1 1

2n n n n n nb d d D D

n n n

1 1 1 1

7 To obtain the optimal solution, the above nonlinear optimization problem must be solved numerically. 8 MDS and PCA attempt to find the most accurate data representation in a lower dimensional space. MDS preserves the most “similarities” of the original data set, while PCA reduces dimension by preserving most of the covariance of data.

Page 10 of 15

The out-out-of-sample extension for the case 1m is summarized in Algorithm 3b, and its corresponding implementation for the

approximate case is found in Appendix 6.3

Algorithm 3b: out of sample MDS (m = 1)

Input: Data setn kU

,1/2

k k ,from MDS &nd , the

square dissimilarity between new object and original n

objects

Ouput: out of sample xy

1. Construct vector b ,

1 12

1 1 1 1

2n n n n n nb d d D D

n n n

1 1 1 1

2. For approximate out-of-sample extension (column)

vector return1/2 T

x k k n ky U b

3. For optimal out-of-sample extension return

construct * [ ]T

xY Y y , D

:T

bB

b

, and9

returnxy that minimizes

2

* *min || ||T

kxy

B Y Y

Notice that the objective function for the optimal solution is a fourth degree polynomial, which can be solved using gradient

numerical methods [T2008;pg10]

4.3 THE OUT OF SAMPLE EXTENSION OF MDS (DEMO)

Figure 5: Map of 12 Chinese cities based on their pairwise distances. Top left: MDS applied to the entire data except Lhasa. Bottom left: out-of-sample extension

for Lhasa. Top Right and Bottom: The same analysis and comparison done for Honhot.

9 Here 2 2

21 1 1

1 1

2

n n n

i ij

i i j

d dn n

Page 11 of 15

Figure 6: MDS on the seed data set10. Left: MDS applied to the entire data set. Red crosses correspond to points that will be traced in the out-of-sample extension.

Right: MDS on a subset of the data set. Green points correspond to the out-of-sample extensions.

4.4 THE OUT OF SAMPLE EXTENSION OF MDS (DEMO)

In section 4.3, we provided the out-of-sample Algorithm for MDS for the case 1m . For the case 1m , analogously define

n mA to be the squared dissimilarities between all n m objects. First, construct the matrix ( ) 1

( ) n m

n mA

1 that

consists of the MDS embedding of all n m objects. As before, let 1 2

T n k

nY y y y represent the embedding of the

original n points, and let 1 2

T m k

mZ z z z be the matrix containing the k -dimensional embedding of m new objects.

To find the out-of-sample extension for these new m objects let 1[ 0 0]T n m

nw

1 , construct the matrix [TP2008;pg2]

1(D)

( ) :n YZ

w T

YZ ZZ

BA B

B B

1, and obtain the optimal Z by solving the optimization problem:

2 2 2min || || min 2 || || || ||T T T T

YZ ZZm k m kZ Z

YB Y Z B YZ B ZZ

Z

5. CONCLUSION

In this paper we discussed the out-of-sample extensions for PCA, Kernel PCA, and MDS. All three extensions share a “centering” step

which can be done easily in the PCA case, or needed to be applied to the Kernel matrix or dissimilarity matrix for Kernel PCA and

MDS, respectively. To find the out-of-sample extension for PCA, centering and mapping to the best-fit line is enough. As for Kernel

PCA, an additional Kernel matrix, between the training and new data needs to be created, and centering is also important before

making transformations using the previously built matrices. For the third method, MDS, the construction of matrices involving

training and new data sets, and centering is also required. Unlike the previous two methods, exact analytic solutions cannot be

found, but rather an approximate (based on PCA) or an optimal (by solving an optimization problem) solution can be computed. We

have also provided some examples in which we show that out-of-sample extensions are not equivalent to an embedding or

transformation of the entire data set, especially in methods that are sensitive to outliers.

10 Can be downloaded at : http://archive.ics.uci.edu/ml/datasets/seeds

Page 12 of 15

6. APPENDIX

Appendix 6.1

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%% Principal Component Analysis (PCA)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% Xtr = training data set. Each row is an observation.

% Xtst = new observations (out of sample)

% k = # of components to keep.

function [XtrPCA,V,XtstPCA] = PCA(Xtr,Xtst,k)

meanXtr = mean(Xtr);

% center the data

X_tilde = Xtr - repmat(meanXtr, size(Xtr,1), 1);

[U,S,V] = svds(X_tilde,k);

XtrPCA = U*S;

V = V(:,1:k); % projection matrix

% out of sample extension

XtstPCA = (Xtst-repmat(meanXtr, size(Xtst,1), 1))*V;

Page 13 of 15

Appendix 6.2

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%% Kernel PCA %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Xtr = data in which each row is an observation. % Xtst = new observations (out of sample) % k = reduced dimension; var = sigma^2 % Here we use the Gaussian Kernel. Code can easily be modified for % other kernels.

function [XtrKPCA,XtstKPCA] = KPCA(Xtr,Xtst,k,var)

% Calculate pairwise distance matrices Dtr = pdist2(Xtr,Xtr); % Constructing Kernel matrix using Gaussian Kernel K = Kn(Dtr, var); % centering n = size(Xtr,1); Kc = K - K*ones(n,n)/n - ones(n,n)*K/n + ones(n,n)*K*ones(n,n)/(n^2);

% Obtain the evectors of K_tilde that correspond to the largest evalues. % Those evectors are the data points already projected onto the respective % principal components.

[U,S] = eig(Kc,'vector'); [~,indx] = sort(S,'descend'); S = S(indx); S = diag(S); U = U(:,indx); Sk = abs(S(1:k,1:k)); XtrKPCA = U(:,1:k)*sqrt(Sk);

% out of sample extension of KPCA Dtst = pdist2(Xtst,Xtr); Kz = Kn(Dtst, var); % centering m = size(Xtst,1); Kzc = Kz - ones(m,n)*K/n - Kz*ones(n,n)/n + ones(m,n)*K*ones(n,n)/(n^2); %XtstKPCA = Kzc*U(:,1:k)*inv(sqrt(Sk)); XtstKPCA = bsxfun(@rdivide,Kyc*XtrKPCA,diag(Sk)'); end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Gram matrix function %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function K = Kn(D, var) % k(xi,xj) = exp(-0.5*||xi-xj||^2/var_j) K = exp(bsxfun(@rdivide, -0.5*D.^2, var)); end

Page 14 of 15

Appendix 6.3

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%% Math 285 Project Function: MDS %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %INPUT: X = distance matrix, assumed to be symmetric. % k = target dimension; Y = new (lower) coordinates in dimension k % d = distance of new point with n original points as a row vector.

function [Y, y_x, stress] = mds(X,d,k)

n = size(X,1); D = X.^2; meanD = mean(D,1); mmeanD = mean(meanD); %D_tilde = 0.5(-D + ones(n,n)*D/n + D*ones(n,n)/n - ones(n,n)*D*ones(n,n) D_tilde = 0.5*(repmat(meanD', 1, n) + repmat(meanD, n, 1) - D - mmeanD);

% Constructing matrix Y [U,S] = eig(D_tilde,'vector'); [~,indx] = sort(S,'descend'); S = S(indx); S = abs(diag(S)); U = U(:,indx); Y = U(:,1:k)*sqrt(S(1:k,1:k)); % Computing stress: stress = sqrt(sum(sum(x,1))/(n^2*l_dotdot)); stress = sqrt(2*sum((squareform(X) - pdist(Y)).^2)/mmeanD)/n;

% Out-of-sample extension d = d'.^2; % b = d - ones(n,n)*d/n - D*ones(n,1)/n + ones(n,n)*D*ones(n,1)/n^2; b = -0.5*(d - mean(d) - mean(D,2) + mmeanD); y_x = b'*Y/S(1:k,1:k); % (sqrt(S)U'b)'; return as row vector

Page 15 of 15

7. REFERENCES

[A2003] Anderson, M.J and Robinson, J. Generalized discriminant analysis based on distances. Australian & New Zealand Journal of

Statistics, 45:301–318, 2003

[B1997] Borg, I., & Groenen, P. (1997). Modern multidimensional scaling: theory and appli- cations. New York: Springer.

[B2003] Y. Bengio, J.-F. Paiemont, and P. Vincent. Out-of-sample extensions for LLE, Isomap, MDS, eigenmaps, and spectral

clustering. Technical Report 1238, D´epartement d’Informatique et Recherche Op´erationelle, Universit´e de Montr´eal, Montr´eal,

Qu´ebec, Canada, July 2003.

[B2006] Bishop, Christopher M. (ed.). Pattern Recognition and Machine Learning. Springer, Cambridge, U.K.,2006.

[G1996] J. C. Gower. Some distance properties of latent root and vector methods in multivariate analysis. Biometrika, 53:325–338,

1966.

[S1998]. Schölkopf, B., Smola, A. and Müller, K.-R. Nonlinear component analysis as a kernel eigenvalue problem. Neural

Computation, 1998

[S1999]. Scholkopf, B., Smola, A.,and Muller, K.-R. (1999). Kernel principal component analysis. In B.Scholkopf, C. J. C. Burges, and A.

J. Smola, editors, Advances in Kernel Methods – SV Learning, pages 327-352. MIT Press, Cambridge, MA

[S2003]. Shlens, J. A Tutorial on Principal Component Analysis: Derivation, Discussion, and Singular Value Decomposition

[T2008] M. W. Trosset and C. E. Priebe. The out-of-sample problem for classical multidimensional scaling. Computational Statistics

and Data Analysis, 52:4635–4642, June 2008

[T2010] M. W. Trosset and M. Tang. The out-of-sample problem for classical multidimensional scaling: Addendum, November 2010

[W2012]. Q. Wang, Kernel Principal Component Analysis and its Applications in Face Recognition and Active Shape Models. CoRR,

2012.

[Y1985] Forrest W. Young, University of North Carolina Kotz-Johnson (Ed.) Encyclopedia of Statistical Sciences, Volume 5, Copyright

(c) 1985 by John Wiley & Sons, Inc