Multidimensional scaling (MDS)people.math.umass.edu/~anna/stat697F/Chapter10_part3.pdf ·...

16
Multidimensional scaling (MDS) Just like SOM and principal curves or surfaces, MDS aims to map data points in R p to a lower-dimensional coordinate system. However, MSD approaches the problem somewhat differently. I Let x 1 , ..., x N R p be observations and d ij be the distance between observations i and j . MDS seeks values z 1 , z 2 , ..., z N R k to minimize the stress function: S M (z 1 , z 2 , ..., z N )= X i 6=i 0 (d ii 0 -kz i - z i 0 k) 2 This is known as least squares or Kruskal-Shephard scaling. I Sammon mapping: S Sm (z 1 , z 2 , ..., z N )= X i 6=i 0 (d ii 0 -kz i - z i 0 k) 2 d ii 0 where more emphasis is put on preserving smaller pairwise distances.

Transcript of Multidimensional scaling (MDS)people.math.umass.edu/~anna/stat697F/Chapter10_part3.pdf ·...

Page 1: Multidimensional scaling (MDS)people.math.umass.edu/~anna/stat697F/Chapter10_part3.pdf · Multidimensional scaling (MDS) Just like SOM and principal curves or surfaces, MDS aims to

Multidimensional scaling (MDS)Just like SOM and principal curves or surfaces, MDS aims to mapdata points in Rp to a lower-dimensional coordinate system.However, MSD approaches the problem somewhat differently.

I Let x1, ..., xN ∈ Rp be observations and dij be the distancebetween observations i and j . MDS seeks valuesz1, z2, ..., zN ∈ Rk to minimize the stress function:

SM(z1, z2, ..., zN) =∑i 6=i ′

(dii ′ − ‖zi − zi ′‖)2

This is known as least squares or Kruskal-Shephard scaling.

I Sammon mapping:

SSm(z1, z2, ..., zN) =∑i 6=i ′

(dii ′ − ‖zi − zi ′‖)2

dii ′

where more emphasis is put on preserving smaller pairwisedistances.

Page 2: Multidimensional scaling (MDS)people.math.umass.edu/~anna/stat697F/Chapter10_part3.pdf · Multidimensional scaling (MDS) Just like SOM and principal curves or surfaces, MDS aims to

I Classical scaling:

SC (z1, z2, ...,ZN) =∑i ,i ′

(sii ′− < zi − z̄ , zi ′ − z̄ >)2

where sii ′ is the similarity between xi and xi ′ and is usuallydefined as the centered inner product sii ′ =< xi − x̄ , xi ′ − x̄ >.

I Shephard-Kruskal nonmetric scaling seeks to minimize

SNM(z1, z2, ...,ZN , θ) =

∑i 6=i ′ [‖zi − zi ′‖ − θ(dii ′)]2∑

i 6=i ′ ‖zi − zi ′‖2

over the zi and an arbitrary increasing function θ.

Page 3: Multidimensional scaling (MDS)people.math.umass.edu/~anna/stat697F/Chapter10_part3.pdf · Multidimensional scaling (MDS) Just like SOM and principal curves or surfaces, MDS aims to
Page 4: Multidimensional scaling (MDS)people.math.umass.edu/~anna/stat697F/Chapter10_part3.pdf · Multidimensional scaling (MDS) Just like SOM and principal curves or surfaces, MDS aims to

I Classical scaling with centered inner product is equivalent toprincipal components. It is not equivalent to least squarescaling, in which mapping can be nonlinear.

I Nonmetric scaling effectively uses only ranks of the distances,rather than the actual dissimilarities or similarities.

I MDS tries to preserve all pairwise distances, while principalsurfaces and SOMs do not.

I MDS requires only the dissimilarities dij , in contrast to theSOM and principal curves and surfaces which need the datapoints xi .

Page 5: Multidimensional scaling (MDS)people.math.umass.edu/~anna/stat697F/Chapter10_part3.pdf · Multidimensional scaling (MDS) Just like SOM and principal curves or surfaces, MDS aims to

Finding latent variables of multivariate data

Multivariate data are often viewed as multiple indirectmeasurements aris-ing from an underlying source, which typicallycannot be directly measured. Examples include the following:

I Educational and psychological tests use the answers toquestionnaires to measure the underlying intelligence andother mental abilities of subjects.

I EEG brain scans measure the neuronal activity in various partsof the brain indirectly via electromagnetic signals recorded atsensors placed at various positions on the head.

I The trading prices of stocks change constantly over time, andrflect various unmeasured factors such as market confidence,external influences, and other driving forces that may be hardto identify or measure.

Page 6: Multidimensional scaling (MDS)people.math.umass.edu/~anna/stat697F/Chapter10_part3.pdf · Multidimensional scaling (MDS) Just like SOM and principal curves or surfaces, MDS aims to
Page 7: Multidimensional scaling (MDS)people.math.umass.edu/~anna/stat697F/Chapter10_part3.pdf · Multidimensional scaling (MDS) Just like SOM and principal curves or surfaces, MDS aims to

PCA has a latent variable presentation

I The correlated Xj are each represented as a linear expansionin the uncorrelated, unit variance varaiables Sl .

I The problem with PCA latent variables is that they are notunique – any orthogonal transformation of S1, ...,Sp is alsouncorrelated with unit variance and satisfy the PCA expansion.

Page 8: Multidimensional scaling (MDS)people.math.umass.edu/~anna/stat697F/Chapter10_part3.pdf · Multidimensional scaling (MDS) Just like SOM and principal curves or surfaces, MDS aims to

Factor analysis

The idea is that the latent variables Sl are common sources ofvariation amongst the Xj , and the account for their correlationstructure, while the uncorrelated εj are unique to each Xj and pickup the remaining unaccounted variation.

Page 9: Multidimensional scaling (MDS)people.math.umass.edu/~anna/stat697F/Chapter10_part3.pdf · Multidimensional scaling (MDS) Just like SOM and principal curves or surfaces, MDS aims to

I Factor analysis faces the same problem as PCA, that is, anyorthogonal transformation of S1, ...,Sp is also uncorrelatedwith unit variance and satisfy the factorization equation

I This leaves a certain subjectivity in the use of factor analysis,since the user can search for rotated versions of the factorsthat are more easily interpretable. This aspect has left manyanalysts skeptical of factor analysis and may account for itslack of popularity in contemporary statistics.

Page 10: Multidimensional scaling (MDS)people.math.umass.edu/~anna/stat697F/Chapter10_part3.pdf · Multidimensional scaling (MDS) Just like SOM and principal curves or surfaces, MDS aims to

Differences between PCA and factor analysis

Because of the separate disturbances εj for each Xj , factor analysiscan be seen to be modeling the correlation structure of the Xj

rather than the covariance structure, as PCA.Example (Exercise 14.15): Generate 200 observations of thethree variates X1,X2,X3 according to

X1 = Z1

X2 = X1 + 0.001Z2

X3 = 10Z3

where Z1,Z2,Z3 are independent standard normal variates. It turnsout the leading principal component aligns itself in the maximalvariance direction X3, while the leading factor essentially ignoresthe uncorrelated component X3 and picks up the correlatedcomponent X2 + X1.

Page 11: Multidimensional scaling (MDS)people.math.umass.edu/~anna/stat697F/Chapter10_part3.pdf · Multidimensional scaling (MDS) Just like SOM and principal curves or surfaces, MDS aims to

Independent component analysis (ICA)

ICA model has exactly the same form as PCA:

except that the Sl are assumed to be statistically independentrather than uncorrelated.

I Since the multivariate Gaussian distribution is determined byits covariance matrix, any Gaussian independent componentscan be dtermined only up to a rotation. ICA therefore seeksSl that are independent and non-Gaussian.

I ICA looks for a sequence of orthogonal projections such thatthe projected data look as far from Gaussian as possible.

Page 12: Multidimensional scaling (MDS)people.math.umass.edu/~anna/stat697F/Chapter10_part3.pdf · Multidimensional scaling (MDS) Just like SOM and principal curves or surfaces, MDS aims to
Page 13: Multidimensional scaling (MDS)people.math.umass.edu/~anna/stat697F/Chapter10_part3.pdf · Multidimensional scaling (MDS) Just like SOM and principal curves or surfaces, MDS aims to

Finding ICA

ICA finds an orthogonal matrix A such that the components inATX are as independent as possible. Let Y = ATX and I (Y ) bethe Kullback-Leibler distance between the density g(y) of Y andits independence version

∏pj=1 gj(yj), where gj(yj) is the marginal

density of Yj :

I (Y ) =

p∑j=1

H(Yj)− H(Y )

where

H(Y ) = −∫

g(y) log g(y)dy

is the entropy of the random variable Y with density g(y).

Page 14: Multidimensional scaling (MDS)people.math.umass.edu/~anna/stat697F/Chapter10_part3.pdf · Multidimensional scaling (MDS) Just like SOM and principal curves or surfaces, MDS aims to

It turns out

I (Y ) =

p∑j=1

H(Yj)− H(X )

I Finding A is equivalent to minimizing the sum of the entropiesof the separate components of Y .

I A well-known result in information theory says that among allrandom varaibles with equal variance, Gaussian varialbes havethe maximum entropy

I Therefore, finding A is equivalent to maximizing departure ofthe components of ATX from Gaussianity separately.

Page 15: Multidimensional scaling (MDS)people.math.umass.edu/~anna/stat697F/Chapter10_part3.pdf · Multidimensional scaling (MDS) Just like SOM and principal curves or surfaces, MDS aims to
Page 16: Multidimensional scaling (MDS)people.math.umass.edu/~anna/stat697F/Chapter10_part3.pdf · Multidimensional scaling (MDS) Just like SOM and principal curves or surfaces, MDS aims to

Subjects wear a cap embedded with a lattice of 100 EEGelectrodes, which record brain activity at different locations on thescalp. Figure 14.4111 (top panel) shows 15 seconds of output froma subset of nine of these elec-trodes from a subject performing astandard ”two-back” learning task over a 30 minute period. Thesubject is presented with a letter (B, H, J, C, F, or K) at roughly1500-ms intervals, and responds by pressing one of two buttons toindicate whether the letter presented is the same or dfferent fromthat presented two steps back. Depending on the answer, thesubject earns or loses points, and occasionally earns bonus or losespenalty points. The time-course data show spatial correlation inthe EEG signals-the signals of nearby sensors look very similar.