Introduction

Introduction

• Given a Matrix of distances D, (which contains zeros in the main diagonal and is squared and symmetric), find variables which could be able, approximately, to generate, these distances.

• The matrix can also be a similarities matrix, squared and symmetric but with ones in the main diagonal and values between zero and one elsewhere.

• Broadly: Distance (0 d 1) =1- similarity

Principal Coordinates (Metric Multidimensional Scaling)

• Given the D matrix of distances, Can we find a set of variables able to generate it ?

• Can we find a data matrix X able to generate D?

• Main idea of the procedure:

(1) To understand how to obtain D when X is known and given,

(2) Then work backwards to build the matrix X given D

Procedure

The first is the covariance matrix S

The second is the Q matrix of scalar products among observations

With this matrix we can compute two squared and symmetric matrices

Remember that given a data matrix we have a zero mean data matrix by the transformation:

The matrix of products Q is closely related to the distance matrix , D, we are interested in. The relation between D and Q is as follows :

Main result: Given the matrix Q we can obtain the matrix D

Elements of Q:

Elements of D:

How to recover Q given D?

t =trace(Q)

Note that as we have zero mean variables the sum of any row in Q must be zero

1. Method to recover Q given D

2. Obtain X given Q

We cannot find exactly X because there will be many solutions to this problem.

IF Q=XX’ also

Q=X A A-1 X’ for any orthogonal matrix A. Thus B=XA is also a solution

The standard solution: Make the spectral decomposition of the matrix Q

Q=ABA’Where A and B contain the non zero eigenvectors and eigenvalues of the

matrix and take as solution X=AB1/2

Note that:

Conclusion

• We say that D is compatible with an euclidean metric if Q obtained as

Q=-(1/2)PDPis nonnegative (all eigenvalues non negative)

Summary of the procedure

Example 1.Cities

(Note that they add up to zero by rows and columns. The matrix has been divided

by 10000)

Example 1Eigenstructure of Q :

Final coordinates for the cities taking two dimensions:

Example 1. Plot

Similarities matrix

Example 2: similarity between products

Example 2

Relationship with PC

• PC: eigenvalues and vectors of S

• PCoordinates: eigenvalues and vectors of Q

If the data are matric both are identical. P Coordinates generalizes PC for non exactly metric data

Biplots

Representar conjuntamente los observaciones por las filas de V2 yLas variables mediante las coordenadas D2

/2 A’2

Se denimina biplots porque se hace una aproximación de dos dimensiones a la matriz de datos

Biplot

Non metric MS

A common method

• Idea: if we have a monotone relation between x and y it must be a linear exact relationship between the ranks of both variables

• Ordered regression or assign ranks and make a regression between ranks iterating

Introduction

Documents

Transcript of Introduction