Post on 21-Dec-2015
Introduction
• Given a Matrix of distances D, (which contains zeros in the main diagonal and is squared and symmetric), find variables which could be able, approximately, to generate, these distances.
• The matrix can also be a similarities matrix, squared and symmetric but with ones in the main diagonal and values between zero and one elsewhere.
• Broadly: Distance (0 d 1) =1- similarity
Principal Coordinates (Metric Multidimensional Scaling)
• Given the D matrix of distances, Can we find a set of variables able to generate it ?
• Can we find a data matrix X able to generate D?
• Main idea of the procedure:
(1) To understand how to obtain D when X is known and given,
(2) Then work backwards to build the matrix X given D
Procedure
The first is the covariance matrix S
The second is the Q matrix of scalar products among observations
With this matrix we can compute two squared and symmetric matrices
Remember that given a data matrix we have a zero mean data matrix by the transformation:
The matrix of products Q is closely related to the distance matrix , D, we are interested in. The relation between D and Q is as follows :
Main result: Given the matrix Q we can obtain the matrix D
Elements of Q:
Elements of D:
How to recover Q given D?
t =trace(Q)
Note that as we have zero mean variables the sum of any row in Q must be zero
2. Obtain X given Q
We cannot find exactly X because there will be many solutions to this problem.
IF Q=XX’ also
Q=X A A-1 X’ for any orthogonal matrix A. Thus B=XA is also a solution
The standard solution: Make the spectral decomposition of the matrix Q
Q=ABA’Where A and B contain the non zero eigenvectors and eigenvalues of the
matrix and take as solution X=AB1/2
Note that:
Conclusion
• We say that D is compatible with an euclidean metric if Q obtained as
Q=-(1/2)PDPis nonnegative (all eigenvalues non negative)
Relationship with PC
• PC: eigenvalues and vectors of S
• PCoordinates: eigenvalues and vectors of Q
If the data are matric both are identical. P Coordinates generalizes PC for non exactly metric data
Biplots
Representar conjuntamente los observaciones por las filas de V2 yLas variables mediante las coordenadas D2
/2 A’2
Se denimina biplots porque se hace una aproximación de dos dimensiones a la matriz de datos