Self-Organizing Maps

Click here to load reader

  • date post

  • Category


  • view

  • download


Embed Size (px)


Self-Organizing Maps. Projection of p dimensional observations to a two (or one) dimensional grid space Constraint version of K-means clustering Prototypes lie in a one- or two-dimensional manifold (constrained topological map; Teuvo Kohonen, 1993) - PowerPoint PPT Presentation

Transcript of Self-Organizing Maps

  • Self-Organizing MapsProjection of p dimensional observations to a two (or one) dimensional grid spaceConstraint version of K-means clusteringPrototypes lie in a one- or two-dimensional manifold (constrained topological map; Teuvo Kohonen, 1993)K prototypes: Rectangular grid, hexagonal grid Integer pair lj Q1 x Q2, where Q1=1, , q1 & Q2=1,,q2 (K = q1 x q2)High-dimensional observations projected to the two-dimensional coordinate system

  • SOM AlgorithmPrototype mj, j =1, , K, are initializedEach observation xi is processed one at a time to find the closest prototype mj in Euclidean distance in the p-dimensional spaceAll neighbors of mj, say mk, move toward xi asmk mk + a (xi mk)Neighbors are all mk such that the distance between mj and mk are smaller than a threshold r (neighbor includes itself) Distance defined on Q1 x Q2, not on the p-dimensional spaceSOM performance depends on learning rate a and threshold rTypically, a and r are decreased from 1 to 0 and from R (predefined value) to 1 at each iteration over, say, 3000 iterations

  • SOM propertiesIf r is small enough, each neighbor contains only one point spatial connection between prototypes is lost converges at a local minima of K-means clusteringNeed to check the constraint reasonable: compute and compare reconstruction error e=||x-m||2 for both methods (SOMs e is bigger, but should be similar)

  • Tamayo et al. (1999; GeneCluster)Self-organizing maps (SOM) on microarray data -Hematopoietic cell lines (HL60, U937, Jurkat, and NB4): 4x3 SOM -Yeast data in Eisen et al. reanalyzed by 6x5 SOM

  • Principal Component AnalysisData xi, i=1,,n, are from the p-dimensional space (n p)Data matrix: XnxpSingular decomposition X = ULVT, where L is a non-negative diagonal matrix with decreasing diagonal entries of eigen values (or singular value) li, Unxp with orthogonal columns (uituj = 1 if ij, =0 if i=j), and Vpxp is an orthogonal matrix The principal components are the columns of XV (=UL)X and V have the same rank, at most p of non-zero eigen values

  • PCA propertiesThe first column of XV or DU is the 1st principal component, which represents the direction with the largest variance (the first eigen value represents its magnitude) The second column is for the second largest variance uncorrelated with the first, and so on.The first q columns, q < p, of XV are the linear projection of X into q diensions with the largest varianceLet x = ULqVT, where Lq is the diagonal matrix of L with q non-zero diagonals x is best possible approximate of X with rank q

  • Traditional PCA Variance-Covariance matrix S from data XnxpEigen value decomposition: S = CDCT, with C an orthogonal matrix(n-1)S = XTX = (ULVT)T ULVT = VLUT ULVT = VL2VT Thus, D = L2/(n-1) and C = V

  • Principal Curves and SurfacesLet f(l) be a parameterized smooth curve on the p-dimensional spaceFor data x, let lf(x) define the closest point on the curve to xThen f(l) is the principal curve for random vector X, if f(l) = E[X| lf(X) = l]Thus, f(l) is the average of all data points that project to it

  • Algorithm for finding the principal curveLet f(l) have its coordinate f(l) = [f1(l), f2(l), , fp(l)] where random vector X = [X1, X2, , Xp]Then iterate the following alternating steps until converge: (a) fj(l) E[Xj|l(X) = l], j =1, , p(b) lf(x) argminl ||x f(l)||2The solution is the principal curve for the distribution of X

  • Multidimensional scaling (MDS)Observations x1, x2, , xn in the p-dimensional space with all pair-wise distances (or dissimilarity measure) dijMDS tries to preserve the structure of the original pair-wise distances as much as possibleThen, seek the vectors z1, z2, , zn in the k-dimensional space (k
  • Other MDS approachesSammon mapping minimizes ij [(dij - ||zi zj||)2] / dijClassical scaling is based on similarity measure sijOften inner product sij = is usedThen, minimize i j [(sij - )2]