Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is...

40
Analy’c Organiza’on of high dimensional observa’onal Databases as a tool for learning and inference . R. Coifman, Mathema0cs Yale M. Gavish , Sta0s0cs Stanford

Transcript of Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is...

Page 1: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

           Analy'c    Organiza'on  of  high  dimensional              observa'onal  Databases  as  a  tool  for  learning  and    inference  .  

R.  Coifman,    Mathema0cs  Yale                M.  Gavish  ,  Sta0s0cs  Stanford    

Page 2: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

We describe a mathematical framework to learn and organize databases without incorporation of expert information. In other words we organize point clouds in very high dimension, a setting where standard global metrics lose their utility unless points are very close. The database could be a matrix of a linear transformation for which the goal is to reorganize the matrix so as to achieve compression and fast algorithms. Or the database could be a collection of documents and their vocabulary, an array of sensor measurements such as EEG , or financial a time series or segments of recorded music. We view the database as a questionnaire. We organize the responder population, into a contextual demographic diffusion geometry, and the questions into a conceptual geometry, this is an iterative process in which each organization informs the other, with the goal of entropy reduction of the whole data base .

Page 3: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

This organization being totally data agnostic applies to the other examples thereby generating automatically a data driven duality of conceptual /contextual pairing.

We will describe the basic underlying tools from Harmonic Analysis for measuring success in extracting structure, tools which enable functional regression prediction and basically signal processing methodologies.

In particular we build bi-hierarchical organizations and an efficient estimation structure .

This work is directly related to recent work of D. Blei and M Jordan [1] on organization of relational data bases of text documents .

Page 4: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

We illustrate the outcome of such organization on the MMPI ( Minnesota Multiphasic Psychological Inventory) questionnaire .

The bi-hierarchical tree engenders Tensor Haar Bases enabling quantitative assessments, such as filtering out anomalous responses , by measuring consistency, and providing detailed “analysis” (pun intended) .

Stromberg’s and Smolyak’s [13] ,observations about the efficiency of approximation of functions of bounded mixed variation in the tensor Haar basis is particularly useful in the statistical data analysis context of a database (or transposable arrays).

Page 5: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

Start by considering the problem of unraveling the geometric structure in a matrix. We view the columns or the rows as collections of points in high dimension whose geometry we need to discover. In principle we would like to permute rows and columns so that “nearby locations” after permutation will have similar values .

The matrix on the left is a permutation in rows and columns of the matrix below it .

The challenge is to unravel the various simple submatrices .

Page 6: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.
Page 7: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

ARO  MURI  Opportunis'c    Sensing,      October  2009  

Page 8: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

ARO  MURI  Opportunis'c    Sensing,      October  2009  

Page 9: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

ARO  MURI  Opportunis'c    Sensing,      October  2009  

Page 10: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.
Page 11: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

A permutation of the rows and columns of the matrix sin(kx).

On the left we recover the one dimensional geometry of x (which is oversampled ), while on the right we recover the one dimensional geometry of k

Page 12: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.
Page 13: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.
Page 14: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

The  same  approach  of  organizing  an  image  as  a  ques3onnaire  ,is  effec3ve  for  texture  segmenta3on.      Here  we  associate  with  each  pixel  the  log  values  of  the  fourier  coefficients  of  the  11X11  square  centered  at  the  pixel  .      The  middle  image  shows  folders  at  a  level  before  last  ,observe  the  spot  in  the  middle  of  the  brown  .  The  image  on  the  right  is  a  good  segmenta3on  of  the  textures  .    Observe  that  no  assump3ons  or  filters  were  given  ,  this  can  be  done  as  easily  without  using  the  FT.  

Page 15: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

The next slide represents a similar organization in the vocabulary of a body of Science News documents . The vocabulary is grouped by the functional usage within the documents .

The geometry of the vocabulary is presented in such a way that the Euclidean distance in the display represents the affinity of the words as measured by the documents .

Page 16: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.
Page 17: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.
Page 18: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.
Page 19: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.
Page 20: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

The simplest joint organization is achieved as follows

Assuming an initial hierarchical organization of the columns of the database (see later) into contextual folders ( for example groups of responders which are similar at different “scales” ) use these folders to assign new response coordinates to each row (question), for example an average response of the demographic group.

Use the augmented response coordinates to organize responses into a conceptual hierarchy of folders of rows which are similar across the population of columns.

We then use the conceptual folders to augment the response of the columns and to reorganize them into a more precise contextual hierarchy .

This process is iterated as long as an “entropy “ of the database is being reduced .

Page 21: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

More precisely the bi-hierarchical geometry described above is obtained through a process of mutual learning , using Haar features which are selected according to their effectiveness in capturing the information of the Database, we view this organization as the underlying observational “organized memory”.

We extend the process to achieve learning, or functional extrapolation as follows: Introduce a function whose values are known, on a small subset, of the data. Start by extending it to the rest of the data using the Haar extrapolation. For example pick the minimizer of the norm of the Haar coefficients over all extrapolations .

Add the extended extended functions as a new row of the database to force a reorganization of “memory “ by its relevance to the function and its variability and iterate.

l1

Page 22: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

Observe that whenever we have a partition of data into a tree of subsets, we can associate with the tree an orthonormal basis constructed by orthogonalization of the characteristic functions of subsets of a parent node, first to the parent, and then to each other, as seen below.

This is precisely the construction of Haar wavelets on the binary tree of dyadic intervals or on a quadtree of dyadic squares .

Page 23: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

The tensor product basis indexed by bi-folders, or rectangles in the data base is used to expand the full data base .

The geometry is iterated until we can no longer reduce the entropy of the tensor-Haar expansion of the data base.

Page 24: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

hR(x, y) = h

I(x)h

J( y)

aR = f (x,y)hR∫ (x, y)dxdy, f (x, y) = aRR∑ h

R(x, y)

| aR |< c | R |1/2+β ⇔

f (x, y ') = f (x, y) + f (x ', y ') − f (x ', y) + O(d (x, x ')β

D( y, y ')β

)

In the setting of a tensor product of two trees , we relate predictability to entropy. Let R=IxJ be a bi-folder where I is a folder in the column tree with associated metric d , while J is a folder in the row tree with associated metric D , |R|=|I||J| is the volume of the “rectangle” R, f represents a data base matrix or a function on the product of the column graph with the row graph.

Page 25: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

Let f be such that eα ≤ 1, then there is a decreasing sequence of sets El such that |E

l|≤ 2− l− l

and a decomposition ( of Calderon Zygmund type )

f = gl+b

l where b

l is supported

on El . and g

l is bi- Holder β=1/α -1/2 with constant 2

(l+1)/α

or equivalently with Haar coefficients satisfying aR < 2(l+1)/α

R1/α

Page 26: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

Diffusions  between  A  and  B  have  to  go  through  the  boHleneck  ,while  C  is  easily  reachable  from  B.  The  Markov  matrix  defining  a  diffusion  could  be  given  by  a  kernel  ,  or  by  inference  (infec'on)  between  neighboring  nodes.            The  diffusion  distance    d    accounts  for  preponderance  of  inference  links  of  length  t.  The  shortest  path  between  A  and  C  is  roughly  the  same  as  between  B  and  C  .  The  diffusion  distance  however  is  larger  since  diffusion  occurs  through  a  boHleneck.  

Diffusion  Geometry  

Page 27: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

A simple empirical diffusion matrix A can be constructed as follows

Let represent normalized data ,we “soft truncate” the covariance matrix as

A is a renormalized Markov version of this matrix

The eigenvectors of this matrix provide a local non linear principal component analysis of the data . Whose entries are the diffusion coordinates These are also the eigenfunctions of a discrete Graph Laplace Operator.

This  map  is  a  diffusion  (at  'me  t)  embedding  into  Euclidean  space    

Page 28: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

Observe  that  in  general  any  posi've  kernel  with  spectrum    as  above  can  give  rise  to  a  natural  

orthogonal  basis  as  well  as  a  natural  mul'scale  analysis.  

Page 29: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

The  First  two  eigenfunc'ons  organize  the  small  images  which  were  provided  in  random  order,  in  fact  assembling  the  3D  puzzle.  

Page 30: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

Diffusion  as  a  search  mechanism.  Star'ng  with  a  few  labeled  points  in  two  classes  ,  the  points  are  iden'fied  by  the  “preponderance  of  evidence”.  (Szummer  ,Slonim,  Tishby…)  

Page 31: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

The  image  on  the  leV  is  projected  into  the  three  dimensional  space  spanned  by  the  eigenvectors    5  ,8  10  which  are  ac've  on  the  scarf  

The  image  above  is  viewed  as  a  data  base  of  all  sub  images  of  size  5x5,  natural  structures  are  discovered  through  projec'ons  on  various  subspaces.  

Page 32: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

The  mul'scale  organiza'on  algorithm  to  build  a  Hierarchy  proceeds  as  follows  .    Start  with  a  disjoint  par''on  of  the  graph  into  clusters  of  diameter    between  1  and  2  rela've  to  the  distance  at  scale  1  .    Consider  the  new  graph  formed  by  le\ng  the  elements  of  the  par''on  be  the  ver'ces  using  the  distance  between  sets    and  affinity  between  sets  described  above  we  repeat.    

Page 33: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

On  this  graph  we  par''on  again  into  clusters  of  diameter  between  1  and  2  rela've  to  the  set  distance  (we  double  the  'me  scale  )  and  redefine  the  affinity  between  clusters  of  clusters  using  the  previously  defined  affinity  between  sub  clusters.    

Iterate  un'l  only  disjoint  clusters  are  leV.  Another  approximate  version  of  this  algorithm  is  to  embed  the  data  using  a  diffusion  map  into  Euclidean  space  and  pull  back  a  Euclidean  based  version  of  the  above  .  

Page 34: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

4 Gaussian Clouds

Page 35: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

A simple example: black disk on white background:

Above are represented the first 4 prolates in the image space (image domain vs. prolate value).

1.  Prolates 1 and 2 capture the ratio of black pixels over white pixels.

2.  Prolates 3 and 4 capture the angle q

3.  Locally, 2 prolates are sufficient to describe the data

q

Page 36: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

If a set in high dimensions can be parametrized by ,say the unit square in 2 dimensions, such parmetrization will define an induced metric on the square .

For example the set of images of 8x8 squares below are naturally parametrized by their average and orientation of the edge .Their distance in 64 d is roughly the square root of the usual metric.

Page 37: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.
Page 38: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

“Conceptual folders of patches” correspond to original patches , small curvelets , and regional boundaries . This “ concepts” for any black and white image with smooth boundaries .

Page 39: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

References

[1]. . Blei, D.M., Griffiths, T.L., Jordan, M.I. (2010). The Nested Chinese Restaurant Process and Bayesian Nonparametric Inference of Topic Hierarchies. Journal of the ACM, Vol. 57, No. 2, Article 7, January 2010. [2] R. Coifman and G. Weiss, Analyse Harmonique Noncommutative sur Certains Espaces Homogenes, Springer-Verlag, 1971 [3] R. Coifman ,G. Weiss, Extensions of Hardy spaces and their use in analysis. Bul. Of the A.M.S., 83, #4, 1977, 569-645. [4] Belkin, M., & Niyogi, P. (2001). Laplacian eigenmaps and spectral techniques for embedding and clustering. Advances in Neural Information Processing Systems 14 (NIPS 2001) (p. 585). [5]Belkin, M., & Niyogi, P. (2003a). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 6, 1373{1396. [6]Coifman, R. R., Lafon, S., Lee, A., Maggioni, M.,Nadler, B., Warner, F., & Zucker, S. (2005a) . Geometric diffusions as a tool for harmonic analysis and structure defnition of data. part i: Diffusion maps.Proc. of Nat. Acad. Sci., 7426{7431. [7] Coifman R.R.,S Lafon, Diffusion maps, Applied and Computational Harmonic Analysis, 21: 5-30, 2006. [8] Coifman R.R., B.Nadler, S Lafon, I G Kevrekidis, Diffusion maps, spectral clustering and reaction coordinates of dynamical systems, Applied and Computational Harmonic Analysis, 21:113-127, 2006.

Page 40: Analy’c!!Organizaon!of!high!dimensional!!!!!!! observaonal ......The geometry of the vocabulary is presented in such a ... Soviet Math. Dokl. 4 240-243. Russian original in Dokl.

11. Coifman RR M. Gavish: Harmonic Analysis on Digital Data Bases To appear in 20 years of wavelets conference proceedings 2011

12. R. Coifman, M Gavish Tensor product based approximation of empirical functions and analysis of data bases. To appear ACHA 2011

13 Smolyak 1963, Quadrature and interpolation formulas for tensor products of certain classes of functions, Soviet Math. Dokl. 4 240-243. Russian original in Dokl. Akad. Nauk SSSR 148 (1963), 1042-1045.

14. A detailed video lecture on this topic can be obtained at http://videolectures.net/mlss09us_coifman_mghadb/

9.Ronald R Coifman1, Mauro Maggioni1, Steven W Zucker1 and Ioannis G Kevrekidis “Geometric diffusions for the analysis of data from sensor networks” Current Opinion in Neurobiology 2005, 15:576–584

10. Ham J, Lee DD, Mika S: Scholkopf: “A kernel view of the dimensionality reduction of manifolds”. In Proceedings of the XXI Conference on Machine Learning, Banff, Canada, 2004