Random Projections in Dimensionality Reduction

download Random Projections in Dimensionality Reduction

of 20

Transcript of Random Projections in Dimensionality Reduction

  • 8/11/2019 Random Projections in Dimensionality Reduction

    1/20

    RANDOM PROJECTIONS INDIMENSIONALITY REDUCTION

    APPLICATIONS TO IMAGE AND TEXT DATA

    Ella Bingham and Heikki Mannilangelo Cardoso

    IST/UTLNovember 2009

    1

  • 8/11/2019 Random Projections in Dimensionality Reduction

    2/20

    Outline

    1. Dimensionality ReductionMotivation

    2. Methods for dimensionality reduction1. PCA

    2. DCT3. Random Projection

    3. Results on Image Data

    4. Results on Text Data5. Conclusions

    2

  • 8/11/2019 Random Projections in Dimensionality Reduction

    3/20

    Dimensionality ReductionMotivation Many applications have high dimensional data

    Market basket analysis Wealth of alternative products

    Text

    Large vocabulary Image Large image window

    We want to process the data High dimensionality of data restricts the choice of data

    processing methods Time needed to use processing methods is too long Memory requirements make it impossible to use some

    methods

    3

  • 8/11/2019 Random Projections in Dimensionality Reduction

    4/20

    Dimensionality ReductionMotivation

    We want to visualize high dimensional data

    Some features may be irrelevant Some dimensions may be highly correlated with

    some other, e.g. height and foot size Intrinsic dimensionality may be smaller than

    the number of features

    The data can be best described and understoodby a smaller number dimensions

    4

  • 8/11/2019 Random Projections in Dimensionality Reduction

    5/20

    Methods for dimensionality reduction

    Main idea is to project the high-dimensional (d)space into a lower-dimensional (k) space

    A statistically optimal way is to project into alower-dimensional orthogonal subspace that

    captures as much variation of the data as possiblefor the chosen k The best (in terms of mean squared error ) and

    most widely used way to do this is PCA

    How to compare different methods?Amount of distortion caused Computational complexity

    5

  • 8/11/2019 Random Projections in Dimensionality Reduction

    6/20

    Principal Components Analysis (PCA)Intuition

    Given an original space in 2d

    How can we represent that points in a k-dimensional space (k

  • 8/11/2019 Random Projections in Dimensionality Reduction

    7/20

    Principal Components Analysis (PCA)Algorithm

    Eigenvalues A measure of how much data variance is

    explained by each eigenvector

    Singular Value Decomposition (SVD) Can be used to find the eigenvectors

    and eigenvalues of the covariancematrix

    To project into the lower-dimensionalspace Multiply the principal components (PCs)

    by X and subtract the mean of X in each

    dimension To restore into the original space

    Multiply the projection by the principalcomponents and add the mean of X ineach dimension

    Algorithm

    1. XCreate N x d data matrix,with one row vector xnperdata point

    2. X subtract meanx from eachdimensionin X

    3. covariance matrix of X

    4. Find eigenvectors andeigenvalues of

    5. PCsthe k eigenvectorswith largest eigenvalues

    7

  • 8/11/2019 Random Projections in Dimensionality Reduction

    8/20

    Random Projection (RP)Idea

    PCA even when calculated using SVD iscomputationally expensive Complexity is O(dcN)

    Where d is the number of dimensions, c is the average number of

    non-zero entries per column and N the number of points Idea What if we randomly constructed principal component

    vectors?

    Johnson-Lindenstrauss lemma If points in vector space are projected onto a randomly

    selected subspace of suitably high dimensions, then thedistances between the points are approximately preserved

    8

  • 8/11/2019 Random Projections in Dimensionality Reduction

    9/20

    Random Projection (RP)Idea Use a random matrix (R) equivalently to the principal

    components matrix R is usually Gaussian distributed Complexity is O(kcn)

    The generated random matrix (R) is usually notorthogonal Making R orthogonal is computationally expensive However we can rely on a result by Hecht-Nielsen:

    In a high-dimensional space, there exists a much larger number ofalmost orthogonal than orthogonal directions.

    Thus vectors with random directions are close enough toorthogonal

    Euclidean distance in the projected space can be scaled to theoriginal space by kd/

    9

  • 8/11/2019 Random Projections in Dimensionality Reduction

    10/20

    Random ProjectionSimplified Random Projection (SRP)

    Random matrix is usually gaussian distributedmean: 0; standart deviation: 1

    Achlioptas showed that a much simpler

    distribution can be used

    This implies further computational savings sincethe matrix is sparse and the computations can beperformed using integer arithmetic's

    10

  • 8/11/2019 Random Projections in Dimensionality Reduction

    11/20

    Discrete Cosine Transform (DCT)

    Widely used method for image compression

    Optimal for human eyeDistortions are introduced at the highest

    frequencies which humans tend to neglect asnoise

    DCT is not data-dependent, in contrast to PCA

    that needs the eigenvalue decomposition This makes DCT orders of magnitude cheaper to

    compute

    11

  • 8/11/2019 Random Projections in Dimensionality Reduction

    12/20

    ResultsNoiseless Images

    12

  • 8/11/2019 Random Projections in Dimensionality Reduction

    13/20

    ResultsNoiseless Images

    13

  • 8/11/2019 Random Projections in Dimensionality Reduction

    14/20

    ResultsNoiseless Images

    14

    Original space 2500-d (100 image pairs with 50x50 pixels)

    Error Measurement Average error on euclidean distance between 100

    pairs of images in the original and reduced space

    Amount of distortion RP and SRP give accurate results for very small k

    (k>10) Distance scaling might be an explanation for the success

    PCA gives accurate results for k>600 In PCA such scaling is not straightforward

    DCT still as a significant error even for k > 600

    Computational complexity Number of floating point operations for RP and SRP

    is on the order of 100 times less than PCA

    RP and SRP clearly outperform PCA and DCT atsmallest dimensions

  • 8/11/2019 Random Projections in Dimensionality Reduction

    15/20

    ResultsNoisy Images

    Images were corrupted bysalt and pepper impulsenoise with probability 0.2

    Error is computed in thehigh-dimensionalnoiselessspace

    RP, SRP, PCA and DCTperform quite similarly tothe noiseless case

    15

  • 8/11/2019 Random Projections in Dimensionality Reduction

    16/20

    ResultsText Data Data set

    Newsgroups corpus sci.crypt, sci.med, sci.space, soc.religion

    Pre-processing Term frequency vectors

    Some common terms were removed but no stemming was used Document vectors normalized to unit length

    Data was not made zero mean

    Size 5000 terms 2262 newsgroup documents

    Error measurement 100 pairs of documents were randomly selected and the error between

    their cosine before and after the dimensionality reduction was calculated

    16

  • 8/11/2019 Random Projections in Dimensionality Reduction

    17/20

  • 8/11/2019 Random Projections in Dimensionality Reduction

    18/20

    ResultsText Data The cosine was used as similarity

    measure since it is more commonfor this task

    RP is not as accurate as SVD The Johnson-Lindenstrauss result

    states that the euclidean distanceare retained well in randomprojection not the cosine

    RP error may be neglected inmost applications

    RP can be used on largedocument collections with lesscomputational complexity thanSVD

    18

  • 8/11/2019 Random Projections in Dimensionality Reduction

    19/20

    Conclusion

    Random Projection is an effective dimensionalityreduction method for high-dimensional real-worlddata sets

    RP preserves the similarities even if the data isprojected into a moderate number of dimensions

    RP is beneficial in applications where thedistances of the original space are meaningful

    RP is a good alternative for traditionaldimensionality reduction methods which areinfeasible for high dimensional data since it doesnot suffer from the curse of dimensionality

    19

  • 8/11/2019 Random Projections in Dimensionality Reduction

    20/20

    Questions20