- Second Exam Presentation Low Rank Matrix Approximation ... · Given an m n matrix A, we are often...
Transcript of - Second Exam Presentation Low Rank Matrix Approximation ... · Given an m n matrix A, we are often...
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Second Exam PresentationLow Rank Matrix Approximation
John Svadlenka
City University of New YorkGraduate Center
Date Pending
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Outline
1 Introduction
2 Classical Results
3 Approximation and Probabilistic Results
4 Randomized Algorithms - Strategies and Benefits
5 Research Activity
6 Open Problems and Future Research Directions
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Problem DefinitionOverview of Conventional AlgorithmsRelated ProblemsMotivation for New Approaches
Given an m × n matrix A, we are often interested in approximatingA as the product of an m × k matrix B and a k × n matrix C .
A ≈ B · C
Why?
Provided it is true that k � min(m, n):
Arithmetic cost of matrix vector product is 2(m + n)k
Storage space of matrix A is (m + n)k
(m + n)k � m × n
We denote the product B · C as a rank k approximation of A
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Problem DefinitionOverview of Conventional AlgorithmsRelated ProblemsMotivation for New Approaches
More formally, we seek a rank k matrix approximation of matrix Afor some ε > 0 such that:
‖A− Ak‖ ≤ (1 + ε)‖A− Ak‖
Ak is the theoretical best rank k approximation of A
Matrix norms are Frobenius ‖· ‖F or Spectral ‖· ‖2
||A||2F :=∑m,n
i ,j=1 |aij |2 ||A||2 := sup||v||2=1 ||Av||2
Ak can be computed from the SVD with cost O((m + n)mn). Sowe seek less costly approaches. Why?
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Problem DefinitionOverview of Conventional AlgorithmsRelated ProblemsMotivation for New Approaches
Suppose m = n and compare mn(m + n) = 2n3 with n2 log n:
n n3 n2 log n
10 1,000 332100 1.00e+06 66,400
1,000 1.00 e+09 1.00 e+0610,000 1.00 e+12 1.33 e+09
Consider the above statistics in light of some recent trends:
Conventional LRA does not scale for Big Data purposes
Approximation algorithms are increasingly preferred
Applications utilizing numerical linear algebra are expandingbeyond traditional scientific and engineering disciplines
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Problem DefinitionOverview of Conventional AlgorithmsRelated ProblemsMotivation for New Approaches
Conventional LRA algorithms generate decompositions, mostimportant of these are SVD, Rank-Revealing QR (RRQR), andRRLU:
Singular Value Decomposition (SVD) [Eckhart-Young]
Let A be an m × n matrix with r = rank(A) whose elements maybe complex. Then there exists two unitary matrices U and V , andan m × n diagonal matrix Σ with nonnegative elements σi , whereσ1 ≥ σ2 ≥ · · · ≥ σr > 0 and σj = 0 for j > r , such that:
A = UΣV ∗
U and V are m ×m and n × n, respectively.
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Problem DefinitionOverview of Conventional AlgorithmsRelated ProblemsMotivation for New Approaches
QR Decomposition
Let A be an m × n matrix with m ≥ n whose elements may becomplex. Then there exists an m × n matrix Q and an n × nmatrix R such that
A = QR
where the columns of Q are orthonormal and R is upper triangular.
Cost O(mnmin (m, n)) is lower than that for SVD.
There are several efficient strategies to orthogonalize A
Column i of A is the linear combination of columns of Q withthe coefficients given by column i of R
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Problem DefinitionOverview of Conventional AlgorithmsRelated ProblemsMotivation for New Approaches
The LRA problem is also significant for these related subjects:
Principal Component Analysis
Clustering Algorithms
Tensor Decomposition
Rank Structured Matrices
But a series of recent trends have provided impetus for newapproaches to LRA...
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Problem DefinitionOverview of Conventional AlgorithmsRelated ProblemsMotivation for New Approaches
Consider these examples of Emerging Applications and Big Data:
New disciplines: Machine Learning, Data Science, ImageProcessing
Modern Massive Data Sets from Physical Systems Modelling,Sensor Measurements, Internet
New Fields: Recommender Systems, Complex Systems Science
Classical LRA algorithms and their implementations, thoughwell-developed over many years, are characterized by:
Limited parallelization opportunities
Relatively high computational complexity
Memory bottlenecks with out-of-core data sets
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Eckhart-Young Theorem For SVDLow Rank Format and Matrix DecompositionsQR DecompositionSkeleton (CUR) DecompositionInterpolative DecompositionDecomposition Summary
[Eckhart-Young Theorem]
Let A ∈ Cm×n and let Ak be the truncated SVD of rank k whereUk , Vk , and Σk are m× k, n× k and k × k, respectively. We have:
Ak = UkΣkV∗k
Then the approximation errors are defined as below. Furthermore,these are the smallest errors of any rank k approximation of A.
‖A− Ak‖2 = σk+1
‖A− Ak‖F =
√√√√√min(m,n)∑j=k+1
σ2j
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Eckhart-Young Theorem For SVDLow Rank Format and Matrix DecompositionsQR DecompositionSkeleton (CUR) DecompositionInterpolative DecompositionDecomposition Summary
Given a rank-k SVD representation of a matrix we may generateits low rank format:
Ak = Uk · (ΣkV∗k )
Other decompositions consist of matrix factors being orthogonal orhaving a row and/or column subset of the original matrix:
RRQR
UTV
CUR
Interpolative Decomposition (ID) (one-sided and two-sided)
We may generate a low rank format similarly with:W = CUR = [CU]R = C [UR]W = UTV = (UT )V = U(TV )
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Eckhart-Young Theorem For SVDLow Rank Format and Matrix DecompositionsQR DecompositionSkeleton (CUR) DecompositionInterpolative DecompositionDecomposition Summary
Existence of a QR factorization for any matrix can be proven inmany ways. For example, it follows from Gram-Schmidtorthogonalization:
Theorem
Suppose (a1, a2, . . . , an) is a linearly independent list of vectors ofa fixed dimension. Then there is an orthonormal list of vectors(q1, q2, . . . , qn) such that span(a1, a2, . . . , an) =span(q1, q2, . . . , qn).
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Eckhart-Young Theorem For SVDLow Rank Format and Matrix DecompositionsQR DecompositionSkeleton (CUR) DecompositionInterpolative DecompositionDecomposition Summary
Shortcomings of Gram-Schmidt QR algorithm wrt LRA:
Problem: The algorithm may fail if rank(A) < nSolution: Introduce a column pivoting strategyImpact: A = QRP where P is a permutation matrix
Problem: Rounding error impacts orthogonalizationSolution: Normalize qi before computing qi+1
Solution: Compute q′i s up to some epsilon tolerance
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Eckhart-Young Theorem For SVDLow Rank Format and Matrix DecompositionsQR DecompositionSkeleton (CUR) DecompositionInterpolative DecompositionDecomposition Summary
Skeleton (CUR) Decomposition Theorem
Let A be an m × n matrix of rank k of real elements withrank(A) = k . Then there exists a nonsingular k × k submatrix Aof A.
Moreover, let I be and J be the index sets of the rows and columnsof A, respectively, in A. Then A = CUR where U = A−1 andC = A(1..m, J) and R = A(I , 1..n).
A set of k columns and rows captures A′s column, row spaces
Skeleton is in contrast to SVD’s left and right singular vectors
Can use QRP or LUP algorithms to find the submatrix A
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Eckhart-Young Theorem For SVDLow Rank Format and Matrix DecompositionsQR DecompositionSkeleton (CUR) DecompositionInterpolative DecompositionDecomposition Summary
Interpolative Decomposition Lemma
Suppose A is an m × n matrix of rank k whose elements may becomplex. Then there exists an m × k matrix B consisting of asubset of columns of A and a k × n matrix P such that:
A = B · PThe Ik matrix appears in some column subset of P
|pij | ≤ 1 for all i and j
ID more appropriate for data analysis purposes
Also appropriate if properties of A required in decomposition
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Eckhart-Young Theorem For SVDLow Rank Format and Matrix DecompositionsQR DecompositionSkeleton (CUR) DecompositionInterpolative DecompositionDecomposition Summary
What type of decomposition is better? It depends...
NLA Theoretician’s point of view: orthogonal matrices are better.
Input error propagation minimizedOrthogonal bases reduce amount of arithmeticThey preserve vector and matrix properties inmultiplicationBut are not easy to understand for data analysis
Data Analyst’s perspective: submatrices are better.
Preserve structural properties of original matrixEasier to understand in application termsBut may not be well-conditioned
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Rationale for Approximation AlgorithmsDimension ReductionColumn and Row SamplingCUR and Maximum Volume
The case for approximation approaches to LRA? A large set ofresults concerning:
Random Matrices and subspace projections
Existential Results for rank k approximations
Column and/or Row Sampling
Matrix skeletons (CUR) and volume maximization
New algorithmic approaches:
Process some matrix much smaller than the original
Provide arbitrary accuracy up to machine precision
Employ adaptive and non-adaptive strategies
Separate randomized and deterministic processing
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Rationale for Approximation AlgorithmsDimension ReductionColumn and Row SamplingCUR and Maximum Volume
Johnson-Lindenstrauss Lemma [1984]
Let X1,X2, . . .Xn ∈ Rd . Then for ε ∈ (0, 1) there exists Φ ∈ Rk×d
for k = O( 1ε2 log n) such that:
(1− ε)‖Xi − Xj‖2 ≤ ‖ΦXi − ΦXj‖2 ≤ (1 + ε)‖Xi − Xj‖2
Distances among vectors in Euclidean space approximatelypreserved in lower dimensional space independent of d
Matrix vector multiplication is O(d log n) for each Xi
Dasgupta and Gupta (2003) proved that standard Gaussianmatrices with i.i.d. N(0, 1) can be used for Φ
Achlioptas (2003) showed that random {+1,-1} entries suffice.
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Rationale for Approximation AlgorithmsDimension ReductionColumn and Row SamplingCUR and Maximum Volume
Next major result: matrix vector multiplication in O(d log d + |P|).
Fast Johnson-Lindenstrauss Transform [Ailon Chazelle 2006]
Let Φ = PHD P ∈ Rk×d H,D ∈ Rd×d d = 2l
Pij ∼ N(0, q−1) with probability q
Pij = 0 with probability 1− q q = min(Θ( log2 nd ), 1)
H2 =(
d− 12 d− 1
2
d− 12 −d− 1
2
)and H2q :=
(Hq Hq
Hq −Hq
)q = 2h, h = 1, . . . l
D is a diagonal matrix with dii drawn uniformly from {1,−1}.Then we have that with probability 2
3 that:
(1− ε)k‖Xi‖2 ≤ ‖ΦXi‖2 ≤ (1 + ε)k‖Xi‖2
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Rationale for Approximation AlgorithmsDimension ReductionColumn and Row SamplingCUR and Maximum Volume
Relative-Error Bound (Frobenius norm) [Sarlos 2006]
Let A ∈ Rm×n. If Φ is an r × n J-L transform with i.i.d. zero meanentries {−1,+1} for r = Θ(kε + k log k) and if ε ∈ (0, 1), thenwith probability ≥ .5, we have that:
‖A− ProjAΦT ,k(A)‖F ≤ (1 + ε)‖A− Ak‖F
where ProjAΦT ,k(A) is the best rank k approximation of the
projection of A in the column space of AΦT .
Papadimitriou et al. (2000) first applied random projections forLatent Semantic Indexing (LSI) and derived an additive errorbound result.
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Rationale for Approximation AlgorithmsDimension ReductionColumn and Row SamplingCUR and Maximum Volume
A relative-error bound in the spectral norm uses a power iterationto offset any slow singular value decay.
Relative-Error Bound (Spectral norm) [Halko et al. 2011]
Let A ∈ Rm×n. If B is an n × 2k Gaussian matrix andY = (AA∗)qAB such that q is a small non-negative integer and 2kis the target rank approximation where 2 ≤ k ≤ 0.5min{m, n}then:
E‖A− ProjY ,2k(A)‖2 ≤[
1 + 4
√2 min (m, n)
k − 1
] 12q+1
‖A− Ak‖2
A power iteration increases A′s largest singular valuesimproving accuracy
A refined proof [Woodruff 2014] gave a rank-k approximation
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Rationale for Approximation AlgorithmsDimension ReductionColumn and Row SamplingCUR and Maximum Volume
From the Relative-Error Bound results of Sarlos and Halko et al.:
With l > k random linear combinations of A′s columns ⇒We can obtain a rank k approximation of A
How and why?
Multiplying A by random vector x gives y ∈ colspace(A)
With high probability y ′s are linearly independent
We get a new approximate basis A for A with dimension l
Project A on to A
Get a rank k matrix approximation of this projection
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Rationale for Approximation AlgorithmsDimension ReductionColumn and Row SamplingCUR and Maximum Volume
Consider the existence result of Ruston (1962) for a collection of kcolumns, C , in A ∈ Rm×n:
‖A− CC †A‖2 ≤√
1 + k(n − k)‖A− Ak‖2
The CX approximation is A ≈ CX where X := C †A
Sampling with Euclidean norms of matrix columns [FriezeKannan Vempala 2004] to get additive error bounds
Sampling according to the top right singular vectors [BoutsidisMahoney Drineas 2010] for relative error bounds
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Rationale for Approximation AlgorithmsDimension ReductionColumn and Row SamplingCUR and Maximum Volume
Another approach to LRA extends column sampling to also includerow sampling:
Extensions to both CX probability distribution approaches
Approximation error proportional to square of the CX error
General Approach:
Sample c columns of A to get C as in CX
Sample r rows from A using a probability distributionconstructed from C
Re-scale the selected rows and columns
Additional processing steps to get an LRA
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Rationale for Approximation AlgorithmsDimension ReductionColumn and Row SamplingCUR and Maximum Volume
More recent directions include CUR with volume sampling:
Pseudo-Skeleton Approximation [Goreinov et al 1997]
Suppose A ∈ Rm×n. Then there exists a set of k columns androws, C and R, in A as given by their index sets c and r ,respectively, and a matrix U ∈ Rk×k such that:
‖A− CUR‖2 ≤ O(√k(√m +
√n))‖A− Ak‖2
Maximal Volume for LRA [Goreinov and Tyrtyshnikov 2001]
Suppose A is a CUR approximation of the form given above andU = A(r , c)−1. If A(r , c) has maximal determinant modulus of allk × k submatrices of A, then
‖A− A‖C ≤ (k + 1)‖A− Ak‖2
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Rationale for Approximation AlgorithmsDimension ReductionColumn and Row SamplingCUR and Maximum Volume
CUR approximation of A depends on finding a sufficiently largevolume submatrix:
Submatrix is the intersection of C and R in the CUR
Volume quantifies the orthogonality of matrix columns
It is NP-hard to find a submatrix of maximal volume
Greedy algorithms find approximate maximal volume
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Dimension ReductionVariations on Dimension Reduction AlgorithmTradeoffs with Randomized MapsCUR Decomposition and Cross Approximation
This random projection algorithm follows from J-L Lemma andRelative-Error Bound Results:
Input: A ∈ Rm×n
Input: rank k , oversampling parameter pOutput: B ∈ Rm×(k+p), C ∈ R(k+p)×n
l ← k + p
Construct random Gaussian matrix G ∈ Rn×l
Y ← A · GGet an orthogonal basis matrix Q for YB ← QC ← Q∗ · AOutput B, C
Algorithm 1: Dimension Reduction [Halko et al 2011]
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Dimension ReductionVariations on Dimension Reduction AlgorithmTradeoffs with Randomized MapsCUR Decomposition and Cross Approximation
To get a rank l SVD approximation for A using algorithmoutput:
1 Run an SVD algorithm on the matrix C = UΣV ∗
2 U ← B · U
Comments on the algorithm
Algorithm itself uses conventional steps on smaller matrices
Matrix Matrix mult. (block operation) preferable for A
Costliest step is Y ← A · G requiring O(mnl) ops
QR factorization may avoid overhead of column pivoting
Oversampling parameter typically higher with other randommatrices
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Dimension ReductionVariations on Dimension Reduction AlgorithmTradeoffs with Randomized MapsCUR Decomposition and Cross Approximation
Other possibilities:
Introduce parallelism for matrix matrix multiplication
SRFT/SHRT random multipliers reduce multiplication cost toO(mn log l)
Superfast abridged (sparse) versions of SRFT/SHRT allowfurther cost reduction though no probability guarantee.
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Dimension ReductionVariations on Dimension Reduction AlgorithmTradeoffs with Randomized MapsCUR Decomposition and Cross Approximation
Subsampled Random Hadamard Transform (SRHT) is√
nl DHR
D ∈ Cn×n is diagonal matrix of random {-1, +1} entriesH is the Hadamard matrixR ∈ In×l has random columns from the identity matrix
Gaussian random matrices
Have to generate n × l entries, also expensive multiplication
Probability of failure is 3e−p
Fast SRFT/SRHT
Recursive Divide and Conquer ⇒ smaller complexity cost
Only n + l random entries needed
Probability of failure rises: O( 1k ) for rank-k approximation
Non-sequential memory access ⇒ memory bottlenecks
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Dimension ReductionVariations on Dimension Reduction AlgorithmTradeoffs with Randomized MapsCUR Decomposition and Cross Approximation
In general, desirable properties of random multipliers include:
Orthogonal
Sparse (but not too sparse)
Structured
Questions to consider with regard to SRFT/SRHT:
Are there alternatives that do not have the memory issues?
Concerns of FFT with limited parallelization
Alternatives - tradeoff arithmetic complexity for bettermemory performance and parallelization?
Can we have the best of both worlds?
Results on different multipliers to be shown from my own research. . .
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Dimension ReductionVariations on Dimension Reduction AlgorithmTradeoffs with Randomized MapsCUR Decomposition and Cross Approximation
CUR Cross-Approximation
W1
W1
W1
W1
W2 W3
The first three recursive steps of a Cross Approximation algorithm output of three striped matrices W1, W2, and W3
Adapted from Low Rank Approximation: New Insights, Accurate Superfast Algorithms, Pre-processing and
Extensions, Victor Y. Pan, Qi Luan, John Svadlenka, Liang Zhao 2017
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Dimension ReductionVariations on Dimension Reduction AlgorithmTradeoffs with Randomized MapsCUR Decomposition and Cross Approximation
To complete the CUR approximation:
1 Form the matrix U by getting the inverse of A(I , J)
2 Set C = A(:, J) and R = A(I , :)
How to approximate the maximum volume:
Use RRLU or RRQR algorithms
Example: LU Factorization to generate an upper triangularmatrix [CT Pan 2000]
For triangular matrix T ∈ Rn×n : det(T ) =n∏
i=1tii
Goal is to maximize absolute values on T ′s diagonal
Involves column interchanges and searching for maximumabsolute-valued elements
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Dimension ReductionVariations on Dimension Reduction AlgorithmTradeoffs with Randomized MapsCUR Decomposition and Cross Approximation
Some comments on the CUR Cross Approximation:
As with Dimension Reduction, runs an algorithm on smallermatrix than A
Each pass through the algorithm’s loop requires onlyO((m + n)k2) ops
Implications of not using all matrix entries in the algorithm?
How to parallelize this algorithm? Perhaps Divide andConquer approach with small blocks.
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Dimension ReductionCUR Approximation
Formulate random multipliers with the strategy:1 Utilize structured, sparse primitive matrices of random
(Gaussian, Bernoulli) variables to form families of randommultipliers B
2 B ∈ Rn×l , B =t∑
i=1Bi and t is a small constant
3 Bi are chosen and applied from the following classes:Abridged and Permuted Hadamard APH (with optional scalingS)Orthogonal Permutation matrix PInverse bidiagonal matrix IBD : (I + SZ )−1
S is a diagonal matrix, and Z =
0 . . . . . . 0
1. . .
. . . 0
0. . .
. . . 0... . . . 1 0
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Dimension ReductionCUR Approximation
Numerical Experiments: Relative errors with various multipliers
SVD-generated Matrices Laplacian Matrices
Multiplier Sum Mean Std Mean Std
Gaussian 1.07E-08 3.82E-09 2.05E-13 1.62E-13
ASPH, 2 IBD 1.23E-08 5.84E-09 1.69E-13 1.34E-13
ASPH, 3 IBD 1.33E-08 1.00E-08 1.98E-13 1.30E-13
3 IBD 1.18E-08 6.23E-09 1.78E-13 1.42E-13
APH, 3 IBD 1.28E-08 1.40E-08 2.33E-13 3.44E-13
APH, 2 IBD 1.43E-08 1.87E-08 1.78E-13 1.61E-13
ASPH, 1 P 1.22E-08 1.26E-08 2.21E-13 2.83E-13
ASPH, 2 P 1.51E-08 1.18E-08 3.57E-13 9.27E-13
ASPH, 3 P 1.19E-08 6.93E-09 2.24E-13 1.76E-13
APH, 3 P 1.26E-08 1.16E-08 2.15E-13 1.70E-13
APH, 2 P 1.31E-08 1.18E-08 1.25E-14 5.16E-14
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Dimension ReductionCUR Approximation
Investigate novel approaches that decrease computation:
Sum of IBD’s without APH, ASPH
IBD is a rank structured matrix: low rank off-diagonal blocks
Matrix Matrix Multiplication with IBD is O((n + l)m) ops
Good spatial and temporal locality (unlike SRFT/SRHT)
Generalize to other rank structured matrices?
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Dimension ReductionCUR Approximation
Our numerical experiments are promising, but new directions to beinvestigated and from computational perspective:
Incorporate approximate leverage scores
Avoid random memory access (max element searching,column and row interchanges)
Look for matrix matrix multiplication possibilities instead
Extensions to tensors?
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Dimension ReductionCUR Approximation
CUR Cross Approximation Benchmark Results
Inputs rank mean stdbaart 6 1.94e-07 3.57e-09shaw 12 3.02e-07 6.84e-09
gravity 25 3.35e-07 1.97e-07wing 4 1.92e-06 8.78e-09
foxgood 10 7.25e-06 1.09e-06inverse Laplace 25 2.40e-07 6.88e-08
Table: CUR approximation of benchmark 1000 × 1000 input matrices (atthe numerical rank of the input matrices) of discretized IntegralEquations from the San Jose University Singular Matrix Database
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Open Problems:
Do there exist random multipliers for Dimension Reductionsuch that Matrix Matrix multiplication can be done fasterthan O(mn log n)?
Does their exist a CUR approximation algorithm with arelative error (1 + ε) bound in the spectral norm?
Future Research Directions:
Theoretical, algorithmic, and computational research in Low RankApproximation, its applications, and related problem areas
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
Acknowledgements
I would like to thank my mentor, Professor Victor Pan, for histhoughtful guidance, insight, and support throughout my doctoraleducation. I am also grateful to Professors Feng Gu and xxxxx fortheir participation and interest as committee members for mySecond Exam. Thank you.
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
References I
N. Halko, P. G. Martinsson, J. A. Tropp, Finding Structurewith Randomness: Probabilistic Algorithms for ApproximateMatrix Decompositions, SIAM Review, 53, 2, 217–288, 2011.
M. W. Mahoney, Randomized Algorithms for Matrices andData, Foundations and Trends in Machine Learning, NOWPublishers, 3, 2, 2011. Preprint: arXiv:1104.5557 (2011)(Abridged version in: Advances in Machine Learning and DataMining for Astronomy, edited by M. J. Way et al., 647–672,2012.)
Woodruff, David P., Sketching as a tool for numerical linearalgebra, Foundations and Trends R© in Theoretical ComputerScience, 10, 1–2, 1–157, 2014.
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
References II
T. Sarlos, Improved Approximation Algorithms for LargeMatrices via Random Projections, Proceedings of IEEESymposium on Foundations of Computer Science (FOCS),143–152, 2006.
Golub, Gene H., and Christian Reinsch, Singular valuedecomposition and least squares solutions, Numerischemathematik, 14, 5, 403–420, 1970.
Axler, Sheldon Jay, Linear Algebra Done Right, Springer, NewYork, NY, 1997 (second edition).
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
References III
S. A. Goreinov, E. E. Tyrtyshnikov and N. L. Zamarashkin, Atheory of pseudo-skeleton approximation, Linear Algebra AndIts Applications, 261, 1–21,1997.
E. Liberty, F. Woolfe, P. G. Martinsson, V. Rokhlin and M.Tygert, Randomized algorithms for the low rank approximationof matrices, PNAS, 104, 51, 20167-20172, 2007.
F. Woolfe, E. Liberty, V. Rokhlin, and M. Tygert, A fastrandomized algorithm for the approximation of matrices,Technical Report YALEU/DCS/TR-1380, Yale UniversityDepartment of Computer Science, New Haven, CT, 2007.
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
References IV
W. B. Johnson and J. Lindenstrauss, Extension of Lipschitzmapping into Hilbert spaces, Proc. of modern analysis andprobability, Contemporary Mathematics, 26, 189-206, 1984.
N. Ailon and B. Chazelle, Approximate nearest neighbors andthe fast Johnson-Lindenstrauss transform, STOC 2006: Proc.38th Ann. ACM Theory of Computing, 557-563, 2006.
Drineas, Petros, Michael W. Mahoney, and S. Muthukrishnan,Relative-error CUR matrix decompositions, SIAM Journal onMatrix Analysis and Applications, 30, 2, 844-881, 2008.
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
References V
C. H. Papadimitriou, P. Raghavan, H. Tamaki and S. Vempala,Latent Semantic Indexing: A probabilistic analysis, Journal ofComputer and System Sciences, 61, 2, 217-235, 2000.
S. A. Goreinov, N. L. Zamarashkin and E. E. Tyrtyshnikov,Pseudo-skeleton approximations by matrices of maximalvolume, Mathematical Notes, 62, 4, 515-519, 1997.
S. A. Goreinov and E. E. Tyrtyshnikov, The maximal-volumeconcept in approximation by low-rank matrices, ContemporaryMathematics, 208, 47-51, 2001.
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
References VI
D. Achlioptas, Database-friendly random projections, Proc.ACM Symp. on the Principles of Database Systems, 274-281,2001.
C.-T. Pan, On the existence and computation ofrank-revealing LU factorizations, Linear Algebra and itsApplications, 316, 199–222, 2000.
V. Y. Pan, Structured Matrices and Polynomials: UnifiedSuperfast Algorithms, Birkhauser/Springer, Boston/New York,2001.
Pan, Victor, John Svadlenka, and Liang Zhao, FastDerandomized Low-rank Approximation and Extensions,CoRR, abs/1607.05801, 2016.
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
References VII
Rudelson, Mark, and Roman Vershynin, Non-asymptotictheory of random matrices: extreme singular values, CoRR,abs/1003.2990v2, 2010.
Dasgupta, Sanjoy, and Anupam Gupta, An elementary proof ofa theorem of Johnson and Lindenstrauss, Random Structuresand Algorithms, 22, 1, 60-65, 2003.
A. Frieze, R. Kannan and S. Vempala, Fast Monte-Carloalgorithms for finding low-rank approximations, Journal of theACM 51, 6, 1025-1041, 2004.
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
References VIII
Akin, Berkin, Franz Franchetti, and James C. Hoe, FFTs withnear-optimal memory access through block data layouts,Acoustics, Speech and Signal Processing (ICASSP), 2014IEEE International Conference on. IEEE, 2014.
Barba, Lorena A., and Rio Yokota, How will the fast multipolemethod fare in the exascale era, SIAM News, 46, 6, 1-3,2013.
M. Gu and S. C. Eisenstat, Efficient algorithms for computinga strong rank-revealing QR factorization, SIAM Journal ofScientific Computing, 17, 4, 848-869, 1996.
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
References IX
Lindtjorn, Olav, et al, Beyond traditional microprocessors forgeoscience high-performance computing applications, IeeeMicro, 31, 2, 41-49, 2011.
Ruston, A., Auerbach’s theorem and tensor products ofBanach spaces, Mathematical Proceedings of the CambridgePhilosophical Society, 58, 3,doi:10.1017/S0305004100036744, 476-480, 1962.
Cheng, Hongwei, et al., On the compression of low rankmatrices, SIAM Journal on Scientific Computing, 26, 4,1389-1404, 2005.
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
APPENDIX:Traditional Applications
Applications of matrix computations have typically included:
Physical Sciences and Engineering
Data Collection and Analysis
Computer Graphics
Biological and Life Sciences
The Theoretical Computer Science (TCS) perspective isincreasingly important:
Cross-fertilization of research in both fields
Demands of new applications of interest to TCS
Shortcomings of conventional LRA algorithms
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
APPENDIX: Two-sided Interpolative Decomposition
Two-sided Interpolative Decomposition Theorem [Cheng et al2005]
Let A be an m × n matrix and k ≤ min(m, n). Then there exists:
A = PL
(IkS
)AS
(Ik |T
)P∗R + X
such that PL and PR are permutation matrices. S ∈ C(m−k)×k andT ∈ Ck×(n−k) and X satisfy:
‖S‖F ≤√k(m − k)
‖T‖F ≤√
k(n − k)
‖X‖2 ≤ σk+1(A)√
1 + k(min(m, n)− k)
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
APPENDIX: Deterministic Algorithms
Theorem
Gram-Schmidt and QR Factorization: Suppose (a1, a2, . . . , an) is alinearly independent list of vectors in an inner product space V .Then there is an orthonormal list of vectors (q1, q2, . . . , qn) suchthat span(a1, a2, . . . , an) = span(q1, q2, . . . , qn).
Proof.
Let proj(r , s) := <r ,s><r ,r> r denote the projection of r on to s
w1 := a1
w2 := a2 − proj(a2,w1)...wn = an − proj(an,w1)− proj(an,w2)− · · · − proj(an,wn−1)q1 = w1/‖w1‖, q2 = w2/‖w2‖, . . . , qn = wn/‖wn‖
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
APPENDIX: Deterministic Algorithms
Re-arranging equations for w1,w2, . . . ,wn to be equations witha1, a2, . . . , an on the left-hand side and replacing wi with qi givesA = Q · R where
A = [a1, a2, . . . , an]
Q = [q1, q2, . . . , qn]
R =
< q1, a1 > < q1, a2 > < q1, a3 > . . . < q1, an >
0 < q2, a2 > . . . < q2, an−1 > < q2, an >0 0 < q3, a3 > . . . < q3, an >...
......
......
0 0 0 0 < qn, an >
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
APPENDIX: Deterministic Algorithms
As a QR Gram Schmidt alternative, consider an orthogonal matrixproduct Q1Q2 . . .Qn that transforms A to upper triangular form R
(Qn . . .Q2Q1)A = R
Multiplying both sides by (Qn . . .Q2Q1)−1, we have that:
(Qn . . .Q2Q1)−1(Qn . . .Q2Q1)A = (Qn . . .Q2Q1)−1R
A = Q1Q2 . . .QnR
A product of orthogonal matrices is also orthogonal so allowing forcolumn-pivoting we have that:
AΠ = Q1Q2 . . .QnR
A Householder reflection matrix is used for each Qi , i = 1, 2, . . . , nto transform A to R column-wise...
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
APPENDIX: Deterministic Algorithms
A Householder matrix vector multiplication Hx = (I − 2vvT )xreflects a vector x across the hyperplane normal to v .
Unit vector v is constructed for each Qi Householder matrix sothat entries of column i below the diagonal of A vanish.
x = (aii , aii+1, . . . , ain) for column i
v depends upon x and the standard basis vector eiThe matrix product Qi · A is applied
The above items are repeated for each column of A
Impact to QR algorithm:
Householder matrices improve numerical stability
But each matrix Qi is applied separately to A
Therefore, parallelism options are limited
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
APPENDIX: Deterministic Algorithms
SVD decomposition of A = UΣV occurs in two distinct steps:
1st Step: Use two sequences of Householder translations toreduce A to upper bidiagonal form:
B = Qn . . .Q2Q1AP1P2 . . .Pn−2
Therefore, we have that: A = Q1Q2 . . .QnBPn−2 . . .P2P1
2nd Step: Use two sequences of Givens rotations (orthogonaltransformations) to reduce B to diagonal form Σ
Σ = Gn−1 . . .G2G1BF1F2 . . .Fn−1
Likewise, we have that: B = G1G2 . . .Gn−1ΣFn−1 . . .F2F1
Set U := Q1Q2 . . .QnG1G2 . . .Gn−1
Set V := (F1F2 . . .Fn−1)∗(P1P2 . . .Pn−2)∗
IntroductionClassical Results
Approximation and Probabilistic ResultsRandomized Algorithms - Strategies and Benefits
Research ActivityOpen Problems and Future Research Directions
APPENDIX: Deterministic Algorithms
SVD cost is O(mn max(m,n))
QR cost is O(kmn) for a rank-k approximation
Random memory access (eg, column pivoting) contributes tomemory bottlenecks
This is especially the case for out-of-core data sets
The standard QR algorithm forms Q from a product ofHouseholder reflector matrices which permits better numericalstability.