Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241...
Transcript of Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241...
![Page 1: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/1.jpg)
Numerical Methods for Data Science:Latent Factor Models, Part I
David Bindel16 April 2019
Department of Computer ScienceCornell University
1
![Page 2: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/2.jpg)
Cornell CS 6241Spring 2018 + Fall 2019
SJTU CS 258June 2018 + May 2019
How do we teach numerical methods in a “data science” age?
• CS 4220 (S12): Data science projects in NA course• CS 6241 (S18): “Numerical methods for data science”• SJTU CS 258 (Jun 18): Undergrad version• CS 3220 (F19): “Computational math for CS”
Sabbatical project: book for use in Fall 19. Ask to hear more!2
![Page 3: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/3.jpg)
Lecture plan
Three threads from “lay of the land” to current research:
• Week 1: Latent Factor Models• Tue, Apr 16: Matrix data and decompositions• Thu, Apr 18: NMF and topic models
• Week 2: Scalable Kernel Methods• Tue, Apr 23: Structure and interpretation of kernels• Thu, Apr 25: Making kernel methods scale
• Week 3: Spectral Network Analysis• Tue, Apr 30: Network spectra, optimization, and dynamics• Wed, May 1: Network densities of states
3
![Page 4: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/4.jpg)
Matrix Data: Relations and Ranking
AliceBobCarolDan…
CasaBlanca
ForestGump
Rocky
TheMatrix
…
5 5 11 5 525 5
4
![Page 5: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/5.jpg)
Matrix Data: Images and Functions
5
![Page 6: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/6.jpg)
Matrix Data: Networks and Relations
PaulAlice Bob
Carol David
W =
0 0 wAC 0 0 wAP0 0 0 0 0 wBPwCA 0 0 0 0 wCP0 0 0 0 0 wDP0 0 0 0 0 0
6
![Page 7: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/7.jpg)
Matrix Data
• Relations between objects and features, e.g.• Documents and word frequencies (“bag of words” models)• User rankings of movies, songs, etc• Images and pixel values• Indicators for DNA markers in organisms• Treatments and outcome histograms• Snapshots of a state vector at different times
• Grid samples of bivariate functions• Image pixels indexed by row and column• PMF values for a discrete 2D random variable
• Relations between pairs of objects• Network relations (e.g. friendships)• Co-occurrence statistics, interaction frequencies, etc
Can generalize beyond pairwise data with tensor methods.
7
![Page 8: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/8.jpg)
Latent Factor Modeling
≈
A L M R
Map objects and features to latent coordinates, model data asbilinear function of coordinates:
aij ≈ li,:Mr:,j.
Unknowns: latent coordinates for points, bilinear form.
This is underdetermined — need constraints on L, M, R foruniqueness. 8
![Page 9: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/9.jpg)
Latent Factor Modeling
≈
A L M R
Model data arranged as a matrix A ∈ Rm×n by
A ≈ LMR, L ∈ Rm×r,M ∈ Rr×r,R ∈ Rr×n
perhaps with constraints on L, M, and R.
9
![Page 10: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/10.jpg)
Latent Factor Modeling
Model data arranged as a matrix A ∈ Rm×n by
A ≈ LMR, L ∈ Rm×r,M ∈ Rr×r,R ∈ Rr×n
perhaps with constraints on L, M, and R.
From this, we would like to
• Compress data• Remove “noise” (including outliers)• Fill in missing data• Cluster and classify objects• Find meaningful “parts” to data
with the Magic of Matrices.
10
![Page 12: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/12.jpg)
What is The Matrix?
A ∈ Rm×n is an m-by-n array of real numbers:
A =
a11 a12 . . . a1na21 a22 . . . a2n... . . . ...am1 am2 . . . amn
Is this right? Consider some aij meanings:• Image pixel at row i and column j• Departure of train j from station i• Frequency of word j in document i
None seem all that linear algebraic!
Seek useful puns between matrices and arrays.12
![Page 13: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/13.jpg)
What is The Matrix?
In LA, a matrix represents (w.r.t. basis choice):
Linear map L : V → W w = AvLinear operator L : V → V w = AvSesquilinear form b : V ×W → R b = w∗AvQuadratic form ϕ : V × V → R ϕ = v∗Av
Different possible attitudes toward bases• Pure LA: The LA object is the thing
• Think basis independent(except canonical choices)
• Numerical LA: Sometimes bases matter!• Sparsity, shape, non-negativity
• Data: The matrix (array) is the thing
13
![Page 14: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/14.jpg)
Matrices in Linear Algebra: Canonical Forms
General bases Orthonormal basesLinear map Rank/nullity SVD
(or bilinear form) A = XIkY∗ A = UΣV∗
Linear operator Jordan form Schur formA = VJV−1 A = UTU∗
Quadratic form Sylvester inertia Symm eigendecompA = X1X∗1 − X2X∗2 A = VΛV∗
What if basis choice (identity of rows and columns) matters?
• Natural transformations are basis permutations.• Permutation matrices ⊂ orthogonal matrices.• Orthogonal decompositions make good building blocks!
14
![Page 15: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/15.jpg)
The SVD and the Symmetric Eigenvalue Problem
SVD SEP
=
A U Σ V∗
=
• U, V orthonormal• Σ = diag(σ1, σ2, . . .)• σj descending, nonneg• Full: Σ rectangular• Economy: U rectangular
=
A ΛV V∗
• V orthonormal• Λ = diag(λ1, λ2, . . .)• λj descending• A = A∗
• Agrees with SVD if SPD15
![Page 16: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/16.jpg)
Construction
Variational construction of SVD (similar for SEP):
• Maximize ∥Av∥ over ∥v∥ = 1• Unique maximum value, no local maxima!• Result: Av1 = σ1u1 where u1 = Av1/∥Av1∥
• Maximize ∥Av∥ over ∥v∥ = 1 and v ⊥ v1• Again a “nice” optimization problem• Result: Av2 = σ2u2
• Continue in this fashion
Completely greedy is OK in principle! No need to backtrack.
16
![Page 17: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/17.jpg)
Matrices and Data: Low-Rank Approximation by SVD
=
A U Σ V∗
≈
Uk Σk V∗k
17
![Page 18: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/18.jpg)
Matrices and Data: Low-Rank Approximation by SVD
Consider the (economy) SVD of A ∈ Rm×n for m ≥ n
A = UΣV∗, U ∈ Rm×n,Σ ∈ Rn×n, V ∈ Rn×n
where U∗U = I, V∗V = I and Σ is diagonal with
σ1 ≥ σ2 ≥ . . . ≥ σn ≥ 0
Eckart-Young: Best rank k approximation is the truncated SVD
A = UkΣkV∗k.
True in Froebenius norm, spectral norm. Error is
∥A− A∥2F =n∑
j=k+1σ2j , ∥A− A∥2 = σk+1
18
![Page 19: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/19.jpg)
Actual Computation
SVD and SEP admit similar computational schemes
• Small and dense (up to a couple 1000, MATLAB eig/svd):• Orthogonal reduction to bidiagonal / tridiagonal (O(n3))• Cheaper reduction rest of the way to the decomposition
• Subspace iteration• Start with random orthonormal columns• Multiply by A (and maybe A∗) a couple times• RandNLA: Few times needed for OK accuracy (withoversample)
• Good for cache use, great for modest accuracy• Krylov methods (Matlab eigs and svds)
• Repeatedly multiply a few vectors by A (and maybe A∗)• Orthogonalize at each step (parallel bottleneck)• Various restarting and acceleration tricks• High accuracy for a few extreme pairs
19
![Page 20: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/20.jpg)
Some Computational Aspects
Big takeaways:
• We have good codes that (mostly) “just work”• People have thought through how to make them run fast• Have good backward stability properties• No misconvergence or sensitivity to starting point
These are great building blocks
20
![Page 21: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/21.jpg)
Example: PCA
−2 −1 0 1 2
−0.5
0
0.5
• Matrix rows represent different object properties• Typically center (subtract column means); may scale• Run SVD on resulting matrix• Dominant singular values are “principal components”
21
![Page 22: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/22.jpg)
Example: LDA
−1 −0.5 0 0.5 1 1.5
−1
0
1
• Matrix rows represent different object properties• Rows are labeled, want to discriminate these labels• Solution involves a generalized SEP• May be very different from PCA directions
22
![Page 23: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/23.jpg)
Example: Latent semantic analysis
Vector space model (Salton and Yang, 1975)
• Columns are documents, row entries are word frequency• Scale raw data (tf-idf)• Take SVD of resulting matrix• Rows of U ≈ latent word representations• Columns of V ≈ latent document representations• Useful for comparing words to words, docs to docs• Also useful for searching for docs by keywords
Problem: latent coordinates are hard to interpret!
I learned about this as an undergrad at UMD.Salton was a founding member of Cornell CS
23
![Page 24: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/24.jpg)
Example: Spectral Clustering
Goal: Cluster objects (rows) according to features
• Compute a truncated SVD: A ≈ UrΣrV∗r• Treat rows of UrΣr as latent coordinates• Run k-means to cluster points
Essentially givesA ≈ LC∗
where
lij =
1, item i in class j0, otherwise
C:,j = centroid of class j
Several types of spectral clustering – will discuss again later!
24
![Page 25: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/25.jpg)
The Story So Far
SVD and SEP provide useful dimension reduction
• Lower-dimensional “latent spaces” for further analysis• Latent space clarifies object similarity / dissimilarity• But interpreting the coordinate system is hard!
So what do we want beyond SVD and SEP?
• Linear dimension with interpretable structure (this week)• Nonlinear dimension reduction (a little coming up)
25
![Page 26: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/26.jpg)
Recall: The Latent Factor Framework
Factor data matrix A ∈ Rm×n as
A ≈ LMR, L ∈ Rm×r,M ∈ Rr×r,R ∈ Rr×n
with different structures on L, M, and R.
• R (or MR) as maps original features to “latent” features• Can impose constraints on L, M, and R• Orthogonality constraints gets us to SVD (easy)• Other types of constraints are harder• …but other constraints improve interpretability
26
![Page 27: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/27.jpg)
Factor Selection and Skeletonization
Idea: Express latent factors via rows/columns of A
• Improves interpretability (maybe — more tricks help)• May improve cost of working with matrix
• Can form part of a row/column without forming all• Useful in both experiments and computation
• But how do we choose good representative rows/cols?• Want things to be as linearly independent as possible• Also want good approximation quality• SVD provides a “speed of light” bound
27
![Page 28: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/28.jpg)
Interpolative Decomposition / Skeletonization
≈
AΠ C[I T
]
• C consists of the leading k columns of A• T = C†(AΠ):,k+1:n chosen to minimize Frobenius norm error• Exists some Π — not easily computed — such that
• Entries of T are at most 2• Singular values of
[I T
]in [1, 1+
√k(n− k)]
• Approximation error within 1+√k(n− k) of optimal
28
![Page 29: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/29.jpg)
CUR Decomposition
Idea: A ≈ CUR where
• C consists of columns of A• R consists of rows of A• U = C†AR† is optimal given C, R• Selection of good C and R is again the challenge
Can also consider symmetric variants where R = C∗.
29
![Page 30: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/30.jpg)
Pivoted QR
Pivoted QR decomposition is
AΠ = QR
where R is upper triangular, rii are positive and decreasing.
Idea: Be greedy (as in SVD), but choose among columns of A
• Choose column ai with maximum norm• Set r11 = ∥ai∥ and q1 = ai/r11 (so a1 = q1r1)• Orthogonalize vs q1: a′j = aj − q1rj1 with rj1 = qT1aj
• Choose modified column a′i with maximum norm• Set r22 = ∥a′i∥ and q2 = a′i/r22• Orthogonalize vs q2 to get new A′ matrix
• Keep repeating the process (re-ordered Gram-Schmidt)
Unlike SVD this is not optimal — but often pretty good.30
![Page 31: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/31.jpg)
Computing Pivoted QR
As with SVD, constructive definition = usual computation.
• Dense case is a standard building block (qr in MATLAB)• Ongoing work to improve parallelism and cache efficiency• Clever tournament pivoted (TSQR) for m≫ n
Again: A great building block to borrow from someone else!
31
![Page 32: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/32.jpg)
The Geometry of Pivoted QR
Chosen column is on the convex hull of a 1D projection=⇒ it is a point on the convex hull of the original columns.
We will use this fact on Thursday...
32
![Page 33: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/33.jpg)
Pivoted QR to Pivoted Cholesky
Symmetric positive definite matrices ≡ Gram matrices:
H = ATA
Decompose:
AΠ = QR =⇒ΠTHΠ = RTQTQR = RTR
Running pivoted Cholesky is equivalent to pivoted QR.
Esp useful when A is implicit (only access via H).Let’s look briefly at an example we’ll see again next week.
33
![Page 34: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/34.jpg)
Example: Pivoted Cholesky on a Kernel Matrix
Diagonal element: 1.00e+00
34
![Page 35: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/35.jpg)
Example: Pivoted Cholesky on a Kernel Matrix
Diagonal element: 6.77e-02
34
![Page 36: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/36.jpg)
Example: Pivoted Cholesky on a Kernel Matrix
Diagonal element: 1.91e-02
34
![Page 37: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/37.jpg)
Example: Pivoted Cholesky on a Kernel Matrix
Diagonal element: 5.11e-04
34
![Page 38: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/38.jpg)
Example: Pivoted Cholesky on a Kernel Matrix
Diagonal element: 1.19e-04
34
![Page 39: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/39.jpg)
Example: Pivoted Cholesky on a Kernel Matrix
Diagonal element: 4.18e-05
34
![Page 40: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/40.jpg)
Example: Pivoted Cholesky on a Kernel Matrix
Diagonal element: 8.54e-07
34
![Page 41: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/41.jpg)
Example: Pivoted Cholesky on a Kernel Matrix
Diagonal element: 3.58e-07
34
![Page 42: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/42.jpg)
Example: Pivoted Cholesky on a Kernel Matrix
Diagonal element: 1.92e-07
34
![Page 43: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/43.jpg)
Improving the Decomposition
Recall interpolative decomposition:
AΠ ≈ C[I T
]and can keep all |tij| ≤ 2 with right Π.
Idea: Start from pivoted QR and refine
• Compute truncated pivoted QR
AΠ = QR =[Q1 Q1
] [R11 R120 R22
]≈ Q1
[R11 R12
]• Set C = Q1R11 and T = R−111 R12• For large entries (|tij| > 2), swap columns πi and πr+j.• Recompute T; repeat swaps until happy.
35
![Page 44: Numerical Methods for Data Science: Latent Factor …bindel/present/2019-04-umcp...CornellCS6241 Spring2018+Fall2019 SJTUCS258 June2018+May2019 Howdoweteachnumericalmethodsina“datascience”age?](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feb3c0485d14f2c9f537f08/html5/thumbnails/44.jpg)
Next Time: Topics, Parts, and NMF
What if we want a low-rank factorization with more structure?
• Non-negativity?• Sparsity of factors?• Normalization for a probabilistic interpretation?
Hard in general, but there are some effective approaches.
Thursday: From non-negative matrix factorization to spectraltopic modeling
36