Fast matrix primitives for ranking, link-prediction and more
-
Upload
david-gleich -
Category
Technology
-
view
569 -
download
4
description
Transcript of Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, communities !
and more.
David F. Gleich!Computer Science!Purdue University!
Netflix David Gleich · Purdue 1
David Gleich · Purdue 2 Netflix
Models and algorithms for high performance !matrix and network computations
Netflix David Gleich · Purdue 3
18 P. G. CONSTANTINE, D. F. GLEICH, Y. HOU, AND J. TEMPLETON
1
error
0
2
(a) Error, s = 0.39 cm
1
std
0
2
(b) Std, s = 0.39 cm
10
error
0
20
(c) Error, s = 1.95 cm
10
std
0
20
(d) Std, s = 1.95 cm
Fig. 4.5: Error in the reduce order model compared to the prediction standard de-viation for one realization of the bubble locations at the final time for two values ofthe bubble radius, s = 0.39 and s = 1.95 cm. (Colors are visible in the electronicversion.)
the varying conductivity fields took approximately twenty minutes to construct usingCubit after substantial optimizations.
Working with the simulation data involved a few pre- and post-processing steps:interpret 4TB of Exodus II files from Aria, globally transpose the data, compute theTSSVD, and compute predictions and errors. The preprocessing steps took approx-imately 8-15 hours. We collected precise timing information, but we do not reportit as these times are from a multi-tenant, unoptimized Hadoop cluster where otherjobs with sizes ranging between 100GB and 2TB of data sometimes ran concurrently.Also, during our computations, we observed failures in hard disk drives and issuescausing entire nodes to fail. Given that the cluster has 40 cores, there was at most2400 cpu-hours consumed via these calculations—compared to the 131,072 hours ittook to compute 4096 heat transfer simulations on Red Sky. Thus, evaluating theROM was about 50-times faster than computing a full simulation.
We used 20,000 reducers to convert the Exodus II simulation data. This choicedetermined how many map tasks each subsequent step utilized—around 33,000. Wealso found it advantageous to store matrices in blocks of about 16MB per record. Thereduction in the data enabled us to use a laptop to compute the coe�cients of theROM and apply to the far face for the UQ study in Section 4.4.
Here are a few pertinent challenges we encountered while performing this study.Generating 8192 meshes with di↵erent material properties and running independent
Tensor eigenvalues"and a power method
28
Tensor methods for network alignment
Network alignment is the problem of computing an approximate isomorphism between two net-works. In collaboration with Mohsen Bayati, Amin Saberi, Ying Wang, and Margot Gerritsen,the PI has developed a state of the art belief propagation method (Bayati et al., 2009).
FIGURE 6 – Previous workfrom the PI tackled net-work alignment with ma-trix methods for edgeoverlap:
i
j
j
0i
0
OverlapOverlap
A L B
This proposal is for match-ing triangles using tensormethods:
j
i
k
j
0
i
0
k
0
TriangleTriangle
A L B
If xi, xj , and xk areindicators associated withthe edges (i, i0), (j, j0), and(k, k0), then we want toinclude the product xixjxk
in the objective, yielding atensor problem.
We propose to study tensor methods to perform network alignmentwith triangle and other higher-order graph moment matching. Similarideas were proposed by Svab (2007); Chertok and Keller (2010) alsoproposed using triangles to aid in network alignment problems.In Bayati et al. (2011), we found that triangles were a key missingcomponent in a network alignment problem with a known solution.Given that preserving a triangle requires three edges between twographs, this yields a tensor problem:
maximizeX
i2L
wixi +X
i2L
X
j2L
xixjSi,j +X
i2L
X
j2L
X
k2L
xixjxkTi,j,k
| {z }triangle overlap term
subject to x is a matching.
Here, Ti,j,k = 1 when the edges corresponding to i, j, and k inL results in a triangle in the induced matching. Maximizing thisobjective is an intractable problem. We plan to investigate a heuris-tic based on a rank-1 approximation of the tensor T and usinga maximum-weight matching based rounding. Similar heuristicshave been useful in other matrix-based network alignment algo-rithms (Singh et al., 2007; Bayati et al., 2009). The work involvesenhancing the Symmetric-Shifted-Higher-Order Power Method due toKolda and Mayo (2011) to incredibly large and sparse tensors . On thisaspect, we plan to collaborate with Tamara G. Kolda. In an initialevaluation of this triangle matching on synthetic problems, using thetensor rank-1 approximation alone produced results that identifiedthe correct solution whereas all matrix approaches could not.
vision for the future
All of these projects fit into the PI’s vision for modernizing the matrix-computation paradigmto match the rapidly evolving space of network computations. This vision extends beyondthe scope of the current proposal. For example, the web is a huge network with over onetrillion unique URLs (Alpert and Hajaj, 2008), and search engines have indexed over 180billion of them (Cuil, 2009). Yet, why do we need to compute with the entire network?By way of analogy, note that we do not often solve partial di↵erential equations or modelmacro-scale physics by explicitly simulating the motion or interaction of elementary particles.We need something equivalent for the web and other large networks. Such investigations maytake many forms: network models, network geometry, or network model reduction. It is thevision of the PI that the language, algebra, and methodology of matrix computations will
11
maximize
Pijk
T
ijk
x
i
x
j
x
k
subject to kxk2
= 1
Human protein interaction networks 48,228 triangles Yeast protein interaction networks 257,978 triangles The tensor T has ~100,000,000,000 nonzeros
We work with it implicitly
where ! ensures the 2-norm
[x
(next)
]
i
= ⇢ · (
X
jk
T
ijk
x
j
x
k
+ �x
i
)
SSHOPM method due to "Kolda and Mayo
Big data methods SIMAX ‘09, SISC ‘11,MapReduce ‘11, ICASSP ’12
Network alignment ICDM ‘09, SC ‘11, TKDE ‘13
Fast & Scalable"Network centrality SC ‘05, WAW ‘07, SISC ‘10, WWW ’10, …
Data clustering WSDM ‘12, KDD ‘12, CIKM ’13 …
Ax = b
min kAx � bkAx = �x
Massive matrix "computations
on multi-threaded and distributed architectures
4 Image from rockysprings, deviantart, CC share-alike
Everything in the world can be explained by a matrix, and we see
how deep the rabbit hole goes
The talk ends, you believe -- whatever
you want to.
Matrix computations in a red-pill
Solve a problem better by exploiting its structure!
Netflix David Gleich · Purdue 5
WHY NO PREPROCESSING?
David F. Gleich (Purdue) Emory Math/CS Seminar
Top-k predicted “links” are movies to watch!
Pairwise scores give user similarity
19 of 47
Problem 1 – (Faster) !Recommendation as link prediction
Netflix David Gleich · Purdue 6
Problem 2 – (Better) !Best movies
Netflix David Gleich · Purdue 7
Matrix computations in a red-pill
Solve a problem better by exploiting its structure!
Netflix David Gleich · Purdue 8
Matrix structure
Movies "“liked” (>3 stars?)
5 1 1 4
5 5
Netflix matrix
Netflix graph
Problem 1!Adjacency matrix
Normalized Laplacian matrix Random walk matrix
Problem 2"Pairwise comparison matrix
Netflix David Gleich · Purdue 9
WHY NO PREPROCESSING?
David F. Gleich (Purdue) Emory Math/CS Seminar
Top-k predicted “links” are movies to watch!
Pairwise scores give user similarity
19 of 47
Problem 1 – (Faster) !Recommendation as link prediction
Netflix David Gleich · Purdue 10
Matrix based link predictors
A MODERN TAKE The Katz score (node-based) is
The Katz score (edge-based) is
David F. Gleich (Purdue) Emory Math/CS Seminar 9 of 47
pred. on movie =
1X
`=1
↵`
✓num. paths of length `
from user to movie
◆
k =1X
`=1
(↵`A`)✓
userind. vec.
◆
| {z }⌘eiM
ovie
pred
ictio
n" v
ecto
r
Netflix David Gleich · Purdue 11
Matrix based link predictors
k =1X
`=1
(↵`A`)✓
userind. vec.
◆
| {z }⌘ei
(I � ↵A)k = ei
Carl Neumann 1X
k=0
(tA)k
Neumann
Netflix David Gleich · Purdue 12
Matrix based link predictors
(I � ↵A)k = ei
(I � ↵P)x = ei
(I � ↵L)x = ei
exp{↵P}x = ei
A MODERN TAKE The Katz score (node-based) is
The Katz score (edge-based) is
David F. Gleich (Purdue) Emory Math/CS Seminar 9 of 47
Katz
PageRank
Semi-super."learning
Heat kernel They all look at sums of "
damped paths, but "change the details, slightly
Netflix David Gleich · Purdue 13
Matrix based link predictors are localized!
Netflix David Gleich · Purdue 14
PageRank scores for one node!Crawl of flickr from 2006 ~800k nodes, 6M edges, alpha=1/2
0 2 4 6 8 10x 105
0
0.5
1
1.5
100 102 104 10610−15
10−10
10−5
100
100 102 104 10610−15
10−10
10−5
100
nonzeros
erro
r
plot(x)
||xtru
e – x
nnz||
1
Matrix based link predictors are localized! KATZ SCORES ARE LOCALIZED
David F. Gleich (Purdue) Emory Math/CS Seminar
Up to 50 neighbors is 99.65% of the total mass
32 of 47
Netflix David Gleich · Purdue 15
Matrix computations in a red-pill
Solve a problem better by exploiting its structure!
Netflix David Gleich · Purdue 16
How do we compute them fast?
Netflix David Gleich · Purdue 17
w/ access to in-links & degs. PageRankPull
w/ access to out-links PageRankPush
j = blue node
x
(k+1)j
� ↵X
i!j
x
(k )i
/deg
i
= f
j
x
(k+1)j
� ↵x
(k )a
/6
� ↵x
(k )b
/2 � ↵x
(k )c
/3= f
j
Solve for x (k+1)j
a
b
c
j = blue node
Let
x
(k+1)j
= x
(k )j
+ r
j
then
Update r(k+1)
r (k+1)b = r (k )
b + ↵r (k )j /3
r (k+1)c = r (k )
c + ↵r (k )j /3
r (k+1)a = r (k )
a + ↵r (k )j /3
r (k+1)j = 0
x
j
= ↵X
i neigh. of j
x
i
deg(i)
PageRank + 1 if j is the target user
We have good theory for this algorithm … … and even better empirical performance.
Netflix David Gleich · Purdue 18
Theory
Andersen, Chung, Lang (2006)!For PageRank, “fast runtimes” and “localization” Bonchi, Esfandiar, Gleich, et al. (2010/2013)!For Katz, “fast runtimes” Kloster, Gleich (2013)!For Katz, Heat Kernel, "“fast runtimes” and “localization”"(assuming power-law degrees)
Netflix David Gleich · Purdue 19
Accuracy vs. work !(Heat kernel)
Netflix David Gleich · Purdue 20
For the dblp collaboration graph, we study the precision in finding the 100 largest nodes as we vary the work. This set of 100 does not include the nodes immediate neighbors. (One column, but representative)
10−2 10−1 100
0
0.2
0.4
0.6
0.8
1
dblp−cc
Effective matrix−vector products
Prec
isio
n
to
l=10
−4
tol=
10−5
@10@25@100@1000
dblp collaboration graph, 225k vertices
Empirical runtime (Katz) TIMING
David F. Gleich (Purdue) Emory Math/CS Seminar 46 of 47
Netflix David Gleich · Purdue 21
Never got to try it …
40 60 80 100 120
40
60
80
mm
Data sourcesWe had two data-sources: Microsoft’s toolbar logs andHelloMovies.com analytics logs.
Microsoft toolbar logsMachine:6AF023 URL:slashdot.org/ Action:ClickMachine:6AF023 URL:google.com/search?&q=yhoo Action:EntryMachine:6AF023 URL:google.com/q?yhoo Action:Entry
HelloMovies.com analytics18:27:59 P C 1 1 1 0 /find_friends/118:28:17 P C 2 1 2 0 /register_success18:28:28 P C 3 1 3 0 /18:28:29 A 4 1 3 1 /find/get_form18:28:30 AC 5 1 4 1 /find/get_movies18:28:36 P C 6 1 5 1 /movie/the-lord-of-the-[...]18:28:52 P C 7 1 6 1 /movie/lion-witch-wardrobe-[...]18:28:15 P 8 1 6 1 /actor/cate-blanchett
Note I collaborate with the company behind HelloMovies.com.
David F. Gleich (Sandia) Measuring alpha WWW2010 11 / 26
Ran out of money once we had the algorithms … promising initial results though!
Netflix David Gleich · Purdue 22
Need to test on
Netflix matrix now
Problem 2 – (Better) !Best movies
Netflix David Gleich · Purdue 23
Which is a better list of good DVDs? Lord of the Rings 3: The Return of … Lord of the Rings 3: The Return of … Lord of the Rings 1: The Fellowship Lord of the Rings 1: The Fellowship Lord of the Rings 2: The Two Towers Lord of the Rings 2: The Two Towers Lost: Season 1 Star Wars V: Empire Strikes Back Battlestar Galactica: Season 1 Raiders of the Lost Ark Fullmetal Alchemist Star Wars IV: A New Hope Trailer Park Boys: Season 4 Shawshank Redemption Trailer Park Boys: Season 3 Star Wars VI: Return of the Jedi Tenchi Muyo! Lord of the Rings 3: Bonus DVD Shawshank Redemption The Godfather
Nuclear Norm "based rank aggregation
(not matrix completion on the netflix rating matrix)
Standard "rank aggregation"(the mean rating)
Netflix David Gleich · Purdue 24/4
0
Rank Aggregation Given partial orders on subsets of items, rank aggregation is the problem of finding an overall ordering. Voting Find the winning candidate Program committees Find the best papers given reviews Dining Find the best restaurant in Chicago
Netflix David Gleich · Purdue 25/4
0
Ranking is really hard
All rank aggregations involve some measure of compromise
A good ranking is the “average” ranking under a permutation distance
Ken Arrow John Kemeny Dwork, Kumar, Naor, !
Sivikumar
NP hard to compute Kemeny’s ranking
Netflix David Gleich · Purdue 26/4
0
Suppose we had scores
Netflix David Gleich · Purdue
Suppose we had scoresLet be the score of the ith movie/song/paper/team to rank
Suppose we can compare the ith to jth:
Then is skew-symmetric, rank 2.
Also works for with an extra log.
Kemeny and Snell, Mathematical Models in Social Sciences (1978)
Numerical ranking is intimately intertwined with skew-symmetric matrices
David F. Gleich (Purdue) KDD 2011 6/20 27/4
0
Using ratings as comparisons
Arithmetic Mean
Log-odds
Netflix David Gleich · Purdue
Ratings induce various skew-symmetric matrices. From David 1988 – The Method of Paired Comparisons 28
/40
Extracting the scores
Netflix David Gleich · Purdue
Extracting the scores
Given with all entries, then is the Bordacount, the least-squares solution to
How many do we have? Most.
Do we trust all ? Not really. Netflix data 17k movies,
500k users, 100M ratings–99.17% filled
105
107
101
101
David F. Gleich (Purdue) KDD 2011
105
Number of Comparisons
Movie
Pair
s
8/20 29/4
0
Only partial info? COMPLETE IT!
Netflix David Gleich · Purdue
Only partial info? Complete it!Let be known for We trust these scores.
Goal Find the simplest skew-symmetric matrix that matches the data
David F. Gleich (Purdue) KDD 2011
noiseless
noisy
Both of these are NP-hard too.9/20
30/4
0
Solution GO NUCLEAR!
Netflix David Gleich · Purdue From a French nuclear test in 1970, image from http://picdit.wordpress.com/2008/07/21/8-insane-nuclear-explosions/
31/4
0
The ranking algorithm
Netflix David Gleich · Purdue 32/4
0
The Ranking Algorithm0. INPUT (ratings data) and c
(for trust on comparisons)
1. Compute from
2. Discard entries with fewer than c comparisons
3. Set to be indices and values of what’s left
4. = SVP( )
5. OUTPUT
David F. Gleich (Purdue) KDD 2011 17/20
Exact recovery
Netflix David Gleich · Purdue 33/4
0
Exact recovery resultsDavid Gross showed how to recover Hermitian matrices.
i.e. the conditions under which we get the exact
Note that is Hermitian. Thus our new result!
Gross arXiv 2010.David F. Gleich (Purdue) KDD 2011
15/20
indices. Instead we view the following theorem as providingintuition for the noisy problem.
Consider the operator basis for Hermitian matrices:
H = S [K [D where
S = {1/p2(eie
Tj + eje
Ti ) : 1 i < j n};
K = {ı/p2(eie
Tj � eje
Ti ) : 1 i < j n};
D = {eieTi : 1 i n}.
Theorem 5. Let s be centered, i.e., sT e = 0. Let Y =seT � esT where ✓ = maxi s
2
i /(sT s) and ⇢ = ((maxi si) �
(mini si))/ksk. Also, let ⌦ ⇢ H be a random set of elementswith size |⌦| � O(2n⌫(1 + �)(log n)2) where ⌫ = max((n✓ +1)/4, n⇢2). Then the solution of
minimize kXk⇤subject to trace(X⇤W i) = trace((ıY )⇤W i), W i 2 ⌦
is equal to ıY with probability at least 1� n��.
The proof of this theorem follows directly by Theorem 4 ifıY has coherence ⌫ with respect to the basis H. We nowshow this result.
Definition 6 (Coherence, Gross [2010]). Let A ben ⇥ n, rank-r, and Hermitian. Let UU⇤ be an orthogonalprojector onto range(A). Then A has coherence ⌫ with
respect to an operator basis {W i}n2
i=1
if both
maxi trace(W iUU⇤W i) 2⌫r/n, and
maxi trace(sign(A)W i)2 ⌫r/n2.
For A = ıY with sT e = 0:
UU⇤ =ssT
sT s� 1
neeT and sign(A) =
1
ksk pnA.
Let Sp 2 S, Kp 2 K, and Dp 2 D. Note that becausesign(A) is Hermitian with no real-valued entries, both quan-tities trace(sign(A)Di)
2 and trace(sign(A)Si)2 are 0. Also,
because UU⇤ is symmetric, trace(KiUU⇤Kp) = 0. Theremaining basis elements satisfy:
trace(SpUU⇤Sp) =1n+
s2i + s2j2sT s
(1/n) + ✓
trace(DpUU⇤Dp) =1n+
s2isT s
(1/n) + ✓
trace(sign(A)Kp)2 =
2(si � sj)2
nsT s (2/n)⇢2.
Thus, A has coherence ⌫ with ⌫ from Theorem 5 and withrespect to H. And we have our recovery result. Although,this theorem provides little practical benefit unless both ✓and ⇢ are O(1/n), which occurs when s is nearly uniform.
6. RESULTSWe implemented and tested this procedure in two synthetic
scenarios, along with Netflix, movielens, and Jester joke-setratings data. In the interest of space, we only present asubset of these results for Netflix.
102
103
104
0
0.2
0.4
0.6
0.8
1
Fra
ctio
n o
f tr
ials
reco
vere
d
Samples
5n
2n lo
g(n
)
6n lo
g(n
)
102
103
104
0
0.2
0.4
0.6
0.8
1
Fra
ctio
n o
f tr
ials
reco
vere
d
Samples200 1000 50000
0.01
0.02
0.03
0.04
0.05
No
ise
leve
l
Samples
5n
2n lo
g(n
)
6n lo
g(n
)
Figure 2: An experimental study of the recoverabil-ity of a ranking vector. These show that we needabout 6n log n entries of Y to get good recovery inboth the noiseless (left) and noisy (right) case. See§6.1 for more information.
6.1 RecoveryThe first experiment is an empirical study of the recover-
ability of the score vector in the noiseless and noisy case. Inthe noiseless case, Figure 2 (left), we generate a score vectorwith uniformly distributed random scores between 0 and 1.These are used to construct a pairwise comparison matrixY = seT � esT . We then sample elements of this matrixuniformly at random and compute the di↵erence betweenthe true score vector s and the output of steps 4 and 5 ofAlgorithm 2. If the relative 2-norm di↵erence between thesevectors is less than 10�3, we declare the trial recovered. Forn = 100, the figure shows that, once the number of samplesis about 6n log n, the correct s is recovered in nearly all the50 trials.Next, for the noisy case, we generate a uniformly spaced
score vector between 0 and 1. Then Y = seT � esT +"E, where E is a matrix of random normals. Again, wesample elements of this matrix randomly, and declare atrial successful if the order of the recovered score vector isidentical to the true order. In Figure 2 (right), we indicatethe fractional of successful trials as a gray value between black(all failure) and white (all successful). Again, the algorithmis successful for a moderate noise level, i.e., the value of ",when the number of samples is larger than 6n log n.
6.2 SyntheticInspired by Ho and Quinn [2008], we investigate recovering
item scores in an item-response scenario. Let ai be the centerof user i’s rating scale, and bi be the rating sensitivity of useri. Let ti be the intrinsic score of item j. Then we generateratings from users on items as:
Ri,j = L[ai + bitj + Ei,j ]
where L[↵] is the discrete levels function:
L[↵] = max(min(round(↵), 5), 1)
and Ei,j is a noise parameter. In our experiment, we drawai ⇠ N(3, 1), bi ⇠ N(0.5, 0.5), ti ⇠ N(0.1, 1), and Ei,j ⇠"N(0, 1). Here, N(µ,�) is a standard normal, and " is anoise parameter. As input to our algorithm, we sampleratings uniformly at random by specifying a desired numberof average ratings per user. We then look at the Kendall⌧ correlation coe�cient between the true scores ti and theoutput of our algorithm using the arithmetic mean pairwiseaggregation. A ⌧ value of 1 indicates a perfect orderingcorrelation between the two sets of scores.
Recovery Discussion and ExperimentsConfession If , then just look at differences from
a connected set. Constants? Not very good.
Intuition for the truth.
David F. Gleich (Purdue) KDD 2011 16/20
Netflix David Gleich · Purdue 34
Recovery Experiments
Netflix David Gleich · Purdue
Recovery Discussion and ExperimentsConfession If , then just look at differences from
a connected set. Constants? Not very good.
Intuition for the truth.
David F. Gleich (Purdue) KDD 2011 16/20
35/4
0
Evaluation
Netflix David Gleich · Purdue
0 0.2 0.4 0.6 0.8 10.5
0.6
0.7
0.8
0.9
1
Error
Media
n K
endall’s
Tau
20
10
5
2
1.5
0 0.2 0.4 0.6 0.8 10.5
0.6
0.7
0.8
0.9
1
Error
Media
n K
endall’s
Tau
0 0.2 0.4 0.6 0.8 10.5
0.6
0.7
0.8
0.9
1
Error
Media
n K
endall’s
Tau
0 0.2 0.4 0.6 0.8 10.5
0.6
0.7
0.8
0.9
1
Error
Media
n K
endall’s
Tau
Figure 3: The performance of our algorithm (left)and the mean rating (right) to recovery the order-ing given by item scores in an item-response theorymodel with 100 items and 1000 users. The variousthick lines correspond to average number of ratingseach user performed (see the in place legend). See§6.2 for more information
Figure 3 shows the results for 1000 users and 100 itemswith 1.1, 1.5, 2, 5, and 10 ratings per user on average. We alsovary the parameter " between 0 and 1. Each thick line withmarkers plots the median value of ⌧ in 50 trials. The thinadjacency lines show the 25th and 75th percentiles of the50 trials. At all error levels, our algorithm outperforms themean rating. Also, when there are few ratings per-user andmoderate noise, our approach is considerably more correlatedwith the true score. This evidence supports the anecdotalresults from Netflix in Table 2.
6.3 NetflixSee Table 2 for the top movies produced by our technique
in a few circumstances using all users. The arithmetic meanresults in that table use only elements of Y with at least 30pairwise comparisons (it is a am all 30 model in the codebelow). And see Figure 4 for an analysis of the residualsgenerated by the fit for di↵erent constructions of the matrixY . Each residual evaluation of Netflix is described by a code.For example, sb all 0 is a strict-binary pairwise matrix Yfrom all Netflix users and c = 0 in Algorithm 2 (i.e. acceptall pairwise comparisons). Alternatively, am 6 30 denotesan arithmetic-mean pairwise matrix Y from Netflix userswith at least 6 ratings, where each entry in Y had 30 userssupporting it. The other abbreviations are gm: geometricmean; bc: binary comparison; and lo: log-odds ratio.
These residuals show that we get better rating fits by onlyusing frequently compared movies, but that there are onlyminor changes in the fits when excluding users that ratefew movies. The di↵erence between the score-based residu-als
��⌦(seT � esT )� b�� (red points) and the svp residuals��⌦(USV T )� b
�� (blue points) show that excluding compar-isons leads to “overfitting” in the svp residual. This suggeststhat increasing the parameter c should be done with careand good checks on the residual norms.To check that a rank-2 approximation is reasonable, we
increased the target rank in the svp solver to 4 to investigate.For the arithmetic mean (6,30) model, the relative residualat rank-2 is 0.2838 and at rank-4 is 0.2514. Meanwhile, thenuclear norm increases from around 14000 to around 17000.These results show that the change in the fit is minimal andour rank-2 approximation and its scores should represent areasonable ranking.
0.2 0.3 0.4 0.5 0.6 0.7
am all 30am 6 30gm 6 30gm all 30am all 100am 6 100sb 6 30sb all 30gm all 100gm 6 100bc 6 30bc all 30
bc 6 100bc all 100lo all 30lo 6 30lo 6 100lo all 100
sb all 100sb 6 100
am 6 0am all 0
bc 6 0bc all 0lo 6 0lo all 0gm 6 0gm all 0
sb 6 0sb all 0
Relative Residual
Figure 4: The labels on each residual show how wegenerated the pairwise scores and truncated the Net-flix data. Red points are the residuals from thescores, and blue points are the final residuals fromthe SVP algorithm. Please see the discussion in §6.3.
7. CONCLUSIONExisting principled techniques such as computing a Ke-
meny optimal ranking or finding a minimize feedback arc setare NP-hard. These approaches are inappropriate in largescale rank aggregation settings. Our proposal is (i) measurepairwise scores Y and (ii) solve a matrix completion problemto determine the quality of items. This idea is both princi-pled and functional with significant missing data. The resultsof our rank aggregation on the Netflix problem (Table 2)reveal popular and high quality movies. These are interestingresults and could easily have a home on a “best movies inNetflix” web page. Such a page exists, but is regarded ashaving strange results. Computing a rank aggregation withthis technique is not NP-hard. It only requires solving aconvex optimization problem with a unique global minima.Although we did not record computation times, the mosttime consuming piece of work is computing the pairwise com-parison matrix Y . In a practical setting, this could easily bedone with a MapReduce computation.
To compute these solutions, we adapted the svp solver formatrix completion [Jain et al., 2010]. This process involved(i) studying the singular value decomposition of a skew-symmetric matrix (Lemmas 1 and 2) and (ii) showing thatthe svp solver preserves a skew-symmetric approximationthrough its computation (Theorem 3). Because the svp solvercomputes with an explicitly chosen rank, these techniqueswork well for large scale rank aggregation problems.
We believe the combination of pairwise aggregation andmatrix completion is a fruitful direction for future research.We plan to explore optimizing the svp algorithm to exploitthe skew-symmetric constraint, extending our recovery resultto the noisy case, and investigating additional data.
Acknowledgements. The authors would like to thank Amy Langville,
Carl Meyer, and Yuan Yao for helpful discussions.
Nuclear norm ranking Mean rating
36/4
0
Tie in with PageRank
Another way to compute the scores is through a close relative of PageRank and the link-prediction methods.
Massey or Colley methods
Netflix David Gleich · Purdue 37/4
0
(2I + D � A)s = “differeneces”(L + 2D
�1)x = “scaled differences”
Ongoing Work Finding communities in large networks ! We have the best community finder (as of CIKM2013)" Whang, Gleich, Dhillon (CIKM)
Fast clique detection! We have the fastest solver for max-clique problems, useful for computing temporal strong components (Rossi, Gleich, et al. arXiv)
Scalable network alignment !& Low-rank clustering with features + links!& Scalable, distributed implementations !of fast graph kernels!& Evolving network analysis!
Netflix David Gleich · Purdue 38
v
r
t
s
w
uwtu
Overlap
A L B
References !Papers Gleich & Lim, KDD 2011 – Nuclear Norm Ranking"Esfandiar, Gleich, Bonchi et al. – WAW2010, J. Internet. Math. 2013"Kloster & Gleich, WAW2013, arXiv 1310.3423
Code!www.cs.purdue.edu/homes/dgleich/codes!bit.ly/dgleich-code
!! Netflix David Gleich · Purdue 39
Supported by NSF CAREER 1149756-CCF www.cs.purdue.edu/homes/dgleich