Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large...
Transcript of Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large...
![Page 1: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/1.jpg)
CMU SCS
Tools for large graph miningWWW 2008 tutorial
Part 3: Matrix tools for graph mining
Jure Leskovec and Christos Faloutsos
Machine Learning Department
Joint work with: Deepay Chakrabarti, Tamara Kolda and Jimeng Sun.
![Page 2: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/2.jpg)
CMU SCS
Tutorial outline
Part 1: Structure and models for networksWhat are properties of large graphs?How do we model them?
Part 2: Dynamics of networksDiffusion and cascading behaviorHow do viruses and information propagate?
Part 3: Matrix tools for mining graphsSingular value decomposition (SVD)Random walks
Part 4: Case studies240 million MSN instant messenger networkGraph projections: how does the web look like
Part 3‐2Leskovec&Faloutsos, WWW 2008
![Page 3: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/3.jpg)
CMU SCS
About part 3
Introduce matrix and tensor tools through real mining applications
Goal: find patterns, rules, clusters, outliers, …in matrices and
in tensors
Part 3‐3Leskovec&Faloutsos, WWW 2008
![Page 4: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/4.jpg)
CMU SCS
What is this part about?
Connection of matrix tools and networksMatrix tools
Singular Value Decomposition (SVD)Principal Component Analysis (PCA)Webpage ranking algorithms: HITS, PageRankCUR decompositionCo‐clustering (in part 4 of the tutorial)
Tensor toolsTucker decomposition
Applications
Part 3‐4Leskovec&Faloutsos, WWW 2008
![Page 5: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/5.jpg)
CMU SCS
Why matrices? Examples
Social networks
Documents and terms
Authors and terms
John Peter Mary Nick ...JohnPeterMaryNick
...
0 11 22 55 ...5 0 6 7 ...
... ... ... ... ...
... ... ... ... ...
... ... ... ... ...
Part 3‐5Leskovec&Faloutsos, WWW 2008
![Page 6: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/6.jpg)
CMU SCS
SIGMOD’07
Why tensors? ExampleTensor:
n‐dimensional generalization of matrix
13 11 22 55 ...5 4 6 7 ...
... ... ... ... ...
... ... ... ... ...
... ... ... ... ...
data mining classif. tree ...JohnPeterMaryNick
...
Part 3‐6Leskovec&Faloutsos, WWW 2008
![Page 7: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/7.jpg)
CMU SCS
SIGMOD’06
SIGMOD’05
SIGMOD’07
Why tensors? ExampleTensor:
n‐dimensional generalization of matrix
13 11 22 55 ...5 4 6 7 ...
... ... ... ... ...
... ... ... ... ...
... ... ... ... ...
data mining classif. tree ...JohnPeterMaryNick
...
Part 3‐7Leskovec&Faloutsos, WWW 2008
![Page 8: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/8.jpg)
CMU SCS
Tensors are useful for 3 or more modes
Terminology: ‘mode’ (or ‘aspect’):
13 11 22 55 ...5 4 6 7 ...
... ... ... ... ...
... ... ... ... ...
... ... ... ... ...
data mining classif. tree ...
Mode (== aspect) #1
Mode#2
Mode#3
Part 3‐8Leskovec&Faloutsos, WWW 2008
![Page 9: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/9.jpg)
CMU SCS
Motivating applications Why matrices are important?Why tensors are useful?
P1: social networksP2: web & text miningP3: network forensicsP4: sensor networks
100 200 300 400 500
50
100
150
200
250
300
350
400
450
500
source
dest
inat
ion
normal trafficabnormal traffic
dest
inat
ion
100 200 300 400 500
50
100
150
200
250
300
350
400
450
500
source
dest
inat
ion
source
dest
inat
ion
source
0 2000 4000 6000 8000 100000
5
10
15
20
25
30
time (min)
valu
e
Temperature
Social networks
Sensor networksNetwork forensics
Part 3‐9Leskovec&Faloutsos, WWW 2008
![Page 10: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/10.jpg)
CMU SCS
Static Data model Tensor
Formally,
Generalization of matrices
Represented as multi‐array, (~ data cube).
Order 1st 2nd 3rd
Correspondence Vector Matrix 3D array
Example
Part 1‐10Leskovec&Faloutsos, WWW 2008
![Page 11: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/11.jpg)
CMU SCS
Dynamic Data modelTensor Streams
A sequence of Mth order tensors
where
t is increasing over timeOrder 1st 2nd 3rd
Correspondence Multiple streams Time evolving graphs 3D arrays
Exampletim
e
… auth
orkeyword
…
Part 1‐11Leskovec&Faloutsos, WWW 2008
![Page 12: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/12.jpg)
CMU SCS
SVD: Examples of Matrices
Example/Intuition: Documents and terms
Find patterns, groups, concepts
13 11 22 55 ...5 4 6 7 ...
... ... ... ... ...
... ... ... ... ...
... ... ... ... ...
Paper#1Paper#2Paper#3Paper#4
data mining classif. tree ...
...
Part 3‐12Leskovec&Faloutsos, WWW 2008
![Page 13: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/13.jpg)
CMU SCS
Singular Value Decomposition (SVD)X = UΣVT
u1 u2 ukx(1) x(2) x(M) = .
v1
v2
vk
.
σ1
σ2
σk
X UΣ VT
right singular vectors
input data left singular vectors
singular values
Part 3‐13Leskovec&Faloutsos, WWW 2008
![Page 14: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/14.jpg)
CMU SCS
SVD as spectral decomposition
Best rank‐k approximation in L2 and Frobenius SVD only works for static matrices (a single 2ndorder tensor)
Am
n
Σm
n
U
VT
≈ +
σ1u1°v1 σ2u2°v2
Part 3‐14Leskovec&Faloutsos, WWW 2008
![Page 15: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/15.jpg)
CMU SCS
Vector outer product – intuition:
A
2-d histogram
car type
ownerage
1-d histograms + independence assumption
VWVolvoBMW
20; 30; 40
VWVolvoBMW
20; 30; 40
Part 3‐15Leskovec&Faloutsos, WWW 2008
![Page 16: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/16.jpg)
CMU SCS
SVD ‐ Example
A = U Σ VT ‐ example:
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
datainf.
retrievalbrain lung
0.18 00.36 00.18 00.90 00 0.530 0.800 0.27
=CS
MD
9.64 00 5.29x
0.58 0.58 0.58 0 00 0 0 0.71 0.71
x
Part 3‐16Leskovec&Faloutsos, WWW 2008
![Page 17: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/17.jpg)
CMU SCS
SVD ‐ Example
A = U Σ VT ‐ example:
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
datainf.
retrievalbrain lung
0.18 00.36 00.18 00.90 00 0.530 0.800 0.27
=CS
MD
9.64 00 5.29x
0.58 0.58 0.58 0 00 0 0 0.71 0.71
x
CS-conceptMD-concept
Part 3‐17Leskovec&Faloutsos, WWW 2008
![Page 18: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/18.jpg)
CMU SCS
SVD ‐ Example
A = U Σ VT ‐ example:
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
datainf.
retrievalbrain lung
0.18 00.36 00.18 00.90 00 0.530 0.800 0.27
=CS
MD
9.64 00 5.29x
0.58 0.58 0.58 0 00 0 0 0.71 0.71
x
CS-conceptMD-concept
doc-to-concept similarity matrix
Part 3‐18Leskovec&Faloutsos, WWW 2008
![Page 19: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/19.jpg)
CMU SCS
SVD ‐ Example
A = U Σ VT ‐ example:
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
datainf.
retrievalbrain lung
0.18 00.36 00.18 00.90 00 0.530 0.800 0.27
=CS
MD
9.64 00 5.29x
0.58 0.58 0.58 0 00 0 0 0.71 0.71
x
‘strength’ of CS-concept
Part 3‐19Leskovec&Faloutsos, WWW 2008
![Page 20: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/20.jpg)
CMU SCS
SVD ‐ Example
A = U Σ VT ‐ example:
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
datainf.
retrievalbrain lung
0.18 00.36 00.18 00.90 00 0.530 0.800 0.27
=CS
MD
9.64 00 5.29x
0.58 0.58 0.58 0 00 0 0 0.71 0.71
x
term-to-conceptsimilarity matrix
CS-concept
Part 3‐20Leskovec&Faloutsos, WWW 2008
![Page 21: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/21.jpg)
CMU SCS
SVD ‐ Example
A = U Σ VT ‐ example:
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
datainf.
retrievalbrain lung
0.18 00.36 00.18 00.90 00 0.530 0.800 0.27
=CS
MD
9.64 00 5.29x
0.58 0.58 0.58 0 00 0 0 0.71 0.71
x
term-to-conceptsimilarity matrix
CS-concept
Part 3‐21Leskovec&Faloutsos, WWW 2008
![Page 22: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/22.jpg)
CMU SCS
SVD ‐ Interpretation
‘documents’, ‘terms’ and ‘concepts’:Q: if A is the document‐to‐term matrix, what is AT A?
A: term‐to‐term ([m x m]) similarity matrixQ: A AT ?A: document‐to‐document ([n x n]) similarity matrix
Part 3‐22Leskovec&Faloutsos, WWW 2008
![Page 23: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/23.jpg)
CMU SCS
SVD properties
V are the eigenvectors of the covariance matrix ATA
U are the eigenvectors of the Gram (inner‐product) matrix AAT
Further reading:1. Ian T. Jolliffe, Principal Component Analysis (2nd ed), Springer, 2002.2. Gilbert Strang, Linear Algebra and Its Applications (4th ed), Brooks Cole, 2005.Part 3‐23Leskovec&Faloutsos, WWW 2008
![Page 24: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/24.jpg)
CMU SCS
PCs
Principal Component Analysis (PCA)
SVD
PCA is an important application of SVD
Note that U and V are dense and may have negative entries
Am
n
Σm
nRR
R
UVT k
k k
Loading
Part 3‐24Leskovec&Faloutsos, WWW 2008
![Page 25: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/25.jpg)
CMU SCS
PCA interpretationbest axis to project on: (‘best’ = min sum of squares of projection errors)
Term1 (‘data’)
Term2 (‘lung’)
Part 3‐25Leskovec&Faloutsos, WWW 2008
![Page 26: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/26.jpg)
CMU SCS
PCA ‐ interpretation
minimum RMS error
PCA projects pointsOnto the “best” axis
v1
first singular vector
Term1 (‘data’)
Term2 (‘retrieval’)
ΣUVT
Part 1‐26Leskovec&Faloutsos, WWW 2008
![Page 27: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/27.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 2‐27
Kleinberg’s algorithm HITS
Problem definition: given the web and a query
find the most ‘authoritative’ web pages for this query
Step 0: find all pages containing the query terms
Step 1: expand by one move forward and backward
Further reading:1. J. Kleinberg. Authoritative sources in a hyperlinked environment. SODA 1998
![Page 28: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/28.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 2‐28
Kleinberg’s algorithm HITS
Step 1: expand by one move forward and backward
![Page 29: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/29.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 2‐29
Kleinberg’s algorithm HITS
on the resulting graph, give high score (= ‘authorities’) to nodes that many important nodes point to
give high importance score (‘hubs’) to nodes that point to good ‘authorities’
hubs authorities
![Page 30: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/30.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 2‐30
Kleinberg’s algorithm HITS
observations
recursive definition!
each node (say, ‘i’‐th node) has both an authoritativeness score ai and a hubness score hi
![Page 31: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/31.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 2‐31
Kleinberg’s algorithm: HITS
Let A be the adjacency matrix: the (i,j) entry is 1 if the edge from i to j exists
Let h and a be [n x 1] vectors with the ‘hubness’ and ‘authoritativiness’ scores.
Then:
![Page 32: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/32.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 2‐32
Kleinberg’s algorithm: HITS
Then:
ai = hk + hl + hmthat is
ai = Sum (hj) over all j that (j,i) edge exists
or
a = AT h
kl
m
i
![Page 33: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/33.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 2‐33
Kleinberg’s algorithm: HITS
symmetrically, for the ‘hubness’:
hi = an + ap + aqthat is
hi = Sum (qj) over all j that (i,j) edge exists
or
h = A a
p
n
q
i
![Page 34: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/34.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 2‐34
Kleinberg’s algorithm: HITS
In conclusion, we want vectors h and a such that:
h = A a
a = AT h
That is:
a = ATA a
![Page 35: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/35.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 2‐35
Kleinberg’s algorithm: HITSa is a right singular vector of the adjacency matrix A (by dfn!), a.k.a the eigenvector of ATA
Starting from random a’ and iterating, we’ll eventually converge
Q: to which of all the eigenvectors? why?A: to the one of the strongest eigenvalue,
(ATA ) k a = λ1ka
![Page 36: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/36.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 2‐36
Kleinberg’s algorithm ‐ discussion
‘authority’ score can be used to find ‘similar pages’ (how?)
closely related to ‘citation analysis’, social networks / ‘small world’ phenomena
![Page 37: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/37.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 2‐37
Motivating problem: PageRank
Given a directed graph, find its most interesting/central node
A node is important,if it is connected with important nodes(recursive, but OK!)
![Page 38: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/38.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 2‐38
Motivating problem – PageRank solution
Given a directed graph, find its most interesting/central node
Proposed solution: Random walk; spot most ‘popular’ node (‐> steady state prob. (ssp))
A node has high ssp,if it is connected with high ssp nodes(recursive, but OK!)
![Page 39: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/39.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 2‐39
(Simplified) PageRank algorithm
Let A be the transition matrix (= adjacency matrix); let B be the transpose, column‐normalized ‐ then
1 2 3
45
p1
p2
p3
p4
p5
p1
p2
p3
p4
p5
=
ToFrom B
1
1 1
1/2 1/2
1/2
1/2
![Page 40: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/40.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 2‐40
(Simplified) PageRank algorithm
B p = p
1 2 3
45
p1
p2
p3
p4
p5
p1
p2
p3
p4
p5
=
B p = p
1
1 1
1/2 1/2
1/2
1/2
![Page 41: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/41.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 2‐41
(Simplified) PageRank algorithm
B p = 1 * p
thus, p is the eigenvector that corresponds to the highest eigenvalue (=1, since the matrix is column‐
normalized)
Why does such a p exist? p exists if B is nxn, nonnegative, irreducible [Perron–Frobenius theorem]
![Page 42: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/42.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 2‐42
(Simplified) PageRank algorithm
In short: imagine a particle randomly moving along the edges
compute its steady‐state probabilities (ssp)
Full version of algo: with occasional random jumps
Why? To make the matrix irreducible
![Page 43: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/43.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 2‐43
Full Algorithm
With probability 1‐c, fly‐out to a random node
Then, we havep = c B p + (1‐c)/n 1 =>
p = (1‐c)/n [I ‐ c B] ‐1 1
![Page 44: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/44.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 Part 3‐44
![Page 45: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/45.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 2‐45
Motivation of CUR or CMD
SVD, PCA all transform data into some abstract space (specified by a set basis)
Interpretability problem
Loss of sparsity
![Page 46: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/46.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 2‐46
PCA ‐ interpretation
minimum RMS error
PCA projects pointsOnto the “best” axis
v1
first singular vector
Term1 (‘data’)
Term2 (‘retrieval’)
![Page 47: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/47.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 2‐47
CUR
Example‐based projection: use actual rows and columns to specify the subspaceGiven a matrix A∈Rm×n, find three matrices C∈ Rm×c, U∈Rc×r, R∈ Rr× n , such that ||A‐CUR|| is small
U is the pseudo-inverse of XOrthogonal projection
![Page 48: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/48.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 2‐48
CUR
Example‐based projection: use actual rows and columns to specify the subspaceGiven a matrix A∈Rm×n, find three matrices C∈ Rm×c, U∈Rc×r, R∈ Rr× n , such that ||A‐CUR|| is small
U is the pseudo-inverse of X:U = X† = (UT U )-1 UT
Example-based
![Page 49: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/49.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 2‐49
CUR (cont.)
Key question:How to select/sample the columns and rows?
Uniform sampling
Biased samplingCUR w/ absolute error bound
CUR w/ relative error bound
Reference:1. Tutorial: Randomized Algorithms for Matrices and Massive Datasets, SDM’062. Drineas et al. Subspace Sampling and Relative-error Matrix Approximation: Column-
Row-Based Methods, ESA20063. Drineas et al., Fast Monte Carlo Algorithms for Matrices III: Computing a
Compressed Approximate Matrix Decomposition, SIAM Journal on Computing, 2006.
![Page 50: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/50.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 2‐50
The sparsity property – pictorially:
=
SVD/PCA:Destroys sparsity
U Σ VT
=
C U R
CUR: maintains sparsity
![Page 51: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/51.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 2‐51
The sparsity property
SVD: A = U Σ VT
Big but sparse Big and dense
CUR: A = C U RBig but sparse Big but sparse
dense but small
sparse and small
![Page 52: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/52.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 2‐52
Matrix tools ‐ summary
SVD: optimal for L2 – VERY popular (HITS, PageRank, Karhunen‐Loeve, Latent Semantic Indexing, PCA, etc etc)
C‐U‐R (CMD etc)near‐optimal; sparsity; interpretability
![Page 53: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/53.jpg)
CMU SCS
TENSORS
Leskovec&Faloutsos, WWW 2008 Part 3‐53
![Page 54: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/54.jpg)
CMU SCS
3‐54
Reminder: SVD
Best rank‐k approximation in L2
Am
n
Σm
n
U
VT
≈
Leskovec&Faloutsos, WWW 2008
![Page 55: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/55.jpg)
CMU SCS
3‐55
Reminder: SVD
Best rank‐k approximation in L2
Am
n
≈ +
σ1u1°v1 σ2u2°v2
Leskovec&Faloutsos, WWW 2008
![Page 56: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/56.jpg)
CMU SCS
3‐56
Goal: extension to >=3 modes
¼
I x R
ABJ x R
R x R x R
I x J x K
+…+=
Leskovec&Faloutsos, WWW 2008
![Page 57: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/57.jpg)
CMU SCS
3‐57
Tensors: Main points
2 major types of tensor decompositions: Kruskal and Tucker
both can be solved with ``alternating least squares’’ (ALS)
Details follow – we start with terminology:
Leskovec&Faloutsos, WWW 2008
![Page 58: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/58.jpg)
CMU SCS
3‐58
Kruskal’s Decomposion ‐ intuition
¼
I x R
ABJ x R
R x R x R
I x J x K
+…+=
Leskovec&Faloutsos, WWW 2008
![Page 59: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/59.jpg)
CMU SCS
3‐59
Tucker Decomposition ‐ intuition
I x J x K
¼A
I x R
BJ x S
R x S x T
author x keyword x conference
A: author x author‐group
B: keyword x keyword‐group
C: conf. x conf‐groupG: how groups relate to each other
Leskovec&Faloutsos, WWW 2008
![Page 60: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/60.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
04.04.004.04.04.04.04.04.004.04.05.05.05.00005.05.05.00000005.05.05.00005.05.05.
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
036.036.028.028.036.036.036.036.028.028036.036.054.054.042.000054.054.042.000000042.054.054.000042.054.054.
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
5.005.0005.005.0005.005.
⎥⎦⎤
⎢⎣⎡
2.2.3.003. [ ]
36.36.28.00000028.36.36. =
m
m
n
nl
k
k
l
e.g., terms x documents
Part 4‐60
2‐d analog of Tucker decomposition
![Page 61: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/61.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
04.04.004.04.04.04.04.04.004.04.05.05.05.00005.05.05.00000005.05.05.00005.05.05.
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
036.036.028.028.036.036.036.036.028.028036.036.054.054.042.000054.054.042.000000042.054.054.000042.054.054.
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
5.005.0005.005.0005.005.
⎥⎦⎤
⎢⎣⎡
2.2.3.003. [ ]
36.36.28.00000028.36.36. =
term xterm-group
doc xdoc group
term group xdoc. group
med. terms
cs terms
common terms
med. doccs doc
Part 4‐61
![Page 62: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/62.jpg)
CMU SCS
3‐62
Tensor tools ‐ summary
Two main toolsPARAFACTucker
Both find row‐, column‐, tube‐groupsbut in PARAFAC the three groups are identical
To solve: Alternating Least Squares
Toolbox: from Tamara Kolda:http://csmr.ca.sandia.gov/~tgkolda/TensorToolbox/
Leskovec&Faloutsos, WWW 2008
![Page 63: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/63.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 4‐63
P1: Environmental sensor monitoring
0 2000 4000 6000 8000 100000
5
10
15
20
25
30
time (min)
valu
e
Temperature
0 2000 4000 6000 8000 100000
100
200
300
400
500
600
time (min)
valu
e
Light
0 2000 4000 6000 8000 100000
0.5
1
1.5
2
2.5
time (min)
valu
e
Voltage0 2000 4000 6000 8000 10000
0
10
20
30
40
time (min)
valu
e
Humidity
![Page 64: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/64.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 4‐64
1st factor Scaling factor 250
Volt Humid Temp Light−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
type
valu
e
type
valu
e
0 20 40 60
0
0.05
0.1
0.15
0.2
0.25
0.3
location
valu
e
location
0 500 1000
−0.02
−0.01
0
0.01
0.02
0.03
0.04
time (min)
valu
e
timeP1: sensor monitoring
1st factor consists of the main trends:Daily periodicity on timeUniform on all locationsTemp, Light and Volt are positively correlated while negatively correlated with Humid
Loca
tion
Time
voltage
hum.temp.
light
![Page 65: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/65.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 4‐65
P1: sensor monitoring
2nd factor captures an atypical trend:Uniformly across all time
Concentrating on 3 locations
Mainly due to voltage
Interpretation: two sensors have low battery, and the other one has high battery.
2nd factorScaling factor 154
Volt Humid Temp Light−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
type
valu
e
0 500 1000
−0.02
−0.01
0
0.01
0.02
0.03
0.04
time (min)
valu
e
typelocationtime
voltage
hum.temp.
light
![Page 66: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/66.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 4‐66
P3: Social network analysisMultiway latent semantic indexing (LSI)
Monitor the change of the community structure over time
Philip Yu
Michael Stonebreaker
‘Query’‘Pattern’
![Page 67: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/67.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 4‐67
P3: Social network analysis (cont.)Authors Keywords Yearmichael carey, michaelstonebreaker, h. jagadish,hector garcia-molina
queri,parallel,optimization,concurr,objectorient
1995
surajit chaudhuri,mitch cherniack,michaelstonebreaker,ugur etintemel
distribut,systems,view,storage,servic,process,cache
2004
jiawei han,jian pei,philip s. yu,jianyong wang,charu c. aggarwal
streams,pattern,support, cluster, index,gener,queri
2004
• Two groups are correctly identified: Databases and Data mining
• People and concepts are drifting over time
DM
DB
![Page 68: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/68.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 4‐68
P4: Network anomaly detection
Reconstruction error gives indication of anomalies.Prominent difference between normal and abnormal ones is mainly due to the unusual scanning activity (confirmed by the campus admin).
200 400 600 800 1000 12000
10
20
30
40
50
hours
erro
r
Reconstruction error over time
Normal traffic
100 200 300 400 500
50
100
150
200
250
300
350
400
450
500
source
dest
inat
ion
Abnormal traffic
![Page 69: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/69.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 4‐69
P5: Web graph mining
How to order the importance of web pages?Kleinberg’s algorithm HITS
PageRank
Tensor extension on HITS (TOPHITS)
![Page 70: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/70.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 4‐70
Kleinberg’s Hubs and Authorities(the HITS method)
Sparse adjacency matrix and its SVD:
authority scoresfor 1st topic
hub scores for 1st topic
hub scores for 2nd topic
authority scoresfor 2nd topic
from
to
Kleinberg, JACM, 1999
![Page 71: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/71.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 4‐71
authority scoresfor 1st topic
hub scores for 1st topic
hub scores for 2nd topic
authority scoresfor 2nd topic
from
to
HITS Authorities on Sample Data.97 www.ibm.com.24 www.alphaworks.ibm.com.08 www-128.ibm.com.05 www.developer.ibm.com.02 www.research.ibm.com.01 www.redbooks.ibm.com.01 news.com.com
1st Principal Factor
.99 www.lehigh.edu
.11 www2.lehigh.edu
.06 www.lehighalumni.com
.06 www.lehighsports.com
.02 www.bethlehem-pa.gov
.02 www.adobe.com
.02 lewisweb.cc.lehigh.edu
.02 www.leo.lehigh.edu
.02 www.distance.lehigh.edu
.02 fp1.cc.lehigh.edu
2nd Principal FactorWe started our crawl from
http://www-neos.mcs.anl.gov/neos, and crawled 4700 pages,
resulting in 560 cross-linked hosts.
.75 java.sun.com
.38 www.sun.com
.36 developers.sun.com
.24 see.sun.com
.16 www.samag.com
.13 docs.sun.com
.12 blogs.sun.com
.08 sunsolve.sun.com
.08 www.sun-catalogue.com
.08 news.com.com
3rd Principal Factor
.60 www.pueblo.gsa.gov
.45 www.whitehouse.gov
.35 www.irs.gov
.31 travel.state.gov
.22 www.gsa.gov
.20 www.ssa.gov
.16 www.census.gov
.14 www.govbenefits.gov
.13 www.kids.gov
.13 www.usdoj.gov
4th Principal Factor
.97 mathpost.asu.edu
.18 math.la.asu.edu
.17 www.asu.edu
.04 www.act.org
.03 www.eas.asu.edu
.02 archives.math.utk.edu
.02 www.geom.uiuc.edu
.02 www.fulton.asu.edu
.02 www.amstat.org
.02 www.maa.org
6th Principal Factor
![Page 72: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/72.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 4‐72
Three‐Dimensional View of the Web
Observe that this tensor is very sparse!
Kolda, Bader, Kenny, ICDM05
![Page 73: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/73.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 4‐73
Topical HITS (TOPHITS)Main Idea: Extend the idea behind the HITS model to incorporate term (i.e., topical) information.
authority scoresfor 1st topic
hub scores for 1st topic
hub scores for 2nd topic
authority scoresfor 2nd topic
from
to
term scores for 1st topic
term scores for 2nd topic
![Page 74: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/74.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 4‐74
Topical HITS (TOPHITS)Main Idea: Extend the idea behind the HITS model to incorporate term (i.e., topical) information.
authority scoresfor 1st topic
hub scores for 1st topic
hub scores for 2nd topic
authority scoresfor 2nd topic
from
to
term scores for 1st topic
term scores for 2nd topic
![Page 75: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/75.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 4‐75
TOPHITS Terms & Authorities on Sample Data
.23 JAVA .86 java.sun.com
.18 SUN .38 developers.sun.com
.17 PLATFORM .16 docs.sun.com
.16 SOLARIS .14 see.sun.com
.16 DEVELOPER .14 www.sun.com
.15 EDITION .09 www.samag.com
.15 DOWNLOAD .07 developer.sun.com
.14 INFO .06 sunsolve.sun.com
.12 SOFTWARE .05 access1.sun.com
.12 NO-READABLE-TEXT .05 iforce.sun.com
1st Principal Factor
.20 NO-READABLE-TEXT .99 www.lehigh.edu
.16 FACULTY .06 www2.lehigh.edu
.16 SEARCH .03 www.lehighalumni.com
.16 NEWS
.16 LIBRARIES
.16 COMPUTING
.12 LEHIGH
2nd Principal Factor
.15 NO-READABLE-TEXT .97 www.ibm.com
.15 IBM .18 www.alphaworks.ibm.com
.12 SERVICES .07 www-128.ibm.com
.12 WEBSPHERE .05 www.developer.ibm.com
.12 WEB .02 www.redbooks.ibm.com
.11 DEVELOPERWORKS .01 www.research.ibm.com
.11 LINUX
.11 RESOURCES
.11 TECHNOLOGIES
.10 DOWNLOADS
3rd Principal Factor
.26 INFORMATION .87 www.pueblo.gsa.gov
.24 FEDERAL .24 www.irs.gov
.23 CITIZEN .23 www.whitehouse.gov
.22 OTHER .19 travel.state.gov
.19 CENTER .18 www.gsa.gov
.19 LANGUAGES .09 www.consumer.gov
.15 U.S .09 www.kids.gov
.15 PUBLICATIONS .07 www.ssa.gov
.14 CONSUMER .05 www.forms.gov
.13 FREE .04 www.govbenefits.gov
4th Principal Factor
.26 PRESIDENT .87 www.whitehouse.gov
.25 NO-READABLE-TEXT .18 www.irs.gov
.25 BUSH .16 travel.state.gov
.25 WELCOME .10 www.gsa.gov
.17 WHITE .08 www.ssa.gov
.16 U.S .05 www.govbenefits.gov
.15 HOUSE .04 www.census.gov
.13 BUDGET .04 www.usdoj.gov
.13 PRESIDENTS .04 www.kids.gov
.11 OFFICE .02 www.forms.gov
6th Principal Factor
.75 OPTIMIZATION .35 www.palisade.com
.58 SOFTWARE .35 www.solver.com
.08 DECISION .33 plato.la.asu.edu
.07 NEOS .29 www.mat.univie.ac.at
.06 TREE .28 www.ilog.com
.05 GUIDE .26 www.dashoptimization.com
.05 SEARCH .26 www.grabitech.com
.05 ENGINE .25 www-fp.mcs.anl.gov
.05 CONTROL .22 www.spyderopts.com
.05 ILOG .17 www.mosek.com
12th Principal Factor
.46 ADOBE .99 www.adobe.com
.45 READER
.45 ACROBAT
.30 FREE
.30 NO-READABLE-TEXT
.29 HERE
.29 COPY
.05 DOWNLOAD
13th Principal Factor
.50 WEATHER .81 www.weather.gov
.24 OFFICE .41 www.spc.noaa.gov
.23 CENTER .30 lwf.ncdc.noaa.gov
.19 NO-READABLE-TEXT .15 www.cpc.ncep.noaa.gov
.17 ORGANIZATION .14 www.nhc.noaa.gov
.15 NWS .09 www.prh.noaa.gov
.15 SEVERE .07 aviationweather.gov
.15 FIRE .06 www.nohrsc.nws.gov
.15 POLICY .06 www.srh.noaa.gov
.14 CLIMATE
16th Principal Factor
.22 TAX .73 www.irs.gov
.17 TAXES .43 travel.state.gov
.15 CHILD .22 www.ssa.gov
.15 RETIREMENT .08 www.govbenefits.gov
.14 BENEFITS .06 www.usdoj.gov
.14 STATE .03 www.census.gov
.14 INCOME .03 www.usmint.gov
.13 SERVICE .02 www.nws.noaa.gov
.13 REVENUE .02 www.gsa.gov
.12 CREDIT .01 www.annualcreditreport.com
19th Principal Factor
TOPHITS uses 3D analysis to find the dominant groupings of web pages and terms.
authority scoresfor 1st topic
hub scores for 1st topic
hub scores for 2nd topic
authority scoresfor 2nd topicfro
m
to
term scores for 1st topic
term scores for 2nd topic
Tensor
wk = # unique links using term k
![Page 76: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/76.jpg)
CMU SCS
Leskovec&Faloutsos, WWW 2008 4‐76
Conclusions
Real data are often in high dimensions with multiple aspects (modes)
Matrices and tensors provide elegant theory and algorithms
Several research problems are still openskewed distribution, anomaly detection, streaming algorithms, distributed/parallel algorithms, efficient out‐of‐core processing
![Page 77: Tools for large graph mining - Stanford Computer Sciencejure/talks/ · CMU SCS Tools for large graph mining WWW 2008 tutorial Part 3: Matrix tools for graph mining Jure Leskovec and](https://reader036.fdocuments.in/reader036/viewer/2022063009/5fc0e2b36410837be32d64b8/html5/thumbnails/77.jpg)
CMU SCS
References
Slides borrowed from SIGMOD ‘07 tutorial by Falutsos, Kolda and Sun.
Leskovec&Faloutsos, WWW 2008 Part 3‐77