1 A High-Performance Interactive Tool for Exploring Large Graphs John R. Gilbert University of...
-
date post
15-Jan-2016 -
Category
Documents
-
view
215 -
download
0
Transcript of 1 A High-Performance Interactive Tool for Exploring Large Graphs John R. Gilbert University of...
1
A High-Performance Interactive Tool for Exploring Large Graphs
John R. GilbertUniversity of California, Santa Barbara
Aydin Buluc & Viral Shah (UCSB)Brad McRae (NCEAS)Steve Reinhardt (Interactive Supercomputing)
with thanks to Alan Edelman (MIT & ISC) and Jeremy Kepner (MIT-LL)
Support: DOE Office of Science, NSF, DARPA, SGI, ISC
2
3D Spectral Coordinates
3
2D Histogram: RMAT Graph
4
Strongly Connected Components
5
Social Network Analysis in Matlab: 1993
Co-author graph from 1993
Householdersymposium
6
Social Network Analysis in Matlab: 1993
Which author hasthe most collaborators?
>>[count,author] = max(sum(A))
count = 32
author = 1
>>name(author,:)
ans = Golub
Sparse Adjacency Matrix
7
Social Network Analysis in Matlab: 1993
Have Gene Golub and Cleve Moler ever been coauthors?
>> A(Golub,Moler)
ans = 0
No.
But how many coauthors do they have in common?
>> AA = A^2;
>> AA(Golub,Moler)
ans = 2
And who are those common coauthors?
>> name( find ( A(:,Golub) .* A(:,Moler) ), :)
ans =
Wilkinson
VanLoan
8
Outline
• Infrastructure: Array-based sparse graph computation
• An application: Computational ecology
• Some nuts and bolts: Sparse matrix multiplication
9
Combinatorial Scientific Computing
Emerging large scale, high-performance applications:
• Web search and information retrieval
• Knowledge discovery
• Computational biology
• Dynamical systems
• Machine learning
• Bioinformatics
• Sparse matrix methods
• Geometric modeling
• . . .
How will combinatorial methods be used by nonexperts?
10
Analogy: Matrix Division in Matlab
x = A \ b;
• Works for either full or sparse A
• Is A square?
no => use QR to solve least squares problem
• Is A triangular or permuted triangular?
yes => sparse triangular solve
• Is A symmetric with positive diagonal elements?
yes => attempt Cholesky after symmetric minimum degree
• Otherwise
=> use LU on A(:, colamd(A))
11
Matlab*P
A = rand(4000*p, 4000*p);
x = randn(4000*p, 1);
y = zeros(size(x));
while norm(x-y) / norm(x) > 1e-11
y = x;
x = A*x;
x = x / norm(x);
end;
12
MATLAB®
Star-P Architecture
Ordinary Matlab variables
Star-P
client manager
server manager
package manager
processor #0
processor #n-1
processor #1
processor #2
processor #3
. . .
ScaLAPACK
FFTW
FPGA interface
matrix manager Distributed matrices
sort
dense/sparse
UPC user code
MPI user code
13
P0
P1
P2
Pn
5941 532631
23 131
Each processor stores local vertices & edges in a compressed row structure.
Has been scaled to >108 vertices, >109 edges in interactive session.
Distributed Sparse Array Structure
1
2 326
53
41
31
59
14
The sparse( ) Constructor
• A = sparse (I, J, V, nr, nc);
• Input: ddense vectors I, J, V, dimensions nr, nc
• Output: A(I(k), J(k)) = V(k)
• Sum values with duplicate indices
• Sorts triples < i, j, v > by < i, j >
• Inverse: [I, J, V] = find(A);
15
Sparse Array and Matrix Operations
• dsparse layout, same semantics as ordinary full & sparse
• Matrix arithmetic: +, max, sum, etc.
• matrix * matrix and matrix * vector
• Matrix indexing and concatenation
A (1:3, [4 5 2]) = [ B(:, J) C ] ;
• Linear solvers: x = A \ b; using SuperLU (MPI)
• Eigensolvers: [V, D] = eigs(A); using PARPACK (MPI)
16
Large-Scale Graph Algorithms
• Graph theory, algorithms, and data structures are ubiquitous in sparse matrix computation.
• Time to turn the relationship around!
• Represent a graph as a sparse adjacency matrix.
• A sparse matrix language is a good start on primitives for computing with graphs.
• Leverage the mature techniques and tools of high-performance numerical computation.
17
Sparse Adjacency Matrix and Graph
• Adjacency matrix: sparse array w/ nonzeros for graph edges
• Storage-efficient implementation from sparse data structures
x ATx
1 2
3
4 7
6
5
AT
18
Breadth-First Search: Sparse mat * vec
x ATx
1 2
3
4 7
6
5
AT
• Multiply by adjacency matrix step to neighbor vertices
• Work-efficient implementation from sparse data structures
19
Breadth-First Search: Sparse mat * vec
x ATx
1 2
3
4 7
6
5
AT
• Multiply by adjacency matrix step to neighbor vertices
• Work-efficient implementation from sparse data structures
20
Breadth-First Search: Sparse mat * vec
AT
1 2
3
4 7
6
5
(AT)2x
x ATx
• Multiply by adjacency matrix step to neighbor vertices
• Work-efficient implementation from sparse data structures
21
• Many tight clusters, loosely interconnected
• Input data is edge triples < i, j, label(i,j) >
• Vertices and edges permuted randomly
SSCA#2: “Graph Analysis” Benchmark(spec version 1)
Fine-grained, irregular data access
Searching and clustering
22
Clustering by Breadth-First Search
% Grow each seed to vertices
% reached by at least k
% paths of length 1 or 2
C = sparse(seeds, 1:ns, 1, n, ns);
C = A * C;
C = C + A * C;
C = C >= k;
• Grow local clusters from many seeds in parallel
• Breadth-first search by sparse matrix * matrix
• Cluster vertices connected by many short paths
23
Toolbox for Graph Analysis and Pattern Discovery
Layer 1: Graph Theoretic Tools
• Graph operations
• Global structure of graphs
• Graph partitioning and clustering
• Graph generators
• Visualization and graphics
• Scan and combining operations
• Utilities
24
Typical Application Stack
Distributed Sparse MatricesArithmetic, matrix multiplication, indexing, solvers (\, eigs)
Graph Analysis & PD Toolbox
Graph querying & manipulation, connectivity, spanning trees,
geometric partitioning, nested dissection, NNMF, . . .
Preconditioned Iterative Methods
CG, BiCGStab, etc. + combinatorial preconditioners (AMG, Vaidya)
Applications
Computational ecology, CFD, data exploration
25
Landscape Connnectivity Modeling
• Landscape type and features facilitate or impede movement of members of a species
• Different species have different criteria, scales, etc.
• Habitat quality, gene flow, population stability
• Corridor identification, conservation planning
26
Pumas in Southern California
Joshua Tree N.P.
L.A.Palm Springs
Habitat quality model
27
Predicting Gene Flow with Resistive Networks
Circuit model predictions:
N = 100 m = 0.01N = 100 m = 0.01Genetic vs. geographic distance:
28
Early Experience with Real Genetic Data
• Good results with wolverines, mahogany, pumas
• Matlab implementation
• Needed:
– Finer resolution
– Larger landscapes
– Faster interaction
5km resolution(too coarse)
29
Combinatorics in Circuitscape
• Initial grid models connections to 4 or 8 neighbors.
• Partition landscape into connected components with GAPDT
• Graph contraction from GAPDT contracts habitats into single nodes in resistive network. (Need current flow between entire habitats.)
• Data-parallel computation on large graphs - graph construction, querying and manipulation.
• Ideally, model landscape at 100m resolution (for pumas). Tradeoff between resolution and time.
30
Numerics in Circuitscape
• Resistance computations for pairs of habitats in the landscape
• Direct methods are too slow for largest problems
• Use iterative solvers via Star-P:
– Hypre (PCG+AMG)
– Experimenting with support graph preconditioners
31
Parallel Circuitscape Results
• Pumas in southern California:
– 12 million nodes
– Under 1 hour (16 processors)
– Original code took 3 days at coarser resolution
• Targeting much larger problems:
– Yellowstone-to-Yukon corridorFigures courtesy of Brad McRae, NCEAS
32
Sparse Matrix times Sparse Matrix
• A primitive in many array-based graph algorithms:
– Parallel breadth-first search
– Shortest paths
– Graph contraction
– Subgraph / submatrix indexing
– Etc.
• Graphs are often not mesh-like, i.e. geometric locality and good separators.
• Often do not want to optimize for one repeated operation, as in matvec for iterative methods
33
Sparse Matrix times Sparse Matrix
• Current work:
– Parallel algorithms with 2D data layout
– Sequential hypersparse algorithms
– Matrices over semirings
34
* =I
J
A(I,K)
K
K
B(K,J)
C(I,J)
ParSpGEMM
C(I,J) += A(I,K)*B(K,J) • Based on SUMMA
• Simple for non-square matrices, etc.
35
How Sparse? HyperSparse !
p blocks
p
nnz(j) = c
0p
cnnz(j) =
Any local data structure that depends on local submatrix dimension n (such as CSR or CSC) is too wasteful.
36
SparseDComp Data Structure
• “Doubly compressed” data structure
• Maintains both DCSC and DCSR
• C = A*B needs only A.DCSC and B.DCSR
• 4*nnz values communicated for A*B in the worst case (though we usually get away with much less)
37
Sequential Operation Counts
• Matlab: O(n+nnz(B)+f)
• SpGEMM: O(nzc(A)+nzr(B)+f*logk)
Break-even point
Required non- zero operations (flops)
Number of columns of A containing at least one non-zero
38
Parallel Timings
• 16-processor Opteron, hypertransport, 64 GB memory
• R-MAT * R-MAT
• n = 220
• nnz = {8, 4, 2, 1, .5} * 220
time vs n/nnz, log-log plot
39
Matrices over Semirings
• Matrix multiplication C = AB (or matrix/vector):
Ci,j = Ai,1B1,j + Ai,2B2,j + · · · + Ai,nBn,j
• Replace scalar operations and + by
: associative, distributes over , identity 1
: associative, commutative, identity 0 annihilates under
• Then Ci,j = Ai,1B1,j Ai,2B2,j · · · Ai,nBn,j
• Examples: (,+) ; (and,or) ; (+,min) ; . . .
• Same data reference pattern and control flow
40
Remarks
• Tools for combinatorial methods built on parallel
sparse matrix infrastructure
• Easy-to-use interactive programming environment
– Rapid prototyping tool for algorithm development
– Interactive exploration and visualization of data
• Sparse matrix * sparse matrix is a key primitive
• Matrices over semirings like (min,+) as well as (+,*)