1 A High-Performance Interactive Tool for Exploring Large Graphs John R. Gilbert University of...

1

A High-Performance Interactive Tool for Exploring Large Graphs

John R. GilbertUniversity of California, Santa Barbara

Aydin Buluc & Viral Shah (UCSB)Brad McRae (NCEAS)Steve Reinhardt (Interactive Supercomputing)

with thanks to Alan Edelman (MIT & ISC) and Jeremy Kepner (MIT-LL)

Support: DOE Office of Science, NSF, DARPA, SGI, ISC

2

3D Spectral Coordinates

3

2D Histogram: RMAT Graph

4

Strongly Connected Components

5

Social Network Analysis in Matlab: 1993

Co-author graph from 1993

Householdersymposium

6


Which author hasthe most collaborators?

>>[count,author] = max(sum(A))

count = 32

author = 1

>>name(author,:)

ans = Golub

Sparse Adjacency Matrix

7


Have Gene Golub and Cleve Moler ever been coauthors?

>> A(Golub,Moler)

ans = 0

No.

But how many coauthors do they have in common?

>> AA = A^2;

>> AA(Golub,Moler)

ans = 2

And who are those common coauthors?

>> name( find ( A(:,Golub) .* A(:,Moler) ), :)

ans =

Wilkinson

VanLoan

8

Outline

• Infrastructure: Array-based sparse graph computation

• An application: Computational ecology

• Some nuts and bolts: Sparse matrix multiplication

9

Combinatorial Scientific Computing

Emerging large scale, high-performance applications:

• Web search and information retrieval

• Knowledge discovery

• Computational biology

• Dynamical systems

• Machine learning

• Bioinformatics

• Sparse matrix methods

• Geometric modeling

• . . .

How will combinatorial methods be used by nonexperts?

10

Analogy: Matrix Division in Matlab

x = A \ b;

• Works for either full or sparse A

• Is A square?

no => use QR to solve least squares problem

• Is A triangular or permuted triangular?

yes => sparse triangular solve

• Is A symmetric with positive diagonal elements?

yes => attempt Cholesky after symmetric minimum degree

• Otherwise

=> use LU on A(:, colamd(A))

11

Matlab*P

A = rand(4000*p, 4000*p);

x = randn(4000*p, 1);

y = zeros(size(x));

while norm(x-y) / norm(x) > 1e-11

y = x;

x = A*x;

x = x / norm(x);

end;

12

MATLAB®

Star-P Architecture

Ordinary Matlab variables

Star-P

client manager

server manager

package manager

processor #0

processor #n-1

processor #1

processor #2

processor #3

. . .

ScaLAPACK

FFTW

FPGA interface

matrix manager Distributed matrices

sort

dense/sparse

UPC user code

MPI user code

13

P0

P1

P2

Pn

5941 532631

23 131

Each processor stores local vertices & edges in a compressed row structure.

Has been scaled to >108 vertices, >109 edges in interactive session.

Distributed Sparse Array Structure

1

2 326

53

41

31

59

14

The sparse( ) Constructor

• A = sparse (I, J, V, nr, nc);

• Input: ddense vectors I, J, V, dimensions nr, nc

• Output: A(I(k), J(k)) = V(k)

• Sum values with duplicate indices

• Sorts triples < i, j, v > by < i, j >

• Inverse: [I, J, V] = find(A);

15

Sparse Array and Matrix Operations

• dsparse layout, same semantics as ordinary full & sparse

• Matrix arithmetic: +, max, sum, etc.

• matrix * matrix and matrix * vector

• Matrix indexing and concatenation

A (1:3, [4 5 2]) = [ B(:, J) C ] ;

• Linear solvers: x = A \ b; using SuperLU (MPI)

• Eigensolvers: [V, D] = eigs(A); using PARPACK (MPI)

16

Large-Scale Graph Algorithms

• Graph theory, algorithms, and data structures are ubiquitous in sparse matrix computation.

• Time to turn the relationship around!

• Represent a graph as a sparse adjacency matrix.

• A sparse matrix language is a good start on primitives for computing with graphs.

• Leverage the mature techniques and tools of high-performance numerical computation.

17

Sparse Adjacency Matrix and Graph

• Adjacency matrix: sparse array w/ nonzeros for graph edges

• Storage-efficient implementation from sparse data structures

x ATx

1 2

3

4 7

6

5

AT

18

Breadth-First Search: Sparse mat * vec

x ATx

1 2

3

4 7

6

5

AT

• Multiply by adjacency matrix step to neighbor vertices

• Work-efficient implementation from sparse data structures

19


x ATx

1 2

3

4 7

6

5

AT



20


AT

1 2

3

4 7

6

5

(AT)2x

x ATx



21

• Many tight clusters, loosely interconnected

• Input data is edge triples < i, j, label(i,j) >

• Vertices and edges permuted randomly

SSCA#2: “Graph Analysis” Benchmark(spec version 1)

Fine-grained, irregular data access

Searching and clustering

22

Clustering by Breadth-First Search

% Grow each seed to vertices

% reached by at least k

% paths of length 1 or 2

C = sparse(seeds, 1:ns, 1, n, ns);

C = A * C;

C = C + A * C;

C = C >= k;

• Grow local clusters from many seeds in parallel

• Breadth-first search by sparse matrix * matrix

• Cluster vertices connected by many short paths

23

Toolbox for Graph Analysis and Pattern Discovery

Layer 1: Graph Theoretic Tools

• Graph operations

• Global structure of graphs

• Graph partitioning and clustering

• Graph generators

• Visualization and graphics

• Scan and combining operations

• Utilities

24

Typical Application Stack

Distributed Sparse MatricesArithmetic, matrix multiplication, indexing, solvers (\, eigs)

Graph Analysis & PD Toolbox

Graph querying & manipulation, connectivity, spanning trees,

geometric partitioning, nested dissection, NNMF, . . .

Preconditioned Iterative Methods

CG, BiCGStab, etc. + combinatorial preconditioners (AMG, Vaidya)

Applications

Computational ecology, CFD, data exploration

25

Landscape Connnectivity Modeling

• Landscape type and features facilitate or impede movement of members of a species

• Different species have different criteria, scales, etc.

• Habitat quality, gene flow, population stability

• Corridor identification, conservation planning

26

Pumas in Southern California

Joshua Tree N.P.

L.A.Palm Springs

Habitat quality model

27

Predicting Gene Flow with Resistive Networks

Circuit model predictions:

N = 100 m = 0.01N = 100 m = 0.01Genetic vs. geographic distance:

28

Early Experience with Real Genetic Data

• Good results with wolverines, mahogany, pumas

• Matlab implementation

• Needed:

– Finer resolution

– Larger landscapes

– Faster interaction

5km resolution(too coarse)

29

Combinatorics in Circuitscape

• Initial grid models connections to 4 or 8 neighbors.

• Partition landscape into connected components with GAPDT

• Graph contraction from GAPDT contracts habitats into single nodes in resistive network. (Need current flow between entire habitats.)

• Data-parallel computation on large graphs - graph construction, querying and manipulation.

• Ideally, model landscape at 100m resolution (for pumas). Tradeoff between resolution and time.

30

Numerics in Circuitscape

• Resistance computations for pairs of habitats in the landscape

• Direct methods are too slow for largest problems

• Use iterative solvers via Star-P:

– Hypre (PCG+AMG)

– Experimenting with support graph preconditioners

31

Parallel Circuitscape Results

• Pumas in southern California:

– 12 million nodes

– Under 1 hour (16 processors)

– Original code took 3 days at coarser resolution

• Targeting much larger problems:

– Yellowstone-to-Yukon corridorFigures courtesy of Brad McRae, NCEAS

32

Sparse Matrix times Sparse Matrix

• A primitive in many array-based graph algorithms:

– Parallel breadth-first search

– Shortest paths

– Graph contraction

– Subgraph / submatrix indexing

– Etc.

• Graphs are often not mesh-like, i.e. geometric locality and good separators.

• Often do not want to optimize for one repeated operation, as in matvec for iterative methods

33

Sparse Matrix times Sparse Matrix

• Current work:

– Parallel algorithms with 2D data layout

– Sequential hypersparse algorithms

– Matrices over semirings

34

* =I

J

A(I,K)

K

K

B(K,J)

C(I,J)

ParSpGEMM

C(I,J) += A(I,K)*B(K,J) • Based on SUMMA

• Simple for non-square matrices, etc.

35

How Sparse? HyperSparse !

p blocks

p

nnz(j) = c

0p

cnnz(j) =

Any local data structure that depends on local submatrix dimension n (such as CSR or CSC) is too wasteful.

36

SparseDComp Data Structure

• “Doubly compressed” data structure

• Maintains both DCSC and DCSR

• C = A*B needs only A.DCSC and B.DCSR

• 4*nnz values communicated for A*B in the worst case (though we usually get away with much less)

37

Sequential Operation Counts

• Matlab: O(n+nnz(B)+f)

• SpGEMM: O(nzc(A)+nzr(B)+f*logk)

Break-even point

Required non- zero operations (flops)

Number of columns of A containing at least one non-zero

38

Parallel Timings

• 16-processor Opteron, hypertransport, 64 GB memory

• R-MAT * R-MAT

• n = 220

• nnz = {8, 4, 2, 1, .5} * 220

time vs n/nnz, log-log plot

39

Matrices over Semirings

• Matrix multiplication C = AB (or matrix/vector):

Ci,j = Ai,1B1,j + Ai,2B2,j + · · · + Ai,nBn,j

• Replace scalar operations and + by

: associative, distributes over , identity 1

: associative, commutative, identity 0 annihilates under

• Then Ci,j = Ai,1B1,j Ai,2B2,j · · · Ai,nBn,j

• Examples: (,+) ; (and,or) ; (+,min) ; . . .

• Same data reference pattern and control flow

40

Remarks

• Tools for combinatorial methods built on parallel

sparse matrix infrastructure

• Easy-to-use interactive programming environment

– Rapid prototyping tool for algorithm development

– Interactive exploration and visualization of data

• Sparse matrix * sparse matrix is a key primitive

• Matrices over semirings like (min,+) as well as (+,*)

1 A High-Performance Interactive Tool for Exploring Large Graphs John R. Gilbert University of...

Documents

Transcript of 1 A High-Performance Interactive Tool for Exploring Large Graphs John R. Gilbert University of...