Fast Katz and Commuters

FAST KATZ AND COMMUTERS

Quadrature Rules and Sparse Linear Solvers

for Link Prediction Heuristics

David F. Gleich

Sandia National Labs

Berkeley LAPACK Seminar

February 3, 2011

With Pooya Esfandiar, Francesco Bonchi, Chen Grief,

Laks V. S. Lakshmanan, and Byung-Won On

David F. Gleich (Sandia) Berkeley SCMC Seminar

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin

Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

1 / 43

MAIN RESULTS

A – adjacency matrix

L – Laplacian matrix

Katz score :

Commute time:

For Commute

Compute one fast

For Katz Compute one fast Compute top fast

2 of 43

OUTLINE

Why study these measures?

Katz Rank and Commute Time

How else do people compute them?

Quadrature rules for pairwise scores

Sparse linear systems solves for top-k

As many results as we have time for…

David F. Gleich (Sandia) Berkeley SCMC Seminar 3 of 43

WHY? LINK PREDICTION

Liben-Nowell and Kleinberg 2003, 2006 found that path based link prediction was more efficient

Neighborhood based

Path based

4 of 43

All graphs are undirected

All graphs are connected

LEO KATZ

NOT QUITE, WIKIPEDIA

: adjacency, : random walk

PageRank

These are equivalent if has constant degree

WHAT KATZ ACTUALLY SAID

Leo Katz 1953, A New Status Index Derived from Sociometric Analysis, Psychometria 18(1):39-43

“we assume that each link independently has the

same probability of being effective” …

“we conceive a constant , depending

on the group and the context of the particular

investigation, which has the force of a probability

of effectiveness of a single link. A k-step chain

then, has probability of being effective.”

“We wish to find the column sums of the matrix”

A MODERN TAKE

The Katz score (node-based) is

The Katz score (edge-based) is

RETURNING TO THE MATRIX

Carl Neumann

I’ve heard the Neumann series called the “von Neumann”

series more than I’d like! In fact, the von Neumann kernel

of a graph should be named the “Neumann” kernel!

Wikipedia page

11 / 43

PROPERTIES OF KATZ’S MATRIX

is symmetric

exists when

is spd when

Note that 1/max-degree suffices

COMMUTE TIME

Picture taken from Google images, seems to be

Bay Bridge Traffic by Jim M. Goldstein.

13 / 43

COMMUTE TIME

Consider a uniform random walk on a graph

Also called the hitting

time from node i to j, or

the first transition time

14 of 43

SKIPPING DETAILS

: graph Laplacian

is the only null-vector

WHAT DO OTHER PEOPLE DO?

1) Just work with the linear algebra formulations

2) For Katz, Truncate the Neumann series as a few (3-5) terms

3) Use low-rank approximations from EVD(A) or EVD(L)

4) For commute, use Johnson-Lindenstrauss inspired random sampling

5) Approximately decompose into smaller problems

Liben-Nowell and Kleinberg CIKM2003, Acar et al. ICDM2009,

Spielman and Srivastava STOC2008, Sarkar and Moore UAI2007, Wang et al. ICDM2007 16 of 43

THE PROBLEM

All of these techniques are

preprocessing based because

most people’s goal is to compute

all the scores.

We want to avoid

preprocessing the graph.

There are a few caveats here! i.e. one could solve the system instead of looking for the matrix inverse

17 of 43

WHY NO PREPROCESSING?

The graph is constantly changing

as I rate new movies.

WHY NO PREPROCESSING?

Top-k predicted “links”

are movies to watch!

Pairwise scores give

user similarity

19 of 43

PAIRWISE ALGORITHMS

Commute

Golub and Meurant

to the rescue!

20 of 43

MMQ - THE BIG IDEA

Quadratic form

Weighted sum

Stieltjes integral

Quadrature approximation

Matrix equation David F. Gleich (Sandia) Berkeley SCMC Seminar

A is s.p.d. use EVD

“A tautology”

Lanczos

21 of 43

LANCZOS

, k-steps of the Lanczos method produce

22 of 43

PRACTICAL LANCZOS

Only need to store the last 2 vectors in

Updating requires O(matvec) work

is not orthogonal

MMQ PROCEDURE

1. Run k-steps of Lanczos on starting with

2. Compute , with an additional eigenvalue at ,

set 3. Compute , with an additional eigenvalue at , set

4. Output as lower and upper bounds on

Correspond to a Gauss-Radau rule, with

u as a prescribed node

Correspond to a Gauss-Radau rule, with

l as a prescribed node

24 of 43

PRACTICAL MMQ

Increase k to become more accurate

Bad eigenvalue bounds yield worse results

and are easy to compute

not required, we can iteratively

update it’s LU factorization

PRACTICAL MMQ

ONE LAST STEP FOR KATZ

KATZ SCORES ARE LOCALIZED

Up to 50 neighbors is 99.65% of the total mass

28 of 43

TOP-K ALGORITHM FOR KATZ

Approximate

where is sparse

Keep sparse too

Ideally, don’t “touch” all of

INSPIRATION - PAGERANK

Approximate

where is sparse

Keep sparse too? YES!

Ideally, don’t “touch” all of ? YES!

McSherry WWW2005, Berkhin 2007, Anderson et al. FOCS2008 – Thanks to Reid Anderson for telling me McSherry did this too.

30 of 43

THE ALGORITHM - MCSHERRY

Start with the Richardson iteration

Rewrite

Richardson converges if

THE ALGORITHM

Note is sparse.

If , then is sparse.

only add one component of to

THE ALGORITHM

How to pick ?

THE ALGORITHM FOR KATZ

Pick as max David F. Gleich (Sandia) Berkeley SCMC Seminar

Storing the non-zeros of the residual in a heap makes picking the max log(n) time. See Anderson et al. FOCS2008 for more

34 of 43

CONVERGENCE?

The “standard” proof works for 1/max-degree.

For , then for symmetric , this algorithm is the Gauss-Southwell procedure and it still converges.

35 of 43

RESULTS – DATA, PARAMETERS

All unweighted, connected graphs

KATZ BOUND CONVERGENCE

COMMUTE BOUND CONVERG.

KATZ SET CONVERGENCE

For arXiv graph.

39 of 43

TIMING

CONCLUSIONS

These algorithms are faster than many alternatives.

For pairwise commute, stopping criteria are simpler with bounds

For top-k problems, we often need less than 1 matvec for good enough results

WARTS AND TODOS

Stopping criteria on our top-k algorithm can be a bit hairy… we should refine it.

The top-k approach doesn’t work right for commute time… we have an alternative.

requires the entire diagonal of

Take advantage of new research to “seed” commute time better! David F. Gleich (Sandia) Berkeley SCMC Seminar

von Luxburg et al. NIPS2010

42 of 43

By AngryDogDesign on DeviantArt

Paper at WAW2010

Slides should be online soon

Code is online already

stanford.edu/~dgleich/

publications/2010/codes/fast-katz

David F. Gleich (Sandia) Berkeley SCMC Seminar 43

F-MEASURE

MAIN RESULTS – SLIDE ONE

A – adjacency matrix

L – Laplacian matrix

Katz score :

Commute time:

MAIN RESULTS – SLIDE TWO

For Katz Compute one fast

Compute top fast

For Commute

Compute one fast

For almost commute

Compute top $F_{i,:}$ fast

MAIN RESULTS – SLIDE THREE

KATZ BOUND CONVERGENCE

KATZ ERROR CONVERGENCE

COMMUTE BOUND CONVERG.

COMMUTE ERROR CONVERG.

KATZ SET CONVERGENCE

KATZ ORDER CONVERGENCE

Fast Katz and Commuters

Documents

Transcript of Fast Katz and Commuters

Fast katz-presentation

The Ten Commandments For Commuters

Movilidad laboral o commuters ¿fenómeno urbano? · 2019. 5. 15. · Movilidad laboral o commuters ¿fenómeno urbano? Los commuters son las personas que laboran en un municipio

Futures – Interim Report · Clean and Green. In 2050, 45% of commuters use autos. Back to the Future. In 2050, 69% of commuters use autos. In 2015, 77% of commuters use an automobile

Telework Programs: - Best Workplaces for Commuters

Commuters' Attitudes Toward Traffic Information Systems ...onlinepubs.trb.org/Onlinepubs/trr/1988/1168/1168-002.pdf · Commuters' Attitudes Toward Traffic Information Systems ...

Sy Katz, Ph.D. S. Katz Associates, Inc. 4388 Knightsbridge ...foundrygate.com/upload/artigos/CUPOLA FURNACE COMPUTER PRO… · Sy Katz, Ph.D. S. Katz Associates, Inc. 4388 Knightsbridge

Soweto Line Presentation Reaching 325,652 commuters daily!

Determinants of self-employment among commuters and non-commuters · For commuters, on the other hand we find that working in a locality with rich business networks increase the probability

Katz® Extractor - InHealthKatz Extractor Instructions for Use 37900-01A I 3 TABLE OF CONTENTS ENG Katz Extractor 4 SQI Ekstraktori Katz 6 9 Katz Extractor ARA BUL Katz Extractor 10

Helena Katz

KATZ Certificate Declaration Forminet.katz.pitt.edu/studentnet/mba/Documents/KATZ Certificate... · KATZ Certificate Declaration Form . Katz certificates provide students with the

Katz Plotkin

Tyrrell Katz

A. E. Archibald, King of the Commuters

Fast matrix computations for pair-wise and column-wise Katz scores and commute times

93rd Annual Alameda Commuters Golf Tournament Offical ...

Target Audience Commuters

Connie Ruth, EPA Best Workplaces for Commuters

Toilets in Delhi Metro - A Commuters Perspective