Download - Applications of Relative Importance

Transcript
Page 1: Applications of Relative Importance

1

Applications of Relative Importance

Why is relative importance interesting? Web Social Networks Citation Graphs Biological Data

Graphs become too complex for manual analysis

Page 2: Applications of Relative Importance

2

Existing Techniques Web

PageRank (Google) Social Networks

‘Centrality’

All focus on global measures of node importance – we’re interested in importance relative to a set of root nodes R

Page 3: Applications of Relative Importance

3

Use Existing Techniques?

Use global algorithm on the subgraph surrounding root nodes?

No preferential treatment of root nodes – just ranking surrounding nodes.

Page 4: Applications of Relative Importance

4

Organization: Relative importance Algorithms

Notation Problem Formulation General Framework Algorithms

Page 5: Applications of Relative Importance

5

Notation Digraph

G = (V, E) Edges

Ordered pair of nodes (u, v) Graphs are directed, unweighted, simple Walks from u to v

a.k.a. A walk is a path with no repeated nodes

1 2 ... ku u u u v 1 1 2( , ),( , ),...,( , )ku u u u u v

Page 6: Applications of Relative Importance

6

Notation k-short paths P(u,v) – set of paths between u and v – set of distinct out-going edges from

u Similarly, we have

( )outS u( ) ( )out outd u S u

( ) ( )in ind u S u

Page 7: Applications of Relative Importance

7

Problem Formulation

1. Given G and r and t, where , compute the “importance” of t w.r.t. root node r:

{r,t} G

|I t r

Page 8: Applications of Relative Importance

8

Problem Formulation

2. Given G and node , rank all vertices in T(G), T V, w.r.t. r.

r G

Page 9: Applications of Relative Importance

9

Problem Formulation

3. Given G, a set of nodes T(G) to rank, and a set of root nodes R(G) where R V, rank all vertices in T w.r.t. R.

This is similar to the last case, except that we compute rather than

Average importance:

|I t r |I t R

1| |

r R

I t R I t rR

Page 10: Applications of Relative Importance

10

Problem Formulation (3 cont’d.) Rather than average each node’s

importance score, we could define

This requires ‘important’ nodes to have a high importance score among all nodes in R

| min | :I t R I t r r R

Page 11: Applications of Relative Importance

11

Problem Formulation

4. Given G, rank all nodes where R=T=V.

Page 12: Applications of Relative Importance

12

General Framework:Weighted Paths

Nodes are related according to the paths that connect them

The longer the path, the less importance:

is a scalar coefficient,

P(r,t) is a set of paths from r to t, pi is the ith path in P.

Importance decays exponentially

,

1

|

i

P r tp

i

I t r 1

Page 13: Applications of Relative Importance

13

How to choose P(r,t)?

Path examples

A

R

D

E

F

T

C

B

A

R

D

E

F

T

C

B

a. b.

Shortest pathsfrom R to T:{R-C-T. R-D-T}which fail to capture much ofConnectivity fromR to T.

Page 14: Applications of Relative Importance

14

Shortest Path

e.g.: Transport cargo from r to t

Shortest path doesn’t always give a good approximation of importance. E.g: the web (graph b)

Page 15: Applications of Relative Importance

15

k-Short Paths Paths of length K Idea: there might often be longer paths than the shortest ones that are

important to take into account Fixes problem of longer, important

paths in Shortest Paths e.g.: graph b., 3-short

Problem: capacity constraints e.g.: network topology

Page 16: Applications of Relative Importance

16

k-Short Node-Disjoint Paths

No nodes and no edges are repeated Implicitly enforces capacity constraints Motivated by ‘mass flow’ where

importance can ‘flow’ along paths e.g.: graph b.

Breadth-first with some heuristic, with some K and some

Page 17: Applications of Relative Importance

17

Markov Chains & Relative Importance

Graph viewed as a stochastic process Explanation of Markov Chains Token traversing Chain… Obviously good for modeling the web

Page 18: Applications of Relative Importance

18

Markov Chains & Relative Importance

Markov Centrality Mean First Passage Time

: expected number of steps until first arrival at node t starting at node r : probability that the chain first returns to

state t in exactly n steps

1

( )rt rtn

m nf n

rtm

( )rtf n

Page 19: Applications of Relative Importance

19

Markov Chains & Relative Importance

Bias toward ‘central nodes’ COMPLEX!!

Time: O(|V|3) (inversion of |V|x|V| transition matrix)

Space: O(|V2|)

1( | )

1rt

r R

I t Rm

R

Page 20: Applications of Relative Importance

20

Markov Chains & Relative Importance

PageRank Uses backlinks to assign importance to

web pages

Page 21: Applications of Relative Importance

21

Markov Chains & Relative Importance

PageRank Less complex

Converges logarithmically 322 million links

processed in 52 iterations

Page 22: Applications of Relative Importance

22

Markov Chains & Relative Importance

Retrofit PageRank such that all nodes in R have a uniform bias at the start

‘Surfer’ begins at a root node, traverses graph, returning to root set R with probability at each time-step

I(t|R) = probability that surfer visits t during a walk

Page 23: Applications of Relative Importance

23

Experiments (Simulated Data)

D F

E

J

C HA

B

G

I

Page 24: Applications of Relative Importance

24

Experiments (Simulated Data)

D F

E

J

C HA

B

G

I

More complex in and out degrees

changed Shortest path

lengths between nodes changed (e.g.: A-B)

Analysis which follows, R={A,F}

Page 25: Applications of Relative Importance

25

Experiments (Simulated Data)

D F

E

J

C HA

B

G

I

HITSPaA .252F .241G .128C .110E .099H .052D .032J .025I .032B .024

HITSPhF .225A .186D .162B .119E .090I .067H .061J .050G .028C .008

Page 26: Applications of Relative Importance

26

Experiments (Simulated Data)

D F

E

J

C HA

B

G

I

MarkovCJ .180C .133G .130H .129E .111I .101F .069D .051A .047B .044

KSMarkovH .146G .142E .142J .140C .120I .098F .087D .061A .034B .024

Page 27: Applications of Relative Importance

27

Experiments (9/11 Terrorist Network)

63 nodes (terrorists) 308 edges (interactions)

Page 28: Applications of Relative Importance

Rank PRankP HITSP WKPaths MarkovC KSMarkov

1 Khemais Khemais Beghal Atta Khemais

2 Beghal Beghal Khemais Al-Shehhi Beghal

3 Moussaoui Atta Moussaoui Al-Shibh Moussaoui

4 Maaroufi Moussaoui Maaroufi Moussaoui Maaroufi

5 Qatada Maaroufi Bensakhria Jarrah Qatada

6 Daoudi Qatada Daoudi Hanjour Daoudi

7 Courtaillier Bensakhria Qatada Al-Omari Bensakhria

8 Bensakhria Daoudi Walid Khemais Courtaillier

9 Walid Courtaillier Courtaillier Qatada Walid

10 Khammoun Khammoun Khammoun Bahaji Khammoun

Page 29: Applications of Relative Importance

29

Conclusion

Provides a first-step to addressing ‘relative-importance’

Scaling for algorithms such as Markov Chaining can be an issue

Using different algorithms and comparing results can reveal interesting information

…Paper Analysis…

Page 30: Applications of Relative Importance

30

References White, Smyth. Algorithms for Estimating Relative

Importance in Networks. SIGKDD ’03. Page, Brin, Motwani, Winograd. The PageRank Citation

Ranking: Bringing Order to the Web. Stanford University, Computer Science Department Technical Report.

Wikipedia on Markov Chains http://en.wikipedia.org/wiki/Markov_chain http://en.wikipedia.org/wiki/Examples_of_Markov_chains