SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs...
-
date post
15-Jan-2016 -
Category
Documents
-
view
217 -
download
0
Transcript of SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs...
![Page 1: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/1.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 1
Graph Mining and Social Network AnalysisOutline
• Graphs and networks
• Graph pattern mining [Borgwardt & Yan 2008]
• Graph classification [Borgwardt & Yan 2008]
• Graph clustering
• Graph evolution [Leskovec & Faloutsos 2007]
• Social network analysis [Leskovec & Faloutsos 2007]
• Trust-based recommendation
[Han and Kamber 2006, sections 9.1 and 9.2] References to research papers at the end of this chapter
![Page 2: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/2.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 2
Graphs and Networks
Basic Definitions
• Graph G = (V,E) V: set of vertices / nodes E V x V: set of edges
• Adjacency matrix (sociomatrix)alternative representation of a graph
• Network: used as synonym to graphmore application-oriented term
otherwise0
),(if1,
Evvy ji
ji
![Page 3: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/3.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 3
Graphs and Networks
Basic Definitions
• Labeled graph set of lables L f: V L or f: E L
|L| typically small
• Attributed graph
set of attributes with domains D1, . . ., Dd
f: V D1x . . . x Dd
|Di| typically large, can be continuous domain
![Page 4: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/4.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 4
Graphs and Networks
Examples
![Page 5: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/5.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 5
Graphs and Networks
More DefinitionsNeighbors
Degree
Clustering coefficient of node vfraction of pairs of neigbors of v that are connected
Betweenness of node vnumber of shortest paths (between any pair of nodes) in Gthat go through v
Betweenness of edge enumber of shortest paths in G that go through e
}),(|{
:nodeof
EvvVvN
vN
jiji
ii
||)deg(
:nodeof)deg(
iNv
vv
![Page 6: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/6.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 6
Graphs and Networks
More DefinitionsShortest path distance between nodes v1 and v2
length of shortest path between v1 and v2 also called minimum geodesic distance
Diameter of graph Gmaximum shortest path distance for any pair of nodes in G
Effective diameter of graph G distance at which 90% of all connected pairs of nodes can be reached
Mean geodesic distance of graph G average minimum geodesic distance for any pair of nodes in
G
![Page 7: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/7.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 7
Graphs and Networks
More DefinitionsSmall-world network
network with „small“ mean geodesic distance / effective diameter
MicrosoftMessenger
network
![Page 8: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/8.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 8
Graphs and Networks
More Definitions
Scale-free networksnetworks with a power law degree distribution
typically between 2 and 3
)()( xfcxf
kkP )(
degree k
P(k)
![Page 9: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/9.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 9
Graphs and Networks
Data Mining Scenarios
One large graph• mine dense subgraphs or clusters• analyze evolution
Many small graphs• mine frequent subgraphs
Two collections of many small graphs• classify graphs
![Page 10: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/10.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 10
Graph Pattern Mining
Frequent Pattern Mining
• Given a graph dataset DB,
i.e. a set of labeled graphs G1, . . ., Gn
and a minimum support
• Find the graphs that are contained in at least of the graphs of DB
•Assumption: the more frequent, the more interestinga graph
• G contained in Gi :
G is isomorph to a subgraph of Gi
10,
n
![Page 11: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/11.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 11
Graph Pattern Mining
Example
![Page 12: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/12.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 12
Graph Pattern Mining
Anti-Monotonicity Property
•If a graph is frequent, all of its subgraphs are
frequent.
•Can prune all candidate patterns that have an
infrequent
subgraph, i.e. disregard them from further
consideration.
• The higher , the more effective the pruning
![Page 13: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/13.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 13
Graph Pattern Mining
Algorithmic Schemes
![Page 14: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/14.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 14
Graph Pattern Mining
Duplicate Elimination• Given existing patterns G1, . . ., Gm and newly discovered pattern G
Is G a duplicate?
• Method 1(slow)
check graph isomorphism of G with each of the Gi
graph isomorphism test is a very expensive operation
• Method 2 (faster)
transform each graph Gi into a canonical form and hash it
into a hash table
transform G in the same way and check whether there is already
a graph Gi with the same hash value
test for graph isomorphism only if such Gi already exists
![Page 15: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/15.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 15
Graph Pattern Mining
Duplicate Elimination• Method 3 (fastest)
define a canonical order of subgraphs and explore them in that order
e.g., graphs in same equivalence class, if they have the same
canonical
spanning tree
and define order on the spanning trees
does not need isomorhism tests
![Page 16: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/16.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 16
Graph Pattern Mining
Conclusion
• Lots of sophisticated algorithms for mining
frequent
graph patterns: MoFa, gSpan, FFSM, Gaston, . . .
• But: number of frequent patterns is exponential
• This implies three related problems:
- very high runtimes
- resulting sets of patterns hard to interpret
- minimum support threshold hard to set.
![Page 17: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/17.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 17
Graph Pattern Mining
Research Directions
• Mine only closed or maximal frequent graphs
i.e. frequent graphs so that no supergraph has the same
(has at least ) support
• Summarize graph patterns
e.g., find the top k most representative graphs
• Constraint-based graph pattern mining
find only patterns that satisfy certain conditions on their
size, density, diameter . . .
![Page 18: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/18.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 18
Graph Pattern Mining
Dense Graph Mining
•Assumption: the denser a graph, the more interesting
• Can add density constraint to frequent graph mining
•In the scenario of one large graph, just want to find
the dense subgraphs
•Density of graph G
•Want to find all subgraphs with density at least
•Problem is notoriously hard, even to solve
approximately
)1|(|||
||2)(
VV
EGdensity
![Page 19: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/19.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 19
Graph Pattern Mining
Weak Anti-Monotonicity Property
• If a graph of size k is dense, (at least) one of its subgraphs of size k-1 is dense.
• Cannot prune all candidate patterns that have a subgraph which is not dense.
• But can still enumerate patterns in a level-wise manner, extending only dense patterns by another node
G’ denser than subgraph G
density = 8/12 density = 14/20
![Page 20: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/20.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 20
Graph Pattern Mining
edges)1|(| V
Quasi-Cliques
• graph G is -quasi-clique if every node has at least
Gv
![Page 21: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/21.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 21
Graph Pattern Mining
Mining Quasi-Cliques [Pei, Jiang & Zhang 05]
• for <1, the -quasi-clique property is not anti-monotone, not even weakly anti-monotone
G is 0.8-quasi-cliquenone of the size 5 subgraphs of G is an 0.8-quasi-clique since they all have a node with degree 3 < 0.8(5-1) = 3.2
![Page 22: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/22.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 22
Graph Pattern Mining
Mining Quasi-Cliques
• enumerate (all) the subgraphs
•prune based on maximum diameter of -quasi-clique G
![Page 23: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/23.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 23
Graph Pattern Mining
Mining Cohesive Patterns [Moser, Colak and Ester 2009]
• Cohesive pattern: subgraph G’ satisfying three conditions:(1) subspace homogeneity, i.e. attribute values are within a range of at most w in at least d dimensions, (2) density, i.e. has at least a of all possible edges, and (3) connectedness, i.e. each pair of nodes has a connecting path in G’.•Task
Find all maximal cohesive patterns.
![Page 24: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/24.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 24
Graph Pattern Mining
= 0.7 = 3
= 0.0
density = 7/10
density = 8/10
cohesivepatterns
Example
![Page 25: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/25.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 25
Graph Pattern Mining
Algorithm
• Cohesive Pattern Mining problem is NP-hard decision version reduceable from Max-Clique problem• A constraint is anti-monotone: if for each network G of size n that satisfies the constraint, all induced subnetworks G’ of G of size n - 1 satisfy the constraint• Can prune all candidate networks that have a subnetwork not satisfying the constraint
cohesive pattern constraints are not anti-monotone
![Page 26: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/26.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 26
Graph Pattern Mining
Algorithm CoPaM
• A constraint is loose anti-monotone:if for each network G of size n that satisfies the constraint, there is at least one induced subnetwork G’ of G of size n - 1 satisfying the constraint.•For a >= 0.5, the cohesive pattern constraints are loose anti-monotone•CoPaM algorithm performs level-wise search of the lattice structure in a bottom-up manner
construct only connected subgraphs•Prune all candidates that do not satisfy the constraints of density and homogeniety
![Page 27: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/27.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 27
Graph Pattern MiningExample
= 0.8 = 2
= 0.5
![Page 28: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/28.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 28
Graph Classification
Introduction
• given two (or more) collections of (labeled) graphs one for each of the relevant classes• e.g., collections of program flow graphs to distinguish faulty graphs from correct ones
![Page 29: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/29.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 29
Graph Classification
Feature-based Graph Classification
• define set of graph featuresglobal features such as diameter, degree
distributionlocal features such as occurence of certain
subgraphs• choice of relevant subgraphs
based on domain knowledgedomain expert
based on frequencypattern mining algorithm [Huan et al 04]
![Page 30: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/30.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 30
Graph Classification
)'(),()',( xxxxk
Kernel-based Graph Classification•kernel-based map two graphs x and x‘ into feature space via function compute similarity (inner product) in feature space
kernel k avoids actual mapping to feature space
•many graph kernels have been proposed e.g. [Kashima et al 2003]
•graph kernels should capture relevant graph features
and be efficient to compute [Borgwardt & Kriegel 2005]
)'(),( xx
![Page 31: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/31.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 31
Graph Clustering
Introduction• group nodes into clusters such that nodes within a cluster have similar relationships (edges) while nodes in different clusters have dissimilar relationships•compared to graph classification: unsupervised•compared to graph pattern mining: global patterns,
typically every node belongs to exactly one cluster •main approaches
- hierarchical graph clustering- graph cuts- block models
![Page 32: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/32.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 32
Graph Clustering
Divisive Hierarchical Clustering [Girvan and Newman 2002]
• for every edge, compute its betweenness
• remove the edge with the highest betweenness
• recompute the edge betweenness
• repeat until no more edge exists
or until specified number of clusters produced
• runtime O(m2n) where m = |E| and n = |V|
produces meaningful communities,
but does not scale to large networks
![Page 33: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/33.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 33
Graph Clustering
Example
friendship network from Zachary’s karate club
hierarchical clustering(dendrogram)
shapes denote the true community
![Page 34: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/34.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 34
Graph Clustering
j
iji ea
Agglomerative Hierarchical Clustering [Newman 2004]
•divisive hierarchical algorithm always produces a
clustering,
whether there is some natural cluster structure or
not
•define the modularity of a partitioning to measure its
meaningfulness (deviation from randomness)
• eij: percentage of edges between partitions i and j
• modularity Q
)( 2 i
iii ae
![Page 35: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/35.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 35
Graph Clustering
Agglomerative Hierarchical Clustering
• start with singleton clusters
• in each step, perform the merge of two clusters
that leads to the largest increase of the modularity
• terminate when no more merges improve modularity
or when specified number of clusters reached
• need to consider only connected pairs of clusters
• runtime O((m+n) n) where m = |E| and n = |V|
scales much better than divisive algorithm
clustering quality quite comparable
![Page 36: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/36.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 36
Graph Clustering
college football network, shapes denote conferences (true communities)
![Page 37: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/37.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 37
Graph Clustering
Graph Cuts
• graph cut is a set of edges whose removal partitions the set of vertices V into two (disconnected) sets S and T
• cost of a cut is the sum of the weights of the cut edges• edge weights can be derived from node attributes, e.g. similarity of attributes (attribute vectors)• minimum cut is a cut with minimum cost
![Page 38: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/38.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 38
Graph Clustering
Graph Cuts [Shi & Malik 2000]
• minimum cut tends to cut off very small, isolated components
• normalized cut
where assoc(A, V) = sum of weights of all edges in V that touch A
),(
),(
),(
),(
VBassoc
BAcut
VAassoc
BAcut
![Page 39: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/39.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 39
Graph Clustering
Graph Cuts
• minimum normalized cut problem is NP-hard • but approximation can be computed by solving generalized eigenvalue problem
![Page 40: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/40.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 40
Graph Clustering
Block Models [Faust &Wasserman 1992]
• actors in a social network are structurally equivalent if they have identical relational ties to and from all the actors in a network•partition V into subsets of nodes that have the same relationships
i.e., edges to the same subset of V• graph represented as sociomatrix • partitions are called blocks
![Page 41: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/41.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 41
Graph Clustering
Example
graph(sociomatrix)
block model(permuted and
partitionedsociomatrix)
![Page 42: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/42.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 42
Graph Clustering
Algorithms
• agglomerative hierarchical clustering• CONCOR algorithm repeated calculations of correlations between rows (or columns) will eventually result in a correlation matrix consisting of only +1and -1 - calculate correlation matrix C1 from sociomatrix - calculate correlation matrix C2 from C1 - iterate until the entries are either +1 or -1
![Page 43: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/43.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 43
Graph Clustering
Stochastic Block Models
• requirement of structural equivalence often too
strict
• relax to stochastic equivalence:
two actors are stochastically equivalent if the actors
are
“exchangeable” with respect to the probability
distribution
• Infinite Relational Model
[Kemp et al 2006]
![Page 44: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/44.jpg)
CMPT 884, SFU, Martin Ester, 1-09 44
Graph Clustering
Generative Model •assign nodes to clusters
•determine link (edge) probability between clusters
•determine edges between nodes
![Page 45: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/45.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 45
Graph Clustering
)()|()|( zPzRPRzP
ab
abab
B
mmBzRP
),(
),()|(
Generative Model • assumption edges conditionally independent given cluster assignments• prior P(z) assigns a probability to all possible partitions of the nodes• find z that maximizes P(z|R)
function Beta theB(.,.) and
b and a clustersbetween edges missing ofnumber theis m and
b and a clustersbetween edges ofnumber theis m where
ab
ab
![Page 46: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/46.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 46
Graph Clustering
Inference
• sample from the posterior P(z|R)
using Markov Chain Monte Carlo
• possible moves:
- move a node from one cluster to another
- split a cluster
- merge two clusters
• at the end, can be recoveredab
![Page 47: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/47.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 47
Graph Evolution
Introduction
•so far, have considered only the static structure of networks•but many real life networks are very dynamic and evolve rapidly in the course of time•two aspects of graph evolution
- evolution of the structure (edges): generative models
- evolution of the attributes: diffusion models •questions, e.g.
does the graph diameter increase or decrease?how does information about a new product
spread?what nodes should be targeted for viral
marketing?
![Page 48: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/48.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 48
Graph Evolution
Generative Models
• Erdos Renyi model - connect each pair of nodes i.i.d. with probability p lots of theory, but does not produce power law degree distribution• Preferential attachment model - add a new node, create m out-links to existing nodes - probability of linking an existing node is proportional to its degree produces power law in-degree distribution but all nodes have the same out-degree
![Page 49: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/49.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 49
Graph Evolution
Generative Models
• Copy model - add a node and choose k, the number of edges to add - with probability β select k random vertices and link to them - with probability 1- β edges are copied from a randomly chosen node generates power law degree distributions with exponent 1/(1-β) generates communities
![Page 50: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/50.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 50
Graph Evolution
Diffusion Models
• each edge (u,v) has probability puv / weight wuv
• initially, some nodes are active (e.g., a, d, e, g, i)
![Page 51: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/51.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 51
Graph Evolution
Diffusion Models
• Threshold model [Granovetter 78]
- each node has a threshold t - node u is activated when
where active(u) are the active neighbors of u - deterministic activation• Independent contagion model [Dodds & Watts 2004] - when node u becomes active, it activates each of its neighbors v with probability puv
- a node has only one chance to influence its neighbors - probabilistic activation
)(uactivevuv tw
![Page 52: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/52.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 52
Social Network Analysis
Viral Marketing
• Customers becoming less susceptible to mass marketing• Mass marketing impractical for unprecedented variety of products online•Viral marketing successfully utilizes social networks for marketing products and services• We are more influenced by our friends than strangers• 68% of consumers consult friends and family before purchasing home electronics (Burke 2003)• E.g., Hotmail gains 18 million users in 12 months, spending only $50,000 on traditional advertising
![Page 53: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/53.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 53
Social Network Analysis
Most Influential Nodes [Kempe et al 2003]
• S: initial active node set • f(S): expected size of final active set •Most influential set of size k: the set S of k nodes producing largest f(S), if activated
![Page 54: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/54.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 54
Social Network Analysis
Most Influential Nodes
• Can use various diffusion models• Diminishing returns: pv(u,S) ≥ pv(u,T) if S ⊆T where pv(u,S) denotes the marginal gain of f(S) when adding u to S• Independent contagion model has diminishing returns• Greedy algorithm repeatedly select node with maximum marginal gain•Performance guarantee solution of greedy algorithm is within (1‐1/e) ~63% of optimal solution• Reason: f is submodular f submodular: if S ⊆T then f(S∪{x}) –f(S) ≥ f(T∪{x}) –f(T)
![Page 55: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/55.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 55
Social Network Analysis
Viral Marketing
Probability of buying increases with the first 10 recommendationsDiminishing returns for further recommendations (saturation)
DVD purchases
![Page 56: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/56.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 56
Social Network Analysis
Viral Marketing
Probability of joining community increases sharply with the first 10 friends in the community Absolute values of probabilities are very small
LiveJournalcommunitymembership
![Page 57: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/57.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 57
Social Network Analysis
Role of Communities
• Consider connectedness of friends• E.g., x and y have both three friends in the community
- x’s friends are independent- y’s friends are all connected
•Who is more likely to join the community?
![Page 58: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/58.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 58
Social Network Analysis
Role of Communities
• Competing sociological theories
• Information argument [Granovetter 1973]
unconnected friends give independent support
• Social capital argument [Coleman 1988]
safety / trust advantage in having friends
who know each other• In LiveJournal, community joining probability increases with more connections among friends in community Independent contagion model too simplistic for real life data
![Page 59: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/59.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 59
Trust-Based Recommendation
Introduction
• Collaborative filteringgiven a user-item rating matrixpredict missing ratings by aggregating ratingsof users with similar rating profiles
Standard method for recommender systems• Online social networks
• Trust-based recommendationgiven additionally a trust (social) networkaggregate ratings of trusted neighbors
![Page 60: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/60.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 60
Trust-Based Recommendation
Introduction
• Explore the trust network to find raters.• Aggregate their ratings.
• Advantages:can better deal with cold start users
• Challengethe larger the distance, the noisier the ratingsbut low probability of finding rater at small distances
![Page 61: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/61.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 61
Trust-Based Recommendation
Introduction
• How far to go in the network?tradeoff between precision and recall
• Instead of distant neighbors with same item use near neighbor with similar item
![Page 62: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/62.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 62
Trust-Based Recommendation
TrustWalker
• Random walk-based method• Start from source user u0.
• In step k, at node u:
If u has rated i, return ru,i
With probability Φu,i,k , random walk stops
Randomly select item j rated by u and return ru,j .
With probability 1- Φu,i,k , continue random walk to a
direct neighbor of u.
![Page 63: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/63.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 63
Trust-Based Recommendation
TrustWalker
• Φu,i,k
sim(i,j): similarity of target item i and item j rated by user u.k: the step of random walk
•
![Page 64: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/64.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 64
Trust-Based Recommendation
TrustWalker
• Prediction = expected value returned by random walk.
![Page 65: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/65.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 65
Trust-Based Recommendation
TrustWalker
• Special cases of TrustWalkerΦu,i,k = 1
Random walk never starts.Item-based Collaborative Filtering.
Φu,i,k = 0Pure trust-based recommendation.Continues until finding the exact target item.Aggregates the ratings weighted by probability of reaching them.Existing methods approximate this.
• ConfidenceHow confident is the prediction?
![Page 66: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/66.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 66
Graph Mining and Social Network Analysis
References
• R. Albert and A.L. Barabasi: Emergence of scaling in random networks, Science, 1999• Karsten M. Borgwardt, Hans-Peter Kriegel: Shortest-Path Kernels on Graphs, ICDM 2005• Karsten Borgwardt, Xifeng Yan: Graph Mining and Graph Kernels, Tutorial KDD 2008• Peter Sheridan Dodds and Duncan J.Watts: Universal Behavior in a Generalized Model of Contagion, Phys. Rev. Letters, 2004• P. Erdos and A. Renyi: On the evolution of random graphs, Publication of the Mathematical Institute of the Hungarian Acadamy of Science, 1960• K. Faust and S.Wasserman: Blockmodels: Interpretation and evaluation, Social Networks,14, 1992• M. Girvan and M. E. J. Newman: Community structure in social and biological networks, Natl. Acad. Sci. USA, 2002
![Page 67: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/67.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 67
Graph Mining and Social Network Analysis
References (contd.)
• Mark Granovetter: Threshold Models of Collective Behavior, American Journal of Sociology, Vol. 83, No. 6, 1978• M. Jamali, M. Ester: TrustWaker: A Random Walk Model for Combining Trust-based and Item-based Recommendation, KDD 2009• H. Kashima,K. Tsuda, and A. Inokuchi: Marginalized kernels between labeled graphs, ICML 2003• Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T. & Ueda, N.: Learning systems of concepts with an infinite relational model, AAAI 2006• D. Kempe, J Kleinberg, É Tardos: Maximizing the spread of influence through a social network, KDD 2003• J.Kleinberg, S. R.Kumar, P.Raghavan, S.Rajagopalan and A.Tomkins: The web as a graph: Measurements, models and methods, COCOON 1998• Jure Leskovec and Christos Faloutsos: Mining Large Graphs, Tutorial ECML/PKDD 2007
![Page 68: SFU, CMPT 741, Fall 2009, Martin Ester 350 Graph Mining and Social Network Analysis Outline Graphs and networks Graph pattern mining [Borgwardt & Yan 2008]](https://reader036.fdocuments.in/reader036/viewer/2022062322/56649d265503460f949fcaac/html5/thumbnails/68.jpg)
SFU, CMPT 741, Fall 2009, Martin Ester 68
Graph Mining and Social Network Analysis
References (contd.)
• F. Moser, R. Colak, A. Rafiey, and M. Ester: Mining cohesive patterns from graphs with feature vectors, SDM 2009• M. E. J. Newman: Fast algorithm for detecting community structure in networks, Phys. Rev. E 69, 2004 • Jian Pei, Daxin Jiang, Aidong Zhang: On Mining CrossGraph QuasiCliques, KDD 2005• Jianbo Shi and Jitendra Malik: Normalized Cuts and Image Segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 8, 2000