Overlapping Communities
Graph Mining course Winter Semester 2017
DavideMottinHasso Plattner Institute
Acknowledgements
§ Mostofthislectureistakenfrom:http://web.stanford.edu/class/cs224w/slides
GRAPH MINING WS 2017 2
Lecture road
Introductiontographclustering
Hierarchicalapproaches
Spectralclustering
GRAPH MINING WS 2017 3
Identifying Communities
Nodes:FootballTeamsEdges:Gamesplayed
Canweidentifynodegroups?(communities,
modules,clusters)
GRAPH MINING WS 2017 4
College Football NetworkAtlanticFootballCups/conferences
Nodes:FootballTeamsEdges:Gamesplayed
GRAPH MINING WS 2017 5
Protein-Protein Interactions
Canweidentifyfunctionalmodules?
Nodes:ProteinsEdges:Physicalinteractions
GRAPH MINING WS 2017 6
Protein-Protein Interactions
Functionalmodules
Nodes:ProteinsEdges:Physicalinteractions
GRAPH MINING WS 2017 7
Facebook Network
Canweidentifysocialcommunities?
Nodes:FacebookUsersEdges:Friendships
GRAPH MINING WS 2017 8
Facebook Network
High school Summerinternship
Stanford (Squash)Stanford (Basketball)
Socialcommunities
Nodes:FacebookUsersEdges:Friendships
GRAPH MINING WS 2017 9
Overlapping Communities
§ Non-overlappingvs.overlappingcommunities
GRAPH MINING WS 2017 10
Non-overlapping Communities
Network Adjacencymatrix
Nodes
Nod
es
GRAPH MINING WS 2017 11
Communities as Tiles!
§ What is the structure of community overlaps:Edgedensityintheoverlapsishigher!
Communitiesas“tiles”GRAPH MINING WS 2017 12
Recap so far…
Thisiswhatwewant!Communitiesinanetwork
GRAPH MINING WS 2017 13
Plan of attack
§ 1)Givenamodel,wegeneratethenetwork:
§ 2)Givenanetwork,findthe“best”model
C
A
B
D E
H
F
G
C
A
B
D E
H
F
G
Generativemodelfornetworks
Generativemodelfornetworks
GRAPH MINING WS 2017 14
Model of networks
§ Goal: Defineamodelthatcangeneratenetworks• Themodelwillhaveasetof“parameters”thatwewilllaterwanttoestimate(anddetectcommunities)
§ Q:Givenasetofnodes,howdocommunities“generate”edgesofthenetwork?
C
A
B
D E
H
F
G
Generativemodelfornetworks
GRAPH MINING WS 2017 15
Community-Affiliation Graph
§ GenerativemodelB(V,C,M,{pc})forgraphs:• NodesV,CommunitiesC,MembershipsM• Eachcommunityc hasasingleprobabilitypc
• Laterwefitthemodeltonetworkstodetectcommunities
Model
Network
Communities,C
Nodes,V
Model
pA pB
Memberships,M
GRAPH MINING WS 2017 16
AGM: Generative Process
§ AGMgeneratesthelinks:Foreach• Foreachpairofnodesincommunity𝑨,weconnectthemwithprob.𝒑𝑨
• Theoveralledgeprobabilityis:
Model
ÕÇÎ
--=vu MMc
cpvuP )1(1),(
Network
Communities,C
Nodes,VCommunity Affiliations
pApB
Memberships,M
If𝒖, 𝒗 shareno communities:𝑷 𝒖, 𝒗 = 𝜺Think of this as an “OR” function: If at least 1 community says “YES” we create an edge
𝑴𝒖 … set of communities node 𝒖 belongs to
GRAPH MINING WS 2017 17
Recap: AGM networks
Model
NetworkGRAPH MINING WS 2017 18
AGM: Flexibility
§ AGMcanexpressavarietyofcommunitystructures:Non-overlapping,Overlapping,Nested
GRAPH MINING WS 2017 19
Detecting Communities
§ DetectingcommunitieswithAGM:
C
A
B
D E
H
F
G
GivenaGraph𝑮(𝑽, 𝑬),findtheModel1) AffiliationgraphM2) NumberofcommunitiesC3) Parameterspc
GRAPH MINING WS 2017 20
generate
infer
Maximum Likelihood Estimation
§ MaximumLikelihoodPrinciple(MLE):• Given: Data𝑿• Assumption: Dataisgeneratedbysomemodel𝒇(𝚯)⁃ 𝒇 …model⁃ 𝚯 …modelparameters
• Wanttoestimate𝑷𝒇 𝑿 𝚯):⁃ Theprobabilitythatourmodel𝒇 (withparameters𝜣)generatedthedata
• Nowlet’sfindthemostlikelymodelthatcouldhavegeneratedthedata:argmax
9𝑷𝒇 𝑿 𝚯)
GRAPH MINING WS 2017 21
Example: MLE
§ Imaginewearegivenasetofcoinflips§ Task: Figureoutthebiasofacoin!
• Data: Sequenceofcoinflips:𝑿 = [𝟏, 𝟎, 𝟎, 𝟎, 𝟏, 𝟎, 𝟎, 𝟏]• Model:𝒇 𝚯 = return1withprob.Θ, elsereturn0• Whatis𝑷𝒇 𝑿 𝚯 ?Assumingcoinflipsareindependent⁃ So,𝑷𝒇 𝑿 𝚯 = 𝑷𝒇 𝟏 𝚯 ∗ 𝑷𝒇 𝟎 𝚯 ∗ 𝑷𝒇 𝟎 𝚯 …∗ 𝑷𝒇 𝟏 𝚯▪ Whatis𝑷𝒇 𝟏 𝚯 ?Simple, 𝑷𝒇 𝟏 𝚯 = 𝚯⁃ Then, 𝑷𝒇 𝑿 𝚯 = 𝚯𝟑 𝟏 − 𝚯 𝟓
⁃ Forexample:▪ 𝑷𝒇 𝑿 𝚯 = 𝟎. 𝟓 = 𝟎. 𝟎𝟎𝟑𝟗𝟎𝟔
▪ 𝑷𝒇 𝑿 𝚯 = 𝟑𝟖= 𝟎. 𝟎𝟎𝟓𝟎𝟐𝟗
• Whatdidwelearn? Ourdatawasmostlikelygeneratedbycoinwithbias 𝚯 = 𝟑/𝟖
𝑷𝒇𝑿𝚯
𝚯
𝚯∗ = 𝟑/𝟖
GRAPH MINING WS 2017 22
MLE for Graphs
§ HowdowedoMLEforgraphs?• Modelgeneratesaprobabilisticadjacencymatrix• Wethenflipalltheentriesoftheprobabilisticmatrixtoobtainthebinaryadjacencymatrix𝑨
§ ThelikelihoodofAGMgeneratinggraphG:
0 0.9 0.4 0.040.1 0 0.85 0.750.1 0.77 0 0.60.04 0.65 0.7 0
0 1 0 01 0 1 10 1 0 10 1 1 0
Foreverypairofnodes𝒖, 𝒗 AGMgivestheprob.𝒑𝒖𝒗 ofthembeinglinked
Flip biased coins
)),(1(),()|(),(),(
vuPvuPGPEvuEvu
-PP=QÏÎ
𝑨
GRAPH MINING WS 2017 23
Graphs: Likelihood P(G|Θ)
24GRAPH MINING WS 2017
GivengraphG(V,E) andΘ, wecalculatelikelihoodthatΘ generatedG: P(G|Θ)
0 1 1 01 0 1 01 1 0 10 0 1 0
0 0.9 0.9 00.9 0 0.9 00.9 0.9 0 0.90 0 0.9 0Θ=B(V,C,M,{pc})
GP(G|Θ)
)),(1(),()|(),(),(
vuPvuPGPEvuEvu
-PP=QÏÎ
G
A B
MLE for Graphs
§ Ourgoal: Find𝚯 = 𝑩(𝑽, 𝑪,𝑴, 𝒑𝑪 ) suchthat:
§ Howdowefind𝑩(𝑽, 𝑪,𝑴, 𝒑𝑪 ) thatmaximizesthelikelihood?
QP( | )AGM
argmaxQ
𝑮
GRAPH MINING WS 2017 25
MLE for AGM
§ Ourgoalistofind𝑩 𝑽, 𝑪,𝑴, 𝒑𝑪 suchthat:argmax
L(𝑽,𝑪,𝑴, 𝒑𝑪 )M 𝑷(𝒖, 𝒗) M(𝟏 − 𝑷 𝒖, 𝒗
�
𝒖𝒗∉𝑬
)�
𝒖,𝒗∈𝑬
§ Problem:FindingBmeansfindingthebipartiteaffiliationnetwork.
• Thereisnonicewaytodothis.• Fitting𝑩(𝑽, 𝑪,𝑴, 𝒑𝑪 ) is too hard, let’schangethemodel(soitiseasiertofit)!
GRAPH MINING WS 2017 26
From AGM to BigCLAM
§ Relaxation:Membershipshavestrengths
• 𝑭𝒖𝑨: Themembershipstrengthofnode𝒖tocommunity𝑨 (𝑭𝒖𝑨 = 𝟎:nomembership)
• Eachcommunity𝑨 linksnodesindependently:𝑷𝑨 𝒖, 𝒗 = 𝟏 − 𝐞𝐱𝐩(−𝑭𝒖𝑨 ⋅ 𝑭𝒗𝑨)
𝑭𝒖𝑨
u v
GRAPH MINING WS 2017 27
Factor Matrix 𝑭§ Communitymembershipstrengthmatrix𝑭
𝑭 =
j
Communities
Nod
es
𝑭𝒗𝑨 … strength of 𝒖’s membership to 𝑨
𝑭𝒖 … vector of community membershipstrengths of 𝒖
¡ 𝑷𝑨 𝒖, 𝒗 = 𝟏 − 𝐞𝐱𝐩(−𝑭𝒖𝑨 ⋅ 𝑭𝒗𝑨)§ Probabilityofconnection is proportionaltotheproductofstrengths§ Notice:Ifonenodedoesn’tbelongtothecommunity(𝐹XY = 0)then𝑷(𝒖, 𝒗) = 𝟎
¡ Prob.thatatleastonecommoncommunity𝑪 linksthenodes:§ 𝑷 𝒖, 𝒗 = 𝟏 −∏ 𝟏 − 𝑷𝑪 𝒖, 𝒗�
𝑪
GRAPH MINING WS 2017 28
From AGM to BigCLAM
§ Community𝑨 linksnodes𝒖, 𝒗 independently:𝑷𝑨 𝒖, 𝒗 = 𝟏 − 𝐞𝐱𝐩(−𝑭𝒖𝑨 ⋅ 𝑭𝒗𝑨)
§ Thenprob.atleastonecommon𝑪 linksthem:
𝑷 𝒖, 𝒗 = 𝟏 −∏ 𝟏 − 𝑷𝑪 𝒖, 𝒗�𝑪
= 𝟏 − 𝐞𝐱𝐩(−∑ 𝑭𝒖𝑪 ⋅ 𝑭𝒗𝑪�𝑪 )
= 𝟏 − 𝐞𝐱𝐩(−𝑭𝒖 ⋅ 𝑭𝒗𝑻)§ Example𝑭matrix:
𝑭𝒖 :
𝑭𝒗 :
Then:𝑭𝒖 ⋅ 𝑭𝒗𝑻 = 𝟎. 𝟏𝟔And:𝑷 𝒖, 𝒗 = 𝟏 − 𝒆𝒙𝒑 −𝟎. 𝟏𝟔 = 𝟎. 𝟏𝟒But:𝑷 𝒖,𝒘 = 𝟎. 𝟖𝟖
𝑷 𝒗,𝒘 = 𝟎𝑭𝒘 :Node community
membership strengths
0 1.2 0 0.2
0.5 0 0 0.8
0 1.8 1 0
GRAPH MINING WS 2017 29
BigCLAM: How to find F
§ Task:Givenanetwork𝑮(𝑽, 𝑬),estimate𝑭• Find𝑭 thatmaximizesthelikelihood:
𝒂𝒓𝒈𝒎𝒂𝒙𝑭 M 𝑷(𝒖, 𝒗�
(𝒖,𝒗)∈𝑬
) M (𝟏 − 𝑷 𝒖, 𝒗 )�
𝒖,𝒗 ∉𝑬⁃ where:𝑷(𝒖, 𝒗) = 𝟏 − 𝐞𝐱𝐩(−𝑭𝒖 ⋅ 𝑭𝒗𝑻)⁃ Manytimeswetakethelogarithmofthelikelihood,andcallitlog-likelihood:𝒍 𝑭 = 𝐥𝐨𝐠𝑷(𝑮|𝑭)
§ Goal:Find𝑭 thatmaximizes𝒍(𝑭):
GRAPH MINING WS 2017 30
BigCLAM: V1.0
§ Computegradientofasinglerow𝑭𝒖 of𝑭:
§ Coordinategradientascent:• Iterateovertherowsof𝑭:⁃ Computegradient𝜵𝒍 𝑭𝒖 ofrow𝒖 (whilekeepingothersfixed)⁃ Updatetherow𝑭𝒖:𝑭𝒖 ← 𝑭𝒖 + 𝜼𝛁𝒍(𝑭𝒖)⁃ Project𝑭𝒖 backtoanon-negativevector:If𝑭𝒖𝑪 < 𝟎:𝑭𝒖𝑪 = 𝟎
§ Thisisslow! Computing𝜵𝒍 𝑭𝒖 takeslineartime!
𝓝(𝒖)..Setoutoutgoingneighbors
GRAPH MINING WS 2017 31
BigCLAM: V2.0
§ However,wenotice:
• Wecache∑ 𝑭𝒗�𝒗
• So,computing∑ 𝑭𝒗�𝒗∉𝓝(𝒖) nowtakeslineartime
inthedegree|𝓝 𝒖 | of𝒖⁃ Innetworksdegreeofanodeismuchsmallertothetotalnumberofnodesinthenetwork,sothisisasignificantspeedup!
GRAPH MINING WS 2017 32
BigClam: Scalability
§ BigCLAM takes5minutesfor300knodenets• Othermethodstake10days
§ Canprocessnetworkswith100Medges!
GRAPH MINING WS 2017 33
Extension: Beyond Clusters
GRAPH MINING WS 2017 34
Extension: Directed AGM
§ Extension:Makecommunitymembershipedgesdirected!
• Outgoingmembership: Nodes“sends”edges• Incomingmembership: Node“receives”edges
GRAPH MINING WS 2017 35
Example: Model and Network
GRAPH MINING WS 2017 36
Directed AGM
§ Everythingisalmostthesameexceptnowwehave2matrices:𝑭 and𝑯
• 𝑭…out-goingcommunitymemberships• 𝑯…in-comingcommunitymemberships
§ Edgeprob.:𝑷 𝒖, 𝒗 = 𝟏 − 𝒆𝒙𝒑(−𝑭𝒖𝑯𝒗𝑻)
§ Networklog-likelihood:
whichweoptimizethesamewayasbefore
𝑭 𝑯
GRAPH MINING WS 2017 37
Predator-prey Communities
GRAPH MINING WS 2017 38
Questions?
GRAPH MINING WS 2017 39
References§ Yang,J.andLeskovec,J.Community-affiliationgraphmodelforoverlappingnetwork
communitydetection. ICDE,2012.
§ OverlappingCommunityDetectionatScale:ANonnegativeMatrixFactorizationApproach byJ.Yang,J.Leskovec. ACMInternationalConferenceonWebSearchandDataMining(WSDM),2013.
§ DetectingCohesiveand2-modeCommunitiesinDirectedandUndirectedNetworks byJ.Yang,J.McAuley,J.Leskovec. ACMInternationalConferenceonWebSearchandDataMining(WSDM),2014.
§ CommunityDetectioninNetworkswithNodeAttributes byJ.Yang,J.McAuley,J.Leskovec. IEEEInternationalConferenceOnDataMining(ICDM),2013.
GRAPH MINING WS 2017 40
Top Related