Dynamic degree-corrected blockmodels for social networks: a … · 2018. 9. 4. · I Overlapping...

Introduction Static model Dynamic models Inference Applications Conclusion

Dynamic degree-corrected blockmodels for socialnetworks: a nonparametric approach

Linda Tan

(Joint work with Maria De Iorio)

National University of Singapore

Bayesian Computation for High-Dimensional Statistical Models

1


Introduction

• Understand community structure of social networkI Social networks often exhibit community structure (nodes are more densely

connected within each group than across groups).I Community structure may be present due to similar interests, social stature,

physical locations etc.

• Challenges in detecting community structureI The number of communities is typically unknown and the communities can

vary in size and rate of interaction.I Nodes within a community can also have different activity levels and

community detection results can be distorted if degree heterogeneity is nottaken into account (Karrer and Newman, 2011).

• Propose a nonparametric Bayesian approachI Use independent Dirichlet processes to (1) capture blockstructure in the

social network and (2) induce clustering in the activity level of nodes.

2


Stochastic blockmodel

• Assumes that the nodes in a network are partitioned into groups.

• Distribution of ties between nodes depends only on1. Group membership of the nodes.2. Probabilities of interactions between different groups.

1 2 3 4

1

2

3

4

• Variety of network structures (community, hierarchical, core-periphery)can be produced through different choices of the probability blockmatrix.

• BlockmodelingI a priori: exogenous actor attribute data are used to partition the nodes.I a posteriori: discover blockstructures from relational data.

3


Extensions of stochastic blockmodel

• Relax restriction that each node can belong to only one groupI Mixed membership stochastic blockmodels (Airoldi et al., 2008)I Overlapping stochastic blockmodels (Latouche et al. (2011))

• Account for degree heterogeneityI Degree-corrected stochastic blockmodels (Karrer and Newman, 2011; Peng

and Carvalho, 2016)I Assortative MMSB with node popularities (Gopalan et al., 2013)

• Determine number of groups automatically (Chinese restaurant process)I Infinite relational model (Kemp et al., 2006)I Infinite degree-corrected stochastic block model (Herlau et al., 2014)

• Models for dynamic networksI State-space mixed membership blockmodel (Xing et al. 2010)

4


Contributions

• Develop degree-corrected stochastic blockmodels for community detectionin social networks using a nonparametric Bayesian approach.

• Formulate static model using probit regression. Use Dirichlet process(DP) to1. detect communities in the network,2. induce clustering among the popularity parameters.

• Flexible approach: allows the number of communities and popularityclusters to be determined by the data automatically.

• Integrates the approach of Kemp et al (2006) who use the Chineserestaurant process to detect community structure and Ghosh et al. (2010)who use the DP to induce clustering among the “popularity” parameters.

• Present a model for static networks and extensions to dynamic networks.Posterior inference is obtained using Gibbs samplers. Proposed models areillustrated using real social networks.

5


Static model

• N = {1, . . . , n}: set of actors.

• y = [yij ]: n× n adjacency matrix. yij is an indicator of a link from i to j.Consider undirected network: y is symmetric with zero diagonal.

yij |pijindep∼ Bernoulli(pij),

Φ−1(pij) = θi + θj +∑K

k=1β∗k1{zi = zj = k}, (1 ≤ i < j ≤ n).

I θi: popularity of actor i.I Φ(·): cumulative distribution function of N(0,1).I K ≤ n: number of communities (unknown)I β∗

k : rate of interaction in kth community (High β∗k : close-knit group).

I zi ∈ {1, . . . ,K}: group membership of actor i.I 1{·}: indicator function.

• Probability of interaction between i and j depends onI their individual popularities,I interaction rate of their group (if they belong to the same group).

6


Static model

• Last term represents a stochastic blockmodel where non-diagonal entriesof the probability matrix are set to a common value (not necessarily zero).

• Interaction between actors from different groups is driven by theirpopularities. The popularity parameters {θi} and group assignments {zi}are competing to explain the network.

• A DP is used to induce clustering among the popularity parameters {θi}.

θi|Giid∼ G (i = 1, . . . , n), G ∼ DP(α,G0),

where G0 is N(0, σ2θ) and α ∼ Gamma(aα, bα).

• A DP, H (independent of G), is used to detect the communities in thenetwork. Introduce a βi for each actor i where βi = β∗zi and assume

βi|Hiid∼ H (i = 1, . . . , n), H ∼ DP(ν,H0),

where H0 is N(0, σ2β) and ν ∼ Gamma(aν , bν).

7


Infer number of clusters

• The number of communities K and number of popularity clusters Lamong {θi} are inferred from the data.

• The prior distribution on L depends on concentration parameter α in theDP (larger α implies larger L).

• We specify a Gamma prior on α, which also facilitates computations.If α ∼ Gamma(a0, b0),

E(L) ≈ a0b0A, Var(L) ≈ E(L∗) +

a20b20B +

{a0b0B +A

}2 a0b20,

where A = ψ0(a0+nb0b0

)− ψ0(a0b0

), B = ψ1(a0+nb0b0

)− ψ1(a0b0

) (Jara et al.,2007). ψ0(·): digamma function, ψ1(·): trigamma function. These resultsserve as reference for setting a0, b0.

• Relation between K and ν is similar.

8


Dynamic model I

• Suppose we observe networks yt = [yt,ij ] for t = 1, . . . , T . Consider

yt,ij |pt,ijindep∼ Bernoulli(pt,ij), (t = 1, . . . , T, 1 ≤ i < j ≤ n).

Φ−1(pt,ij) = θit + θjt +

K∑k=1

β∗k1{zi = zj}.

I Assume community memberships remain unchanged but popularities of theactors can vary with time.

I This assumption is appropriate for data where communities arise due tofactors that are unlikely to vary drastically over time (physical locations, jobpositions). Changes in ties is attributed to variations in popularities of nodes.

I In resemblance of static model, we induce clustering among {θit} using a DP,

θit|Giid∼ G for i = 1, . . . , n, t = 1, . . . , T, G ∼ DP(G0, α),

where G0 is N(0, σ2θ) and α ∼ Gamma(aα, bα).

I The {β∗k} and {zi} are modeled using a DP as in the static model.

9


Dynamic model II

• Allows the tie between nodes i and j at time t to depend on existence ofthe tie at previous time point. Assumes that the popularities andcommunity memberships of the actors remain unchanged over time.

Φ−1(pt,ij) = ηyt−1,ij1{t > 1}+ θi + θj +

K∑k=1

β∗k1{zi = zj},

where η ∼ N(0, σ2η).

• η measures the persistence of ties in the network.I η > 0: a tie is more likely to be present at time t if it was present at t− 1.I η < 0: a tie is more likely to be present at time t if it was absent at t− 1.

• {θi}, {zi} and {β∗k} are modeled as in the static model.

• The popularities and communities inferred from this model smooths outthe data and provide an overview of the behavior of actors over time.

10


Posterior inference

• We use Gibbs samplers to obtain posterior inference for the proposedmodels.

• Sampling from the DP is performed using the methods in Neal (2000)while the concentration parameters α and ν are sampled using themethod in Escobar and West (1995).

• The algorithms are coded in Julia. It is possible to use standard softwarese.g. OpenBUGS to obtain posterior inference by considering a truncatedDP (Ishwaran and Zarepour, 2000). However, the runtime in OpenBUGSis significantly longer than Julia when the number of nodes is large.

• For the applications, we initialize multiple MCMC chains from randomstarting points and use diagnostic plots to check for convergence.

11


Cluster analysis

• From MCMC output, we compute n× n posterior similarity matrix S.Sij : posterior probability that actors i and j belong to the same cluster.I Estimate Sij by proportion of times i and j are in the same cluster.

I Sij is not affected by “label-switching” (cluster labels may change duringMCMC runs) or number of clusters varying across iterations.

• Compute a (hard) clustering estimate using Binder’s loss function (totalnumber of disagreements between estimated and true clustering amongall pairs of actors).

• The function minbinder from R package mcclust can be used to findthe clustering c∗ = [c∗1, . . . , c

∗n] that minimizes the posterior expectation

of this loss. The posterior expected loss can be written as∑i<j

|1{c∗i=c∗j} − Sij |,

where the sum is taken over all possible pairs of actors.

12


Karate club network

• This dataset contains 78 undirected friendship links among 24 members.Due to disputes over lesson price, the club was divided into two factions,led by instructor Mr Hi (actor 1) and president John A. (actor 34).

• The club eventually split into two separate clubs. All members joinedclubs following their own factions except actor 9.

• Static model (3 parallel chains, 40000 iterations each, total time: 172 s).Set aν = bν = aα = bα = 5 and σ2θ = σ2β = 1.

2 4 6

K

Pro

babi

lity

0.0

0.2

0.4

0.0 1.0 2.0

0.0

0.4

0.8

1.2

ν

dens

ity

Community

2 4 6 8

L

Pro

babi

lity

0.00

0.10

0.20

0.30

0.5 2.0 3.5

0.0

0.4

0.8

α

dens

ity

Popularity

Figure: Posterior distributions of K, ν L and α. For ν and α, prior distributionsin dotted lines and posterior distributions in solid lines.

13


Posterior similarity matrices and popularities

Mr H

i5 7 11 6 4 17 8 13 2 18 22 14 20 12 3 10 29 25 Jo

hn A

.24 30 26 33 32 27 28 15 31 9 16 21 23 19

Mr Hi57

1164

178

132

1822142012

3102925

John A.243026333227281531

916212319

Community

0.0

0.2

0.4

0.6

0.8

1.0

Mr H

iJo

hn A

.3 33 2 4 24 26 17 18 16 27 11 21 19 23 15 12 22 13 5 20 25 30 6 8 7 29 10 28 31 9 14 32

Mr HiJohn A.

333

24

2426171816271121192315122213

5202530

687

29102831

91432

Popularity

0.0

0.2

0.4

0.6

0.8

1.0

2 4 6 8 10 12 14 16degree

1.501.251.000.750.500.250.000.250.50

mea

n of

i

1

2

3

45 678 910 1112 13

141516171819 20212223 24252627 2829 3031 32

33

34Popularity

(Posterior mean of θi against actor’s degree)

Mr Hi (1), John A. (34) and a few other

actors {2, 3, 33} have high popularity but the

rest have much lower activity levels.

14


Hard clustering estimates

Mr Hi

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

1819

20

21

22

23

24

2526

27 28

29

30

31

32

33John A.

Community

Mr Hi

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

1819

20

21

22

23

24

2526

27 28

29

30

31

32

33John A.

Popularity

Gp 1: 1.8 (0.27)Gp 3: 1.74 (0.26)

Gp 1: 0.56 (0.15)Gp 2: −0.24 (0.2)Gp 3: −1.51 (0.14)

• Plots of network: nodes colored according to hard clustering estimates.Singletons are not colored. Posterior means of β∗ and θ∗ and their standarddeviations (brackets) conditional on clustering structure given in legend.

• For communities, Gp 3 is exactly the faction led by John A. (Zachary, 1997)while Gp 1 together with {3} is the faction led by Mr Hi (actor 3 has posteriorprobability of 0.4 of being clustered with members in Gp 1 and 0.05 of beingclustered with members in Gp 3).

15


Interpreting results

• If we drop {θi} from the static model and consider just the blockmodel,we obtain five clusters: {1}, {3}, {33}, {34} and all other members.I Network is split into high-degree nodes and low-degree nodesI Importance of accounting for degree variation in blockmodels (Karrer and

Newman, 2011)

Our static model tries to address these issues using a nonparametricapproach via the automatic clustering structures induced by the DP.

• While hard partitions of the network are easier to interpret, the posteriorsimilarity matrices reveal finer details regarding the degree of affiliation ofactors towards the clusters that they are assigned to in the hard split.I E.g. actor 10 is assigned to the cluster led by John A., but he has a slightly

lower posterior probability (≈ 0.5) of being with other members in this clusterthan the rest, and also has a posterior probability (≈ 0.2) of being in thesame cluster as members in Mr Hi’s faction.

16


Kapferer’s tailor shop network

• Data on interactions among 39 workers in a tailor shop in SouthernAfrica, from June 1965 – Feb 1966 (Kapferer, 1972).

• The workers’ duties can be classified into eight categories:

More prestigious Less prestigious

head tailor, cutter, line 1 tailor,button machiner

line 3 tailor, ironer, cotton boy,line 2 tailor

• Focus on symmetric “sociational” networks recorded at two time points:

1. before an aborted strike,2. after a successful strike for higher wages.

• Network at second time point (223 edges) is denser than the first (158edges) as the workers strive to be more united (thereby expanding theirsocial relations) in their efforts to change the wage system.

17


Dynamic model I

• Assume communities remain unchanged and that the emergence ordissolution of ties are due to changes in activity level of actors.

• Set aν = bν = aα = bα = 10 and σ2θ = σ2β = 1. (3 parallel chains, 15,000iterations each, total time: 139 s)

3 5 7 9

K

Pro

babi

lity

0.00

0.10

0.20

0.30

0.5 1.5 2.5

0.0

0.4

0.8

1.2

ν

dens

ityCommunity

3 5 7

LP

roba

bilit

y

0.0

0.1

0.2

0.3

0.4

0.5 1.5

0.0

0.5

1.0

1.5

α

dens

ity

Popularity

Figure: Marginal posterior distributions of K, ν, L and α. For ν and α, priordistributions in dotted lines and posterior distributions in solid lines.

18


Hard clustering estimates1

2

3 4

5

6

7

8

9

10

111213

141516

17

18

19

20

21

22

2324

25

26

27

28

29

30

313233

343536

37 38

39

t=1: Community

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

2223

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

t=2: Community

Gp 1: 1.65 (0.13)Gp 2: 1.46 (0.31)Gp 3: 1.45 (0.29)Gp 9: 1.55 (0.14)

1

2

3 4

5

6

7

8

9

10

111213

141516

17

18

19

20

21

22

2324

25

26

27

28

29

30

313233

343536

37 38

39

t=1: Popularity

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

2223

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

t=2: Popularity

Gp 1: −1.41 (0.09)Gp 2: −0.46 (0.03)Gp 3: 0.57 (0.08)

Head tailor

Cutter

Line 1 tailor

19

16

Button machiner

Line 3 tailor

Ironer

Cotton boy

Line 2 tailor

19


Dynamic model I results

• High degree of job homophily in the communities even though thenetworks are based on casual interactions. Groups 1 and 2: cutter, line 1and line 2 tailors (more prestigious jobs); Gp 3: line 3 tailors; Gp 9:ironers and cotton boys.

low avg high

t=1t=2

Popularity

coun

t

05

1015

2025

highavglowpr

opor

tion

0.0

0.2

0.4

0.6

0.8

1.0 head tailor cutter line 1 tailor button mach. line 3 tailor ironer cotton boy line 2 tailor

t=1 t=2 t=1 t=2 t=1 t=2 t=1 t=2 t=1 t=2 t=1 t=2 t=1 t=2 t=1 t=2

• Three popularity clusters, −1.41 (Gp 1), −0.46 (Gp 2) and 0.57 (Gp 3),representing “low”, “average” and “high” popularity.

• Number of workers with low popularity decreased from t = 1 to t = 2while the number with average or high popularity increased (reflectsefforts of workers in expanding social ties after first unsuccessful strike)

20


Dynamic model I results

• Proportion of workers with low and average popularity remainedunchanged over the two time points for the ironers, cotton boys and line2 tailors (positions with lower prestige).

• Changes in popularity arise mainly from line 1 tailors, button machinersand line 3 tailors.

• These observations are consistent with the analysis of Kapferer (1972),who noted that line 1 tailors made a strong attempt to expand their linksafter the first unsuccessful strike as they stand to benefit the most fromthe change in wage system.

21


Dynamic model II

• The probability that a tie is formed depends on whether a tie exists at theprevious time point as well as the community membership of the nodesand their popularities.

• Consider 3 parallel chains, 15,000 iterations, total runtime: 106 s.

3 5 7 9 11

K

Pro

babi

lity

0.00

0.05

0.10

0.15

0.20

0.25

0.5 1.0 1.5 2.0 2.5

0.0

0.2

0.4

0.6

0.8

1.0

1.2

ν

dens

ity

3 5 7 9 11 13

L

Pro

babi

lity

0.00

0.05

0.10

0.15

0.20

0.25

0.5 1.5 2.5

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

α

dens

ity

0.0 0.4 0.8

0.0

0.5

1.0

1.5

2.0

2.5

3.0

η

dens

ity

Figure: Marginal posterior distributions of K, ν, L, α and η. For ν and α, priordistributions in dotted lines and posterior distributions in solid lines.

• Posterior mean of η is 0.58 and its posterior mass is concentrated onpositive values. This indicates that a tie is likely to persist at second timepoint given that it existed at first time point.

22


Dynamic model II results1

2

3 4

5

6

7

8

9

10

111213

141516

17

18

19

20

21

22

2324

25

26

27

28

29

30

313233

343536

37 38

39

t=1: Community

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

2223

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

t=2: Community

Gp 1: 1.33 (0.13)Gp 2: 1.38 (0.39)Gp 4: 1.39 (0.28)Gp 6: 0.54 (0.8)Gp 10: 1.76 (0.15)

1

2

3 4

5

6

7

8

9

10

111213

141516

17

18

19

20

21

22

2324

25

26

27

28

29

30

313233

343536

37 38

39

t=1: Popularity

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

2223

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

t=2: Popularity

Gp 1: −0.81 (0.04)Gp 2: −0.16 (0.05)Gp 3: −1.39 (0.11)Gp 4: 0.74 (0.11)Gp 5: −0.47 (0.1)Gp 6: 0.24 (0.16)

Head tailor

Cutter

Line 1 tailor

19

16

Button machiner

Line 3 tailor

Ironer

Cotton boy

Line 2 tailor

23


Dynamic model II results

• The communities detected are similar to that of Dynamic model I exceptfor changes to individuals {14, 16, 19, 21}.

• The number of popularity clusters increased from three in dynamic modelI to six in model II. In model II, the popularity of an actor summarizes hisactivity level across all time points.

0 5 10 15 20 25degree

1.25

1.00

0.75

0.50

0.25

0.00

0.25

0.50

mea

n of

i

1 2

3

45

6

78

9

10

111213

14

15

16

17

18

19

20

21

22

23

24

25

2627

28

2930

3132

33

34

3536

37

38

39

t=1

5 10 15 20 25degree

1.25

1.00

0.75

0.50

0.25

0.00

0.25

0.50

mea

n of

i

12

3

45

6

78

9

10

1112 13

14

15

16

17

18

19

20

21

22

23

24

25

2627

28

2930

3132

33

34

3536

37

38

39

t=2

Figure: Mean of θi against the actor i’s degree at t = 1 (left) and t = 2 (right).

• The head tailor (19) and cutter (16) have significantly higher popularitythan the rest followed by actor 24 and actors in popularity Gp 2 (includesindividuals who play significant roles in the factory’s social relationships).

24


Conclusion

• We propose a nonparametric Bayesian approach for detecting communitiesin social networks, using degree-corrected stochastic blockmodels.

• The number of communities and popularity clusters is inferred from thedata through use of the Dirichlet process.

• Inferred popularity clusters summarizes the popularities of the actors andhelps in identifying key players in the network.

• Extensions of static model to dynamic networks.I Dynamic model I: study changes in activity level of actors over the time.I Dynamic model II: measures persistence of links in the network.

• While Gibbs samplers are feasible for small networks, they do not scalewell to large networks and more efficient methods of estimation, such asvariational approximation methods, can be developed.

Tan, L. S. L. and De Iorio, M. (2018). Dynamic degree-corrected blockmodelsfor social networks: a nonparametric approach. Statistical Modelling.

25

Dynamic degree-corrected blockmodels for social networks: a … · 2018. 9. 4. · I Overlapping...

Documents

Transcript of Dynamic degree-corrected blockmodels for social networks: a … · 2018. 9. 4. · I Overlapping...