Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed...

55
Edo Airoldi Department of Statistics, Harvard Broad Institute of Harvard and MIT Guest lecture for EE380L at UT Austin (Prof. J Ghosh), November 10, 2011 Models of networks and mixed membership stochastic blockmodels

Transcript of Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed...

Page 1: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Edo Airoldi

Department of Statistics, Harvard Broad Institute of Harvard and MIT

Guest lecture for EE380L at UT Austin (Prof. J Ghosh), November 10, 2011

Models of networks and mixed membership stochastic blockmodels

Page 2: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Guest lecture for EE380L (November 2011) 2

Agenda

•  Overview •  Models of networks •  Mixed membership blockmodels

1. Inference 2. Results

•  Concluding remarks

Page 3: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Overview

•  Structured data vs. latent dependence structure Leveraging observed (noisy) structure for estimation As opposed to dim redux, graphical models, sparsity, …

•  Technical challenges Abandon convenient representations of dependence Deal with structured measurements and interfering units

•  This talk Statistical problems when structure is expressed by a graph

3

Page 4: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

What is a complex network?

•  Define as a collection of measurements on pairs of sampling units and of unit-specific attributes

•  Traditionally, can only choose 2 out of 3 1.  Large scale, e.g. millions of nodes 2.  Realistic 3.  Completely mapped, or to a large extent

•  Today, a number of systems fall under this data setting that satisfy all three characteristics

Guest lecture for EE380L (November 2011) 4

Page 5: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

A few examples

•  Internet, WWW and Wikipedia •  Signaling pathways and metabolic networks •  JStor and scientific literature •  Cell-phone data, e.g. Rwanda, UK, ATT •  Yahoo and other instant messaging systems •  Linked-In and Facebook •  Blogs and Twitter

Guest lecture for EE380L (November 2011) 5

Page 6: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Rich, interdisciplinary literature

•  Historical notes Moreno formalizes the sociogram (’34), Sociometry (‘37)

50s: Sociology (Coleman et al. ‘57), Mathematics (Erdos & Reniy ‘59, Gilbert ‘59), Psychology (Milgram ‘67, ’69)

70s: Statistics (Holland, Leinhardt, Fienberg, Wasserman)

90s: Computer Science (Faloutsos3 ’99), Physics (Huberman & Adamic ‘99, Albert & Barabasi ‘99)

Guest lecture for EE380L (November 2011) 6

Page 7: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Statistical issues in network analysis

•  Representation and compressed sensing How to smoothly represent the space of all graph structures? Motifs, metrics, spectral, …, semi-parametric

•  Population models Sample size? Notions of variability? (See survey paper)

•  Diffusion of information on a network How to infer who talks to whom from aggregate traffic?

Guest lecture for EE380L (November 2011) 7

Page 8: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Statistical issues in network analysis

•  Confidence sets, tests, GoF, model selection How to establish confidence sets for network structure? The Newman-Girvan modularity score is inconsistent

•  Inference from a sample CDC sponsored more than 90 studies to date using RDS Are network sampling designs ignorable? No.

•  Causal inference with interference How to separate peer-influence effects from homophily?

Guest lecture for EE380L (November 2011) 8

Page 9: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Some details to think about

•  Easy to measure things. Hard to pose questions. May not really know what any node or link means.

•  What does Yij=0 mean?

•  Valued measurements and censoring.

•  Notion of variability. (sample size, populations)

•  Global properties must be non-trivial outcomes of the composition of local properties and structures

Guest lecture for EE380L (November 2011) 9

Page 10: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Guest lecture for EE380L (November 2011) 10

Agenda

•  Overview •  Models of networks •  Mixed membership blockmodels

1. Inference 2. Results

•  Concluding remarks

Page 11: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Network modeling 101

•  Graphs or networks?

•  Usually a graph is defined as, G = (V,E)

•  For the purpose of this seminar, G = (1:N,YN✕N)

•  Complex networks, G = (1:N,YN✕N,XN✕P)

•  Random graphs via P(G|Θ) or P(Y|Θ)

•  Frequentist or Bayes?

Guest lecture for EE380L (November 2011) 11

Page 12: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Erdös-Renyi-Gilbert

•  The most widely known random graph model

•  Binary edges are sampled independently G(N,θ): sample Yij from Bernoulli(θ) for i,j=1..N G(N,M): sample Y from SRS(θ,M)

•  Likelihood for G(N,θ) P(Y|Θ) = Πij θYij (1-θ)(1-Yij)

Guest lecture for EE380L (November 2011) 12

Page 13: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Emergence of the giant component

•  ER studied G(N,M) as θ=M/ increases in [0,1]

•  For a graph with N nodes, θ=1/N is a critical value 1.  If θ<1/N, no connected components of size larger than

O(log N) will exist in the graph, as N↑∞

2.  If θ=1/N, largest connected component of size O(N2/3) will exist in the graph, as N↑∞

3.  If θ>1/N, unique connected component of size O(N) will exist in the graph, as N↑∞. No other components with more than O(log N) will exist, as N↑∞

Guest lecture for EE380L (November 2011) 13

!

N2"

# $ %

& '

Page 14: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

0.00 0.01 0.02 0.03 0.04 0.05

020

40

60

80

100

probability of an edge

siz

e o

f la

rgest c-c

om

p (

mean)

5e-05 5e-04 5e-03 5e-02

020

40

60

80

100

probability of an edge

siz

e o

f la

rgest c-c

om

p (

mean)

0.00 0.01 0.02 0.03 0.04 0.05

05

10

15

probability of an edge

siz

e o

f la

rgest c-c

om

p (

st.dev)

5e-05 5e-04 5e-03 5e-02

05

10

15

probability of an edge

siz

e o

f la

rgest c-c

om

p (

st.dev)

0.00 0.01 0.02 0.03 0.04 0.05

0.0

0.2

0.4

0.6

0.8

1.0

probability of an edge

pro

b o

f gia

nt com

p (

mean)

5e-05 5e-04 5e-03 5e-02

0.0

0.2

0.4

0.6

0.8

1.0

probability of an edge

pro

b o

f gia

nt com

p (

mean)

0.00 0.01 0.02 0.03 0.04 0.05

0.0

0.1

0.2

0.3

0.4

0.5

probability of an edge

pro

b o

f gia

nt com

p (

st.dev)

5e-05 5e-04 5e-03 5e-02

0.0

0.1

0.2

0.3

0.4

0.5

probability of an edge

pro

b o

f gia

nt com

p (

st.dev)

14

Page 15: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

p* or ERG models

Pr (Y=y|Θ=θ) = exp{ Σk θkSk(y) + A(θ) }

where Sk(y) counts specific structure k, such as •  edges S1(y) = Σ1≤i≤j≤n yij

•  triangles S3(y) = Σ1≤i≤j≤h≤n yij yih yjh.

Frank & Strauss (JASA, 1986), Snijders et al. (Soc. Met., 2004), Hanneke & Xing (LNCS, 2007)

Guest lecture for EE380L (November 2011) 15

Page 16: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Towards exchangeable graphs

•  Symmetry suggests the nodes should be treated as exchangeable in the following sense

•  A result by Hoover and Aldous: any model that satisfies this condition for any N is of the form

for ui,uj i.i.d. and εij i.i.d node/pair-specific effects

Guest lecture for EE380L (November 2011) 16

Page 17: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Exchangeable graph models

•  Alternative specifications of h(µ,ui,uj,εij) lead to different models. With some generality

P(Yij=1|µ,ui,uj,εij) = h’(µ + α(ui,uj) + εij) = θij

•  Likelihood P(Y|c) = ∫Θ P(Θ|c) ⋅ Πij θij

Yij (1-θij)(1-Yij) dθij

Guest lecture for EE380L (November 2011) 17

Page 18: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Guest lecture for EE380L (November 2011) 18

Approach

•  Issues: scalability, global vs. local perspectives

Data

Probabilistic Hierarchical Models

Bayesian Posterior Inference Hidden Mechanism

(Statistician)

Domain Knowledge and Hypotheses

(Domain expert)

Page 19: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Three basic models

•  Latent space model α(ui,uj) = -|ui-uj|; ui real vectors, for i=1…N

•  Latent eigenmodel α(ui,uj) = ui

’Λuj; ui real vectors, for i=1…N; Λ diag. K×K

•  Latent class model α(ui,uj) = Bui,uj; ui =1…K, for i=1…N; B symm. K×K

Guest lecture for EE380L (November 2011) 19

Page 20: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Latent space models

log-odds (Yij=1|ui,uj,µ) = µ – |ui–uj| = ηij

where ui is a point in Rk, for all nodes i in N.

Idea: close points in Rk are likely to be connected.

Here uis are constants; θij = [1+exp{–ηij}]-1 and likelihood is P(Y|U,µ) = Σij [ηijYij – log(1+exp{ηij}) ]

Hoff et at. (JASA, 2002), Handcock et al. (JRSS/A, 2007), Krivitsky et al. (Soc. Net., 2009)

Guest lecture for EE380L (November 2011) 20

Page 21: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Guest lecture for EE380L (November 2011) 21

Shortcomings so far

•  ERG models (Wasserman et al., Handcock et al.)

Summarize graphs using exp model on motif-counts Issues: cannot offer node-specific predictions, ..

•  Latent space models (Hoff et al. 02; Hoff 03)

Project adjacency matrix onto a latent RK via logistic regression; closer points increase chance of connectivity

Issues: MCMC does not scale, hard identifiability problem, no clustering effect

Page 22: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Model specifications

πi ~ Dirichlet (α), for all nodes i=1..N yij|πi,πj ~ Bernoulli (πi`B πj), for all pairs (i,j)

where πi is a point in the K-simplex, and B is K×K.

Nodes in the same block share similar connectivity.

Loraine & White (JMS, 1971), Fienberg et al. (JASA, 1985), Nowicki & Snijders (JASA, 2001), Airoldi et al. (JMLR, 2008)

22

1 2

3

4 5

6

8

9 7

Page 23: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Guest lecture for EE380L (November 2011) 23

Agenda

•  Overview •  Models of networks •  Mixed membership blockmodels

1. Inference 2. Results 3. Remarks

•  Concluding remarks

Page 24: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

The cell

Guest lecture for EE380L (November 2011) 24

(Source: fig.cox.miami.edu)

Page 25: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Functions & mechanisms

•  Cytoplasm is a busy place Proteins, small molecules

•  Taxonomy of functions Gene Ontology annotations (e.g., cell division)

•  Mechanisms Pathways as complex graphs (e.g., carbon metabolism)

Guest lecture for EE380L (November 2011) 25

(Source: SGD and own work)

Page 26: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

26

(Source: Nature, and BMC Bioinformatics)

Domain knowledge

Proteins form stable protein complexes to carry out functions in the cell

Protein interaction data

Page 27: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Guest lecture for EE380L (November 2011) 27

Scientific questions

•  Can interaction motifs: –  indicate proteins’ multifaceted functional role? –  reveal protein complexes and relations among them?

Protein interaction data Functions (GO Slim) Yeast cell

(Source: fig.cox.miami.edu, SGD, and own work)

Page 28: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

•  Structural equivalence (Lorrain & White, 1971) –  Nodes with similar connectivity collapsed into a block

•  Instantiated by –  Blockmodel (B) (≈ Nowiki & Snijders, 01, Airoldi et al. 05, 07, 08)

•  Combined with –  Mixed membership (Π) (Airoldi et al. 05, 07, 08)

Guest lecture for EE380L (November 2011) 28

Two modeling ideas 1 2

3

4 5

6

8

9 7

(7,8,9)

(1,2,3) (4,5,6)

9

Page 29: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Guest lecture for EE380L (November 2011) 29

Blockmodel, B

•  Captures salient structure at the block level

•  Connectivity among nodes within the same block (across blocks) is only specified on average

A B C

1.0 0 0.3 A

0.3 1.0 0 B

0 0.3 0 C

C = (7,8,9)

A = (1,2,3) B = (4,5,6)

1 2

3

4 5

6

8

9 7

From

To

Page 30: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Guest lecture for EE380L (November 2011) 30

Mixed membership, Π

•  Nodes can be mapped to multiple blocks

•  Extends the idea of a mixture (i.e., local weights)

•  Node-specific weights useful for prediction

A B C node

1.0 0 0 1

1.0 0 0 2

. . . .

0.1 0.1 0.8 9

1 2

3

4 5

6

8

9 7

9

A B C

Page 31: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Guest lecture for EE380L (November 2011) 31

Model: projecting Y onto B via Π

Mixed Membership

Stochastic Blockmodel

Page 32: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Blockmodel + node-specific memberships

Likelihood

Note: the matrix B has size K✕K Guest lecture for EE380L (November 2011) 32

Model: variant for prediction

Y (n, m) ! Bernoulli (!"!

nB !"m), (n, m) " [1, N ]2

!"n ! Dirichlet (#), n " [1, N ]

!(Y |", B) =!!

"n

p(#$n|")"

nmp(Y (n, m)|#$n,#$m, B) d!

Page 33: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Guest lecture for EE380L (November 2011) 33

Model: variant for de-noising

Blockmodel + relation-specific memberships

Note: the matrix B has size K✕K

!"n ! Dirichlet (#), n " [1, N ]

!znm! ! multinomial (!"n, 1), (n, m) " [1, N ]2

!znm! ! multinomial (!"m, 1), (n, m) " [1, N ]2

Y (n, m) ! Bernoulli (!z!nm"B !znm#), (n, m) " [1, N ]2

Page 34: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Guest lecture for EE380L (November 2011) 34

Agenda

•  Overview •  Models of networks •  Mixed membership blockmodels

1. Inference 2. Results

•  Concluding remarks

Page 35: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Guest lecture for EE380L (November 2011) 35

Revisiting EM

•  Data Y, latent variables X =(Π,Z), and constants Θ =(α,B)

log

Page 36: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

q ! q!(X) " p(X | Y ) at !! = !!(Y )

Guest lecture for EE380L (November 2011) 36

Variational EM

•  EM maximizes the lower bound over (q,Θ) •  In EM we set

•  If not feasible, we can posit approximation for q using free parameters Δ ⎯ this is vEM

q = p(X | Y,!)

Page 37: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Eq!

!

log p(Y, X | !) ! log q!(X)"

=: L(q!, !)

Guest lecture for EE380L (November 2011) 37

Variational EM (cont.)

•  Leads to approximate lower bound

•  Iterate

Variational E-step:

M-step:

!! = arg max! L(q!, ")

!! = arg max! L(q"! , !)

Page 38: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

38

Nested variational EM

•  Mean field:

Vanilla vEM (Jordan et al. 99) E-step: initialize γ1:N, ϕ1:N,1:N 1. update ϕ1:N,1:N 2. update γ1:N

M-step: update α, B

Nested vEM (Airoldi et al. 05, 08)

E-step: initialize γ1:N loop pairs (n,m) 1. init & optimize ϕn,m 2. partially update γ n,γm

M-step: update α, B

q!(!, Z) =!

n q!"n(!"n) ·

!nm q!#nm

(!znm)

Page 39: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Guest lecture for EE380L (November 2011) 39

Agenda

•  Overview •  Models of networks •  Mixed membership blockmodels

1. Inference 2.  Results

•  Concluding remarks

Page 40: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Guest lecture for EE380L (November 2011) 40

•  Functional content in P(Y | )

•  Model reveals information about functional modules (cross-validation: K*=50; gold standard in Myers et al. 06)

Evaluation: recovering function

3

2

1

Prec

isio

n

Recall

Page 41: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

41

Evaluation: identifying blocks

•  Two model variants capture a different number of functional processes, with equally high accuracy

GO functional processes (Area under the curve, red = high)

2 1 3

Page 42: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

42

Evaluation: mixed membership

•  Amount of mixed membership is substantial •  Membership reveals multifaceted functional roles

Mixed membership

Estimated memberships 0 1

Estim

ated

mem

bers

hips

15 high level functions 15 high level functions

NAT-1 GAL-4 NOP-1

NOP-58 MET-31

Page 43: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Guest lecture for EE380L (November 2011) Edo Airoldi

National study on adolescents

•  A friendship network among 69 students in grades 7-12

Original data node-specific relation-specific (prediction) (de-noising)

Page 44: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Columbia University, Nov. 26th, 2007, New York, NY Edo Airoldi

Page 45: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Sampson’s monastery data

•  Multivariate sociometric relations among novices in a NE monastery, over two years.

•  Anthropological observations as ground truth

•  Two factions, plus social outcasts and waverers

•  After two years John and Greg get expelled, most young turks leave and the order dissolves

Guest lecture for EE380L (November 2011) 45

Page 46: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Guest lecture for EE380L (November 2011) 46

Expressing connectivity

•  Two variants provide increasing levels of definition

Original data node-specific relation-specific (prediction) (de-noising)

Page 47: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Social structure: blockmodel

Young Turks

Loyal

Opposition

Outcasts

0.9

0.9 0.5

0.3

0.4

Guest lecture for EE380L (November 2011) 47

Page 48: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Columbia University, Nov. 26th, 2007, New York, NY Edo Airoldi

Social structure: membership

Page 49: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Guest lecture for EE380L (November 2011) 49

Evaluation: nested variational EM

(Simulated data; 300 nodes, 10 blocks)

— Vanilla

— Nested

Variational EM

(Airoldi et al. 07)

(Jordan et al. 99) Hel

d-ou

t log

like

lihoo

d

Run time (seconds)

Page 50: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Guest lecture for EE380L (November 2011) 50

Model extensions

•  Sparsity, general formulation, informative priors and full Bayes (Airoldi, Blei, Fienberg & Xing, 05, 06, 08)

•  Node attributes (Airoldi, Markowetz, Blei & Troyanskaya)

•  Dynamic (Airoldi, Fienberg & Krackhardt, 08)

•  Extensions by others (Hofman & Wiggins 07; Eliassi-Rad, Griffiths & Jordan; Nallapati, Cohen & Lafferty; Frey et al., 06, Chang & Blei)

Page 51: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Guest lecture for EE380L (November 2011) 51

Y

Dynamics of social failure

•  Analysis suggests a theory of social failure in isolated communities. Try longitudinal model

•  Data:

Y

Page 52: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Guest lecture for EE380L (November 2011) 52

Whom do like (epoch 1) Whom do like (epoch 2)

Whom do like (epoch 3)

Page 53: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Guest lecture for EE380L (November 2011) 53

Agenda

•  Overview •  Models of networks •  Mixed membership blockmodels •  Concluding remarks

Page 54: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Take home points

•  Complex networks are an exciting research area that is generating new statistical problems

•  The familiar notions of sampling variability and sampling designs are challenged

•  Potential for impact in the sciences, from biology to communications, and from computational social science to healthcare survey design and analysis

Guest lecture for EE380L (November 2011) 54

Page 55: Models of networks and mixed membership stochastic blockmodels · Models of networks and mixed membership stochastic blockmodels. Guest lecture for EE380L (November 2011) 2 Agenda

Acknowledgements and pointers

CDC, Facebook, Bell Labs. S Fienberg, E Xing, D Blei, B Singer, A Gelman, Z Ghahramani, J Leskovec, J Kleinberg, D Rubin.

1.  Getting started in probabilistic graphical models. Airoldi, PLoS Computational Biology, 2007.

2.  Mixed membership stochastic blockmodels. Airoldi, Blei, Fienberg & Xing, Journal of Machine Learning Research, 2008. (in R: iGraph, LDA)

3.  A survey of statistical network models. Goldenberg, Zheng, Fienberg & Airoldi. Foundations & Trends in Machine Learning, 2009.

4.  Deconvolution of mixing time series on a graph. Blocker & Airoldi. Uncertainty in Artificial Intelligence (UAI), 2011.