DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR...

84
DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness, Ph.D. Supported by the US National Institutes of Health NME Workshop 1

Transcript of DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR...

Page 1: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

DAY 2:

STATISTICAL METHODS FOR NETWORK ANALYSIS

Martina Morris, Ph.D.Steven M. Goodreau, Ph.D.Samuel M. Jenness, Ph.D.

Supported by the US National Institutes of Health

NME Workshop 1

Page 2: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Today we will cover

NME Workshop 2

Three classes of statistical models for networks

Simple null models

Generative models for static networks

Generative models for dynamic networks

Ending with an example How we can use these models to answer questions about

epidemic dynamics and interventions

morning

afternoon

Page 3: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Basic null hypothesis statistical tests Does your network differ from a simple random graph?

Simple null models to test against: CUG and BRG

ERGMs : generative models to test for multiple structural properties simultaneously in static networks Components of an ERG model

Interpretation of coefficients

Estimation algorithms (and when they fail)

Model Diagnostics (estimation, and goodness of fit)

Simulation from fitted models

Network data Types

Principles for estimation

Outline for the morning session

NME Workshop 3

This is what we’ll use ERGMs for in Epi modeling

Page 4: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Note

NME Workshop 4

We’ll cover a lot of ground here some of the material and vocabulary may be unfamiliar

Don’t worry if you don’t understand everything Focus on getting the big picture, not the details

EpiModel puts a lot of this behind the curtain So you don’t have to deal with it, for the most part

The details do matter when you have a problematic model though

And – don’t be afraid to ask questions

Page 5: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Getting started

NME Workshop 5

Recall, the two ways to access statnetWeb https://statnet.shinyapps.io/statnetWeb/ library(statnetWeb); run_sw();

Open statnetWeb and load the faux.mesa.high network

Page 6: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

How do you know if your network is significantly different than a simple random graph?

Statistical Testing: Basics6

NME Workshop

Page 7: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Description vs. Inference in statistics

NME Workshop 7

So far we have been using descriptive statistics to explore our network data Density

Degree and geodesic distributions

Mixing matrices

Component size distributions

Next, we might want to compare these statistics to what we would expect by chance What do we mean “by chance”?

Is there a natural “null hypothesis test” in this context?

Page 8: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Recap

NME Workshop 8

Does the structure of our social network differ from a simple random graph?

What are some structural differences you can see?

faux.mesa.high network Simple random graph with the same tie probability

Page 9: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Consider triangles

Suppose kids have a tendency to become friends with their friends’ friends

And this is the only generative process occurring.

Presumably, this would mean that you would observe more triangles than expected by chance in the graph.

How would you test this for a specific network?

NME Workshop 9

Page 10: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

A basic statistical test for triangles

Begin by counting the # triangles in your network

Say this is “T”, your test statistic

Then determine the probability of observing T or more triangles in this network …

And see if it is less than 5%

NME Workshop 10

But … how do you determine that probability?

For that you need a null hypothesis of some sort

Page 11: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

What is the natural null hypothesis?

NME Workshop 11

It turns out there’s more than one …

But they all get used the same way when constructing a statistical test.

To create a sampling distribution consistent with the null

And compare your observed value to that distribution

Page 12: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Null probability distribution (1)

Unconditional: For a network this size (size = # nodes)

enumerate all possible networks for a fixed number of nodes,

count the number of triangles in each network, and

construct the frequency distribution of these counts.

Where does the number of triangles in your network lie in this distribution?

Top 5%?

Bottom 5%

Near the middle?

NME Workshop 12

Page 13: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Null probability distribution (1)

For example: Take a network with 3 nodes

How many dyads are there?

32=

3!

2! 3−2 != 3

How many different networks on these dyads? Every dyad has 2 possible values, and there are 3 dyads

So the number of possible networks is: 23 = 8

What is the distribution of triangle counts? 7 networks have 0 triangles

1 network has 1 triangle

So if your network has 1 triangle, what do you think?

NME Workshop 13

0

2

4

6

8

0 1

Triangle distribution for 3 node networks

Page 14: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Null probability distribution (1)

One problem with the unconditional null distribution

enumerate all possible networks for a fixed number of nodes

this is not so easy in practice

for 4 nodes: # of dyads is 4*3/2 = 6

# of possible networks = 26 = 64

for 10 nodes: # of dyads is 10*9/2 = 45

# of possible networks = 245 ≈ 35 trillion

for 20 nodes: # of dyads is 20*19/2 = 190

# of possible networks = 2190 ≈ 1057

We can solve this problem by sampling from the space of networks.

NME Workshop 14

Page 15: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Null probability distribution (1)

More important question for the unconditional null distribution

Do you really care about comparing your network to networks with zero ties?

Or all possible ties?

Or does it make more sense to compare your network to other networks with the same number of ties?

Controlling for density, does your network have more or less triangles than expected?

NME Workshop 15

Page 16: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Null probability distribution (2)

Condition on the density, the number of nodes and links

This is the Conditional Uniform Graph test (CUG)

enumerate all possible networks for a fixed number of nodes and links,

count the number of triangles in each network,

construct the frequency distribution of the counts

compare the value in your network

This also reduces the sample space

but it’s still a lot of graphs… 𝑛2𝑒= 𝑛2! /𝑒! ( 𝑛

2− 𝑒)!

so we will still need to sample from this space in practice

NME Workshop 16

Page 17: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

The CUG is implemented as a permutation test

NME Workshop 17

Since full enumeration is typically not possible

We sample the enumeration space by permutation

Randomly choose a tied dyad, and a dyad without a tie

Permute the tie and the non-tie This preserves the exact density in the network

Count the number of triangles in the new network

Repeat until you have the desired sample size

Permutation tests are often used in statistics When the distribution of a sample statistic is not known

Page 18: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Null probability distribution (3)

Condition on the probability of a tie

This is the Bernoulli Random Graph test (BRG)

Similar to the CUG, but treats density as a random variable

Implemented via Markov Chain Monte-Carlo (MCMC)

Randomly choose a dyad

Flip a coin with probability(tie) = density of the network This will not preserve the exact density for each network, but will preserve it on average

Repeat many times, then count the number of triangles in the final network

Repeat until you have a sample of the desired size

NME Workshop 18

Page 19: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Null models in statnetWeb

NME Workshop

Select a summary measure for the observed data

Compare it to the distribution simulated from a null model

In statnetWeb:

We can plot null distribution overlays on degree and geodesic distributions

And plot the CUG and BRG distributions for selected network summary measures

19

Page 20: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

In statnetWeb: Degree distribution

Compare the degree distribution in faux.mesa.high to what we would expect by chance

20NME Workshop

Network Descriptives

Degree Distribution

Select CUG and BRG

null models

What do you see now?

Overlays the mean and 95% confidence intervals from 100 simulations

Page 21: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Test for the number of isolates

Compare the number of isolates in faux.mesa.high to what we would expect by chance

21NME Workshop

Network Descriptives

MoreNull model

tests

How likely is the number of isolates we observed in our network, under the null model?

Page 22: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

CUG test for triangles

NME Workshop 22

Are there more triangles in the observed network?

Choose the triangle term from the dropdown menu and run 100 simulations to see how our network compares to the two null models

“CUG” and “BRG”

Page 23: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Indeed…

NME Workshop 23

Page 24: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Yes the observed triangle count is high

NME Workshop 24

But why?

… a simple null hypothesis test doesn’t provide any insight about that.

Page 25: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Limitations of simple null hypotheses

If we are only interested in whether the triangle counts are different than expected given the density of the graph

One can use these simple null hypothesis tests

But if we want to understand the underlying generative process, quantify the impact of each process on our network, and control for other network features …

This requires a more general statistical modeling framework

NME Workshop 25

Page 26: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Can you control for more than just density?

What if you want to test more than one network feature?

And you want a model grounded in generative social theory?

… That’s when you turn to ERGMs

Statistical Testing: Beyond Basics26

NME Workshop

Page 27: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Motivation

NME Workshop 27

Why are there so many more triangles?

What do you see when color-coding the nodes by their attributes?

faux.mesa.high network Simple random graph with the same tie probability

Page 28: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

(At least) two theories about the process that generates triangles:

1. Homophily: People tend to chose friends who are like them, in terms of

grade, race, etc. (“birds of a feather”), triad closure is a by-product

2. Transitivity: People who have friends in common tend to become friends

(“friend of a friend”), triad closure is the key process

So, for three actors in

the same grade

A cycle-closing tie may

form due to transitivityBut it may be due

instead to homophily

Friend of a friend, or birds of a feather?

NME Workshop 28

Page 29: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

But not completely. Any tie may be classified by whether it is:

The cells represent how the processes jointly influence that tie, so the distribution of ties in this table is informative.

This suggests we should be able to disentangle the two processes statistically

Transitivity and homophily are confounded

NME Workshop 29

Within Grade:

Triangle forming:

Yes No

Yes Both Homophily

No Transitivity Neither

partially

Page 30: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

ERGMs: Basic idea

NME Workshop 30

We want to model the probability of a tie as a function of:

Nodal attributes (that influence degree and mixing)

The propensity for certain “configurations” (like triangles)

The dyads may be dependent

Nodal attribute effects do not induce dyad dependence

But triad closure does

So we model the joint distribution directly

Page 31: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

where: g(y) = vector of network statistics

= vector of model parameters

k( ) = numerator summed over all possible networks on node set y

Probability of observing a graph (set of relationships) y on a fixed set of nodes:

• Exponential family model• Well understood statistical properties (≠ well understood models)• Very general and flexible

Exponential Random Graph Model (ERGM)

NME Workshop 31

𝑃 𝑌 = 𝑦 ) =exp(𝜽′𝒈 𝒚 )

𝑘( )

Page 32: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Probability of observing a graph (set of relationships) y on a fixed set of nodes:

If you’re not familiar with this kind of compact vector notation, the numerator is just:

exp(𝜽1𝑥1 + 𝜽2𝑥2 +⋯+ 𝜽𝑝𝑥𝑝)

Kind of like a linear model, but a bit different (watch out for this later)

Exponential Random Graph Model (ERGM)

NME Workshop 32

𝑃 𝑌 = 𝑦 ) =exp(𝜽′𝒈 𝒚 )

𝑘( )

Page 33: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

where 𝝏 𝒈 𝒚 represents the change in 𝒈 𝒚 when Yij is toggled between 0 and 1

The conditional odds of a tie

NME Workshop 33

can be re-expressed as𝑃 𝑌 = 𝑦 ) =exp(𝜽′𝒈 𝒚 )

𝑘( )

𝑙𝑜𝑔𝑖𝑡 𝑃 𝑌𝑖𝑗 = 1 rest of the graph ) = 𝑙𝑜𝑔𝑃 𝑌𝑖𝑗 = 1 rest of the graph )

𝑃 𝑌𝑖𝑗 = 0 rest of the graph )

= 𝜽′𝝏 𝒈 𝒚

The probability of the graph

The conditional log odds of a specific tie

This is an auto logistic regression (auto because of the possible dependence)

Page 34: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

ERGM specification: 𝜽′𝒈 𝒚

NME Workshop 34

The 𝒈 𝒚 terms in the model are summary “network statistics”

Counts of network configurations, for example:1. Edges: 𝑦𝑖𝑗

2. Within-group ties: 𝑦𝑖𝑗𝐼(𝑖 ∈ 𝐶, 𝑗 ∈ 𝐶)

3. 2-stars: 𝑦𝑖𝑗𝑦𝑖𝑘

4. 3-cycles: 𝑦𝑖𝑗𝑦𝑖𝑘𝑦𝑗𝑘

A key distinction in the types of terms: Dyad independent (1 & 2 are examples)

Dyad dependent (3 & 4 are examples)

Page 35: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

ERGM specification: 𝜽′𝒈 𝒚

NME Workshop 35

Model specification involves:

1. Choosing the set of network statistics 𝒈 𝒚

From minimal : # of edges

To saturated: one term for every dyad in the network

NB: statnetWeb allows you to choose from the list of terms and retrieve documentation for each one

2. Choosing “homogeneity constraints” on the parameter , for example, with edges:

all homogeneous

group specific (e.g., sex or age specific )

dyad specific

Page 36: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Let’s explore the Florentine marriage network

… to StatnetWeb36

NME Workshop

Page 37: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Flomarriage: Bernoulli Model

Load the flomarriage network

Network of marriage ties between families in Renaissance Florence

On the Fit Model page, look up the documentation on the edges term

37NME Workshop

Page 38: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Flomarriage: Bernoulli Model

Add edges to the ergm formula

Fit the model

What does this model imply? Homogeneous edge probability

Every tie is equally likely

Not a very interesting model

38NME Workshop

Step 1

Step 2

Page 39: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Interpreting the coefficients

The log-odds of any tie existing is:

Corresponding probability:

39NME Workshop

= −1.609 × change in # ties

= −1.609 × 1

=exp −1.609

1 + exp −1.609

= 0.1667

You can confirm that this is the density of the network

𝜃 = 𝑙𝑛𝑝

(1 − 𝑝)

𝑝 =𝑒𝜃

1 + 𝑒𝜃

Page 40: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Flomarriage: Triad Formation

NME Workshop 40

The “triangle” term is a measure of clustering

Read the documentation for the triangle term

Fit the model edges + triangle

Hint: you can just add the triangle term if edges is already in your formula

Then click Fit Model

Triangle is a dyad dependent term, so the estimation algorithm changes to MCMC (more on this later)

Page 41: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Flomarriage: Triad Formation

NME Workshop 41

Now how to interpret the coefficients?

Conditional log-odds of two actors having a tie:

(−1.68 × change in the # of ties) + (0.16 × change in # of triangles)

always=1 how many triangles can one tie change?

For a tie that will create zero triangles −1.68 + 0 = −1.68

One triangle −1.68 + (0.16 × 1) = −1.52

Two triangles −1.68 + (0.16 × 2) = −1.36

Note, not significant

Still unlikely, but a bit less so

Page 42: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Flomarriage: Nodal covariates

NME Workshop 42

What do you notice?

We can test whether edge probabilities are a function of wealth

This is a quantitative nodal attribute, so we use the ergm term “nodecov”

flomarriage sized by wealth

Page 43: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Flomarriage: Nodal covariates

NME Workshop 43

Reset the ergm formula and fit the following model:

There is a significant positive wealth effect on the odds of a tie

What does the positive coefficient mean?

Not that there is homophily by wealth

Just that wealthy nodes have more ties

Note that the wealth effect operates on both nodes in a dyad.

Page 44: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Flomarriage: Nodal covariates

NME Workshop 44

The conditional log-odds of a tie between two actors is:

For a tie between two nodes with minimum wealth (3)

For a tie between two nodes with maximum wealth (146)

For a tie between nodes with maximum and minimum wealth

Note: To specify homophily on wealth, you would use the ergm-term absdiff

−2.59 × change in # ties + 0.01 × wealth of node 1 + 0.01 × wealth of node 2

−2.59 + 0.01 × 3 + 3 = −2.53

−2.59 + 0.01 × 146 + 146 = 0.33

−2.59 + 0.01 × 146 + 3 = −1.1

Page 45: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Estimation (in one slide)

NME Workshop 45

There is no “closed form” or analytic solution for the estimated coefficients (as there is in OLS: 𝛽 = 𝑋′𝑋 −1(𝑋′𝑌))

Instead, we rely on a defining property of Maximum Likelihood Estimates (MLEs) for exponential family models

At the MLE of the coefficients:

expected values of the statistics under the model = the observed statistics

And we find these MLEs using an iterative search algorithm

A “Markov Chain Monte Carlo” (MCMC) algorithm Start with some initial θ values, simulate a sample of networks from those values

Compare the means of the simulated statistics to the observed values

Update the values of θ based on the deviations

Repeat until the (expected – observed) < epsilon

Page 46: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Estimation (ok, I needed 2 slides)

NME Workshop 46

What does it mean to “simulate networks from those values”?

Pick a dyad at random

Toss a coin to set the tie status

The probability of the tie is determined by the model

And the details of the MCMC sampling algorithm (Gibbs, Metropolis, Metropolis-Hastings)

Repeat (many many many times)

This produces a Markov Chain of networks

Sample from this chain, every 1000th element (say)

Calculate the mean of the model statistics from this sample

And compare the this mean to the observed network statistics

Page 47: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Computationally intensive estimation

NME Workshop 47

Has been key to statistical estimation of complex (i.e., realistic and interesting) models for dependent data

And to the emergence of the field of “data science”

In most cases, it works really well

And there is lots of mathematical theory proving it has good convergence properties

… but, it can run into trouble

especially if the model you’re trying to fit is not a good one for the observed network

Page 48: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Model Degeneracy

NME Workshop 48

Models with dyad dependent terms can behave differently than we expect

They look simple, almost like logistic regression

But they represent effects that cascade through a network via a chain of dependence (this is the “watch out” from earlier)

Homogeneous triangle and k-star terms turn out to be some of the worst offenders

Page 49: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Model Degeneracy

NME Workshop 49

Technical Definition:

When a model places almost all probability on a small number of uninteresting graphs

Most common “uninteresting” graphs: Complete (all links exist)

Empty

Model degeneracy = misspecification The model you specified would almost never produce the network you

observed

Page 50: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Model Degeneracy

NME Workshop 50

Switch to the faux.mesa.high network

Fit a model with: edges + triangle

What happens?

Trying to fit this model, the algorithm heads off into networks that are much more dense than the observed network.

What does this mean? That this model would not have produced this network, for any combination of parameter estimates for the two terms

i.e., this is a model misspecification problem

Page 51: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Degeneracy Plot (for the 2 star model)

NME Workshop 51

Only the white area has networks with some interesting variation

The dark areas are complete graphs, or empty graphs (+/-1 or 2 edges)

This model does not produce many useful networksFrom Mark Handcock’s 2003 tech report:

https://www.csss.washington.edu/Papers/2003/wp39.pdf

Page 52: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Solution: better network statistics

NME Workshop 52

Old statistic: t(x) = # of triangles in the graph

Here, every additional 3-cycle has the same impact,

New statistic: Set declining marginal returns for each additional 3-cycle involving the

same edge

The specific function we place on this shared partner distribution involves a geometric weighting

Hence the name: geometrically weighted edge-wise shared partners A.k.a. GWESP

The parameter that specifies the rate of decline in marginal returns is α

The smaller the α, the more rapid the decline

Page 53: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Solution: better network statistics

NME Workshop 53

spi = # of edges with i shared partners𝑔𝑤𝑒𝑠𝑝 = 𝑒α

𝑖=1

𝑛−2

1 − 1 − 𝑒−α 𝑖 𝑠𝑝𝑖

This configuration contains:• 1 edge with 3 shared partners • 6 edges with 1 shared partner

α GWESP(α)

0 𝑒0 1 − 1 − 𝑒−0 1 × 6 + 𝑒0 1 − 1 − 𝑒−0 3 × 1 = 7

0.5 𝑒0.5 1 − 1 − 𝑒−0.5 1 × 6 + 𝑒0.5 1 − 1 − 𝑒−0.5 3 × 1 = 7.55

1 𝑒1 1 − 1 − 𝑒−1 1 × 6 + 𝑒1 1 − 1 − 𝑒−1 3 × 1 = 8.03

The # of edges with 1+ shared partners

Page 54: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Solution: better network statistics

NME Workshop 54

spi = # of edges with i shared partners𝑔𝑤𝑒𝑠𝑝 = 𝑒α

𝑖=1

𝑛−2

1 − 1 − 𝑒−α 𝑖 𝑠𝑝𝑖

Count of edges in each triangle (i.e. # of triangles x 3)

Count of edges in at least one triangle (because only an edge’s first triangle counts)

Page 55: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Adding a gwesp term to the faux.mesa.high model

And conducting model assessments

… to StatnetWeb55

NME Workshop

Page 56: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Fitting and diagnosing a model

NME Workshop 56

Convergence is the first assessment

Dyad independent models will always converge

Dyad dependent models may or may not

Next step depends on the model:

Dyad independent Dyad dependent

Goodness of fit assessment: GOF plots

Convergence assessment: MCMC diagnostics

Page 57: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

What are MCMC Diagnostics?

NME Workshop 57

MCMC Diagnostics tell us if the estimation algorithm is mixing well

These look pretty good

The traceplots on the left show random walks around the target value (a bit of correlation in the sampled networks, but not enough to cause concern)

The distribution of sampled statistics on the right is centered on the target values

These are taken from the penultimate MCMC chain, which is stored in the ergmoutput object

Page 58: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Goodness of Fit (GOF)

NME Workshop 58

Traditional GOF stats can be used AIC, BIC are included in the model summary

We also take another approach

We are interested in how well we fit aggregate properties of the network structure that we did not include as model terms

This helps to identify what the model gets wrong

We use 3 “higher order” statistics:

Degree distribution

Shared partner distribution (non-parametric) (local clustering)

Geodesic distance distribution (global clustering)

Page 59: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

DATA

MODEL

ESTIMATEDCOEFFICIENTS

SIMULATED DATA(draws from the prob. dist.)

HIGHER ORDERGRAPH STATISTICS

OF DATA

HIGHER ORDERGRAPH STATISTICS

OF SIMULATED DATA

GOODNESS OF FIT OF MODEL

TO DATA

NME Workshop 59

We’ll show how to do this next

Page 60: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Take a break?60

NME Workshop

Page 61: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

So let’s run and compare several models

NME Workshop 61

These will allow us to examine the evidence for homophily vs. transitivity

We’ll assess the convergence of the different models

As well as the goodness of fit

And the implications for the generative process of high school friendship patterns in this network

Page 62: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Fit the Bernoulli model to faux.mesa.high

62

Estimate, and run the default set of GOF terms for this model:

NME Workshop

faux.mesa.high ~ edges

Is this a dyad independent or dyad dependent model?

Dyad independent models are not fit with MCMC, so we don’t need to check MCMC diagnostics

We can move directly to GOF

Page 63: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Save the model

NME Workshop 63

This will keep the results so we can compare them later

Page 64: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Run the GOF for this model

64

Go to the Goodness of Fit tab

NME Workshop

Run the default GOF

This will take a moment because GOF is simulating 100 networks from the model, and calculating the default summary statistics for each one

Page 65: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Data:

Model: Bernoulli

(i.e. edges only)

Goodness of fit measure 1: degree distribution

NME Workshop 65

Boxplots show 100 simulations from the Bernoulli model

Black line shows the observed data from faux.mesa.high

Page 66: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Data:

Model: Bernoulli

(i.e. edges only)

This edge has an ESP value of 3

Goodness of fit measure 2: ESP distribution (local clustering)

NME Workshop 66

Page 67: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Data:

Model: Bernoulli

(i.e. edges only)

A/B have geodesic 2

A/C have geodesic ∞

AB

C

Goodness of fit measure 3: geodesic distribution(global clustering)

NME Workshop 67

Page 68: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

faux.mesa.high ~ edges

degree edgewise shared partner geodesic

Goodness of fit measures assembled

NME Workshop 68

Summary: Not a good fit to any of the aggregate structural properties observed

Page 69: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Fit a model with gwesp

69

Estimate, save and assess this model:

NME Workshop

faux.mesa.high ~ edges + gwesp(0.25, fixed = TRUE)

Save this model too.

This is a dyad dependent model

It converges (unlike the triangle model)

It is fit with MCMC

So we should check the MCMC diagnostics

Page 70: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Run the MCMC diagnostics for this model

70

Go to the MCMC diagnostics tab

NME Workshop

Select Model 2

Looks pretty good

Page 71: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Run the Goodness of Fit

71

edge-wise shared partnersdegree minimum geodesic distance

NME Workshop

faux.mesa.high ~ edges + gwesp(0.25, fixed = TRUE)

Much better, though the ESP distribution fit isn’t great

Page 72: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

And, a quick eyeball test…

NME Workshop 72

So, back to our original question:How much of the clustering is due to homophily, and how much to transitivity?

The global structure looks kinda similar now, ... But something is not right

Observed network Network simulated from model*

* We’ll get to simulation in just a bit

Page 73: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Test this by comparing four models

Model Network Statistics g(y)

Edges # of edges

Edges + GWESP

(transitivity)

# of edges

weighted shared partners

Edges + Attributes

(homophily)

# of edges

# of edges for each race, sex, grade

# of edges that are within-race, within-grade, within-sex

Edges + Attributes + GWESP

(both)

# of edges

# of edges for each race, sex, grade

# of edges that are within-race, within-grade, within-sex

weighted shared partners

NME Workshop 73

Page 74: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Fitting and saving models

NME Workshop 74

statnetWeb allows you to save up to 5 models – we’ll fit 4 (you can cut and paste from here to statnetWeb):

1. edgesFit model, save model, reset formula

2. edges + gwesp(0.25, fixed = T)Fit model, save model, reset formula

3. edges + nodefactor("Grade") + nodefactor("Race") + nodefactor("Sex") + nodematch("Grade", diff = T) + nodematch("Race", diff = F) + nodematch("Sex", diff = F) Fit model, save model, reset formula

4. edges + nodefactor("Grade") + nodefactor("Race") + nodefactor("Sex") + nodematch("Grade", diff = TRUE) + nodematch("Race", diff = FALSE) + nodematch("Sex", diff = FALSE) + gwesp(0.25, fixed = TRUE)Fit model, save model

You’ve already fit and saved these

Page 75: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Model Comparison

NME Workshop 75

Note how the gwespestimate changes from model 2 to 4

About 25% smaller

That’s the impact of controlling for attribute effects, including homophily

Homophilyestimates change also, once you control for transitivity

Page 76: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

GOF comparison for all 4 models:

1. Edges

AIC: 2288

3. Edges + Attributes

AIC: 1809

2. Edges + GWESP

AIC: 1999

4. Edges + Attributes +

GWESP

AIC: 1648

76NME Workshop

This will take some time to run

Page 77: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Summary

NME Workshop 77

Both transitivity and homophily play a role in clustering these friendships• Homophily accounts for the distribution of path lengths (geodesics)

• Transitivity (Triadic closure)

• Accounts for the large number of isolates

• Captures the local clustering (ESP) reasonably well

• ~25% of the transitivity effect is a by-product of homophily

• The gwesp coefficient drops by ~25% when homophily is added to the model

The GOF suggests the ESP distribution is still not well fit• You could tinker some more, if this was a real research question

• But we’ll move on…

Page 78: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Simulating networks from the model

NME Workshop 78

A fitted model describes a probability distribution across all networks of this size

The model assigns a probability to every possible network

The model terms and the estimated coefficients make some networks more likely than others

You can simulate networks from this distribution

Using the same MCMC algorithm that was used for estimation

And the simulated networks will be centered on the network statistics in the original observed network

This, of course, is why these models are really useful for network epidemiology

Page 79: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Simulations

NME Workshop 79

Choose one of the models that you have saved and run 100 simulations with the default control settings

Choose the model on the Simulations page next to “ergm formula”

Do you see autocorrelation in the simulation statistics?

Increase the MCMC interval to 10,000 and re-run the simulations to see how this changes the autocorrelation

Page 80: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Some common statistics in ergms

NME Workshop 80

Term Formula Unit Value(s)

~edges # of edges edges 8

~nodefactor(“color”) Sum of degrees for nodes of each color nodes/edges* [8,] 6, 2

~nodefactor(“color”,

base=2)

Sum of degrees for nodes of each color nodes/edges* 8, [6,] 2

~nodematch(“color”) # of edges between nodes of same color edges 6

~nodematch(“color”, diff

= TRUE)

# of edges between nodes of same color, for each color

edges 3, 2, 1

undirected network of 10 nodes, including nodal attribute “color”, with values:

1=black,

2=red,

3=green

Page 81: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Some common statistics in ergms

NME Workshop 81

Term Formula Unit Value(s)

~nodemix(“color”, base=1) # of edges between nodes of each color combo edges [3,] 2, 2, 0, 0, 1

~degree(0) # of nodes of degree 0 nodes 2

~degree(2:5) # of nodes of degrees 2, 3, 4, 5 each nodes 1, 2, 1, 0

~concurrent # of nodes of at least degree 2 nodes 4

undirected network of 10 nodes, including nodal attribute “color”, with values:

1=black,

2=red,

3=green

Page 82: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Some common statistics in ergms

NME Workshop 82

Term Formula Unit Value(s)

~triangle # of triangles (beware!) triangles 2

~gwesp(0) # of edges in at least one triangle edges 5

~gwesp(∞) # of edges in triangles total (=3 * # triangles) triangles 6

undirected network of 10 nodes, including nodal attribute “color”, with values:

1=black,

2=red,

3=green

Page 83: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Next

NME Workshop 83

Collecting network data

There are different types

And the type you collect has implications for the statistical methods that you can use

Page 84: DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness,

Selected References

NME Workshop 84

Journal of Statistical Software (v42) 2008 – Eight papers on the statnet software; covering theory, algorithms and usage

Goodreau, S., et al. (2009). "Birds of a Feather, or Friend of a Friend? Using Statistical Network Analysis to Investigate Adolescent Social Networks." Demography 46(1): 103–125.

Handcock MS. (2003) Assessing Degeneracy in Statistical Models of Social Networks. CSSS working paper 39. https://www.csss.washington.edu/research/working-papers/39

Hunter DR. Curved Exponential Family Models for Social Networks. (2007) Social networks. 29(2):216-30. doi: 10.1016/j.socnet.2006.08.005. PubMed PMID: PMC2031865.

Hunter DR, Handcock MS. Inference in Curved Exponential Family Models for Networks. (2006) Journal of Computational and Graphical Statistics. 15(3):565-83. doi: 10.1198/106186006X133069.

Hunter DR, Goodreau SM, Handcock MS. Goodness of fit of social network models. (2008) J Am Stat Assoc. 103(481):248-58. doi: 10.1198/016214507000000446. PubMed PMID: WOS:000254311500029.