Using the Social Network Data From Add Health 2000 Add Health Users Workshop August 1 & 2 Bethesda...

Using the Social Network Data From Add Health

2000 Add Health Users WorkshopAugust 1 & 2

Bethesda Maryland

James Moody

•Introduction: What and Why•Levels of Network Data•Composition & Pattern•Networks on both sides of the equation

•Network Data structures•Adjacency Matricies•Adjacency Lists

•Network Analysis Programs

•Network Data in Add Health•In School Friendship Nominations•In Home Friendship Nominations

•Constructing Networks•Total Networks•Local Networks•Peer Groups

•Analyses Using Networks•Networks as dependant variables•Networks as independent variables

Levels of Network Data

ego

Best Friends

ego

Local Network

Peer Group

ego

The Social Structure of “Countryside” School District

Points Colored by Grade

9th

10th

11th

7th

8th

12th

The Social Structure of “Countryside” School District

Points Colored by Race White

Black

Mixed/Other

Measuring Network ContextPatterns

Pattern measures capture some feature of the distribution of relations across nodes in the network. These include:

•Density: % of all possible ties actually made•Reciprocity: likelihood that given a tie from i to j there will also be a tie from j to i. •Transitivity: extent to which friends of friends are aslo friends•Hierarchy: Is there a status order to nominations? How is it patterned?•Clustering: Are there significant groups? How so?•Segregation: Do attributes (such as race) and nominations corespond?•Distance: How many steps separate the average pair of persons in the school? Is this larger or smaller than expected?•Block models: What is the implied role strucutre underlying patterns of relations?

These features (usually) require having nomination data from each person in the network.

Measuring Network ContextComposition

Compostion measures capture characteristics of the population of people within a given network level. These include:

•Heterogeneity: How dispersed are actors with respect to a given attribute?•Means: What is the mean GPA of ego’s friends? How likely is it that most of ego’s friends will go to college?•Dispersion: What is the age-range of people ego hangs out with?

These features can often be measured from the simple ego network.

Analysis with Social Network data

Networks as Dependant Variables•Interest is in explaining the observed patterns of relations.•Examples:

•Why are some schools segregated and others not?•What accounts for differences in hierarchy across schools?•What accounts for homophily in friendship choice?

•Tools:•Descriptive tools to capture properties•Standard analysis tools at the level of networks to explain the measures•p* and other specialized network statistical and simulation models

Networks as independent Variables•Interest is in explaining behavior with network context (Peer influence/ context models)•Examples:

•Is ego’s probability of smoking related to the smoking levels of those he/she hangs out with? (compositional context)•Is the transition to first intercourse affected by the peer context? •Are isolated students more likely to cary weapons to school than those in dense peer groups? (positional context)

•Tools:•Depends on dependant variable•Peer influence models•Dyad models•Contextual models, with network level as nested context (students within peer groups)

Analysis with Social Network data

Network Data Structures

1 2

3

5 4

GraphAdjacency Matrix

Arc ListNode ListSend

11234444555

Recv23421235134

Network Analysis Programs

1) UCI-NET•Genearl Network analysis program, runs in Windows•Good for computing measures of network topography for single nets•Input-Output of data is a little clunky, but workable.•Not optimal for large networks•Availiable from:

Analytic [email protected]

2) STRUCTURE •“A General Purpose Network Analysis Program providing Sociometric Indices, Cliques, Structural and Role Equivalence, Density Tables, Contagion, Autonomy, Power and Equilibria In Multiple Network Systems.”•DOS Interface w. somewhat awkward syntax•Great for role and structural equivalance models•Manual is a very nice, substantive, introduction to network methods•Availiable from a link at the INSNA web site:

http://www.heinz.cmu.edu/project/INSNA/soft_inf.html


3) NEGOPY•Program designed to identify cohesive sub-groups in a network, based on the relative density of ties.•DOS based program, need to have data in arc-list format•Moving the results back into an analysis program is difficult.•Availiable from:

William D. Richardshttp://www.sfu.ca/~richards/Pages/negopy.htm

4) PAJEK •Program for anlayzing and plotting very large networks•Intuitive windows interface•Used for all of the real data plots in this presentation•Mainly a graphics program, but is expanding the analytic capabilities•Free•Availiable from:


5) SPAN - Sas Programs for Analyzing Networks (Moody, ongoing)•is a collection of IML and Macro programs that allow one to:

a) create network data structures from the Add Health nominationsb) import/export data to/from the other network programsc) calculate measures of network pattern and compositiond) analyze network models

•Allows one to work with multiple, large networks•Easy to move from creating measures to analysing data•All of the Add Health data are already in SAS•Availiable by sending an email to:

[email protected]

Network Data Collected in Add HealthIn -School Network Data

•Complete Network Data collected in every school•Each student was asked to name up to 5 male and 5 female friends•These data provide the basic information needed to construct network context measures.•Due to response rates, we computed data on 129 of the 144 total schools.•Variable is named MF<#>AID form male friend, FF<#>AID for female friends.


Nomination Categories:•Matchable People Inside Ego’s School or Sister School

•People who were present that dayID starting with 9 and are in the sample

•People who were absent that dayID starting with 9, but not in the school sample

•People in ego’s school, but not on the directoryNomination appears as 99999999

•People in ego’s sister school, but not on the directorNomination appears as 88888888

•People not in ego’s school or the sister schoolNomination appears as 77777777

•Other Special Codes•Nominations Appears as 99959995

Nominator Categories•Matchable Nominator

Person who was on the roster, ID starts is 9.•Unmatchable Nominator

Person who was NOT on the roster, ID starts with 5 or 8


Tie Accounts TableMatchableSampled

Matchable Not-Sampled

In School,Not-On Roster

Out of School(special Codes)

MatchableSampled

Full Informationon this cell.

Will appear asnominations to9999. Non-matched peoplecan send, but notreceive ties.

Will appear as9999 or 8888nominations

Will appear as7777 nominations

MatchbleNot-Sampled

Missing data Missing Data Missing Data Missing Data

In school, Noton Roster

Validnominating data.

9999s or 8888s 9999s or 8888s 7777s

Out of School Missing data Missing Data Missing Data Missing Data


Example 1. Ego is a matchable person in the School

EgoM

M

M

M

Out

Un

True Network

EgoM

M

M

M

Out

Un

Observed Network

UnOut


Example 2. Ego is not on the school roster

M

M

M

M

M

Un

True Network

M

M

M

M

M

Un

Observed Network

Un

Un

Un


Characteristics of the Add Health School Sample

All Schools Schools w. network dataSample Characteristics

Number of schools 144 129Number of students 90,118 75,871

School TypePublic 89.6% 89.9%Private 10.4 10.1

Grade RangeJunior High School 40.6% 40.3%High School 43.4 43.47 - 12 16.1 16.3

Region***

West 19.4% 15.5%Midwest 22.9 24.0South 40.9 42.6North East 16.7 17.8

Demographic Characteristics% of schools >70% single race 52.7% 55%Family SES 6.03 6.02

Behavioral CharacteristicsSmoke Regularly 14.4% 14.7%Sexually active 32.3 32.9Expect to go to College 76.2 76.3

Active in school activities*p<.05, ** p<=.01, ***p<=.001.

Local - Network Characteristics (Std. Dev. in parentheses)Same Sex Cross Sex

Total Male Female Male: Female Female: MaleIn-school nominationsa 5.68

(3.45)3.08

(1.98)3.57

(1.74)2.19

(2.08)2.54

(1.95)

Out-of-school nominations 1.04(1.87)

0.42(0.98)

0.45(0.93)

0.42(1.09)

0.78(1.28)

Local network densitya 0.18(0.19)

0.22(0.24)

.26(.26)

.19(.25)

.15(.23)

Reciprocity rateb 0.40(0.30)

0.40(0.35)

0.51(0.34)

0.29(0.35)

0.27(0.34)

7th - 8th grade 0.36(0.29)

0.38(0.35)

0.46(0.33)

0.23(0.33)

0.20(0.30)

9th - 10th grade 0.38(0.30)

0.39(0.35)

0.52(0.34)

0.25(0.33)

0.26(0.34)

11th - 12th grade 0.45(0.31)

0.43(0.36)

0.56(0.34)

0.37(0.37)

0.32(0.36)

a) Includes nominations to people not sampledb) Proportion of ego's nominations that are reciprocated


Network Data Collected in Add HealthIn -Home Network Data

•Network Data were collected in both Wave1 and Wave 2 Surveys•There were two procedures:

•Saturated Settings•Attempted to survey every student from the In-School sample.•2 large schools, and 10 small schools.•Was supposed to replicate the in-school design exactly.

•Unsaturated Settings•Each person was only asked to name one other person

•In both cases, the design was not always carried out. As such, some of the students in the saturated settings were alowed to name only one male and one female friend, while some students who were in the non-saturated settings were asked to nominate a full slate of 5 and 5.


Data Usage Notes:•Romantic Relation Overlap

For the W1 and W2 friendship data, any friendship that was also a romantic relation was recoded to 55555555, to protect the romantic relation nominations.

•Bad Machine on Wave 2 DataData on from one school in wave 2 seems to be corrupted. We have no way to show this for certain, but it seems to be the case that data from machines 200065 or 200106 gave incorrect data. We suspect this is so, because almost everyone who used these two machines “nominated” the same person multiple times. This results in one person having an abnormally large in-degree.

•All nomination #s are now valid•Unlike the in-school data, Ids starting with something other than ‘9’ can be nominated.

•Same out-of-sample special codes•All other special codes for these data are the same as in the in-school data.


Descriptive Statistics for Saturated Settings

Constructing Network MeasuresTotal Network

To construct the social network from the nomination data, we need to integrate each person’s nominations with every other nomination.

Methods:1) Export the Nomination data to construct network in other program

MOST of the other programs require you to pre-process the data a great deal before they can read them. As such, it is usually easier to create the files in SAS first, then bring them into UCINET or some such program.

2) Construct the network in SASThe best way to do this is to combine IML and the MACRO language. SAS IML

lets you work with matricies in a (fairly) strait forward language, the SAS MACRO language makes it easy to work with all of the schools at once.

Programs already set up to do this are availiabel in SPAN.

Constructing Network MeasuresAdjacency Matricies

The key to analyzing / measuring the total network is constructing either an adjacency matrix or an adjacency list. These data structures allow you to directly identify both the people ego nominates and the people that nominate ego. Thus, the first step in any network analysis will be to construct the adjacency matrix.

To do this you need to:1) Identify the universe of possible people in the network. This is usually the same as the set of people that you have sampled. However, if you want to include ties to non-sampled people you may make the universe include all people named by anyone.

2) create a blank matrix with n rows and n columns.

3) loop over all respondents, placing a value in the column that corresponds to the persons they nominate. This can be binary (named or not) or valued (number of activities they do with alter).

Constructing Network MeasuresTotal Network

Data for 12th grade males in a small school.

Constructing Network MeasuresTotal Network Program for creating a network and exporting it to PAJEK

0 proc iml;1 %include 'c:\moody\sas\programs\modules\adj.mod';2 %include 'c:\moody\sas\programs\modules\pajwrite.mod';3 %include 'c:\moody\sas\programs\modules\pajpart.mod';4 use work.d;5 read all var{aidr} into id;6 read all var{mf1aid mf2aid mf3aid mf4aid mf5aid} into noms;7 adjmat=adj(id,noms); /* adj(*) is a pre-programed module */8 adj_id=adjmat[,1];9 insamp=j(nrow(adj_id),1,0); /* identify people who are also in the

sub-sample */10 do i=1 to nrow(insamp);11 iloc=loc(id=adj_id[i]);12 if type(iloc)='N' then do;13 insamp[i]=1;14 end;15 free iloc;16 end;

17 adjmat=adjmat[,2:ncol(adjmat)];18 file 'c:\moody\conferences\add_health\ptp15_paj.net';19 call pajwrite(adjmat,adj_id,2);20 file 'c:\moody\conferences\add_health\ptp15_paj.clu';21 call pajpart(insamp);22 quit;

Constructing Network MeasuresTotal Network Resulting network as displayed by PAJEK.

Senior Male subsample in Red

Constructing Network MeasuresLocal Networks.

•To create and calculate measures based only on the people ego nominates, you can work directly from the nomination list (don’t need to construct the adjacency matrix).

•To create and calculate measures based on the received or reciprocated ties, you need to have a list of people who nominate ego, which is easiest to get given the adjacency matrix.

•To calculate positional measures (density, reciprocity, etc.) all you need is the nomination data.

•To calculate compositional data, you need both the nomination data and matching attribute data.

Constructing Network MeasuresLocal Networks. An example network:

All senior males from a small (n~350) public HS.Adjacency Matrix


You need to:•Construct a dataset with

(a) ego's id (aid*1 - make it a number instead of a character), (b) age of each person, (c) the friendship nominations variables.

•Write a macro that loops over each community/School

•For each community, doa) Identify ego's friendsb) Identify their agec) compare it to ego's aged) count it if it is greater than ego's.

An example SAS program to do this is in the handouts.

Example 2: Suppose you want to identify ego’s friends, calculate what proportion of ego’s female friends are older than ego, and how many male friends they have (this example came up in a model of fertility behavior).

Constructing Network MeasuresPeer Groups.

Identifying cohesive peer groups requires first specifying what a cohesive peer group is. Potential defintions could be:

a) all people within k steps of ego (extended ego-network)b) a set of people who interact with each other often (relative density)c) a set of people with a particular pattern of ties (a closed loop, for example)

UCINET, STRUCTURE, NEGOPY and SPAN all provide methods for identifying cohesive groups. They all differ on the underlying definition of what constitutes a group. The FACTIONS algorithm in UCINET and NEGOPY’s algorithm use relative density. The CROWD algorithm is SPAN uses a combination of relative density and pattern.

Once you have constructed the adjacency matrix, you can export to these other programs fairly easily. However, most of them are QUITE time consuming (FACTIONS, for example, is a bear) and take a good deal of time to run, so be sure you have identified exactly what you want before you start processing….

Constructing Network MeasuresPeer Groups Characteristics.

Identifying Cohesive Sub-Groups

•Cohesion: The group is difficult to separate; the connection of the group does not depend on one relation or person.

•Groupness: Relative to the rest of the network, a cohesive sub - group has high relational volume.

• Inclusion: Some people are not in groups while others bridge groups.

Examples of Peer groups within Add Health High SchoolsCrowds Algorithm

Observed Clustering within Adolescent Social Networks

• On average, 65% of a school’s adolescents are in

cohesive sub-groups.• 87% of all relations are within sub-groups.• The average sub-group has 22 members.• The average diameter for a sub-group is 3 steps. • The mean segregation index is .96 (1=Complete,

0=Random)

Network Characteristics of Sub Groups

Observed Clustering within Adolescent Social NetworksDistribution of Characteristic within groups, relative to school distribution

Grade

34%

Race

65%

College

84%

GPA

86%

Activities

79%

Smoking

74%

Groups 23 & 24 Group 1

Group 15 Group 18

Constructing Network DataSchool Level

2

4

30

13

16

3

1

20

7

24

5

19

17

27

1810

15

23

25

14 31

12

21

Mostly Seniors

Mostly Juniors

Mostly Sophomores

Mostly Freshmen

Mixed Grades

Directed Arrow

Constructing Network DataSchool Level

Inter-Group Relations

Sa

me

Ra

ce F

rie

nd

ship

Pre

fere

nce

(b

1)

Racial Heterogeneity

.1 .8

-.2

1.6

.3 .6

.4

1.0

Countryside h.s.

Same race friendship preferenceby racial heterogeneity

Analysis Using Network DataNets as Dependent Variable: Racial Segregation

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

Same Race

SES

GPA

Both Smoke

College

Drinking

FightReciprocity

Same Sex

Same Clubs

Transitivity

Intransitivity

Same Grade

Analysis Using Network DataNets as Dependent Variable: Modeling the network

Network Model Coefficients, In school Networks

RegulationLow High

Low

High

Anomic Altruistic

Egotistic Fatalistic

Relational Structures and Forms of Suicide

Integration

Analysis Using Network DataNets as Independant Variable: Suicide

Measuring Isolation and Anomie.

Ego

Alter

Third( )

Peer Anomie

Intransitivity

Isolation

School


Effect of Friendship Structure on Suicidal ThoughtsNet of demographic, family, school, religion and personal characteristcs.

Males FemalesOR 95% CI OR 95% CI

Network Isolation 0.665 (0.307 - 1.445) 2.010 (1.073 - 3.765) Intransitivity Index 0.747 (0.358 - 1.558) 2.198 (1.221 - 3.956) Friend Attempted Suicide 2.725 (2.187 - 3.395) 2.374 (2.019 - 2.791) Trouble with People 0.999 (0.912 - 1.095) 1.027 (0.953 - 1.106)


Analysis Using Network DataNets as Independant Variable: Weapons

15.7113.92

19.45

11.13

27.67

3.23

10.29

8.08

3.52

8.51

0

10

20

30

White Black Hispanic Asian Mix / Other

Per

cent

By Race and Gender

Male

Female

16.3% of American adolescent males and 5.14% of adolescent females report ever bringing weapons to school.

14.13

4.53

16.2

4.93

18.7

7.43

0

5

10

15

20

Males Females

Member of a groupBridges groups Not a group member

By position in the school friendship network

Analysis Using Network DataNets as Independant Variable: Weapons

Analysis Using Network DataNets as Independant Variable: Sexual Debut

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

76-100%51-75%26-50%1-25 %0 %

The Effect of Peer Group Composition on Sexual Debut*

Proportion of High-Risk Adolescents in Peer Group

N=380 N=1898 N=2026 N=660 N=88

*Probability of experiencing sexual debut during the 18 months following the in-school survey. Controlling for age, socio-demographic characteristics, family and peer group characteristics (see table A1, model 6). Bearman and Bruckner, 199

Est

imat

ed P

roba

bili

ty o

f S

exua

l Deb

ut

0.00

0.05

0.10

0.15

0.20

76-100 %51-75%26-50%1-25 %0 %no friends

The Effect of Close Friends' Risk Status on Pregnancy Risk*

Proportion of Low-Risk Male and Female Close Friends

N=308 N=932 N=100 N=517 N=550 N=427

*Probability of experiencing a pregnancy during the 18 months following the in-school survey. Controlling for age, socio-demographic and individual characteristics, family characteristics, and popularity (see table B1, model 3), Bearman and Brukner 1999.

Est

imat

ed P

roba

bili

ty o

f P

regn

ancy

Analysis Using Network DataNets as Independant Variable: Pregnancy

Using the Social Network Data From Add Health 2000 Add Health Users Workshop August 1 & 2 Bethesda...

Documents

Transcript of Using the Social Network Data From Add Health 2000 Add Health Users Workshop August 1 & 2 Bethesda...