Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’?...

88
Social Sub-groups Overview Background: •How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto •A discussion of how action is situated within and between social groups. Nice application of a group-detection algorithm on interesting data. Linton Freeman •UC-Irvine. Long-standing editor of the journal Social Networks. Writes today on the theoretical necessities of a ‘group’. Rise of StatMath: Modularity fro Newman, Porter, Mucha Methods: Algorithms & Measures

Transcript of Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’?...

Page 1: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Social Sub-groupsOverview

Background:•How do we characterize the social structure of a ‘group’?

Exemplar: Ken Frank and Jeffrey Yasumoto•A discussion of how action is situated within and between social groups. Nice application of a group-detection algorithm on interesting data.

Linton Freeman•UC-Irvine. Long-standing editor of the journal Social Networks. Writes today on the theoretical necessities of a ‘group’.

Rise of StatMath: Modularity fro Newman, Porter, Mucha

Methods: Algorithms & Measures

Page 2: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Social Sub-groups

Frank & Yasumoto: Action and Structure

“...subgroups may define the essential components that contextualize actors’ social ties and relations.”

The predominance of subgroups in the literature,

“...leaves unanswered how and why rational actors simultaneously sustain their subgroups and the linkages between them.”

Page 3: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Social Sub-groups

Frank & Yasumoto: Action and Structure

They argue that actors seek social capital, defined as the access to resources through social ties, and emphasize two mechanisms:

a) Reciprocity Transactions Actors seek to build obligations with others, and thereby gain in the ability to extract resources.

b) Enforceable Trust “Social capital is generated by individual members’ disciplined compliance with group expectations.” An indirect, group level effect, that comes through the judicious non-use of negative action. (p.646)

Page 4: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Social Sub-groups

Frank & Yasumoto: Action and Structure

They expect to find evidence of enforceable trust within social subgroups and evidence of reciprocity between such groups.

To do so, they must identify primary subgroups within the network. They do so using a density based criterion. Frank’s algorithm iteratively assigns nodes to subgroups until a parameter that maximizes in-group density is reached. Basic model is:

logit(Yij)= + ij

Seek to find an assignment of nodes to groups (g) that maximizes fit. This results in a ‘block diagonal’ adjacency matrix, where most of the ties fall along the diagonal.

Page 5: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Relations among the French Financial Elite (as drawn by F&Y)

Group-weighted MDS

Relations within group are weighted heavier than between to generate this picture:

Page 6: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Relations among the French Financial Elite (as drawn by F&Y)

Treat all edges equal, get a somewhat less clear pattern:

Page 7: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

# of ties: n A B C Dn 2 5 2 2 1A 5 42 8 1 4B 2 8 22 3 1C 2 1 3 8 1D 1 4 1 1 4

Density:

n 0.167 0.139 0.095 0.167 0.111A 0.139 0.467 0.127 0.028 0.148B 0.095 0.127 0.629 0.143 0.067C 0.167 0.028 0.143 0.667 0.111D 0.111 0.148 0.067 0.111 0.667

Relations among the French Financial Elite:Group to group density table

Page 8: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Relations among the French Financial Elite:

Given a subgroup structure, how do these groups relate to social capital? Enforceable trust: Look for acts of hostility.

A hostile act was any action on the part of one actor that would deprive another actor of access to resources.

Note that these were rare. Only 15 overall, likely indicating some level of cohesion in the system as a whole.

On the whole, they find that -- net of other focal features and direct ties -- being members of the same sub-group lowers the probability of a negative action between the dyad

Page 9: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Relations among the French Financial Elite:

They repeat the exercise with positive support.

They find that supportive actions are better predicted by friendship (reciprocity) than by subgroup membership.

They conclude that this supports the hypothesis that “the potential for enforceable trust within subgroups reduces the relative need to pursue social capital through reciprocity transactions within subgroups.” (p.647) Instead, they find that support occurs between subgroups.

Page 10: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Social Sub-groups

Lin Freeman: The sociological concept of “Group”

Focus on collectivities that are: “Relatively small, informal, and involve close personal ties.” What we would call “Primary Groups”

What (network) structure characterizes such a group?

Goal: Identify (a) non-overlapping groups that allow one to (b) identify internal group structure.

Page 11: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Social Sub-groups

Lin Freeman: The sociological concept of “Group”

Winship’s Model:

1) Assign people to equivalence classes that are hierarchically nested:

Page 12: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Social Sub-groups

Lin Freeman: The sociological concept of “Group”

To assign people to a class, you must first identify the strength of the relation between each pair. Winship’s model says that you define proximity based on interaction such that:

),min( 3)

2)

1 1)

yzxzxy

yxxy

xy

SSS

SS

yxS

Winship’s Model:

Page 13: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Social Sub-groups

Lin Freeman: The sociological concept of “Group”

In words, this means that whatever metric you define, a person is closer to themselves than to anyone else, that the relation be symmetric, and that triads be transitive (which, given the symmetric condition, means that they be complete).

You can then identify partitions by scaling the proximity, such that these three conditions are met.

Winship’s Model:

Page 14: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Social Sub-groups

Lin Freeman: The sociological concept of “Group”

A B C D E F G H I J KA . 5 5 4 4 4 4 3 3 3 3 B 5 . 5 4 4 4 4 3 3 3 3 C 5 5 . 4 4 4 4 3 3 3 3 D 4 4 4 . 5 5 5 3 3 3 3 E 4 4 4 5 . 5 5 3 3 3 3 F 4 4 4 5 5 . 5 3 3 3 3 G 4 4 4 5 5 5 . 3 3 3 3 H 3 3 3 3 3 3 3 . 5 5 5 I 3 3 3 3 3 3 3 5 . 5 5 J 3 3 3 3 3 3 3 5 5 . 5 K 3 3 3 3 3 3 3 5 5 5 .

Winship’s Model:

Page 15: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Social Sub-groups

Lin Freeman: The sociological concept of “Group”

total

{A-G} {H-K}

{A-C} {D-G}

Winship’s Model:

Page 16: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Social Sub-groups

Lin Freeman: The sociological concept of “Group”

Granovetter’s Model:

Proceed exactly as in Winship, but treat intransitivity differently when looking at strong or weak ties.

If x and y are strongly connected, and y and z are strongly connected, then x and z should be at least weakly connected.

Page 17: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

An example of a graph fitting the prohibition against G-intransitive relations.

Social Sub-groups

Lin Freeman: The sociological concept of “Group”

Granovetter’s Model:

Page 18: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Social Sub-groups

The Davis - “Old South” Example

Page 19: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Social Sub-groups

The Davis - “Old South” Example: Ties > 2

Page 20: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Social Sub-groups

The Davis - “Old South” Example: Ties > 3

Page 21: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Social Sub-groups

The Davis - “Old South” Example: Ties > 4

Meets the G-transitivity condition

Page 22: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Social Sub-groups

The Davis - “Old South” Example: Ties > 5

Stronger than the G-transitivity condition

Page 23: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Social Sub-groups

Lin Freeman: The sociological concept of “Group”

Freeman argues that the G-intransitivity model fits the data best for each of the 7 groups he studies.

Substantively, the types of groups this model predicts are very similar to those predicted by the general transitivity model, except re-cast as a valued relation.

Empirically, if you want to identify groups based on levels like this, you can use PAJEK and walk through the model in just the same way as we did with “Old South” or you can use UCI-NET (or program it, it’s not hard)

Page 24: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Methods: How do we identify primary groups in a network?

A) Classic graph theoretical methods: Cliques and extensions of cliques•Cliques•k-cores•k-plexes•Freeman (1992) Models•K-components (we talked about these already)

B) Algorithmic methods: search through a network trying to maximize for a particular pattern (I.e. like Frank & Yasumoto)

•Adjust assignment of actors to groups until a particular pattern of ties (block diagonal, usually) is identified.•Standard models:

- Factions (UCI-NET)- KliqueFinder (Frank)-RNM/CROWDS/JIGGLE (Moody)-Principle component analysis (PCA)-Flow models (MCL)-Modularity Maximization routines- General Distance & Clustering Methods

Page 25: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Methods: How do we identify primary groups in a network?

Graph Theoretical Models.

Start with a clique. A clique is defined as a maximal subgraph in which every member of the graph is connected to every other member of the graph. Cliques are collections of nodes where density = 1.0.

Properties of cliques:•Density: 1.0•Everyone connected to n-1 alters•Distance between every pair is 1•Ratio of within group ties to between group ties is infinite

•All triads are transitive

Page 26: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Methods: How do we identify primary groups in a network?

Graph Theoretical Models.

In practice, complete cliques are not very useful. They tend to overlap heavily and are limited in their size.

Graph theorists have thus relaxed the complete connectivity requirement (with varying degrees of success). See the Moody & White paper on cohesion for a discussion of many of these attempts.

Page 27: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Methods: How do we identify primary groups in a network?

Graph Theoretical Models.

k-cores: Every person connected to at least k other people.

Ideally, they would look something like this (here two 3-cores).

However, adding a single tie from A to B would make the whole graph a 3-core

Page 28: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Methods: How do we identify primary groups in a network?Graph Theoretical Models.

Extensions of this idea include:

K-plex: Every member connected to at least n-k other people in the graph (recall in a clique everyone is connected to n-1, so this relaxes that condition.

n-clique: Every person is connected by a path of N or less (recall a clique is with distance = 1).

N-clan: same as an n-clique, but all paths must be inside the group.

I’ve never had much luck with any of these methods empirically. Real data is usually too messy to work well. You should try them, and gain some intuition for yourself. The place to start is in UCINET.

Page 29: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Methods: How do we identify primary groups in a network?

UCINET will compute all of the best-known graph theoretic treatments for subgroups

Graph Theoretical Models.

Page 30: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Methods: How do we identify primary groups in a network?

Consider running different methods on a known group structure:

Graph Theoretical Models.

Page 31: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Methods: How do we identify primary groups in a network?Graph Theoretical Models.

Page 32: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Methods: How do we identify primary groups in a network?CliquesGraph Theoretical Models.

Page 33: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Methods: How do we identify primary groups in a network?

The only way to get something meaningful from this is to analyze the clique overlap matrix, which is what the “Clique by partion” dataset does, using cluster analysis

Cliques

Page 34: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Methods: How do we identify primary groups in a network?K-Cores

(See example, but in this case it works very poorly)

Page 35: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Methods: How do we identify primary groups in a network?n-Clique: (Everyone linked by a path of at least length n)

Page 36: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Methods: How do we identify primary groups in a network?n-Clique: (Everyone linked by a path of at least length n)

Page 37: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Methods: How do we identify primary groups in a network?n-Clan: (Everyone linked by a path of at least length n, but path is INSIDE group)

Page 38: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Methods: How do we identify primary groups in a network?K-plex: (each member of a K-plex of size N has N-K ties to other members)

Page 39: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Strategies for identifying primary groups: Search:

1) Fit Measure: Identify a measure of groupness (usually a function of the number of ties that fall within group compared to the number of ties that fall between group).2) Algorithm to maximize fit. Once we have the index, we need a clever method for searching through the network to maximize the fit. See: “Jiggle”, “Factions” etc.

Destroy:Break apart the network in strategic ways, removing the weakest parts first, what’s left are your primary groups. See “edge betweeness” “MCL”

Evade:Don’t look directly, instead find a simpler problem that correlates:Examples: Generalized cluster analysis, Factor Analysis, RM.

Methods: How do we identify primary groups in a network?

Page 40: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Segregation Index(Freeman, L. C. 1972. "Segregation in Social Networks." Sociological Methods and Research 6411-30.)

Freeman asked how we could identify segregation in a social network. Theoretically, he argues, if a given attribute (group label) does not matter for social relations, then relations should be distributed randomly with respect to the attribute. Thus, the difference between the number of cross-group ties expected by chance and the number observed measures segregation.

)(

)(

XE

XXESeg

Methods: How do we identify primary groups in a network?Search: Optimize a partition to fit

Page 41: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Consider the (hypothetical) network below. There are two attributes in this network: people with Blue eyes and Brown eyes and people who are square or not (they must be hip).

Methods: How do we identify primary groups in a network?Search: Optimize a partition to fit

Page 42: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Segregation Index

Mixing Matrix:

Blue Brown

Blue 6 17

Brown 17 16

Hip Square

Hip 20 3

Square 3 30

Seg = -0.25

Seg = 0.78

Methods: How do we identify primary groups in a network?Search: Optimize a partition to fit

Page 43: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Segregation Index

One problem with the segregation index is that it is not ‘margin free.’ That is, if you were to change the distribution of the category of interest (say race) by a constant but not the core association between race and friendship choice, you can get a different segregation level.

One antidote to this problem is to use odds ratios. In this case, and odds ratio tells us the relative likelihood that two people in the same category will choose each other as friends.

Methods: How do we identify primary groups in a network?Search: Optimize a partition to fit

Page 44: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

The second problem is that the Segregation index has no clear maximum – if every node is assigned to a single group the value can be higher than if everyone is assigned to the “right” group. This means you can’t just keep adjusting nodes until you see a best fit, but instead have to look for changes in fit.

The modularity score solves this problem by re-organizing the expectation in a way that forces the value to 0 if everyone is in a single group.

Methods: How do we identify primary groups in a network?Search: Optimize a partition to fit

Page 45: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

We can also measure the extent that ties fall within clusters with the modularity score:

Where:s indexes clusters in the networkls is the number of lines in cluster sds is the sum of the degrees of sL is the total number of lines

s

ss

L

d

L

lM

2

2

M has the advantage of going to 0 if there is only 1 group, which means maximizing the score is sensible

Methods: How do we identify primary groups in a network?Search: Optimize a partition to fit

Page 46: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Modularity Scores Comparison to Segregation Index – comparing values for known solutions

Modularity Score Plotted against Segregation Index for various nets

Methods: How do we identify primary groups in a network?Search: Optimize a partition to fit

Page 47: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Number of groups

In-group Density

Methods: How do we identify primary groups in a network?Search: Optimize a partition to fit

Page 48: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

•Factions in UCI-NET•Multiple options for the exact factor maximized. I recommend either the density or the correlation function, and I would calculate the distance in each case.

•Frank’s KliqueFinder (the AJS paper we just read) •Moody’s crowds / Jiggle has elements of this•Generalized blockmodel in PAJEK•iGraph (R) has a couple that see this sort

Methods: How do we identify primary groups in a network?Search: Optimize a partition to fit

Page 49: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Factions in UCI-NET

Methods: How do we identify primary groups in a network?Search: Optimize a partition to fit

Page 50: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Factions in UCI-NET

Page 51: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Factions in UCI-NET

Page 52: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Factions in UCI-NET

Reduced BlockMatrix

1 2 3 4 5 6

-- -- -- -- -- --

1 59 1 2 14 1 0

2 1 54 0 1 12 2

3 1 2 55 0 1 12

4 9 1 1 51 0 0

5 0 12 2 0 62 1

6 1 0 9 2 0 64

Fit perfectly

Page 53: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

UCINETBiggest drawbacks of FACTIONS are:

A) SLOWB) Have to specify the number of groups.

Methods: How do we identify primary groups in a network?Search: Optimize a partition to fit

Page 54: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

There are many similar approaches, my sense is the best approaches use a combination of strategies.

- CROWDS: Starts w. an RNM based clusterig, then shifts nodes to maximize fit. Includes sub-loops to merge & resplit groups.

-JIGGLE: Starts w. a PCA on a weighted matrix, then proceeds as in CROWDS (without the group splitting trick).

-“Generalized Blockmodel” in PAJEK – uses a simulated aneallig procedure to try to directly maximize fit.

Methods: How do we identify primary groups in a network?Search: Find a “cheap” indicator, and cluster/optimize that

Page 55: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

PAJEK – Generalized Blockmodel

Page 56: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

PAJEK – Generalized Blockmodel

Page 57: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

PAJEK – Generalized Blockmodel

Fits fine, but it’s slow!

Page 58: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

R – “Fast Greedy”

This is a direct optimization of Modularity

Page 59: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Cluster analysis

In addition to tools like FACTIONS, we can use the distance information contained in a network to cluster observations that are ‘close’ to each other. In general, cluster analysis is a set of techniques that allows you to identify collections of objects that are simmilar to each other in some degree.

A very good reference is the SAS/STAT manual section called, “Introduction to clustering procedures.” (http://wks.uts.ohio-state.edu/sasdoc/8/sashtml/stat/chap8/index.htm)

(See also Wasserman and Faust, though the coverage is spotty).

We are going to start with the general problem of hierarchical clustering applied to any set of analytic objects based on similarity, and then transfer that to clustering nodes in a network.

Methods: How do we identify primary groups in a network?Evade: Find a “cheap” indicator, and cluster/optimize that

Page 60: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Cluster analysis

Imagine a set of objects (say people) arrayed in a two dimensional space. You want to identify groups of people based on their position in that space.

How do you do it?

How Cool you are

How

Sm

art y

ou a

re

Page 61: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Start by choosing a pair of people who are very close to each other (such as 15 & 16) and now treat that pair as one point, with a value equal to the mean position of the two nodes.

x

Methods: How do we identify primary groups in a network?Evade: Find a “cheap” indicator, and cluster/optimize that

Page 62: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Now repeat that process for as long as possible.

Methods: How do we identify primary groups in a network?Evade: Find a “cheap” indicator, and cluster/optimize that

Page 63: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

This process is captured in the cluster tree (called a dendrogram)

Methods: How do we identify primary groups in a network?Evade: Find a “cheap” indicator, and cluster/optimize that

Page 64: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

As with the network cluster algorithms, there are many options for clustering. The three that I use most are:

•Ward’s Minimum Variance -- the one I use almost 95% of the time•Average Distance -- the one used in the example above•Median Distance -- very similar

Again, the SAS manual is the best single place I’ve found for information on each of these techniques.

Some things to keep in mind:Units matter. The example above draws together pairs

horizontally because the range there is smaller. Get around this by standardizing your data.

This is an inductive technique. You can find clusters in a purely random distribution of points. Consider the following example.

Methods: How do we identify primary groups in a network?Evade: Find a “cheap” indicator, and cluster/optimize that

Page 65: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

data random; do i=1 to 20; x=rannor(0); y=rannor(0); output; end;run;

The data in this scatter plot are produced using this code:

Cluster analysis

Methods: How do we identify primary groups in a network?Evade: Find a “cheap” indicator, and cluster/optimize that

Page 66: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Cluster analysis Resulting dendrogram

Methods: How do we identify primary groups in a network?Evade: Find a “cheap” indicator, and cluster/optimize that

Page 67: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Cluster analysisResulting cluster solution

Page 68: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Cluster analysis

Cluster analysis works by building a distance matrix between each pair of points. In the example above, it used the Euclidean distance which in two dimensions is simply the physical distance between the points in a plot.

Can work on any number of dimensions.

To use cluster analysis in a network, we base the distance on the path-distance between pairs of people in the network.

Consider again the blue-eye hip example:

Methods: How do we identify primary groups in a network?Evade: Find a “cheap” indicator, and cluster/optimize that

Page 69: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Cluster analysis

Distance Matrix0 1 3 2 3 3 4 3 3 2 3 2 2 1 11 0 2 2 2 3 3 3 2 1 2 2 1 2 13 2 0 3 2 4 3 3 2 1 1 1 2 2 32 2 3 0 1 1 2 1 1 2 3 3 3 2 13 2 2 1 0 2 1 1 1 1 2 2 3 3 23 3 4 1 2 0 1 1 2 3 4 4 4 3 24 3 3 2 1 1 0 2 2 2 3 3 4 4 33 3 3 1 1 1 2 0 1 2 3 3 4 3 23 2 2 1 1 2 2 1 0 1 2 2 3 3 22 1 1 2 1 3 2 2 1 0 1 1 2 2 23 2 1 3 2 4 3 3 2 1 0 1 2 2 32 2 1 3 2 4 3 3 2 1 1 0 1 1 22 1 2 3 3 4 4 4 3 2 2 1 0 2 21 2 2 2 3 3 4 3 3 2 2 1 2 0 11 1 3 1 2 2 3 2 2 2 3 2 2 1 0

Methods: How do we identify primary groups in a network?Evade: Find a “cheap” indicator, and cluster/optimize that

Page 70: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

The distance matrix implies a space that nodes are embedded within. Using something like MDS, we can represent the space implied by the distance matrix in two dimensions. This is the image of the network you would get if you did that.

Methods: How do we identify primary groups in a network?Evade: Find a “cheap” indicator, and cluster/optimize that

Page 71: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Cluster analysisWhen you use variables, the cluster analysis program generates a distance matrix. We can, instead use the network distance matrix directly. If we do that with this example network, we get the following:

Page 72: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Cluster analysis

Page 73: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Cluster analysis

In SAS you use two commands to get a cluster analysis. The first does the hierarchical clustering. The second analyzes the cluster output to create the tree.

Example 1. Using variables to define the space (like income and musical taste):

proc cluster data=a method=ave out=clustd std;var x y;id node;run;

proc tree data=clustd ncl=5 out=cluvars;run;

Page 74: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Cluster analysisExample 2. Using a pre-defined distance matrix to define the space (as in a social network).You first create the distance matrix (in IML), then use it in the cluster program.

proc iml; %include 'c:\moody\sas\programs\modules\reach.mod';

/* blue eye example */

mat2=j(15,15,0); mat2[1,{2 14 15}]=1; /* lines cut here */ mat2[15,{1 14 2 4}]=1;

dmat=reach(mat2); mattrib dmat format=1.0;

print dmat; id=1:nrow(dmat); id=id`;

ddat=id||dmat;

create ddat from ddat; /* creates the dataset */ append from ddat;

quit;

data ddat (type=dist); /* tells SAS it is a distance */ set ddat; /* matrix */run;

Page 75: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Cluster analysisExample 2. Using a pre-defined distance matrix to define the space (as in a social network).Once you have it, the cluster program is just the same.

proc cluster data=ddat method=ward out=clustd;id col1;run;

proc tree data=clustd ncl=3 out=netclust;copy col1;run;

proc freq data=netclust;tables cluster;run;

proc print data=netclust;var col1 cluster;run;

Page 76: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Moody’s CROWDS algorithm combines the search approach with an initial cluster analysis and a routine for determining how many clusters are in the network. It does so by using the Segregation index and all of the information from the cluster hierarchy, combining two groups only if it improves the segregation fit for both groups.

.395.341 .319 .254

.404 .185 .614

.197 .372

.394

.279 .238 .224

.370

.325.368 .473.285.171

.589

.679 .496

.398 .255

.387

.701

.402.410

.555 .400

.646

.692

.085.127

.762

.735

.745

.745

Total

Methods: How do we identify primary groups in a network?Evade: Find a “cheap” indicator, and cluster/optimize that

Page 77: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

The logic behind these algorithms is that you remove some weak links and see what is left. Most popular is the “edge betweenness” algorithm.

Methods: How do we identify primary groups in a network?Destroy: Remove lines/nodes until what is left over reveals something of interest

Page 78: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

UCINET has the MCL algorithm programmed.

Methods: How do we identify primary groups in a network?Destroy: Remove lines/nodes until what is left over reveals something of interest

Page 79: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

“Evade” – look for something that correlates with your split

Newman’s Leading Eigenvector (in R – this is the “bottom” partition, not the best fit, which aggregates/joins from here)

Page 80: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

The Recursive Neighborhood Means algorithm creates the variables that are then used in the cluster analysis to identify groups.

•Start by randomly assigning every node a value on k variables•Then calculate the average for each variable for the people each person is tied to•Repeat this process multiple times

This results in people who have many ties to each other having similar values on the k random variables. This similarity then gets picked up in a cluster analysis.

“Evade” – look for something that correlates with your split

Page 81: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Example of the RNM procedure

Time 1 Time 2 Time 3

Page 82: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Example of the RNM procedure

Page 83: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

As an example, consider the process active on a known-to-be clustered networks, starting with 2 random k variables.

You get something like this, where the nodes are now placed according to their resulting values on the 2 variables.

Page 84: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.
Page 85: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

The algorithm does a good job uncovering clusters in fake datasets.

Page 86: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

The algorithm does a good job uncovering clusters in fake datasets.

Page 87: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Compared to real data:

Page 88: Social Sub-groups Overview Background: How do we characterize the social structure of a ‘group’? Exemplar: Ken Frank and Jeffrey Yasumoto A discussion.

Compared to real data: