Anuska Ferligojˇ Vladimir Batagelj University of...
Transcript of Anuska Ferligojˇ Vladimir Batagelj University of...
'
&
$
%
Blockmodeling
Anuska FerligojVladimir Batagelj
University of Ljubljana
European Science FoundationQMSS Workshop
Ljubljana, Saturday, July 2, 2005, 9:00-12:30
version: June 29, 2005 / 01 : 19
A. Ferligoj, V. Batagelj: Blockmodeling 2'
&
$
%
Outline1 Matrix rearrangement view on blockmodeling . . . . . . . . . . . . . . . . . . . 1
2 Blockmodeling as a clustering problem . . . . . . . . . . . . . . . . . . . . . . 2
13 Indirect Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
22 Example: Support network among informatics students . . . . . . . . . . . . . . 22
27 Direct Approach and Generalized Blockmodeling . . . . . . . . . . . . . . . . . 27
37 Pre-specified blockmodeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
44 Blockmodeling in 2-mode networks . . . . . . . . . . . . . . . . . . . . . . . . 44
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 1'
&
$
%
Matrix rearrangement view on blockmodelingSnyder & Kick’s World trade network / n = 118, m = 514
Pajek - shadow 0.00,1.00 Sep- 5-1998World trade - alphabetic order
afg alb alg arg aus aut bel bol bra brm bul bur cam can car cha chd chi col con cos cub cyp cze dah den dom ecu ege egy els eth fin fra gab gha gre gua gui hai hon hun ice ind ins ire irn irq isr ita ivo jam jap jor ken kmr kod kor kuw lao leb lib liy lux maa mat mex mla mli mon mor nau nep net nic nig nir nor nze pak pan par per phi pol por rum rwa saf sau sen sie som spa sri sud swe swi syr tai tha tog tri tun tur uga uki upv uru usa usr ven vnd vnr wge yem yug zai
afg
a
lb
alg
a
rg
aus
a
ut
bel
b
ol
bra
b
rm
bul
b
ur
cam
c
an
car
c
ha
chd
c
hi
col
c
on
cos
c
ub
cyp
c
ze
dah
d
en
dom
e
cu
ege
e
gy
els
e
th
fin
fra
gab
g
ha
gre
g
ua
gui
h
ai
hon
h
un
ice
ind
ins
ire
irn
irq
isr
ita
ivo
jam
ja
p
jo
r
k
en
km
r
k
od
kor
k
uw
lao
leb
lib
liy
lux
maa
m
at
mex
m
la
mli
mon
m
or
nau
n
ep
net
n
ic
nig
n
ir
n
or
nze
p
ak
pan
p
ar
per
p
hi
pol
p
or
rum
r
wa
saf
s
au
sen
s
ie
som
s
pa
sri
sud
s
we
sw
i
s
yr
tai
tha
tog
tri
tun
tur
uga
u
ki
upv
u
ru
usa
u
sr
ven
v
nd
vnr
w
ge
yem
y
ug
zai
Pajek - shadow 0.00,1.00 Sep- 5-1998World Trade (Snyder and Kick, 1979) - cores
uki net bel lux fra ita den jap usa can bra arg ire swi spa por wge ege pol aus hun cze yug gre bul rum usr fin swe nor irn tur irq egy leb cha ind pak aut cub mex uru nig ken saf mor sud syr isr sau kuw sri tha mla gua hon els nic cos pan col ven ecu per chi tai kor vnr phi ins nze mli sen nir ivo upv gha cam gab maa alg hai dom jam tri bol par mat alb cyp ice dah nau gui lib sie tog car chd con zai uga bur rwa som eth tun liy jor yem afg mon kod brm nep kmr lao vnd
uki
n
et
bel
lu
x
fr
a
it
a
d
en
jap
usa
c
an
bra
a
rg
ire
sw
i
s
pa
por
w
ge
ege
p
ol
aus
h
un
cze
y
ug
gre
b
ul
rum
u
sr
fin
sw
e
n
or
irn
tur
irq
egy
le
b
c
ha
ind
pak
a
ut
cub
m
ex
uru
n
ig
ken
s
af
mor
s
ud
syr
is
r
s
au
kuw
s
ri
th
a
m
la
gua
h
on
els
n
ic
cos
p
an
col
v
en
ecu
p
er
chi
ta
i
k
or
vnr
p
hi
ins
nze
m
li
s
en
nir
ivo
upv
g
ha
cam
g
ab
maa
a
lg
hai
d
om
jam
tr
i
b
ol
par
m
at
alb
c
yp
ice
dah
n
au
gui
li
b
s
ie
tog
car
c
hd
con
z
ai
uga
b
ur
rw
a
s
om
eth
tu
n
li
y
jo
r
y
em
afg
m
on
kod
b
rm
nep
k
mr
lao
vnd
Alphabetic order of countries (left) and rearrangement (right)
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 2'
&
$
%
Blockmodeling as a clustering problem
The goal of blockmodeling is to reducea large, potentially incoherent networkto a smaller comprehensible structurethat can be interpreted more readily.Blockmodeling, as an empirical proce-dure, is based on the idea that units ina network can be grouped according tothe extent to which they are equivalent,according to some meaningful defini-tion of equivalence.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 3'
&
$
%
Cluster, clustering, blocks
One of the main procedural goals of blockmodeling is to identify, in a givennetwork N = (U , R), R ⊆ U × U , clusters (classes) of units that sharestructural characteristics defined in terms of R. The units within a clusterhave the same or similar connection patterns to other units. They form aclustering C = {C1, C2, . . . , Ck} which is a partition of the set U . Eachpartition determines an equivalence relation (and vice versa). Let us denoteby ∼ the relation determined by partition C.
A clustering C partitions also the relation R into blocks
R(Ci, Cj) = R ∩ Ci × Cj
Each such block consists of units belonging to clusters Ci and Cj and allarcs leading from cluster Ci to cluster Cj . If i = j, a block R(Ci, Ci) iscalled a diagonal block.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 4'
&
$
%
Blockmodel
A blockmodel consists of structures obtained by identifying all units fromthe same cluster of the clustering C. For an exact definition of a blockmodelwe have to be precise also about which blocks produce an arc in the reducedgraph and which do not. The reduced graph can be presented by relationalmatrix, called also image matrix.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 5'
&
$
%
Structural and regular equivalence
Regardless of the definition of equivalence used, there are two basicapproaches to the equivalence of units in a given network (compare Faust,1988):
• the equivalent units have the same connection pattern to the sameneighbors;
• the equivalent units have the same or similar connection pattern to(possibly) different neighbors.
The first type of equivalence is formalized by the notion of structuralequivalence and the second by the notion of regular equivalence with thelatter a generalization of the former.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 6'
&
$
%
Structural equivalence
Units are equivalent if they are connected to the rest of the network inidentical ways (Lorrain and White, 1971). Such units are said to bestructurally equivalent.
In other words, X and Y are structurally equivalent iff:
s1. XRY ⇔ YRX s3. ∀Z ∈ U \ {X,Y} : (XRZ ⇔ YRZ)s2. XRX ⇔ YRY s4. ∀Z ∈ U \ {X,Y} : (ZRX ⇔ ZRY)
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 7'
&
$
%
. . . structural equivalence
The blocks for structural equivalence are null or complete with variationson diagonal in diagonal blocks.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 8'
&
$
%
Regular equivalence
Integral to all attempts to generalize structural equivalence is the idea thatunits are equivalent if they link in equivalent ways to other units that arealso equivalent.
White and Reitz (1983): The equivalence relation ≈ on U is a regularequivalence on network N = (U , R) if and only if for all X,Y,Z ∈ U ,X ≈ Y implies both
R1. XRZ ⇒ ∃W ∈ U : (YRW ∧W ≈ Z)R2. ZRX ⇒ ∃W ∈ U : (WRY ∧W ≈ Z)
Another view of regular equivalence is based on colorings (Everett, Borgatti1996).
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 9'
&
$
%
. . . regular equivalence
Theorem 1 (Batagelj, Doreian, Ferligoj, 1992) Let C = {Ci} be apartition corresponding to a regular equivalence ≈ on the network N =(U , R). Then each block R(Cu, Cv) is either null or it has the propertythat there is at least one 1 in each of its rows and in each of its columns.Conversely, if for a given clustering C, each block has this property thenthe corresponding equivalence relation is a regular equivalence.
The blocks for regular equivalence are null or 1-covered blocks.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 10'
&
$
%
An Example
r r r rr r
rr
JJ
J
JJ
J�
���
ZZ
ZZ
1
2
3
4 5
6
7 8
For the relation R given with the graph in figure the equivalences mentionedare
≈str = {{4, 5}, {7, 8}, {1}, {2}, {3}, {6}}≈reg = {{1, 4, 5, 7, 8}, {2, 3, 6}}
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 11'
&
$
%
Establishing Blockmodels
The problem of establishing a partition of units in a network in terms of aselected type of equivalence is a special case of clustering problem that canbe formulated as an optimization problem (Φ, P ) as follows:
Determine the clustering C? ∈ Φ for which
P (C?) = minC∈Φ
P (C)
where Φ is the set of feasible clusterings and P is a criterion function.
Since the set of units U is finite, the set of feasible clusterings is alsofinite. Therefore the set Min(Φ, P ) of all solutions of the problem (optimalclusterings) is not empty.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 12'
&
$
%
Criterion function
Criterion functions can be constructed
• indirectly as a function of a compatible (dis)similarity measure betweenpairs of units, or
• directly as a function measuring the fit of a clustering to an idealone with perfect relations within each cluster and between clustersaccording to the considered types of connections (equivalence).
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 13'
&
$
%
Indirect Approach
−→−→
−−−−→−→
R
Q
D
hierarchical algorithms,
relocation algorithm, leader algorithm, etc.
RELATION
DESCRIPTIONSOF UNITS
original relation
path matrix
triadsorbits
DISSIMILARITYMATRIX
STANDARDCLUSTERINGALGORITHMS
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 14'
&
$
%
Dissimilarities
We consider the following list of dissimilarities between units xi and xj
where the description of the unit consists of the row and column of theproperty matrix Q = [qij ].
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 15'
&
$
%
. . . Dissimilarities
Manhattan distance: dm(xi, xj) =n∑
s=1
(|qis − qjs|+ |qsi − qsj |)
Euclidean distance:
dE(xi, xj) =
√√√√ n∑s=1
((qis − qjs)2 + (qsi − qsj)2)
Truncated Manhattan distance: ds(xi, xj) =n∑
s=1s 6=i,j
(|qis − qjs|+ |qsi − qsj |)
Truncated Euclidean distance (Faust, 1988):
dS(xi, xj) =
√√√√√ n∑s=1
s 6=i,j
((qis − qjs)2 + (qsi − qsj)2)
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 16'
&
$
%
. . . Dissimilarities
Corrected Manhattan-like dissimilarity (p ≥ 0):
dc(p)(xi, xj) = ds(xi, xj) + p · (|qii − qjj |+ |qij − qji|)
Corrected Euclidean-like dissimilarity (Burt and Minor, 1983):
de(p)(xi, xj) =√
dS(xi, xj)2 + p · ((qii − qjj)2 + (qij − qji)2)
Corrected dissimilarity:
dC(p)(xi, xj) =√
dc(p)(xi, xj)
The parameter, p, can take any positive value. Typically, p = 1 or p = 2,where these values count the number of times the corresponding diagonalpairs are counted.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 17'
&
$
%
. . . DissimilaritiesIt is easy to verify that all expressions from the list define a dissimilarity(i.e. that d(x, y) ≥ 0; d(x, x) = 0; and d(x, y) = d(y, x)). Each of thedissimilarities from the list can be assessed to see whether or not it is also adistance: d(x, y) = 0 ⇒ x = y and d(x, y) + d(y, z) ≥ d(x, z).
The dissimilarity measure d is compatible with a considered equivalence ∼if for each pair of units holds
Xi ∼ Xj ⇔ d(Xi, Xj) = 0
Not all dissimilarity measures typically used are compatible with structuralequivalence. For example, the corrected Euclidean-like dissimilarity iscompatible with structural equivalence.
The indirect clustering approach does not seem suitable for establishingclusterings in terms of regular equivalence since there is no evident wayhow to construct a compatible (dis)similarity measure.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 18'
&
$
%
The Hierarchical Approach
Agglomerative hierarchical clustering algorithms usually assume that allrelevant information on the relationships between the n units from the set Uis summarized by a symmetric pairwise dissimilarity matrix D = [dij ]. Thescheme of the agglomerative hierarchical algorithm is:
Each unit is a cluster: Ci = {xi} , xi ∈ U , i = 1, 2, . . . , n;repeat while there exist at least two clusters:
determine the nearest pair of clusters Cp and Cq:d(Cp, Cq) = minu,v d(Cu, Cv) ;
fuse the clusters Cp and Cq to form a new clusterCr = Cp ∪ Cq;
replace Cp and Cq by the cluster Cr;determine the dissimilarities between the cluster Cr
and other clusters.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 19'
&
$
%
Dissimilarity update
hhhhhhhhh
���������
�
�
�
�sCj
sCi s Ckd(Ci, Cj)
d(Cj , Ck)
d(Ci, Ck)
According to the last step of this al-gorithm, we have to determine thedissimilarity d between the newlyformed cluster Cr and all other, pre-viously established, clusters. Thiscan be done in many different ways,each of which determines a differ-ent hierarchical clustering method.
The resulting clustering (hierarchy) can be represented graphically by theclustering tree (dendrogram).
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 20'
&
$
%
Methods
Suppose further, that the clusters Ci and Cj are the closest. They are fusedto form a new cluster Ci ∪ Cj . The methods of creating the dissimilaritybetween the new cluster and an extant cluster Ck include the following:
• The Minimum method, or single linkage, (Florek K. et al., 1951;Sneath P., 1957):
d(Ci ∪ Cj , Ck) = min(d(Ci, Ck), d(Cj , Ck))
• The Maximum method, or complete linkage, (McQuitty L., 1960):
d(Ci ∪ Cj , Ck) = max(d(Ci, Ck), d(Cj , Ck))
• The McQuitty method (McQuitty L., 1966; 1967):
d(Ci ∪ Cj , Ck) =d(Ci, Ck) + d(Cj , Ck)
2
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 21'
&
$
%
. . . Methods
• The Average method (Sokal R. and Michener C., 1958):
d(Ci ∪ Cj , Ck) =1
(ni + nj)nk
∑u∈Ci∪Cj
∑v∈Ck
d(u, v)
where ni denotes the number of units in the cluster Ci.
• The Gower method (Gower J., 1967):
d(Ci ∪ Cj , Ck) = d2(tij , tk)
where tij denotes the centroid of the fused cluster Ci ∪ Cj and tk thecenter of the cluster Ck.
• The Ward method (Ward, 1963):
d(Ci ∪ Cj , Ck) =(ni + nj)nk
(ni + nj + nk)d2(tij , tk)
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 22'
&
$
%
Example: Support network among informaticsstudents
The analyzed network consists of social support exchange relation amongfifteen students of the Social Science Informatics fourth year class(2002/2003) at the Faculty of Social Sciences, University of Ljubljana.Interviews were conducted in October 2002.
Support relation among students was identified by the following question:
Introduction: You have done several exams since you are in thesecond class now. Students usually borrow studying material fromtheir colleagues.
Enumerate (list) the names of your colleagues that you have mostoften borrowed studying material from. (The number of listedpersons is not limited.)
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 23'
&
$
%
Class network
b02
b03
g07
g09
g10
g12
g22 g24
g28
g42
b51
g63
b85
b89
b96
class.net
Vertices represent studentsin the class; circles – girls,squares – boys. Oppositepairs of arcs are replaced byedges.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 24'
&
$
%
Indirect approachPajek - Ward [0.00,7.81]
b51
b89
b02
b96
b03
b85
g10
g24
g09
g63
g12
g07
g28
g22
g42
Using Corrected Euclidean-likedissimilarity and Ward clusteringmethod we obtain the followingdendrogram.From it we can determine the num-ber of clusters: ’Natural’ cluster-ings correspond to clear ’jumps’ inthe dendrogram.If we select 3 clusters we get thepartition C.
C = {{b51, b89, b02, b96, b03, b85, g10, g24},{g09, g63, g12}, {g07, g28, g22, g42}}
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 25'
&
$
%
Partition in 3 clusters
b02
b03
g07
g09
g10
g12
g22 g24
g28
g42
b51
g63
b85
b89
b96
On the picture, ver-tices in the samecluster are of thesame color.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 26'
&
$
%
MatrixPajek - shadow [0.00,1.00]
b02
b03
g10
g24 C1
b51
b85
b89
b96
g07
g22 C2
g28
g42
g09
g12 C3
g63
b02
b03
g10
g24
C
1
b51
b85
b89
b96
g07
g22
C
2
g28
g42
g09
g12
C
3
g63
The partition can be usedalso to reorder rows andcolumns of the matrix repre-senting the network. Clus-ters are divided using bluevertical and horizontal lines.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 27'
&
$
%
Direct Approach and GeneralizedBlockmodeling
The second possibility for solving the blockmodeling problem is to constructan appropriate criterion function directly and then use a local optimizationalgorithm to obtain a ‘good’ clustering solution.
Criterion function P (C) has to be sensitive to considered equivalence:
P (C) = 0 ⇔ C defines considered equivalence.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 28'
&
$
%
Generalized BlockmodelingA blockmodel consists of structures ob-tained by identifying all units from thesame cluster of the clustering C. Foran exact definition of a blockmodel wehave to be precise also about whichblocks produce an arc in the reducedgraph and which do not, and of whattype. Some types of connections arepresented in the figure on the next slide.The reduced graph can be representedby relational matrix, called also imagematrix.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 29'
&
$
%
Block Types and Matrices
1 1 1 1 1 1 0 0
1 1 1 1 0 1 0 1
1 1 1 1 0 0 1 0
1 1 1 1 1 0 0 0
0 0 0 0 0 1 1 1
0 0 0 0 1 0 1 1
0 0 0 0 1 1 0 1
0 0 0 0 1 1 1 0
C1 C2
C1 complete regular
C2 null complete
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 30'
&
$
%
Block Types
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 31'
&
$
%
Generalized equivalence / Block Types
Y1 1 1 1 1
X 1 1 1 1 11 1 1 1 11 1 1 1 1complete
Y0 1 0 0 0
X 1 1 1 1 10 0 0 0 00 0 0 1 0
row-dominant
Y0 0 1 0 0
X 0 0 1 1 01 1 1 0 00 0 1 0 1
col-dominant
Y0 1 0 0 0
X 1 0 1 1 00 0 1 0 11 1 0 0 0
regular
Y0 1 0 0 0
X 0 1 1 0 01 0 1 0 00 1 0 0 1
row-regular
Y0 1 0 1 0
X 1 0 1 0 01 1 0 1 10 0 0 0 0col-regular
Y0 0 0 0 0
X 0 0 0 0 00 0 0 0 00 0 0 0 0
null
Y0 0 0 1 0
X 0 0 1 0 01 0 0 0 00 0 0 1 0
row-functional
Y1 0 0 00 1 0 0
X 0 0 1 00 0 0 00 0 0 1
col-functional
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 32'
&
$
%
Characterizations of Types of Blocksnull nul all 0 ∗
complete com all 1 ∗
regular reg 1-covered rows and columns
row-regular rre each row is 1-covered
col-regular cre each column is 1 -covered
row-dominant rdo ∃ all 1 row ∗
col-dominant cdo ∃ all 1 column ∗
row-functional rfn ∃! one 1 in each row
col-functional cfn ∃! one 1 in each column
non-null one ∃ at least one 1∗ except this may be diagonal
A block is symmetric iff ∀X,Y ∈ Ci × Cj : (XRY ⇔ YRX).
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 33'
&
$
%
Criterion function
One of the possible ways of constructing a criterion function that directlyreflects the considered equivalence is to measure the fit of a clustering toan ideal one with perfect relations within each cluster and between clustersaccording to the considered equivalence.
Given a clustering C = {C1, C2, . . . , Ck}, let B(Cu, Cv) denote the set ofall ideal blocks corresponding to block R(Cu, Cv). Then the global error ofclustering C can be expressed as
P (C) =∑
Cu,Cv∈C
minB∈B(Cu,Cv)
d(R(Cu, Cv), B)
where the term d(R(Cu, Cv), B) measures the difference (error) betweenthe block R(Cu, Cv) and the ideal block B. d is constructed on the basis ofcharacterizations of types of blocks. The function d has to be compatiblewith the selected type of equivalence.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 34'
&
$
%
. . . criterion function
For example, for structural equivalence, the term d(R(Cu, Cv), B) can beexpressed, for non-diagonal blocks, as
d(R(Cu, Cv), B) =∑
X∈Cu,Y∈Cv
|rX Y − bX Y|.
where rX Y is the observed tie and bX Y is the corresponding value in anideal block. This criterion function counts the number of 1s in erstwhilenull blocks and the number of 0s in otherwise complete blocks. These twotypes of inconsistencies can be weighted differently.
Determining the block error, we also determine the type of the best fittingideal block (the types are ordered).
The criterion function P (C) is sensitive iff P (C) = 0 ⇔ µ (determinedby C) is an exact blockmodeling. For all presented block types sensitivecriterion functions can be constructed (Batagelj, 1997).
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 35'
&
$
%
Solving the blockmodeling problem
The obtained optimization problem can be solved by local optimization(e.g., relocation algorithm).
Determine the initial clustering C;repeat:
if in the neighborhood of the current clustering Cthere exists a clustering C′ such that P (C′) < P (C)then move to clustering C′ .
The neighborhood in this local optimization procedure is determined by thefollowing two transformations:
• moving a unit Xk from cluster Cp to cluster Cq (transition);
• interchanging units Xu and Xv from different clusters Cp and Cq
(transposition).
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 36'
&
$
%
Benefits from Optimization Approach• ordinary / inductive blockmodeling: Given a network N and set of
types of connection T , determine the model M;
• evaluation of the quality of a model, comparing different models,analyzing the evolution of a network (Sampson data, Doreian andMrvar 1996; states / continents): Given a network N , a model M, andblockmodeling µ, compute the corresponding criterion function;
• model fitting / deductive blockmodeling: Given a network N , set oftypes T , and a family of models, determine µ which minimizes thecriterion function (Batagelj, Ferligoj, Doreian, 1998).
• we can fit the network to a partial model and analyze the residualafterward;
• we can also introduce different constraints on the model, for example:units X and Y are of the same type; or, types of units X and Y are notconnected; . . .
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 37'
&
$
%
Pre-specified blockmodelingIn the previous slides the inductive approaches for establishing blockmodelsfor a set of social relations defined over a set of units were discussed.Some form of equivalence is specified and clusterings are sought that areconsistent with a specified equivalence.
Another view of blockmodeling is deductive in the sense of starting with ablockmodel that is specified in terms of substance prior to an analysis.
In this case given a network, set of types of ideal blocks, and a reducedmodel, a solution (a clustering) can be determined which minimizes thecriterion function.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 38'
&
$
%
Pre-Specified Blockmodels
The pre-specified blockmodeling starts with a blockmodel specified, interms of substance, prior to an analysis. Given a network, a set of idealblocks is selected, a family of reduced models is formulated, and partitionsare established by minimizing the criterion function.
The basic types of models are:
* * *
* 0 0
* 0 0
* 0 0
* * 0
? * *
* 0 0
0 * 0
0 0 *
center - hierarchy clustering
periphery
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 39'
&
$
%
Prespecified blockmodeling example
We expect that center-periphery model exists in the network: some studentshaving good studying material, some not.
Prespecified blockmodel: (com/complete, reg/regular, -/null block)
1 2
1 [com reg] -
2 [com reg] -
Using local optimization we get the partition:
C = {{b02, b03, b51, b85, b89, b96, g09},
{g07, g10, g12, g22, g24, g28, g42, g63}}
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 40'
&
$
%
2 clusters solution
b02
b03
g07
g09
g10
g12
g22 g24
g28
g42
b51
g63
b85
b89
b96
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 41'
&
$
%
ModelPajek - shadow [0.00,1.00]
g07
g10
g12
g22
g24
g28
g42
g63
b85
b02
b03
g09
b51
b89
b96
g07
g10
g12
g22
g24
g28
g42
g63
b85
b02
b03
g09
b51
b89
b96
Image and Error Matrices:
1 2
1 reg -
2 reg -
1 2
1 0 3
2 0 2
Total error = 5center-periphery
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 42'
&
$
%
The Student Government at the University of Ljubljana in1992
The relation is determined by the following question (Hlebec, 1993):
Of the members and advisors of the Student Government, whomdo you most often talk with about the matters of the StudentGovernment?
m p m m m m m m a a a1 2 3 4 5 6 7 8 9 10 11
minister 1 1 · 1 1 · · 1 · · · · ·p.minister 2 · · · · · · · 1 · · ·minister 2 3 1 1 · 1 · 1 1 1 · · ·minister 3 4 · · · · · · 1 1 · · ·minister 4 5 · 1 · 1 · 1 1 1 · · ·minister 5 6 · 1 · 1 1 · 1 1 · · ·minister 6 7 · · · 1 · · · 1 1 · 1minister 7 8 · 1 · 1 · · 1 · · · 1adviser 1 9 · · · 1 · · 1 1 · · 1adviser 2 10 1 · 1 1 1 · · · · · ·adviser 3 11 · · · · · 1 · 1 1 · ·
minister1
pminister
minister2
minister3
minister4
minister5
minister6
minister7
advisor1
advisor2
advisor3
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 43'
&
$
%
A Symmetric Acyclic Blockmodel of Student Government
The obtained clustering in 4clusters is almost exact. Theonly error is produced by thearc (a3,m5).
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 44'
&
$
%
Blockmodeling in 2-mode networksWe already presented some ways of rearranging 2-mode network matricesat the beginning of this lecture.
It is also possible to formulate this goal as a generalized blockmodelingproblem where the solutions consist of two partitions — row-partition andcolumn-partition.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 45'
&
$
%
Supreme Court Voting for Twenty-Six Important DecisionsIssue Label Br Gi So St OC Ke Re Sc ThPresidential Election PE - - - - + + + + +
Criminal Law CasesIllegal Search 1 CL1 + + + + + + - - -Illegal Search 2 CL2 + + + + + + - - -Illegal Search 3 CL3 + + + - - - - + +Seat Belts CL4 - - + - - + + + +Stay of Execution CL5 + + + + + + - - -
Federal Authority CasesFederalism FA1 - - - - + + + + +Clean Air Action FA2 + + + + + + + + +Clean Water FA3 - - - - + + + + +Cannabis for Health FA4 0 + + + + + + + +United Foods FA5 - - + + - + + + +NY Times Copyrights FA6 - + + - + + + + +
Civil Rights CasesVoting Rights CR1 + + + + + - - - -Title VI Disabilities CR2 - - - - + + + + +PGA v. Handicapped Player CR3 + + + + + + + - -
Immigration Law CasesImmigration Jurisdiction Im1 + + + + - + - - -Deporting Criminal Aliens Im2 + + + + + - - - -Detaining Criminal Aliens Im3 + + + + - + - - -Citizenship Im4 - - - + - + + + +
Speech and Press CasesLegal Aid for Poor SP1 + + + + - + - - -Privacy SP2 + + + + + + - - -Free Speech SP3 + - - - + + + + +Campaign Finance SP4 + + + + + - - - -Tobacco Ads SP5 - - - - + + + + +
Labor and Property Rights CasesLabor Rights LPR1 - - - - + + + + +Property Rights LPR2 - - - - + + + + +
The Supreme Court Justices andtheir ‘votes’ on a set of 26 “impor-tant decisions” made during the2000-2001 term, Doreian and Fu-jimoto (2002).The Justices (in the order inwhich they joined the SupremeCourt) are: Rehnquist (1972),Stevens (1975), O’Conner (1981),Scalia (1982), Kennedy (1988),Souter (1990), Ginsburg (1993)and Breyer (1994).
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
A. Ferligoj, V. Batagelj: Blockmodeling 46'
&
$
%
. . . Supreme Court Voting / a (4,7) partition
Pajek - shadow
[0.00,1.00]
Cam
paignFi
VotingR
igh
DeportA
lie
IllegalS.1
IllegalS.2
StayE
xecut
PG
AvP
layer
Privacy
Imm
igratio
LegalAidP
o
Crim
inalAl
IllegalS.3
CleanA
irAc
Cannabis
NY
Tim
es
SeatB
elt
UnitedF
ood
Citizenshi
PresE
lecti
Federalism
FreeS
peech
CleanW
ater
TitleIV
Tobacco
LaborIssue
Property
Rehnquest
Thomas
Scalia
Kennedy
OConner
Breyer
Ginsburg
Souter
Stevens
upper – conservative / lower – liberal
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6