HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC...
Transcript of HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC...
![Page 1: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/1.jpg)
1
1
HIERARCHICAL AND PYRAMIDAL CLUSTERING
FOR SYMBOLIC DATA
Paula Brito
Univ. Porto, Portugal
2
OUTLINE
•The hierachical model
• The pyramidal model
• Numerical hierarchical / pyramidal clustering
•Symbolic Clustering
•The property of completeness
• The generality degree
• The clustering algorithm
•Examples
•The HIPYR Module
![Page 2: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/2.jpg)
2
3
Hierarchical Model :set of nested partitions
• E ∈∈∈∈ H
• ∀∀∀∀ a ∈∈∈∈ ΕΕΕΕ , { a } ∈∈∈∈ H
• ∀∀∀∀h, h' ∈∈∈∈ H, h ∩∩∩∩ h' = Ø or h ⊆⊆⊆⊆ h' or h' ⊆⊆⊆⊆ h
Let ΕΕΕΕ be the observations set (the set being clustered)
Hierarchy on E :
Family H on non-empty subsets of E such that
4
x1 x2 x3 x4 x5
h
h'
x3
x4 x5
x1 x2
h
h'
![Page 3: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/3.jpg)
3
5
PYRAMIDAL REPRESENTATION
PYRAMID : set of nested overlappings
P = { {x1} , {x2} , {x3} , {x4} , {x5} , {x1 , x2} , {x3 , x4} ,
{x4 , x5} , {x1 , x2 , x3} , {x1 , x2 , x3 , x4} , {x3 , x4 , x5} ,
{x1 , x2 , x3 , x4 , x5 } }
x1 x2 x3 x4 x5
x3
x5
x1 x2
x4
Diday (1984, 1986), Bertrand, Diday (1985), Bertrand,(1986)
6
PYRAMID P on ΕΕΕΕ
Family on non-empty subsets of E such that :
• E ∈∈∈∈ P
• ∀∀∀∀ a ∈∈∈∈ ΕΕΕΕ , { a } ∈∈∈∈ P
• ∀∀∀∀ p, p' ∈∈∈∈ P, p ∩∩∩∩ p' = Ø or p ∩∩∩∩ p' ∈∈∈∈ P
• there exists a linear order θθθθ / every element of P is an
interval of θθθθ
Pyramidal model :
x1 x2 x3 x4 x5
Hierarchy : nested partitions
Pyramid : nested overlappings
Clustering
Seriation
![Page 4: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/4.jpg)
4
7
SUCCESSOR AND PREDECESSOR
p SUCCESSOR of p’ if
- p ⊆⊆⊆⊆ p’
- ¬¬¬¬∃∃∃∃ p’’ / p ⊆⊆⊆⊆ p” ⊆⊆⊆⊆ p’
p’ is a PREDECESSOR of p
Pyramid : Each cluster has at most TWO predecessors
x1 x2 x3 x4 x5
Hierarchy : Each cluster has at most ONE predecessor
x1 x2 x3 x4 x5
h
h'
8
ASCENDING CLUSTERING ALGORITHM
Starting with the one cluster elements,
merge at each step the MERGEABLE clusters for which
the dissimilarity (aggregation index) is MINIMUM
MERGEABLE CLUSTERS :
→ if the structure is a hierarchy : none of them has
been aggregated before ;
→ if the structure is a pyramid : none of them has been
aggregated twice, and p is an interval of a total order θθθθ
on E
![Page 5: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/5.jpg)
5
9
Given a set of symbolic objects ,
build an hierarchical / pyramidal clustering
such that each cluster is a pair
SYMBOLIC CLUSTERING
EACH CLUSTER HAS AN AUTOMATIC SYMBOLIC
REPRESENTATION
BY A SYMBOLIC OBJECT
ndescriptio its - INTENSION
members its - EXTENSION
OBJECTIVE :
10
Symbolic clustering methods need:
• Generalization Operator
C ⊆⊆⊆⊆ C’
s’ (representing C’) is more general than
s (representing C)
• Generality degree measure
![Page 6: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/6.jpg)
6
11
GENERALISATION
s is more general than s’ if its extent contains
the extent of s’
s’ is more specific than s
Generalisation of two symbolic objects s and s’ :
determining s’’ : s’’ is more general than both s and s’.
This procedure differs according to the variable type :
s ≤≤≤≤ s ∪∪∪∪ s’ and s’≤≤≤≤ s ∪∪∪∪ s’
ext (s ∪∪∪∪ s’) ⊇⊇⊇⊇ ext (s) and ext (s ∪∪∪∪s’) ⊇⊇⊇⊇ ext (s’)
12
1) Interval variables
s1 = [ y ∈∈∈∈ [a1, b1] ]
s2 = [ y ∈∈∈∈ [a2, b2] ]
s1 ∪∪∪∪ s2 = [ y ∈∈∈∈ [min {a1,a2}, max {b1,b2}]]
s1 = [ time ∈∈∈∈ [5, 15] ]
s2 = [ time ∈∈∈∈ [10, 20] ]
s1 ∪∪∪∪ s2 = [ time ∈∈∈∈ [ 5, 20 ]]
Example :
![Page 7: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/7.jpg)
7
13
2) Multi-valued categorical variables
s1 = [ y ∈∈∈∈ V1]
s2 = [ y ∈∈∈∈ V2 ]
s1 ∪∪∪∪ s2 = [ y1 ∈∈∈∈ V1 ∪∪∪∪ V2 ]
s1 = [job ∈∈∈∈{secretary, teacher}]
s2 = [job ∈∈∈∈ {employee}]
s1 ∪∪∪∪ s2 = [job ∈∈∈∈{secretary, teacher, employee}]
Example :
14
3) Modal variables
Two possibilities proposed :
take for each category the Minimum of its frequencies
take for each category the Maximum of its frequencies
![Page 8: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/8.jpg)
8
15
a) Generalisation by the Maximum
= ] } )(pm , ,)(pm [y ] } )(pm , ,)(pm [y 2
kk
2
11
1
kk
1
11 …………{{{{====∪∪∪∪…………{{{{====
] } )(pm , ,)(pm [y kk11 …………{{{{====
}}}}2
j
1
jj p , p {Max p ====with
[Type of Job ∈∈∈∈{ (0.3) administration, (0.7) teaching}] ∪∪∪∪
[Type of Job ∈∈∈∈{ (0.6) admin., (0.2) teaching , (0.2) secretary}]
= [Type of Job ∈∈∈∈{ (0.6) admin., (0.7) teaching , (0.2) secret.}]
Example :
Extension : k} , 1,=j , p p : a { j
a
j …………≤≤≤≤
“at most” principle
16
b) Generalisation by the Minimum
= ] } )(pm , ,)(pm [y ] } )(pm , ,)(pm [y 2
kk
2
11
1
kk
1
11 …………{{{{====∪∪∪∪…………{{{{====
] } )(pm , ,)(pm [y kk11 …………{{{{====
with }}}}2
j
1
jj p , p { Min p ====
[[Type of Job ∈∈∈∈{ (0.3) administration, (0.7) teaching}] ∪∪∪∪
[Type of Job ∈∈∈∈{ (0.6) admin., (0.2) teaching , (0.2) secretary}]
= [Type of Job ∈∈∈∈{ (0.3) admin., (0.2) teaching}]
Example :
Extension :
“at least” principle
k} , 1,=j , p p : a { j
a
j …………≥≥≥≥
![Page 9: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/9.jpg)
9
17
COMPLETENESS AND COMPLETE OBJECTS
COMPLETE symbolic object
• defined by all the properties that characterize its
extension
• the most specific to fulfill this condition
Galois connections, lattice theory
Union preservs completeness
Barbut & Monjardet (1970); Wille (1982); Ganter (1984)
18
y1
y2
y3
Age Weight Sex
w120 45 F
w2
50 55 F
w3
30 50 F
w4
60 60 M
a = [ Weight = [ 40, 50 ] ] ∧∧∧∧ [ Age = [ 20, 50 ] ]
ext a = {w1 , w3}
a IS NOT COMPLETE
a = [Weight = [45, 50]] ∧∧∧∧ [Age = [20, 30]] ∧∧∧∧ [Sex ={F}]
IS COMPLETE
Example:
![Page 10: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/10.jpg)
10
19
THE BASIC ALGORITHM
• p1, p2 can be merged together
• s is complete
• s is more general than s1, s2 ⇒⇒⇒⇒ union s1 ∪∪∪∪ s2
• extE s = p
Non - uniqueness ⇒⇒⇒⇒ numerical criterion
Clusters with more specific descriptions
are formed first
STARTING WITH THE ONE-OBJECT CLUSTERS
{ai}, i = 1,…,n
At each step, FORM A CLUSTER p union of p1 , p2 ,
REPRESENTED BY s such that
20
GENERALITY DEGREE
a = ∧∧∧∧ [ yi ∈∈∈∈ Vi ], Vi ⊆⊆⊆⊆ Oi Oi bounded
i
e = a if
)i
(eG = )
i(O m
)i
(V m
= (a)G
p
1=i
p
1=i
∧∧∧∧
∏∏∏∏∏∏∏∏
PROPORTION of the description space covered by a
the more possible members of the extension of a ,
the greater the generality degree of a
![Page 11: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/11.jpg)
11
21
Interval variables
m(Vi) = max Vi – min Vi (range)
55,045
25
1560
2045)11e(G ==
−
−=
2,010000
2000
010000
10003000)12e(G ==
−
−=
11,02,0*55,0)1s(G ==
Example :
Describing groups of people
defined on variables age and salary.
Age ranges from 15 to 60 , salary ranges from 0 to 10000.
Consider a group described by a symbolic object
s1 = [age ∈∈∈∈ [ 20 , 45]] ∧∧∧∧ [salary ∈∈∈∈ [1000 , 3000]] = e11 ∧∧∧∧ e12
22
Multi-valued categorical variables
m(Vi) = card Vi
Example:
Describing grous of people from the UE,
defined on variables sex and nationality.
5,02
1)11e(G == 08,0
25
2)12e(G ==
04,008,05,0)1s(G =×=
Consider one group described by :
s1 = [sex ∈∈∈∈ { M }] ∧∧∧∧ [nationality ∈∈∈∈ {French, English}]
![Page 12: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/12.jpg)
12
23
∑∑∑∑∏∏∏∏========
====jk
i
ij
p
j j
pk
aG11
1
1)(
which is the affinity coefficient (Matusita, 1951) between
(p1,…,pk) and the uniform distribution
This means that we consider an object
the more general the more similar it is
to the uniform distribution
G1(a) is maximum (=1) when pi = 1/k, i=1,…k : uniform
Modal variables
a) Generalising by the Maximum
24
∑∑∑∑∏∏∏∏========
−−−−−−−−
====jk
i
ij
p
j jj
pkk
aG11
2 )1()1(
1)(
Again, G2(a) is maximum (=1)
when pi = 1/k, i=1,…k : uniform
b) Generalising by the minimum
![Page 13: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/13.jpg)
13
25
ALGORITHM
• p1, p2 can be merged together
• s = s1 ∪∪∪∪ s2 (complete)
• extE s = p
• G(s) is minimum
Starting with the one-object clusters {ai}, i = 1,…,n
At each step, FORM A CLUSTER p union of p1 , p2 ,
REPRESENTED BY s such that
26
each cluster is represented by a
"complete" symbolic object
whose extension is the cluster itself
The algorithm builds a hierarchy / pyramid on E, such that
AUTOMATIC SYMBOLIC REPRESENTATION
OF THE CLUSTERS
CLUSTER = (p, s) p = ext s
![Page 14: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/14.jpg)
14
27
y1 y2 y3 y4
w1 1 1 1 2
w2 1 2 1 3
w3 1 2 2 2
w4 2 1 1 2
w5 3 3 2 1
28
w 4 w 1 w2 w3 w5
2/54
4 /54
8/54
16/54
24 /54
32/54
1
P 1P2
P3
P4P5 P 6
P7
P8
P9
P10
P6 : : : : s6 = [ y1 = {1} ] ∧∧∧∧ [ y2 = {1,2} ] ∧∧∧∧ [ y4 = {2,3} ]
P7 : : : : s7 = [ y1 = {1,2} ] ∧∧∧∧ [ y2 = {1,2}] ∧∧∧∧ [y4 = {2,3} ]
![Page 15: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/15.jpg)
15
29
Each of the 12 groups is described
by interval variables, representing the variation
of the underlying variables in the class.
Application
INE Labour Force Survey data :
12 sex * age groups
30
![Page 16: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/16.jpg)
16
31
32
P36 [Looking_for_job=[2.010,3.595]U[4.579,6.707]
]^
[Financial Act.=[3.639,5.851]]^
[Public_ Admin=[1.336,2.620]U[3.004,4.871]]^
[Hotels_Rest_=[1.874,3.497]]^
[Commerce=[9.582,13.161]]^
[Construction=[11.244,14.526]]^
[Education=[4.589,8.829]]^
[Elect__gas_water=[0.347,1.757]]^
[Industry=[32.700,37.277]U[38.671,43.205]]^
[Other_Serv_=[4.316,7.551]]^
[Primary=[5.921,9.799]]^
[Health=[1.857,3.622]]^
[Transp_Comunic_=[2.352,4.675]]^
[None=[2.603,4.285]U[4.609,6.839]]^...
![Page 17: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/17.jpg)
17
33
its members
Man 15 to 54
Women 15 to 24
are the only elements
fulfilling these conditions.
34
Application
Cultural activities of 11 socio-professional groups
1509 individual observations grouped
![Page 18: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/18.jpg)
18
35
36
![Page 19: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/19.jpg)
19
37
The HIPYR Module
Objective :
Perform Hierarchical or Pyramidal clustering
on a set of SO’s
� from a dissimilarity matrix
→→→→ numerical clustering
� directly based on the data set
→→→→ symbolic clustering: clusters are "concepts "
38
The HIPYR module
![Page 20: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/20.jpg)
20
39
Main Parameters
Structure :
Data Source :
Aggregation Index :
Hierarchy or Pyramid
� Dissimilarity Matrix (Numerical Clustering)
� Symbolic objects (Symbolic Clustering)
� Numerical Clustering : Maximum, Minimum, Average,
Diameter
� Symbolic Clustering : Minimum Generality
Minimum Increase in Generality
40
Main Parameters
�Use Taxonomies for generalization
(nominal or categorical multi-valued variables) : Y, N
� Select “best” classes : Y, N
� Write induced dissimilarity/generality matrix : Y, N
� Order Variable (optional) : quantitative single variable;
to impose an order compatible with the pyramid
� Modal variables generalization :
� Maximum
� Minimum
![Page 21: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/21.jpg)
21
41
Main Parameters
42
Main Parameters
![Page 22: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/22.jpg)
22
43
Induced dissimilarity/generality matrix
For each pair of SO, si, sj, i, j, =1,..., n,
d*(si, sj,) = index (height) of the “smallest” class
that contains si and sj
d*(si, sj,) = Min {f(C), si ∈∈∈∈ C, sj, ∈∈∈∈ C}
Evaluation of the obtained indexed hierarchy /
pyramid:
Comparision between the initial and the induced
dissimilarity/generality matrices.
44
Evaluation value
For si, sj, i, j, =1,..., n, d(si, sj) :
�the given dissimilarity matrix (numerical clustering)
�generality degree of si ∪∪∪∪ sj (symbolic clustering)
∑∑∑∑ ∑∑∑∑
∑∑∑∑ ∑∑∑∑
−−−−
==== ++++====
−−−−
==== ++++====
−−−−
====1n
1i
n
1ijji
1n
1i
n
1ij
2jiji
)s,s(d
))s,s(*d)s,s(d(
EV
![Page 23: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/23.jpg)
23
45
Cluster selectionIdentify the most interesting clusters :
A cluster is “interesting” if its variability is
small as compared to its predecessors.
Variability indicated by index values f(h).
Compute mean value and standard deviation
of height increase values.
A class is selected if the corresponding increase value
is more than 2 stand. dev. over the mean value.
46
Cluster selection
C
![Page 24: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/24.jpg)
24
47
HIPYROUTPUT
� Text file
� Sodas file
� Interactive Graphical Representation (VPYR)
48
� The labels of the individuals
� The labels of the variables
� The description of each node :
� the symbolic object associated to each node
� its extension
� Evaluation value
�Selected clusters, if asked for
�The induced matrix, if asked for
The output listing contains:
![Page 25: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/25.jpg)
25
49
Graphical Representation
50
Options of the Graphical
Representation
A cluster is selected by clicking on it.
�description of the cluster in terms of
list of chosen variables
�representation by a Zoom Star
![Page 26: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/26.jpg)
26
51
Graphical Representation
52
Graphical Representation
![Page 27: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/27.jpg)
27
53
HIPYR HIPYR -- VPYRVPYR
54
Pruning the hierarchy or pyramid using the
aggregation heights as a criterion.
Suppressing cluster p if :
f(p’) - f(p) < αααα f(E) ∧∧∧∧ p has a single predecessor
Options of the Graphical
Representation
Rate of simplification αααα
chosen by the user,
new graphic window with
the simplified structure.
![Page 28: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/28.jpg)
28
55
Pruning
56
Rule Generation
If the hierarchy/pyramid are built from a symbolic
data table, rules may be generated and saved in a
specified file.
(p1,s1) (p2,s2)
(p,s)
(p1,s1) (p2,s2)
(p,s)
Fission method :
s ⇒⇒⇒⇒ s1 ∨∨∨∨ s2
Fussion method
(pyramids only) :
s1 ∧∧∧∧ s2⇒⇒⇒⇒ s
![Page 29: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/29.jpg)
29
57
Rule generation
58
Reduction
Should the user be interested in a particular cluster,
he may obtain a window with the structure
restricted to this cluster and its successors.
Options of the Graphical
Representation
![Page 30: HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC …vlado.fmf.uni-lj.si/info/ifcs06/symbolic/HIPYR.pdf · 1 1 HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA Paula Brito](https://reader033.fdocuments.in/reader033/viewer/2022050220/5f66130ed5cd4702f5482e4e/html5/thumbnails/30.jpg)
30
59
Reduction