Post on 14-Dec-2015
Graph-based cluster labeling using Growing Hierarchal SOM
Mahmoud Rafeek AlfarraCollege Of Science & Technologym.farra@cst.ps
The second International conference of Applied Science & natural
Ayman Shehda GhabayenCollege Of Science & Technologya.ghabayen@cst.ps
Prepared by:
Out Line
Labeling, What and why ?
Graph based Representation
Growing Hierarchal SOM
Extraction of labeles of clusters
Labeling, What and why ?
Cluster labeling: process tries to select descriptive labels (Key words) for the clusters obtained through a clustering algorithm.
Labeling, What and why ?
Cluster labeling is an increasingly important task that:
1. The document collections grow larger.2. Help To: work with processing of news,
email threads, blogs, reviews, and search results
Labeling, What and why ?
Documents collection
DocumentLabeled Clusters
Preprocessing StepDIG Model
X B
S OL
A
G
CD
Clustering Process
+Labeling
0G00G1
0GsSOM
1G01G1
1Gs
2G12G2
Hierarchal Growing SOM
2G12G2
1G01G1
2G12G2
Graph based Representation
010110
25
96
37
100000
A
B
X
D
NC
S
2,3
3,3
1,3
1,1
ph1
ph2
ph3
ph4
ph5
Graph based Representation
Capture the silent features of the data. DIG Model: a directed graph.
A document is represented as a vector of sentences Phrase indexing information is stored in the graph nodes themselves in the form of document tables.
e1
e0
e2
rafting
adventures
river
Document Table e0 S1(1), S2(2), S3(1)
e0 S2(1)
e2 S1(2)
e1 S4(1)
fishing
DocTFET
1{0,0,3}
2{0,0,2}
3{0,0,1}
S1(2)
#SentencePosition
of term
Graph based Representation
Example Document 1River rafting
Mild river rafting
River rafting trips
Document 2Wild river adventures
River rafting vocation plan
fishing trips
fishing vocation plan
booking fishing trips
river fishing
mild
river
rafting
trips
mildriver
rafting
trips
wild
adventures vocation
plan
wild
plan
mild
river
rafting
trips
adventures
vocation
booking
fishing
+
Growing Hierarchal SOM
Growing Hierarchal SOM
Determining the winning node
…
v1
v2
v3
v5
v4
v7
e0 v6e0
e1 e5
e3
e2
e4
n-nodes in SOM (Gs)
v1
v2 v5
v7
e0 v6e0
e1 e5
e3
Input Document Graph (Gi)
Phrases Significance
Gi Gs
length
Gi
Growing Hierarchal SOM
Neuron updating in the graph domain
A
B D
C
e0 Xe0
e1 e5
e3
Y
B D
CEe4
e1 e5
e3
Ae2
e2
G1G2
We choose increasing the matching phrases to update graphs due to its affect is more stronger than increasing terms (nodes) also add matching phrases can consider it as add ordered pair of nodes
Over all Document clustering Process
Extracting labeling of clusters
To extract the Key word, we need to build a table for each cluster as the following:
TermTF- Locations{T, L,B,b}
No of matching phrases (MP)
Weight
Weight = (f1*T + f2*L + f3*B+ f4*b) * 0.4 + MP * 0.6
Extracting labeling of clustersT1
T2 T3
T10
T4
T7 T8 T11
T6 T5
T9
TermF-weight# MPNet weight
T212.42 (T2,T3), (T2,T5)4.96 + 1.2 =6.16
T310.22 (T2,T3), (T5,T3)4.08 + 1.2= 5.28
T516.63 (T2,T5), (T8, T5), (T5,T3)6.4+ 1.8= 6.4
T814.41 (T8,T5)5.76+ 0.6=6.36
Thank You … Questions