Hypergraph Mining For Social Networks
-
Upload
giacomo-bergami -
Category
Presentations & Public Speaking
-
view
232 -
download
0
Transcript of Hypergraph Mining For Social Networks
Hypergraph Mining for Social Networks
Giacomo [email protected]
Università di Bologna
July 17, 2014
Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 1 / 19
Contents
1 Goals & State of the Art
2 Why Hypergraphs?
3 Data Mining Algebra
4 gSpanThe Original AlgorithmSpecialization Proposal
5 Conclusions & Future work
Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 2 / 19
Goals & . . .
Define an hypergraph data structure for both data and relations.Define some algebra operators.Open question: how to automate the mining process.Evaluations: online social network (OSN) data representation.Experiments: Graph clustering of Iris data set.
Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 3 / 19
. . . & State of the Art
Toon Calders’s Data Mining (Relational) Algebra.Data structures that have already been studied in detail:.
I Data mining operations have been developed in both graphs andrelational graphs.
I Well known algorithms (optimal computational cost) and operators.
While hypergraphs may be an intuitive representation of higher ordersimilarities, it seems (anecdotally at least) that graphs lie at the heartof this problem.
Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 4 / 19
Why Hypergraphs?
tommy32
Tom S.
(lat1,long1)
Tom S. (lat1,long1) tommy32
Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 5 / 19
Data Structures: ED as data entities’ representation
tommy32
Tom S.
(lat1,long1)
sam
Adeola S.
(lat2,long2)
47561
10231
75234
80235
11230
UserID UserLocation UserName w ϕtommy23 (lat1,long1) Tom Sawyer 1 1000
sam (lat2,long2) Adeola Samuel 1 10002
Entities(E
D)
PostID PostLocation PostContent UserID w ϕ47561 (lat3,long3) Have a nice day! tommy32 1 1
80235 (lat4,long4) Some great news... sam 1 2
10231 (NULL,NULL) J.S. Bach and ... tommy32 1 3
75234 (lat5,long5) Telemann & Xenakis... tommy32 1 4
11230 (NULL,NULL) Yet another plot! sam 1 5
Entities(E
D)
Database with Uncertain Data
Users<:Object
Posts<:Object
Collection
Attributes
Primary key Foreign key
Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 6 / 19
Data Structures: EE as binary relationships’ representation
tommy32
Tom S.
(lat1,long1)
sam
Adeola S.
(lat2,long2)
47561
10231
75234
80235
11230
tommy3210231
75234
47561
sam
80235
10231follows
follows
The information’s atomization allows to automatically identify relationshipsbetween data (users’ posts)If we keep binary EE relations between ED , we could retain a linear timecomplexity on the size of a binary graph (O(|G|)), where ED are mapped asG’s vertices.
Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 7 / 19
DHImp, the final step. (DHImp = (db,T ))
Relational Database will express data relations (ED) while tensors willexpress data correlations (EE )This definition permits to define also tensors τi ∈ T for non binaryrelations (OSN relations are mainly binary).Permits to separate the operations for data and the operations overrelations.
Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 8 / 19
(Relational) Data Mining Algebra
D E
Pop
I
πA πRDA
κλ
Toons’ DMA Identifies different “data” categories: Data World,Intensional World, Extensional World.Some internal world operations and external world operations aredefined.Data Mining algorithms could be described by this algebra.
I Is it possible to map these worlds over our hypergraphs? YesI Is it possible to define an algebra for (weighted and indexed)
hypergraphs? Yes
Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 9 / 19
Is it possible to map these worlds over our hypergraphs?
DHImp (as Hyperg
raph)
euser
epos
eid
fuser
fpos
fid
Pure I-Hypergrap
h
{e}
{f}
{e, f}
HDM = (h,H, EL)
Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 10 / 19
Is it possible to define an algebra for (weighted and indexed) hypergraphs?
Definition (Database operations)
.(DB) = { .(t) | t ∈ DB }DB1onDB2 = { t1 on t2 | t1 ∈ DB1 ∧ t2 ∈ DB2 }
Definition (Index-consistency)A database unary operation . is said to be index-consistent iff. for all thetables of the current database, the indices among the tables are keptdistinct. (Similarly for binary ones).
Relational algebra operators over DWorld should be redefined forweight update and indexing properties.
I The reindexing over the tables obtained as a result of algebraicoperations is performed via dovetailing.
I All the relational algebra operations have to be proved asindex-consistent.
Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 11 / 19
Hypergraph Data Mining Example: H. Clustering via Binary Graph Clustering
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
3031
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
9293
94
95 9697
98
99100
101
102
103
104
105
106
107
108
109
110
111
112
113
114115
116
117
118119
120121
122
123
124
125
126
127128
129
130
131
132
133
134 135
136
137138
139
140
141
142
143
144
145
146
147
148
149
150
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
3031
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
9293
94
95 9697
98
99100
101
102
103
104
105
106
107
108
109
110
111
112
113
114115
116
117
118119
120121
122
123
124
125
126
127128
129
130
131
132
133
134 135
136
137138
139
140
141
142
143
144
145
146
147
148
149
150
Real data (left) vs. Clustered (right) values.
Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 12 / 19
Hypergraph Data Operators Example: 2nd Neighbours and Node Degree
<gyankos,Jack Bergus>
<jsbach,Johann Sebastian Bach>
<handel,G F Haendel>
<mozart,W A Mozart>
<faux,P.D.Q. Bach>
1
1
1
1 1
Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 13 / 19
G ′′ ←Calcf (x)=1 as Count(ρ{NScreenName←ScreenName,NUserId←UserId}(G))
<Jack Bergus,gyankos,1>
<Johann Sebastian Bach,jsbach,1>
<G F Haendel,handel,1>
<W A Mozart,mozart,1>
<P.D.Q. Bach,faux,1>
1
1
1
1 1
Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 14 / 19
G ′′′ ← σPe 7→w(e)≥0.5(G onx ,y 7→ϕ(x)=ϕ(y)∨T [x ,y ,hasFriend]6=0 G ′′)
<jsbach,Johann Sebastian Bach,G F Haendel,handel,1>
<jsbach,Johann Sebastian Bach,P.D.Q. Bach,faux,1>
<gyankos,Jack Bergus,Jack Bergus,gyankos,1><handel,G F Haendel,G F Haendel,handel,1>
<mozart,W A Mozart,W A Mozart,mozart,1>
<gyankos,Jack Bergus,P.D.Q. Bach,faux,1>
<jsbach,Johann Sebastian Bach,Johann Sebastian Bach,jsbach,1>
<mozart,W A Mozart,G F Haendel,handel,1><faux,P.D.Q. Bach,Johann Sebastian Bach,jsbach,1>
<faux,P.D.Q. Bach,P.D.Q. Bach,faux,1>
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
1
1
0.5
0.50.5
0.5
0.5
0.5 0.5
0.5
1
0.5
0.5 0.5
1
0.5 0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
1
0.5
Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 15 / 19
G ′′′ ← σPe 7→w(e)≥0.25(Γsum(Count)-1 as Count
〈ScreenName,UserId〉 (G ′′′))
<W A Mozart,mozart,1>
<Jack Bergus,gyankos,1>
<P.D.Q. Bach,faux,1>
<Johann Sebastian Bach,jsbach,2>
<G F Haendel,handel,0>
0.75
0.62
0.25
0.38
0.58
0.25
0.5
0.28 0.67
Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 16 / 19
gSpanThe Original Algorithm
gSpan is a frequent subgraph mining algorithm that works in the followingway:�gSpan(DB, minsupp, Solution) {
Ξ←sort(FrequentEdges(DB,minsupp))Solution← Ξ;NStack ← Solution
5 while (g ← NStack.pop()) {if g 6= minDfsCode(g) continue;Solution← g∀e ∈ Ξ. if (e �re g)⇒ { // if e �re g is a rightmost expansion of g by eif GS(e �re g ,DB) ≥ minsupp
10 NStack.push(e �re g)}}
}� �Listing 1 : gSpan
Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 17 / 19
gSpanSpecialization Proposalhttps://github.com/jackbergus/gSpanExtended
Suppose to see DHImp = (db,T ) as a directed (binary) graph:
({ϕ(e)|t ∈ db ∧ e ∈ t}, {(e, f )|∃e, f , k.T [e, f , k] 6= 0})
We suppose that our HDM contains only one vertex per entityinstance. Hence, each vertex will have a unique label, that is its datarepresentation (D(e)) or index (ϕ(e)).Suppose that relation label λ(e) = λ(v ,w) = (ϕ(v),T(e), ϕ(w)),each edge has an unique label because each vertex is unique.gSpan algorithm guarantees that the minimal dfs code is unique foreach graph. This algorithm strengthens the fact.Subgraph isomorphism over distinct edges and vertices reduces to anapproximated ordered-subset test that could be implemented in lineartime.The overall original algorithm has a polynomial time complexity.
Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 18 / 19
Conclusions & Future work
ConclusionWe used R for experimenting data mining operations. Some Javaimplementations of the algebra were provided(https://github.com/jackbergus/hypergraphalgebra/)Hypergraph express n-ary relations, and change both data andrelations.Hypergraph problems could be reduced to graph ones.
Future WorkWe could study the time complexity of each data mining operators forhypergraphs and complete the algebra definition.We could set an environment where execute such hypergraph algebricoperations with distributed or parallel algorithms.
Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 19 / 19