Large Scale Topic Detection using Node-Cut Partitioning on ...
Transcript of Large Scale Topic Detection using Node-Cut Partitioning on ...
![Page 1: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/1.jpg)
Large Scale Topic Detection using Node-Cut Partitioning on Dense Weighted-Graphs
Kambiz GhoorchianŠarūnas Girdzijauskas
![Page 2: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/2.jpg)
• Motivation
• Solution
• Results
• Conclusion
2
![Page 3: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/3.jpg)
What is a Topic (Trending Topic)?
3
#ChewbaccaMom
![Page 4: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/4.jpg)
What is a Topic (Trending Topic)?
4
#ChewbaccaMom #Aylan
#uselections2016
#susanboyle
#Apple
#Wimbledon
#FacebookIsDown
#Superbowl
#Politics
#JobMarket
#Stefanlöfven#Sport #Euro2016
#TweetDeck
#FindingDory
رمضان#
#IranElection
#Immigration
#Russia
#Trump
![Page 5: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/5.jpg)
5
Why Topics (Trends) are Important?
![Page 6: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/6.jpg)
6
Why Topics (Trends) are Important?
![Page 7: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/7.jpg)
7
Why Topics (Trends) are Important?
![Page 8: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/8.jpg)
Given a large number of documents (e.g., tweets), how can we extract the
most frequent (significant) topics (trends)?
8
What is Topic Detection?
![Page 9: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/9.jpg)
Current Solutions
9
![Page 10: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/10.jpg)
Current Solutions
10
• Statistical Topic Modeling
• Machine Learning
![Page 11: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/11.jpg)
Current Solutions
11
• Statistical Topic Modeling
• Matrix Factorization
• Latent Dirichlet Allocation (LDA)[1]
• Hierarchical LDA (HLDA)
• Machine Learning
W1 W2 W3 W4 …D1 1 0 1 1 …D2 0 1 0 1 …D3 0 0 1 1 …
…Dn 1 1 0 1 …
Document-Term
T1 T2 T1 … TkW1 0.1 0.6 0.01 … 0.2W2 0.7 0.1 0.1 … 0.02W3 0.01 0.1 0.4 … 0.4
…Wm 0.2 0.4 0.4 … 0.0
Word-Topic
T1 T2 T1 … TkD1 0.1 0.6 0.01 … 0.2D2 0.7 0.1 0.1 … 0.02D3 0.01 0.1 0.4 … 0.4
…Dn 0.2 0.4 0.4 … 0.0
Document-Topic
1. David M. Blei, Andrew Y. Ng, Michael I. Jordan; “Latent Dirichlet Allocation” 3(Jan):993-1022, 2003.
![Page 12: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/12.jpg)
Current Solutions
12
• Statistical Topic Modeling
• Matrix Factorization
• Latent Dirichlet Allocation (LDA)[1]
• Hierarchical LDA (HLDA)
• Machine Learning
1. Document Modeling
• Vector Modeling
• Graph Modeling
2. Topic Detection
• Unsupervised - Clustering
• Supervised - Classification
W1 W2 W3 W4 …D1 1 0 1 1 …D2 0 1 0 1 …D3 0 0 1 1 …
…Dn 1 1 0 1 …
Document-Term
T1 T2 T1 … TkW1 0.1 0.6 0.01 … 0.2W2 0.7 0.1 0.1 … 0.02W3 0.01 0.1 0.4 … 0.4
…Wm 0.2 0.4 0.4 … 0.0
Word-Topic
T1 T2 T1 … TkD1 0.1 0.6 0.01 … 0.2D2 0.7 0.1 0.1 … 0.02D3 0.01 0.1 0.4 … 0.4
…Dn 0.2 0.4 0.4 … 0.0
Document-Topic
1. David M. Blei, Andrew Y. Ng, Michael I. Jordan; “Latent Dirichlet Allocation” 3(Jan):993-1022, 2003.
![Page 13: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/13.jpg)
Limitations
13
![Page 14: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/14.jpg)
Limitations• Sparsity
• Short messages have Less informative co-occurrence patterns which results in[1]:
1. False segmentation of topics.
2. Difficulty in identification of ambiguous words (Apple, Computer vs Fruit).
14
[1] - Liangjie et al, “Empirical Study of Topic Modeling in Twitter. SOMA 2010”
[2] - http://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/
![Page 15: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/15.jpg)
Limitations• Sparsity
• Short messages have Less informative co-occurrence patterns which results in[1]:
1. False segmentation of topics.
2. Difficulty in identification of ambiguous words (Apple, Computer vs Fruit).
• Dynamism
• Constant emergent of New phrases or Acronyms
• (e.g., Selfie, Unlike, Phablet, IAVS = I am very sorry, IWSN = I want sex now).
15
[1] - Liangjie et al, “Empirical Study of Topic Modeling in Twitter. SOMA 2010”
[2] - http://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/
![Page 16: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/16.jpg)
Limitations• Sparsity
• Short messages have Less informative co-occurrence patterns which results in[1]:
1. False segmentation of topics.
2. Difficulty in identification of ambiguous words (Apple, Computer vs Fruit).
• Dynamism
• Constant emergent of New phrases or Acronyms
• (e.g., Selfie, Unlike, Phablet, IAVS = I am very sorry, IWSN = I want sex now).
• Scalability
• 310M active-users/month [2]
• 500M messages/day [2]
16
[1] - Liangjie et al, “Empirical Study of Topic Modeling in Twitter. SOMA 2010”
[2] - http://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/
![Page 17: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/17.jpg)
Solution
17
Unsupervised learning: 1-Graph Modeling 2-Node-cut Partitioning
![Page 18: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/18.jpg)
DocumentsD1D2D3D4D5D6…
18
SolutionUnsupervised learning: 1-Graph Modeling 2-Node-cut Partitioning
![Page 19: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/19.jpg)
DocumentsD1D2D3D4D5D6…
19
1 - Graph Modeling
SolutionUnsupervised learning: 1-Graph Modeling 2-Node-cut Partitioning
![Page 20: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/20.jpg)
Random Indexing Knowledge Base
Word RI VectorW1 V1W2 V2W3 V3W4 V4W5 V5W6 V6W7 V7W8 V8…. …
DocumentsD1D2D3D4D5D6…
20
1 - Graph Modeling
SolutionUnsupervised learning: 1-Graph Modeling 2-Node-cut Partitioning
![Page 21: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/21.jpg)
Random Indexing Knowledge Base
Word RI VectorW1 V1W2 V2W3 V3W4 V4W5 V5W6 V6W7 V7W8 V8…. …
DocumentsD1D2D3D4D5D6…
2 - Node-Cut Partitioning
21
1 - Graph Modeling
SolutionUnsupervised learning: 1-Graph Modeling 2-Node-cut Partitioning
![Page 22: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/22.jpg)
1 - Graph Modeling using Random Indexing
22
![Page 23: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/23.jpg)
Random Indexing (RI)• Is a dimensionality reduction method (similar to hashing).
23 23
![Page 24: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/24.jpg)
Random Indexing (RI)• Is a dimensionality reduction method (similar to hashing).
24 24
DocumentsD1 = {W1, W4, W8, …}
D2D3D4D5D6…
![Page 25: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/25.jpg)
Random Indexing (RI)• Is a dimensionality reduction method (similar to hashing).
25 25
Random Indexing Knowledge Base
Word
RI VectorW1 V1 = {a1, b1, c1, d1, e1, f1}W2W3W4 V4 = {a4, b4, c4, d4, e4, f4}W5W6W7W8 V8 = {a8, b8, c8, d8, e8, f8}…. …
DocumentsD1 = {W1, W4, W8, …}
D2D3D4D5D6…
Random Indexing
![Page 26: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/26.jpg)
Random Indexing (RI)• Is a dimensionality reduction method (similar to hashing).
26 26
Random Indexing Knowledge Base
Word
RI VectorW1 V1 = {a1, b1, c1, d1, e1, f1}W2W3W4 V4 = {a4, b4, c4, d4, e4, f4}W5W6W7W8 V8 = {a8, b8, c8, d8, e8, f8}…. …
DocumentsD1 = {W1, W4, W8, …}
D2D3D4D5D6…
Random Indexing
1. Unique
2. Fixed length
3. Captures Co-occurrence patterns of the words
![Page 27: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/27.jpg)
Random Indexing (RI)• Is a dimensionality reduction method (similar to hashing).
27 27
Random Indexing Knowledge Base
Word
RI VectorW1 V1 = {a1, b1, c1, d1, e1, f1}W2W3W4 V4 = {a4, b4, c4, d4, e4, f4}W5W6W7W8 V8 = {a8, b8, c8, d8, e8, f8}…. …
DocumentsD1 = {W1, W4, W8, …}
D2D3D4D5D6…
Random Indexing
1. Unique
2. Fixed length
3. Captures Co-occurrence patterns of the words
![Page 28: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/28.jpg)
Graph Modeling
28
![Page 29: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/29.jpg)
Graph Modeling
29
Documents
D1 = {W1, W4, W8, …}
D2 = {W2, W3, W7, …}
D3 = {W4, W1, W3, …}
D4 = {W2, W6, W9, …}
D5 = {W3, W4, W8, …}
D6 = {W1, W3, W7, …}
…
RI - Knowledge Base
Word
RI VectorW1 V1 = {a1, b1, c1, d1, e1, f1}W2 V2 = {a2, b2, c2, d2, e2, f2}W3 V3 = {a3, b3, c3, d3, e3, f3}W4 V4 = {a4, b4, c4, d4, e4, f4}W5 V5 = {a5, b5, c5, d5, e5, f5}W6 V6 = {a6, b6, c6, d6, e6, f6}W7 V7 = {a7, b7, c7, d7, e7, f7}W8 V8 = {a8, b8, c8, d8, e8, f8}…. …
![Page 30: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/30.jpg)
Graph Modeling
30
Documents
D1 = {W1, W4, W8, …}
D2 = {W2, W3, W7, …}
D3 = {W4, W1, W3, …}
D4 = {W2, W6, W9, …}
D5 = {W3, W4, W8, …}
D6 = {W1, W3, W7, …}
…
RI - Knowledge Base
Word
RI VectorW1 V1 = {a1, b1, c1, d1, e1, f1}W2 V2 = {a2, b2, c2, d2, e2, f2}W3 V3 = {a3, b3, c3, d3, e3, f3}W4 V4 = {a4, b4, c4, d4, e4, f4}W5 V5 = {a5, b5, c5, d5, e5, f5}W6 V6 = {a6, b6, c6, d6, e6, f6}W7 V7 = {a7, b7, c7, d7, e7, f7}W8 V8 = {a8, b8, c8, d8, e8, f8}…. …
![Page 31: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/31.jpg)
Graph Modeling
31
Documents
D1 = {W1, W4, W8, …}
D2 = {W2, W3, W7, …}
D3 = {W4, W1, W3, …}
D4 = {W2, W6, W9, …}
D5 = {W3, W4, W8, …}
D6 = {W1, W3, W7, …}
…
RI - Knowledge Base
Word
RI VectorW1 V1 = {a1, b1, c1, d1, e1, f1}W2 V2 = {a2, b2, c2, d2, e2, f2}W3 V3 = {a3, b3, c3, d3, e3, f3}W4 V4 = {a4, b4, c4, d4, e4, f4}W5 V5 = {a5, b5, c5, d5, e5, f5}W6 V6 = {a6, b6, c6, d6, e6, f6}W7 V7 = {a7, b7, c7, d7, e7, f7}W8 V8 = {a8, b8, c8, d8, e8, f8}…. …
a b
c
e
b
f c
d
a b
f
e d
![Page 32: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/32.jpg)
Graph Modeling
32
Documents
D1 = {W1, W4, W8, …}
D2 = {W2, W3, W7, …}
D3 = {W4, W1, W3, …}
D4 = {W2, W6, W9, …}
D5 = {W3, W4, W8, …}
D6 = {W1, W3, W7, …}
…
RI - Knowledge Base
Word
RI VectorW1 V1 = {a1, b1, c1, d1, e1, f1}W2 V2 = {a2, b2, c2, d2, e2, f2}W3 V3 = {a3, b3, c3, d3, e3, f3}W4 V4 = {a4, b4, c4, d4, e4, f4}W5 V5 = {a5, b5, c5, d5, e5, f5}W6 V6 = {a6, b6, c6, d6, e6, f6}W7 V7 = {a7, b7, c7, d7, e7, f7}W8 V8 = {a8, b8, c8, d8, e8, f8}…. …
a b
c
e
b
f c
d
a b
f
e d
a b
f c
e d
![Page 33: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/33.jpg)
Graph Modeling
a
f
e d
33
e
b
f c
d
a b
f c
e d
a
e
c
d
Documents
D1 = {W1, W4, W8, …}
D2 = {W2, W3, W7, …}
D3 = {W4, W1, W3, …}
D4 = {W2, W6, W9, …}
D5 = {W3, W4, W8, …}
D6 = {W1, W3, W7, …}
…
RI - Knowledge Base
Word
RI VectorW1 V1 = {a1, b1, c1, d1, e1, f1}W2 V2 = {a2, b2, c2, d2, e2, f2}W3 V3 = {a3, b3, c3, d3, e3, f3}W4 V4 = {a4, b4, c4, d4, e4, f4}W5 V5 = {a5, b5, c5, d5, e5, f5}W6 V6 = {a6, b6, c6, d6, e6, f6}W7 V7 = {a7, b7, c7, d7, e7, f7}W8 V8 = {a8, b8, c8, d8, e8, f8}…. …
![Page 34: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/34.jpg)
Graph Modeling
34
Documents
D1 = {W1, W4, W8, …}
D2 = {W2, W3, W7, …}
D3 = {W4, W1, W3, …}
D4 = {W2, W6, W9, …}
D5 = {W3, W4, W8, …}
D6 = {W1, W3, W7, …}
…
RI - Knowledge Base
Word
RI VectorW1 V1 = {a1, b1, c1, d1, e1, f1}W2 V2 = {a2, b2, c2, d2, e2, f2}W3 V3 = {a3, b3, c3, d3, e3, f3}W4 V4 = {a4, b4, c4, d4, e4, f4}W5 V5 = {a5, b5, c5, d5, e5, f5}W6 V6 = {a6, b6, c6, d6, e6, f6}W7 V7 = {a7, b7, c7, d7, e7, f7}W8 V8 = {a8, b8, c8, d8, e8, f8}…. …
![Page 35: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/35.jpg)
Graph Modeling
35
Documents
D1 = {W1, W4, W8, …}
D2 = {W2, W3, W7, …}
D3 = {W4, W1, W3, …}
D4 = {W2, W6, W9, …}
D5 = {W3, W4, W8, …}
D6 = {W1, W3, W7, …}
…
RI - Knowledge Base
Word
RI VectorW1 V1 = {a1, b1, c1, d1, e1, f1}W2 V2 = {a2, b2, c2, d2, e2, f2}W3 V3 = {a3, b3, c3, d3, e3, f3}W4 V4 = {a4, b4, c4, d4, e4, f4}W5 V5 = {a5, b5, c5, d5, e5, f5}W6 V6 = {a6, b6, c6, d6, e6, f6}W7 V7 = {a7, b7, c7, d7, e7, f7}W8 V8 = {a8, b8, c8, d8, e8, f8}…. …
2 - Node-Cut Partitioning
![Page 36: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/36.jpg)
2 - Node-Cut Partitioning
36
![Page 37: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/37.jpg)
Node-Cut PartitioningJa-Be-Ja-VC[1]
balanced,
k-way partitioning
for un-weighted graphs
based on node-cut minimization.
37
1. F Rahimian, AH Payberah, S Girdzijauskas, S Haridi: Distributed Vertex-cut Partitioning, in Distributed Applications and Interoperable Systems, 186-200, 2014.
![Page 38: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/38.jpg)
Node-Cut Partitioning
38
![Page 39: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/39.jpg)
39
Random Initialization
k = 2
Node-Cut Partitioning
![Page 40: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/40.jpg)
40
Random Initialization Iteration
e e’
k = 2
HeatGain
C = BlueC’ = Red
Node-Cut Partitioning
![Page 41: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/41.jpg)
41
Random Initialization Iteration
e e’e e’
k = 2
HeatGain
C = BlueC’ = Red
Node-Cut Partitioning
![Page 42: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/42.jpg)
42
Random Initialization Iteration Iteration
e e’e e’e
e’e
e’
k = 2
HeatGain
C = BlueC’ = Red
Node-Cut Partitioning
![Page 43: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/43.jpg)
43
Random Initialization Iteration Iteration
e e’e e’e
e’e
e’
k = 2
HeatGain
C = BlueC’ = Red Minimum Cut Size
Node-Cut Partitioning
![Page 44: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/44.jpg)
• Same Utility Function
• Weighted Gain factor
• Weighted Cut
Modifications
44 44
HeatGain
![Page 45: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/45.jpg)
Modifications
45 45
5 , 5
e e’e
5 , 5Un-Weighted Graph
11 , 11
e1
133 1 1
1
5
1
3
3
13 , 9
e e’1
133 1 1
1
5
1
3
3
Weighted Graph
![Page 46: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/46.jpg)
Modifications
46 46
5 , 5
e e’e
5 , 5Un-Weighted Graph
11 , 11
e1
133 1 1
1
5
1
3
3
11 , 11
ee’
1
13
3 1 11
5
1
3
3
Weighted Graph
11 , 11
e1
133 1 1
1
5
1
3
3
13 , 9
e e’1
133 1 1
1
5
1
3
3
Weighted Graph
![Page 47: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/47.jpg)
Modifications
47 47
5 , 5
e e’e
5 , 5Un-Weighted Graph
11 , 11
e1
133 1 1
1
5
1
3
3
13 , 9
e e’1
133 1 1
1
5
1
3
3
Weighted Graph
1. Scalability
2. Convergence
11 , 11
e1
133 1 1
1
5
1
3
3
11 , 11
ee’
1
13
3 1 11
5
1
3
3
Weighted Graph
![Page 48: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/48.jpg)
Modifications
48 48
11 , 11
13 , 9e
1
1
3
3
1
11
5
1
3
3
e e’
1
1
3
3
1
11
5
1
3
3
![Page 49: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/49.jpg)
Modifications
49 49
11 , 11
13 , 9e
1
1
3
3
1
11
5
1
3
3
e e’
1
1
3
3
1
11
5
1
3
3
12 , 10
ee’1
1
1
3
3
1
11
5
1
3
3
e’2
![Page 50: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/50.jpg)
Experiments
50
![Page 51: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/51.jpg)
Experiments1. Accuracy (Quantitative)
• SNAP Twitter Trending Topics from 2009 [1]
• EXP1 - 3 Topics
• 2531 Documents
• K = 100
• Sam = 20%
• EXP2 - 8 Topics
• 23175 Documents
• K = 100
• Sam = 20%
A. Scalability (Qualitative)
• TREC Tweets 2011 - 16M Tweets [2]
• EXP3
• 275336 Documents
51
SNAP Twitter 2009
Topic Acronym EXP1 EXP2
Harry Potter (HP) HP 1457 —
American Idol (AI) AI — 4241
Dollhouse (DH) DH — 1262
Slumdog Milliner (SM) SM — 280
Susan Boyle (SB) SB 555 992
Swine Flue (SF) SF 519 1944
Tiger Wood (TW) TW — 2242
Tweetdeck (TD) TD — 5860
Wimbledon (WI) WI — 6354
1. https://snap.stanford.edu/data/ 2. http://trec.nist.gov/data/tweets/
![Page 52: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/52.jpg)
Experiments
52
• Comparison
• GibsLDA - baseline [1]
• BiTerm - Best known solution[2]
1. David M. Blei, Andrew Y. Ng, Michael I. Jordan; “Latent Dirichlet Allocation” 3(Jan):993-1022, 2003. 2. Yan, Xiaohui and Guo, Jiafeng and Lan, Yanyan and Cheng, Xueqi, “A Biterm Topic Model for Short Texts”, WWW ’13.
![Page 53: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/53.jpg)
Experiments - Evaluation• F1-Score (Quantitative)
• Average Coherence Score (Qualitative)
53
= [0 1]
= [Log(k/n) Log(1+k/n)]= [- ∞ 0.000001]
![Page 54: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/54.jpg)
54
EXP1 - SNAP 3 Topics - F-ScoreBi
Term
LDA
Our
’s
![Page 55: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/55.jpg)
55
EXP2 - SNAP 8 Topics - F-ScoreLD
ABi
Term
Our
’s
![Page 56: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/56.jpg)
• Tweets 300K
• Edges 7,9M
• Vertices 4000
• Avg_Deg 3948
• Partitions 500
• Duration
• LDA 1684s
• BiTerm 1973s
• Our Algorithm 7000s (Centralized)
56
EXP3 - Twitter Large Large Dataset - Average Coherence Score - K=500
Num Top Words 20 10 5
LDA -637.75 -162.96 -41.52
BiTerm -597.5 -143.45 -34.3
Our Algorithm -582.0 -166.15 -49.59
EXP3 - TREC - Coherency
![Page 57: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/57.jpg)
57
EXP1 - Twitter 3 Topics - Average Coherency Score - K=100
Num Top Words 20 10 5
LDA -37.94 -15.85 -5.3
BiTerm -32.05 -12.57 -4.32
Our Algorithm -20.62 -9.12 -3.25
EXP1 - SNAP 3 Topics - Coherency
• Tweets 2K
• Edges 2.3M
• Vertices 3994
• Avg_Deg 1175
• Partitions 100
• Duration
• LDA 1.3s
• BiTerm 2s
• Our Algorithm 6000s (Centralized)
![Page 58: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/58.jpg)
58
EXP1 - Twitter 8 Topics - Average Coherence Score - K=100
Num Top Words 20 10 5
LDA -162.89 -52.52 -13.88
BiTerm -141.37 -42.16 -11.15
Our Algorithm -124.67 -37.24 -9.18
EXP2 - SNAP 8 Topics - Coherency
• Tweets 2K
• Edges 7,5M
• Vertices 4000
• Avg_Deg 3779
• Partitions 100
• Duration
• LDA 7S
• BiTerm 24S
• Our Algorithm 6000s (Centralized)
![Page 59: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/59.jpg)
59
Scalability
Duration Growth RatePe
rcen
tage
![Page 60: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/60.jpg)
• Achievements
• Efficient and scalable solution for topic detection.
• Solves Sparsity and Dynamism using RI Knowledge-base
• Meets Scalability using Graph Partitioning
• Future work
• Enhance initialization and language modeling
• Extend the algorithm to an streaming model since Graph construction is incremental
60
Conclusion
![Page 61: Large Scale Topic Detection using Node-Cut Partitioning on ...](https://reader031.fdocuments.in/reader031/viewer/2022022723/621ad1c8489a4c79e3163356/html5/thumbnails/61.jpg)
Thank You
Questions?
Bibliography1. Sahlgren, M. (2005) An Introduction to Random Indexing, Proceedings of the Methods and Applications of Semantic Indexing Workshop at the 7th
International Conference on Terminology and Knowledge Engineering, TKE 2005, August 16, Copenhagen, Denmark. 2. Kanevara, P: Sparse Distributed Memory and Related Models. Associative Neural Memories, Oxford University Press, 1993. 3. Kanerava, P., Kristoferson, J., and Holst, A. (2000). Random indexing of text samples for latent semantic analysis. In Gleitman, L. R. and Josh, A. K.,
editors, Proceedings of the 22nd Annual Conference of the Cognitive Science Society, page 1036, Mahwah, New Jersey. Erlbaum. 4. Johnson, W. and Lindenstrauss, J. (1984). Extensions of Lipschitz mappings into a Hil- bert space. In Beals, R., Beck, A., Bellow, A., and Hajian, A.,
editors, Conference on Modern Analysis and Probability (1982: Yale University), volume 26 of Con- temporary Mathematics, pages 189–206. American Mathematical Society.
5. K Ghoorchian, F Rahimian, S Girdzijauskas: Semi Supervised Multiple Disambiguation, Trustcom/BigDataSE/ISPA, 2015 IEEE 2, 88-95.
img1. Img 1 - http://www.studerasmart.nu/wp-content/uploads/2012/04/jobb-och-cv.png 2. Img 2 - http://gfx2.aftonbladet-cdn.se/image/19456728/485/normal/efc46e3660c6c/hedenmo3.jpg 3. Img 3 - http://cdn01.nyheter24.se/c4ab6c0402fa00a700/2014/04/03/941973/Sk%C3%A4rmavbild%202014-04-03%20kl.%2020.54.47.png 4. Img 4 - http://ericagelfandlaw.com/wp-content/uploads/2015/12/immigration.jpg
61