Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu...

20
Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu

Transcript of Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu...

Page 1: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu.

Clustering PathwaysUsing Graph Mining Approach

Mahmud Shahriar HossainMonika AkbarPramodh PochuVenkata Sesha Sanagavarapu

Page 2: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu.

2

Design Pipeline

Preprocessor

Frequent Subgraph Discovery

Graph Objects of Pathways

Mined Data

Pathway Clustering

STKE Dataset

NN Search Pathway Relations

Page 3: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu.

3

Dataset Properties (size)

Total Pathways = 50

Size of Pathway, k

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

105

110

Nu

mb

er

of

k-e

dg

e p

ath

wa

ys

0

1

2

3

4

Page 4: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu.

4

Dataset Properties (size)

Total Pathways = 50

Size Range

0-1

0

11

-20

21

-30

31

-40

41

-50

51

-60

61

-70

71

-80

81

-90

91

-10

0

10

0-1

10

Nu

mb

er

of

Pa

thw

ays

in S

ize

Ra

ng

e

0

1

2

3

4

5

6

7

8

9

10

11

12

13

Page 5: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu.

5

pf-ipf (tf-idf)

Transaction Items bought

David Lopez Orange Juice (2), Potato chip (3), Pepsi (1)

Robbie Lamb Potato chip (3), Pepsi (3), Beer (1)

Jonathan Branden Potato chip (1), Pepsi (1)

John Paxton Potato chip (2), Coconut Cookies (2), Pepsi (1)

Rafal Angryk Swiss Army Knife (15)

Jeannete Radclif Potato chip (2), Coconut Cookies (3)

Rocky Ross Orange Juice (2), Coconut Cookies (3)

Richard MaClaster Coconut Cookies (3), Beer (1)

………… ……………………………….

Page 6: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu.

6

Dataset Properties (pf-ipf)

Number of Edges in MPG = 1376

min_pfipf

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

Nu

mb

er o

f ed

ges

left

0

200

400

600

800

1000

1200

1400

Page 7: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu.

7

Dataset Properties (pf-ipf)

Total Pathways=50

min_pfipf

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

Nu

mb

er o

f p

ath

way

s le

ft

20

25

30

35

40

45

50

Page 8: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu.

8

Subgraph Discovery

k # of Subgraphs generated

Time (sec.)

1 1,376 Existing

2 5,380 41

3 29,565 149

4 187,508 971

5 1274,852 7518

--- ---- -----

min_sup=2%

• What so novel about pruning edges?

Page 9: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu.

9

Subgraph Discovery

Contour graph for number of subgraphs

min_sup4 6 8 10 12 14 16 18 20

pf-

ipf

thre

sho

ld0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.10

1000 2000 3000 4000

0

1000

2000

3000

4000

5000

6000

46

810

1214

1618

20

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

Nu

mb

er o

f S

ub

gra

ph

s

min

_sup

pf-ipf threshold

Total Run: 10X9

0 1000 2000 3000 4000 5000 6000

Page 10: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu.

10

Subgraph Discovery

minsup= 4.0%min_tfidf= 0.01

k

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Tim

e (m

s)

0

50x103

100x103

150x103

200x103

250x103

300x103

350x103

400x103

FSGSEM

Page 11: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu.

11

Subgraph Discovery

minsup= 4.0%min_tfidf= 0.01

k

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Tim

e (m

s)

0

500

1000

1500

2000

2500

3000

FSGSEM

Page 12: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu.

12

Subgraph Discovery

minsup= 4.0%min_tfidf= 0.01

k

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

# o

f A

tem

pts

0

250000

500000

750000

1000000

1250000

FSGSEM

k Number of Subgraphs

Time Saved (%)

Attempts Saved(%)

2 186 99.83 98.983 246 98.33 86.154 305 98.57 86.385 323 98.95 86.916 313 98.96 85.647 279 98.88 83.258 263 98.67 78.919 292 98.38 74.76

10 364 98.58 74.7511 470 98.76 78.0812 608 99.04 81.8413 785 99.22 85.0214 980 99.38 87.6315 1117 99.48 89.4816 1075 99.53 90.2617 804 99.51 89.4018 430 99.34 85.2219 141 98.76 71.2220 20 96.15 9.1921 1 75.74 -574.47

Overall attempts saved = 89.52%Overall time saved = 99.39%

Page 13: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu.

13

Clustering

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

0.22

0.24

6 8 10 12 14 16 18 200.010.020.030.040.050.060.070.080.090.10

Ave

rag

e S

C

min_sup

pf-

ipf

thre

sho

ld

Average SC Mesh plot for 10 clusters using different min_sup and pf-ipf threshold

0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22

Page 14: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu.

14

Clustering

Average SC Contour Graph for 10 clusters using different min_sup and pf-ipf

min_sup

4 6 8 10 12 14 16 18 20

pf-

ipf

thre

sh

old

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.10

0.08 0.10 0.12 0.14 0.16 0.18 0.20

Page 15: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu.

15

Nearest Neighbors

Each bar indicates 100 execution time of NN search of a pathway

Sample Pathway

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Tim

e (

ms

)

0

2000

4000

6000

8000

10000

12000

14000

16000

Cover Tree Brute-force

Cover Tree andBrute-force method

Page 16: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu.

16

Pathway Relations (StoryTelling)

Bidirectional Search

S

p1

p2

p3

T

p7

p8

p9

Page 17: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu.

17

Pathway Relations (StoryTelling)

Numbers of varying length storiesfor different branching factor

Story length, t

3 4 5 6 7 8 9 10 11 12 13 14 15 16

Nu

mb

er

of

t-le

ng

th s

tori

es

0

50

100

150

200

250

300

350

b=2b=4b=6b=8

Page 18: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu.

18

Pathway Relations (StoryTelling)

Numbers of varying length storiesfor different branching factor

Story length, t

3 4 5 6 7 8 9 10 11 12 13 14 15 16

Nu

mb

er

of

t-le

ng

th s

tori

es

0

50

100

150

200

250

300

350

b=2b=3b=4b=5b=6b=7b=8b=9b=10

Page 19: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu.

19

Pathway Relations (StoryTelling)

Branching factor, b

2 3 4 5 6 7 8 9 10

To

tal s

tori

es f

rom

all

pa

irs

0

200

400

600

800

1000

Branching factor, b

2 3 4 5 6 7 8 9 10

Tim

e t

o g

ene

rate

all

sto

rie

s (

ms)

0.0

200.0x103

400.0x103

600.0x103

800.0x103

1.0x106

1.2x106

1.4x106

Branching factor, b

2 3 4 5 6 7 8 9 10

Len

gth

of

the

lon

ges

t s

tory

4

6

8

10

12

14

16

Page 20: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu.

20

Questions ???