Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong.
Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm...
-
date post
21-Dec-2015 -
Category
Documents
-
view
218 -
download
1
Transcript of Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm...
![Page 1: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/1.jpg)
Copyright © 2005 by Limsoon Wong
Convexity in Itemset Spaces
Limsoon WongInstitute for Infocomm Research
![Page 2: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/2.jpg)
Copyright © 2005 by Limsoon Wong
Plan
• Frequent itemsets– Convexity– Equivalence classes, generators, & closed
patterns– Plateau representation– Efficient mining of generators & closed
patterns
• Emerging patterns• Odds ratio patterns • Relative risk patterns
![Page 3: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/3.jpg)
Copyright © 2005 by Limsoon Wong
Frequent Itemsets
![Page 4: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/4.jpg)
Copyright © 2005 by Limsoon Wong
Association Rules
• Buyer’s behaviour in supermarket
• Mgmt are interested in rules such as
![Page 5: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/5.jpg)
Copyright © 2005 by Limsoon Wong
Frequent Itemsets
• List of items: I = {a, b, c, d, e, f}
• List of transactions: T = {T1, T2, T3, T4, T5}• T1 = {a, c, d}
• T2 = {b, c, e}
• T3 = {a, b, c, e, f}
• T4 = {b, e}
• T5 = {a, b, c, e}
• For each itemset I I, sup(I,T) = |{ Ti T | I Ti}|
• Freq itemsets: FT = F(ms,T) ={I I | sup(I,T) ms}
![Page 6: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/6.jpg)
Copyright © 2005 by Limsoon Wong
• Freq itemset from our example:
• A priori property: I FT I’ I, I’ FT
A Priori Property
ms=2
![Page 7: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/7.jpg)
Copyright © 2005 by Limsoon Wong
Lattice of Freq Itemsets
• FT can be very large
• Is there a concise rep?• Observation:
– {a, b, c, e} is maximal– { } is minimal– everything else is betw them
{ }, {a, b, c, e} a concise rep for FT?
![Page 8: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/8.jpg)
Copyright © 2005 by Limsoon Wong
Convexity
• An itemset space S is convex if, for all X, Y S st X Y, we have Z S whenever X Z Y
• An itemset X is most general in S if there is no proper subset of X in S. These itemsets form the left bound L of S
• An itemset is most specific in S if there is no proper superset of X in S.These itemsets form the right bound R of S
L, R is a concise rep of S• [L, R] = { Z | X L, Y R, X Z Y} = S
![Page 9: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/9.jpg)
Copyright © 2005 by Limsoon Wong
Convexity of Freq Itemsets
• Proposition 1: The freq itemset space is convex
L, R is a concise rep for a freq itemset space
![Page 10: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/10.jpg)
Copyright © 2005 by Limsoon Wong
Is it good enough?
{ }, {a, b, c, e} can be a concise rep for FT
• But we cant get support values for elems in FT
![Page 11: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/11.jpg)
Copyright © 2005 by Limsoon Wong
What is a good concise rep?
• A good concise rep for FT should enable these tasks below efficiently, w/o accessing T again:– Task 1: Enumerate {I FT}
– Task 2: Enumerate {(I, sup(I,T)) | I FT }
– Task 3: Given I, decide if I FT, & if so report sup(I,T)
– Task 4: Enumerate itemsets w/ sup in a given range
– etc.
![Page 12: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/12.jpg)
Copyright © 2005 by Limsoon Wong
Closed Itemset Rep• A pattern is a closed pattern if each of its
supersets has a smaller support than it
• The closed itemset rep of FT is
CR ={ (I, sup(I,T)) | I FT, I is closed pattern}
• Proposition 2: {(I, sup(I,T)) | I FT} =
{(I, max{sup(I’, T) | (I’, sup(I’,T)) CR, I I’}) | I FT}
May be inefficient for Tasks 2, 3, 4
![Page 13: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/13.jpg)
Copyright © 2005 by Limsoon Wong
Generator Rep
• A pattern is a generator if each of its subsets has a larger support than it
• The generator rep of FT is
GR ={(I, sup(I,T)) | I FT, I is generator}, GBd-
where GBd- are the min in-freq itemsets
• Proposition 3: {(I, sup(I,T)) | I FT} =
{(I, min{sup(I’,T) | I’ GR, I’ I}) | I FT} May be inefficient for Tasks 2, 3, 4
![Page 14: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/14.jpg)
Copyright © 2005 by Limsoon Wong
• Decompose freq itemset lattice into plateaus wrt itemset support, S = i Pi,
with Pi = {I S | sup(I,T) = i}
• Proposition 6: Each Pi is convex
S = i [Li, Ri], where [Li, Ri] = Pi
Freq Itemset Plateaus
![Page 15: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/15.jpg)
Copyright © 2005 by Limsoon Wong
From Generators & Closed Patterns To Equivalence Classes• The equivalence class of an itemset I is
[I]T = { I’ | { Ti T | I’ Ti} = {Tj T | I Tj}}
• Proposition 4: [I]T is convex. Furthermore, if [L,R] = [I]T, then L = min [I]T, and R = max [I]T is a singleton
• Proposition 5:– An itemset I is a generator iff I min [I]T
– An itemset I is a closed pattern iff I max [I]T
![Page 16: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/16.jpg)
Copyright © 2005 by Limsoon Wong
Plateaus = Generators + Closed Patterns• Theorem 7:
Let [Li,Ri] = Pi be a freq itemset plateau of FT. Then
– Pi = [X1]T … … [Xk]T, where Ri = {X1, …, Xk}
– Ri are the closed patterns in Pi
– Li = i min [Xi]T are the generators in Pi
![Page 17: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/17.jpg)
Copyright © 2005 by Limsoon Wong
Freq Itemset Plateau Rep• The freq itemset plateau rep of FT is
PR = {(Li, Ri,i) | i ms}
where [Li,Ri] is plateau at support level i in FT
• Proposition 8: {(I, sup(I,T)) | I FT} =
{(I, i)| (Li, Ri, i) PR,
X Li, Y Ri, X I Y} All 4 tasks are obviously efficient
![Page 18: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/18.jpg)
Copyright © 2005 by Limsoon Wong
Remarks
• PR is a good concise rep for freq itemsets• PR is more flexible compared to other
reps• PR unifies diff notions used in data
mining
• Nice ... But can we mine PR fast?
![Page 19: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/19.jpg)
Copyright © 2005 by Limsoon Wong
Mining PR Fast
• To mine PR fast, mine its borders fast• To mine its borders fast, mine equiv classes in
the plateau fast• To mine equiv classes fast, mine generators &
closed patterns of equivalence classes fast
![Page 20: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/20.jpg)
From SE-Tree To Trie To FP-Tree
{}
b c da
ab ac ad
abc abd
abcd
acd
bc bd
bcd
cd
SE-tree of possibleitemsets
TT1 = {a,c,d}T2 = {b,c,d}T3 = {a,b,c,d}T4 = {a,d}
Copyright © 2005 by Limsoon Wong
.
. . ..
. . •
. .
•
•
. .
•
.
a
b
c
d
d
c
d
b
cd
d
d
c
d
d
Trie of transactions
<1: right-to-left,top-to-bottomtraversal of SE-tree
abcd
FP-tree head table
![Page 21: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/21.jpg)
Copyright © 2005 by Limsoon Wong
GC-growth: Fast Simultaneous Mining of Generators & Closed Patterns
![Page 22: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/22.jpg)
Step 1: FP-tree construction
Copyright © 2005 by Limsoon Wong
![Page 23: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/23.jpg)
Step 2: Right-to-left, top-to-bottom traversal
Copyright © 2005 by Limsoon Wong
![Page 24: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/24.jpg)
Step 5: Confirm Xi is generator
Copyright © 2005 by Limsoon Wong
Proposition 9:Generators enjoy the apriori property. That is every subsetof a generator is also a generator
![Page 25: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/25.jpg)
Step 7: Find closed pattern of Xi
Copyright © 2005 by Limsoon Wong
Proposition 10:Let X be a generator. Then theclosed pattern of X is {X’’|X’H[last(X)],X X’, X’ prefixof X’’, T[X’’] = true}.
![Page 26: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/26.jpg)
Correctness of GC-growth
• Theorem 11:GC-growth is sound and complete for mining generators and closed patterns
Copyright © 2005 by Limsoon Wong
![Page 27: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/27.jpg)
Copyright © 2005 by Limsoon Wong
Performance ofGC-growth
• GC-growth is mining both generators and closed patterns
• But is comparable in speed to the fastest algorithms that mined only closed patterns
![Page 28: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/28.jpg)
Copyright © 2005 by Limsoon Wong
Emerging Patterns
![Page 29: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/29.jpg)
0%
edible mushrooms poisonous mushrooms
EPs
x%
Example: {odor=none, gill_size=broad, ring_number=1} 64% (edible) vs 0% (poisonous)
Differentiation and Contrast
Copyright © 2005 by Limsoon Wong
![Page 30: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/30.jpg)
Copyright © 2005 by Limsoon Wong
NB: For this talk, we restrict ourselves to “jumping” emerging patterns
Emerging Patterns
• An emerging pattern is a set of conditions– usually involving several features– that most members of a class P satisfy – but none or few of the other class N satisfy
I is emerging pattern if sup(I,P) / sup(I,N) > k, for some fixed threshold k
![Page 31: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/31.jpg)
Copyright © 2005 by Limsoon Wong
Convexity of Emerging Patterns• Theorem 12:
Let E be an EP space and Pi = { I E | sup(I) = i}. Then E = i Pi, E is convex, and each Pi is convex. That is, E can be decomposed into convex plateaus
![Page 32: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/32.jpg)
Copyright © 2005 by Limsoon Wong
EP Plateau Rep
• A concise rep for E = i Pi is EP plateau rep:
EP_PR = { (Li, Ri, i) | [Li, Ri] = Pi}
• Proposition 13: {(I, sup(I)) | I E} =
{ (I, i) | (Li, Ri, i) EP_PR,
X Li, Y Ri, X I Y}
All 4 tasks are obvious efficient
![Page 33: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/33.jpg)
Efficient Mining of EP_PR• Modify GC-growth so
that for each equiv class C, it outputs its support in +ve transactions Spos[C] & in -ve transactions Sneg[C]
• Then [R[C], C] are emerging patterns if Spos[C] / Sneg[C] > k
Copyright © 2005 by Limsoon Wong
NB. Assume the threshold for EP is k
![Page 34: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/34.jpg)
Copyright © 2005 by Limsoon Wong
Odds Ratio Patterns
![Page 35: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/35.jpg)
0%
edible mushrooms poisonous mushrooms
EPs
x%
Example: {odor=none, gill_size=broad, ring_number=1} 64% (edible) vs 0% (poisonous)
Is an emerging pattern that is absent in most of the positive transactions a “real” pattern?
Copyright © 2005 by Limsoon Wong
What if this is 4%? 0.4%? 0.04%?
![Page 36: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/36.jpg)
Copyright © 2005 by Limsoon Wong
Odds Ratio
• Odds ratio for a (compound) factor P in a case-control study D is
OR(P,D) = (PD,ed / PD,-d) / (PD,e- / PD,--)
P is a odds ratio pattern if OR(P,D) > k, for some threshold k
![Page 37: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/37.jpg)
Copyright © 2005 by Limsoon Wong
Nonconvexity of Odds Ratio Pattern Space
• Proposition 14:Let Sk
OR(ms,D) = { P F(ms,D) | OR(P,D) k}. Then Sk
OR(ms,D) is not convex
![Page 38: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/38.jpg)
Convexity of Odds Ratio Pattern Space Plateaus• Theorem 15:
Let Sn,kOR(ms,D) =
{ P F(ms,D) | PD,ed=n, OR(P,D) k}. Then Sn,k
OR(ms,D) is convex
The space of odds ratio patterns is not convex in general, but becomes convex when stratified into plateaus based on support levels
The space of odds ratio patterns can be concisely represented by plateau borders
Copyright © 2005 by Limsoon Wong
![Page 39: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/39.jpg)
Copyright © 2005 by Limsoon Wong
How do you find these fast is key!
Efficient Mining ofOdds Ratio Pattern Space Plateaus
GC-growth can find these fast :-)
![Page 40: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/40.jpg)
Copyright © 2005 by Limsoon Wong
Performance
• FPClose* and CLOSET+ – closed patterns only
• Our method computes – closed patterns– generators, and– odds ratio patterns (OR >
2.5)
Patterns that are much more statistically sophisticated than frequent patterns can now be mined efficiently
![Page 41: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/41.jpg)
Copyright © 2005 by Limsoon Wong
Relative Risk Patterns
![Page 42: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/42.jpg)
Copyright © 2005 by Limsoon Wong
RelativeRisk
• Relative risk for a (compound) factor P in a prospective study D is
P is a relative risk pattern if RR(P,D) > k, for some threshold k
![Page 43: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/43.jpg)
Copyright © 2005 by Limsoon Wong
Nonconvexity of Relative Risk Pattern Space
• Proposition 16:Let Sk
RR(ms,D) = { P F(ms,D) | RR(P,D) k}. Then Sk
RR(ms,D) is not convex
![Page 44: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/44.jpg)
Convexity of Relative Risk Pattern Space Plateaus• Theorem 17:
Let Sn,kRR(ms,D) =
{ P F(ms,D) | PD,ed=n, RR(P,D) k}. Then Sn,k
RR(ms,D) is convex
The space of relative risk patterns is not convex in general, but becomes convex when stratified into plateaus based on support levels
The space of relative risk patterns can be concisely represented by plateau borders
Copyright © 2005 by Limsoon Wong
![Page 45: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/45.jpg)
Copyright © 2005 by Limsoon Wong
How do you find these fast is key!
Efficient Mining of Relative Risk Pattern Space Plateaus
GC-growth can find these fast :-)
x := RR(R,D);
![Page 46: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/46.jpg)
Copyright © 2005 by Limsoon Wong
Concluding Remarks
• Equiv classes & plateaus are fundamental in– Frequent itemsets– Emerging patterns– Odds ratio patterns – Relative risk patterns, ...
• Equiv classes & plateaus of these complex patterns are convex spaces
Complex pattern spaces are concisely representable by borders
Complex pattern spaces can be efficiently and completely mined
![Page 47: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/47.jpg)
Copyright © 2005 by Limsoon Wong
Future Works
![Page 48: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/48.jpg)
Copyright © 2005 by Limsoon Wong
• Impact of item ordering
• Impact of pushing complex statistical filters deeper into equivalence class generators
Generate bordersof equiv classes & support levels
Test for odds ratio
Test for relative
risk
Test for 2
Improve Implementations
• Modular pattern mining by construction of a fast equiv class generator and multiple statistical condition filters
![Page 49: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/49.jpg)
Copyright © 2005 by Limsoon Wong
• Simple ensemble
• PCL
Apply to Classification
• Develop classifiers based on the mined patterns– Simple ensemble– PCL
• Impact on accuracy of using generators vs closed patterns
Argmaxc C
r Rc,
r > 50% accuracy
r(X)f(X) =
![Page 50: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/50.jpg)
Copyright © 2005 by Limsoon Wong
Enrich Data Mining Foundations• Increase statistical
sophistication of patterns mined
• Increase dimensions and size of data handled
![Page 51: Copyright © 2005 by Limsoon Wong Convexity in Itemset Spaces Limsoon Wong Institute for Infocomm Research.](https://reader031.fdocuments.in/reader031/viewer/2022013011/56649d645503460f94a46dc6/html5/thumbnails/51.jpg)
Copyright © 2005 by Limsoon Wong
Acknowledgements
• Haiquan Li• Jinyan Li• Mengling Feng• Yap Peng Tan