LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets L inear time C losed itemset M...
-
Upload
marvin-james -
Category
Documents
-
view
215 -
download
0
Transcript of LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets L inear time C losed itemset M...
LCM: An Efficient Algorithm forLCM: An Efficient Algorithm forEnumerating Frequent Closed Enumerating Frequent Closed
Item SetsItem Sets
LLinear time inear time CClosed itemset losed itemset
MMineriner
Takeaki UnoTakeaki Uno
Tatsuya AsaiTatsuya Asai
Hiroaki ArimuraHiroaki Arimura
Yuzo UchidaYuzo Uchida
National Institute of Informatics
Kyushu University
Kyushu University
Kyushu University
19/Nov/2003 FIMI 2003
small supports
MotivationMotivation
- We want to solve difficult problems in short time
Few solutions for small support
Many solutions foreven large support
#closed set = #freq. set #closed set << #freq. set
retail
accidents
IBMdatas
chess
connect
mushroom
kosarak
pumsb*
pumsb
BMS POS BMS web1,2
・・ database reductiondatabase reduction・・ remove infrequent itemsremove infrequent items
・・ sparse/densesparse/dense (occ-deliv/diffsets)(occ-deliv/diffsets)
・・ exact enumerationexact enumeration of closed item setof closed item set
・・ generation of generation of all/maximal item set all/maximal item set from closed item setfrom closed item set
large supports
Outline of Our ResearchOutline of Our Research
- Exact enumerationExact enumeration of closed item sets
(no sophisticated pruning, post processing, nor memory for
obtained closed item sets)
- Enumerate all/maximal frequent item sets using closed item set
- Algorithms for updating occurrences/maximality check
in dense/sparse cases, and their adaptive hybridadaptive hybrid
- Save additional memorySave additional memory useuse
(right first sweep, adjacency matrix only for large transactions)
- Introduce acyclic parent-child relationshipparent-child relationship on freq. closed sets
( it induces a tree-shaped transversal routetree-shaped transversal route )
- Traverse the route in depth-first mannerdepth-first manner
( find a child, and go to it )
Exact Enumeration of Closed Item SetsExact Enumeration of Closed Item Sets
Exact enumeration (linear time to #closed set)
Any child is found by taking closure (in short time)
Not need to store obtained item sets (small memory) can enumerate all closed item sets (even without min. support)
rootroot((== φφ))
X : closed item set
parent of X = closure of X∩{1,…,i}
where i is the maximum s.t. X ≠closure of X∩{1,…,i}
parent of X ⊆ X, acyclic
X' = child of X ⇔ X' is closure of X {∪ i} for some i
and (cond) X' \ X includes no item <i
Definition of ParentDefinition of Parent
All children are found by taking closure of X {∪ i}
(cond) can be checked in short time by using some algorithms
xx
x'x'
Closure = maximal item set with the same
occurrences
child
Computation of Occurrences X {∪ i} for Sparse and Dense Cases
- In sparse case, by tracing items of each occurrence of X
(occurrence deliver : maybe a known technique)
- In dense case, use diffsets (proposed by Zaki)
Adaptive Hybrid AlgorithmAdaptive Hybrid Algorithm
We choose best one according to estimations of computation timein each iterations
- Maximal frequent sets generated from closed item sets
- All frequent sets (hypercube decomposition) -- decompose classes of closed item sets into complete sublattices
-- enumerate pairs of greatest/least elements of sublattices
-- generate others from the pairs
Maximal and All Frequent SetsMaximal and All Frequent Sets
000 ••• 0
111 ••• 1
closed item set
class01 lattice
ResultResult
retail
accidents
IBMdatas
chess
connect
mushroom
kosarak
pumsb*
pumsb
BMS POS BMS web1,2
fast if support is small
fast or usual
Slower than others
large supports
small supports
fast
ConclusionConclusion
- For data sets s.t. #freq. closed sets << #freq. sets
- large business datasets: BMS-web1,2, retails
- machine learning datasets with small supports: UCI repository
exact enumerationexact enumeration of closed item sets and
hypercube decomposition hypercube decomposition perform well
- These techniques are orthogonal to other techniques,
( ・ database reduction, ・ pruning infrequent items,… )
we can do better for large supports / accidents (blue area).
- Parameter of hybridhybrid is not tuned
not fast for kosarak, IBMdatas now faster
For further speed upFor further speed upFast without pruning, trie,
other existing method
We think…We think…
● What are the real problem (bottleneck) What are the real problem (bottleneck) ??
---- Mining structured item sets
(closed item sets, association rule with threshold,… )
● Is it only a counting problem ?Is it only a counting problem ?
---- for all frequent item set mining, Yes.
the problem is how to make the occurrences of an item set
from other item sets (choose best way, represent
● Is maximal item set useful ?Is maximal item set useful ?
---- closed item set is useful!!
have an application for classification, association rule mining
Usually, < 1/2 Really need to prune ?
- Computing occurrences for infrequent items from X
Some ObservationsSome Observations
X X {1∪ } X {2∪ } X {3∪ } X {4∪ } X {5∪ }
frequency
- Almost computation is for updating occurrences- There is a best e to get occurrence of X from X - eCan we design algorithm choosing e in each iteration ? how we find this e ? Does this accelerate? ( we can evaluate the lower bound of occurrence computation )
Pruning of infrequent sets really necessary?Pruning of infrequent sets really necessary?
Need for accelerating occurrence computation ?Need for accelerating occurrence computation ?
Usually, < 1/2
- Computing occurrences for infrequent items from X
Some ObservationsSome Observations
Really need to prune ?
X X {10∪ } X {11∪ } X {12∪ } X {13∪ } X {14∪ }
frequency
- Generate recursive calls in decreasing order of items
- Clear memory after the recursive call
- Re-use the memory in the following recursive calls
Right First SweepRight First Sweep
Child iterations need no memory
X {10∪ } X {11∪ } X {12∪ } X {13∪ } X {14∪ }
A A ABBCDD
DE
Compute T(X {∪ i}) by tracing each occurrence of X
Occurrence deliverOccurrence deliver
In sparse cases, fast
E
D
C
B
A
X {10∪ } X {11∪ } X {12∪ } X {13∪ } X {14∪ }
A A ABBCDD
DE
- Check (cond) closure of X {∪ i} \ X includes no item <i
- In sparse case, find an occurrence not including j,
for all possible item j
- In dense case, update occurrences of all frequent X {∪ j},
and compute T(X {∪ i} {∪ j})
CheckingChecking (cond) (cond) of Closure of Closure
Quite faster than computing the closure of X {∪ i}
ABC
X {1∪ } X {2∪ } X {∪ i} X {14∪ }
ABC
A・・・
・・・
C
connect
0.1
1
10
100
1000
95 90 80 70 60 50 40 30minsup (%)
time (sec)IBM T10I4D100K
1
10
100
1000
0.15 0.125 0.1 0.075 0.05 0.025 minsup (%)
time (sec)LCMfreq
LCM
LCMmax
fpgrowth
fp_eclat
fp_apriori
mafia_fi
mafia_fci
mafia_mfi
BMS-WebView-2
1
10
100
1000
0.1 0.08 0.06 0.04 0.02 0.01
minsup (%)
time (sec)BMS-WebView-1
0.1
1
10
100
1000
0.1 0.08 0.06 0.04 0.02 0.01 minsup (%)
time (sec)LCMfreq
LCM
LCMmax
fpgrowth
fp_eclat
fp_apriori
mafia_fi
mafia_mfi
ResultsResults all closed maximal