1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc....
-
Upload
imogene-shaw -
Category
Documents
-
view
219 -
download
0
Transcript of 1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc....
![Page 1: 1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.](https://reader035.fdocuments.in/reader035/viewer/2022062315/5697bfd91a28abf838cafbb5/html5/thumbnails/1.jpg)
1
Mining Sequential Patterns with Constraints in Large Database
Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining (ICDM’02)
Adviser: Jia-Ling Koh Speaker: Yu-ting Kung
![Page 2: 1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.](https://reader035.fdocuments.in/reader035/viewer/2022062315/5697bfd91a28abf838cafbb5/html5/thumbnails/2.jpg)
2
Introduction
In past studies, two problems remain:1. Many practical constraints are not covered
2. There lack a systematic method to push various constraints into the mining process
In this paper: Develop a framework—Prefix-growth, is
built based on a prefix-monotone property The constraints can be effectively and
efficiently pushed deep into sequential pattern mining under this new framework
![Page 3: 1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.](https://reader035.fdocuments.in/reader035/viewer/2022062315/5697bfd91a28abf838cafbb5/html5/thumbnails/3.jpg)
3
Categories of constraints
1. Item constraints
For example:
2. Length constraint The number of transactions or occurrences of items… For example:
..,,,,,,
),][),(1:()(
)][),(1:()(
where
VileniiCitem
orVileniiCitem
)][),(1:()( BileniiCbookstore
)50)(()( lenClen
![Page 4: 1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.](https://reader035.fdocuments.in/reader035/viewer/2022062315/5697bfd91a28abf838cafbb5/html5/thumbnails/4.jpg)
4
Categories of constraints (Cont.)
3. Super-pattern constraint
where P is a given set of patterns For example:
4. Aggregate constraint Aggregate function: sum, avg, max, min,etc For example:
We like sequential patterns where average price of all the items in each pattern is over $100
)..()( rtsPrC pat
cameradigitalPCC pat _)(
![Page 5: 1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.](https://reader035.fdocuments.in/reader035/viewer/2022062315/5697bfd91a28abf838cafbb5/html5/thumbnails/5.jpg)
5
Categories of constraints (Cont.)
5. Regular expression constraints Constraints specified as a regular expression For example:
6. Duration constraints
7. Gap constraints For example:
Find purchasing patterns such that “the gap between each consecutive purchases is less than 1 month”
)||()|( LodgingMotelsandHotelsHotelsCityYorkNewYorkNewTravel
supmin_}].[].[(],[)]([],[]1[..)(1|{.. 1)()(1)(1 ttimeitimeiandilenitsleniiSDBifonlyandifts lenlenlen
![Page 6: 1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.](https://reader035.fdocuments.in/reader035/viewer/2022062315/5697bfd91a28abf838cafbb5/html5/thumbnails/6.jpg)
6
Characterization of constraints
Anti-monotonic If a sequence satisfies C implies that every non-emp
ty subsequence of also satisfies C For example: dur() < 3
Monotonic If a sequence satisfies CM implies that every super-s
equence of also satisfies CM For example: len() >= 10, super-pattern constraints
Succinct constraint For example: item-constraint
![Page 7: 1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.](https://reader035.fdocuments.in/reader035/viewer/2022062315/5697bfd91a28abf838cafbb5/html5/thumbnails/7.jpg)
7
Characterization of constraints (Cont.)
![Page 8: 1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.](https://reader035.fdocuments.in/reader035/viewer/2022062315/5697bfd91a28abf838cafbb5/html5/thumbnails/8.jpg)
8
Prefix-Monotone Property
Prefix anti-monotonicfor each sequence satisfying the constraint, so does every prefix of
Prefix monotonicfor each sequence satisfying the constraint, so does every sequence having as a prefix.
A constraint is called Prefix-monotone if it is prefix-monotonic or prefix monotonic.
![Page 9: 1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.](https://reader035.fdocuments.in/reader035/viewer/2022062315/5697bfd91a28abf838cafbb5/html5/thumbnails/9.jpg)
9
Theorem
All the commonly used constraint discussed above, except for g_sum and average, have prefix-monotone property
![Page 10: 1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.](https://reader035.fdocuments.in/reader035/viewer/2022062315/5697bfd91a28abf838cafbb5/html5/thumbnails/10.jpg)
10
Push Prefix-Monotone Constraints into Sequential Pattern Mining
Regular expression Min_sup = 2
dddbcbbaC |)(|
![Page 11: 1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.](https://reader035.fdocuments.in/reader035/viewer/2022062315/5697bfd91a28abf838cafbb5/html5/thumbnails/11.jpg)
11
Push Prefix-Monotone Constraints into Sequential Pattern Mining (Cont.)
Mining step:1. find length-1 sequential and remove irrelevant seque
nce Patterns <a>, <b>, <c>, <d>, <e> are identified as le
ngth-1 patterns, infrequent item <f> is removed S_id = 10 is removed fail this constraint
2. divide the set of sequential patterns into subsets without overlap prefix<a>, prefix<b>, prefix<c>, prefix<d>, prefix<e>
are pruned!!
![Page 12: 1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.](https://reader035.fdocuments.in/reader035/viewer/2022062315/5697bfd91a28abf838cafbb5/html5/thumbnails/12.jpg)
12
Push Prefix-Monotone Constraints into Sequential Pattern Mining (Cont.)
3. construct <a>-projected database and mine it SDB|<a>={<(_b)(bc)dd>, <(_e)(abc)(dd)>,<ddcb>} Locally frequent items and satisfy the constraint:
prefix <ab>, prefix<ac>, prefix<ad>
4. recursive mining To mining patterns with prefix <ab>、 <ac>、 <ad>,
and form the projected database
5. Final pattern outputted {<a(bc)d>, <add>}
![Page 13: 1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.](https://reader035.fdocuments.in/reader035/viewer/2022062315/5697bfd91a28abf838cafbb5/html5/thumbnails/13.jpg)
13
Handling Touch aggregate constraint
Constraint: Min_sup = 2 Item i called a small item if its value i.value <= 25, ot
herwise, it is called a big item
25)( avgC
![Page 14: 1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.](https://reader035.fdocuments.in/reader035/viewer/2022062315/5697bfd91a28abf838cafbb5/html5/thumbnails/14.jpg)
14
Experimental results
Compare the efficiency of mining sequential patterns without constraint
![Page 15: 1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.](https://reader035.fdocuments.in/reader035/viewer/2022062315/5697bfd91a28abf838cafbb5/html5/thumbnails/15.jpg)
15
Experimental results (Cont.)
Compare the efficiency of mining sequential patterns with constraint Capability of GSP and prefix-growth on pushing anti-mono
tone constraint (dur() <= t)
![Page 16: 1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.](https://reader035.fdocuments.in/reader035/viewer/2022062315/5697bfd91a28abf838cafbb5/html5/thumbnails/16.jpg)
16
Experimental results (Cont.)
Experimental results on mining with regular expression constraint
![Page 17: 1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.](https://reader035.fdocuments.in/reader035/viewer/2022062315/5697bfd91a28abf838cafbb5/html5/thumbnails/17.jpg)
17
Experimental results (Cont.)
Scalability of prefix-growth with Constraint avg() ≤ v
Number of projected databases in prefix-growth with Constraint
avg() ≤ v
![Page 18: 1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.](https://reader035.fdocuments.in/reader035/viewer/2022062315/5697bfd91a28abf838cafbb5/html5/thumbnails/18.jpg)
18
Experimental results (Cont.)
Scalability of prefix-growth w.r.t. support threshold
![Page 19: 1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.](https://reader035.fdocuments.in/reader035/viewer/2022062315/5697bfd91a28abf838cafbb5/html5/thumbnails/19.jpg)
19
Experimental results (Cont.)
Scalability of prefix-growth w.r.t. database size
![Page 20: 1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.](https://reader035.fdocuments.in/reader035/viewer/2022062315/5697bfd91a28abf838cafbb5/html5/thumbnails/20.jpg)
20
Conclusion
Prefix-monotone property covers many commonly used constraints
Experiment results and performance study show that prefix-growth is efficient and scalable in mining large databases