CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data
-
Upload
arden-talley -
Category
Documents
-
view
36 -
download
2
description
Transcript of CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data
![Page 1: CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data](https://reader036.fdocuments.in/reader036/viewer/2022062309/568137f2550346895d9faeb1/html5/thumbnails/1.jpg)
CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data
Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang
Author : Yiling Yang Xudong Guan
Jinyuan You
![Page 2: CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data](https://reader036.fdocuments.in/reader036/viewer/2022062309/568137f2550346895d9faeb1/html5/thumbnails/2.jpg)
Outline
Motivation Objective Introduction Clustering With sLOPE Implementation Experiments Conclusions Personal opinion
![Page 3: CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data](https://reader036.fdocuments.in/reader036/viewer/2022062309/568137f2550346895d9faeb1/html5/thumbnails/3.jpg)
Motivation
This paper studies the problem of categorical data clustering, especially for transactional data characterized by high dimensionality and large volume.
![Page 4: CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data](https://reader036.fdocuments.in/reader036/viewer/2022062309/568137f2550346895d9faeb1/html5/thumbnails/4.jpg)
Objective
To present a fast and efficient clustering algorithm CLOPE for transactional data.
![Page 5: CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data](https://reader036.fdocuments.in/reader036/viewer/2022062309/568137f2550346895d9faeb1/html5/thumbnails/5.jpg)
Introduction
Clustering is an important data mining technique. Objective : to group data into sets
Intra-cluster similarity is maximized Inter-cluster similarity is minimized
The basic idea behind CLOPE Uses global criterion function that tries to increase the
intra-cluster overlapping of transaction items by increasing the height-to-width ratio of the cluster histogram.
A parameter to control the tightness of the cluster.
![Page 6: CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data](https://reader036.fdocuments.in/reader036/viewer/2022062309/568137f2550346895d9faeb1/html5/thumbnails/6.jpg)
Introduction
![Page 7: CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data](https://reader036.fdocuments.in/reader036/viewer/2022062309/568137f2550346895d9faeb1/html5/thumbnails/7.jpg)
Clustering With SLOPE
The Criterion function can be defined locally or globally. Locally
The criterion function is built on the pair-wise similarity between transactions.
Globally Clustering quality is measured in the cluster
level,utilizing information like the sets of large and small items in the clustering.
![Page 8: CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data](https://reader036.fdocuments.in/reader036/viewer/2022062309/568137f2550346895d9faeb1/html5/thumbnails/8.jpg)
Clustering With SLOPE We define the size S(C) and width W(C) of a cluster C below:
The height of a cluster is defined as
We use gradient instead of H(C) as the quality measure for cluster C.
|)(|)(
||),()()(
CDCW
tCiOccCSCDi Ct
i
i
)(/)()( CWCSCH 2)(/)()(/)()( CWCSCWCHCG
![Page 9: CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data](https://reader036.fdocuments.in/reader036/viewer/2022062309/568137f2550346895d9faeb1/html5/thumbnails/9.jpg)
Clustering With SLOPE
![Page 10: CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data](https://reader036.fdocuments.in/reader036/viewer/2022062309/568137f2550346895d9faeb1/html5/thumbnails/10.jpg)
Clustering With SLOPE
Here, r is a positive real number called repulsion, used to control the level of intra-cluster similarity.
k
ii
i
k
ir
i
i
r
C
CCWCS
Cprofit
1
1
||
||)()(
)(
k
ii
i
k
i i
i
k
ii
i
k
ii
C
CCWCS
C
CCGCprofit
1
12
1
1
||
||)()(
||
||)()(
![Page 11: CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data](https://reader036.fdocuments.in/reader036/viewer/2022062309/568137f2550346895d9faeb1/html5/thumbnails/11.jpg)
Clustering With SLOPE
Problem definition Given D and r, find a clustering C that ma
ximize . The sketch of the CLOPE algorithm.
)(Cprofit r
![Page 12: CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data](https://reader036.fdocuments.in/reader036/viewer/2022062309/568137f2550346895d9faeb1/html5/thumbnails/12.jpg)
![Page 13: CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data](https://reader036.fdocuments.in/reader036/viewer/2022062309/568137f2550346895d9faeb1/html5/thumbnails/13.jpg)
Implementation
RAM data structure We keep only the current transaction and A small amount of information for each
cluster.The information, called cluster features.
Remark The total memory required for item
occurrences is approximately M*K*4 bytes using array of 4-byte integers.
The computation of profit Computing the delta value of adding t to C.
![Page 14: CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data](https://reader036.fdocuments.in/reader036/viewer/2022062309/568137f2550346895d9faeb1/html5/thumbnails/14.jpg)
Implementation
![Page 15: CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data](https://reader036.fdocuments.in/reader036/viewer/2022062309/568137f2550346895d9faeb1/html5/thumbnails/15.jpg)
Implementation
The time and space complexity Suppose the average length of a transaction
is A. The total number of transactions is N. The maximum number of clusters is K. The time complexity for one iteration is
O( ). The space requirement for CLOPE is
approximately the memory size of the cluster features.
AKN
![Page 16: CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data](https://reader036.fdocuments.in/reader036/viewer/2022062309/568137f2550346895d9faeb1/html5/thumbnails/16.jpg)
Experiments
We analyze the effectiveness and execution speed of CLOPE with two real-life datasets. For effectiveness, we compare the clustering quality of
CLOPE on a labeled dataset with those of LargeItem and ROCK.
For execution speed, we compare CLOPE with LargeItem on a large web log dataset.
![Page 17: CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data](https://reader036.fdocuments.in/reader036/viewer/2022062309/568137f2550346895d9faeb1/html5/thumbnails/17.jpg)
Mushroom
Mushroom dataset (real-life) The mushroom dataset from the UCI machine learning
repository has been used by both ROCK and LargeItem for effectiveness tests.
It contains 8124 records with two classes,4208 edible mushrooms and 3916 poisonous mushrooms.
![Page 18: CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data](https://reader036.fdocuments.in/reader036/viewer/2022062309/568137f2550346895d9faeb1/html5/thumbnails/18.jpg)
![Page 19: CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data](https://reader036.fdocuments.in/reader036/viewer/2022062309/568137f2550346895d9faeb1/html5/thumbnails/19.jpg)
InterIntrawCCost w )(,
Mushroom
![Page 20: CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data](https://reader036.fdocuments.in/reader036/viewer/2022062309/568137f2550346895d9faeb1/html5/thumbnails/20.jpg)
Mushroom
![Page 21: CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data](https://reader036.fdocuments.in/reader036/viewer/2022062309/568137f2550346895d9faeb1/html5/thumbnails/21.jpg)
Mushroom
The result of CLOPE on mushroom is better than that of LargeItem and close to that of ROCK.
Sensitivity to data order The results are different but very
close to the original ones. It shows that CLOPE is not very
sensitive to the order of input data.
![Page 22: CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data](https://reader036.fdocuments.in/reader036/viewer/2022062309/568137f2550346895d9faeb1/html5/thumbnails/22.jpg)
Berkeley web logs
Web log data is another typical category of transactional databases.
The web log files from http://www.cs.berkeley.edu/log/ as the dataset for our second experiment and test the scalability as well as performance of CLOPE. There are about 7 million entries in the raw
log file and 2 million of them are kept after non-html entries removed.
![Page 23: CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data](https://reader036.fdocuments.in/reader036/viewer/2022062309/568137f2550346895d9faeb1/html5/thumbnails/23.jpg)
![Page 24: CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data](https://reader036.fdocuments.in/reader036/viewer/2022062309/568137f2550346895d9faeb1/html5/thumbnails/24.jpg)
![Page 25: CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data](https://reader036.fdocuments.in/reader036/viewer/2022062309/568137f2550346895d9faeb1/html5/thumbnails/25.jpg)
Conclusions
The CLOPE algorithm is proposed based on the intuitive idea of increasing the height-to-width ratio of the cluster histogram.
The idea is generalized with a repulsion parameter that controls tightness of transactions in a cluster.
Experiments show that CLOPE is quite effective in finding interesting clusterings.
Moreover,CLOPE is not very sensitive to data order and requires little domain knowledge in controlling the number of clusters.
![Page 26: CLOPE: a Fast and Effective Clustering Algorithm for Transactional Data](https://reader036.fdocuments.in/reader036/viewer/2022062309/568137f2550346895d9faeb1/html5/thumbnails/26.jpg)
Personal Opinion
The idea behind CLOPE is very simple, can help beginner easy to learning.