2003Data Mining Tut 3
-
Upload
sreekarscribd -
Category
Documents
-
view
215 -
download
0
Transcript of 2003Data Mining Tut 3
-
8/3/2019 2003Data Mining Tut 3
1/4
Intelligent Data Analysis and Probabilistic Inference
Data Mining Tutorial 3: Clustering and Associations Rules
1.
i. Explain the operation of the k-means clustering algorithm using pseudo code.
ii. Given the following eight points, and assuming initial cluster centroids given by A, B, C,and that a Euclidean distance function is used for measuring distance between points, usek-means to show only the three clusters and calculate their new centroids after the second
round of execution.
2.
i. Explain the meaning of support and confidence in the context of association rule
discovery algorithms and explain how the a priori heuristic can be used to improve the
efficiency of such algorithms.
ii. Given the transactions described below, find all rules between single items that have
support >= 60%. For each rule report both support and confidence.
1: (Beer)2: (Cola, Beer)
3: (Cola, Beer)
4: (Nuts, Beer)5: (Nuts, Cola, Beer)
6: (Nuts, Cola, Beer)
7: (Crisps, Nuts, Cola)
8: (Crisps, Nuts, Cola, Beer)
9: (Crisps, Nuts, Cola, Beer)
10:(Crisps, Nuts, Cola, Beer)
[email protected], [email protected] 16th Dec2003
ID X Y
A 2 10
B 2 5
C 8 4
D 5 8
E 7 5
F 6 4G 1 2
H 4 9
-
8/3/2019 2003Data Mining Tut 3
2/4
3. a. Explain how hierarchical clustering algorithms work, make sure your answer describes what is
meant by a linkage method and how it is used.
b. Explain the advantages and disadvantages of hierarchical clustering compared to K-means
clustering.
4. The following table shows the distance matrix between five genes,
G1 G2 G3 G4 G5
G1 0
G2 9 0
G3 3 7 0
G4 6 5 9 0
G5 11 10 2 8 0
i. Based on a complete linkage method show the distance matrix between the first formed
cluster and the other data points.
ii. Draw a dendrogram showing the full hierarchical clustering tree for five points based on
complete linkage.iii. Draw a dendrogram showing the full hierarchicatree for the five points based on single
linkage.
[email protected], [email protected] 16th Dec2003
-
8/3/2019 2003Data Mining Tut 3
3/4
Data Mining Tutorial 3: Answers
1.
Clusters after 1st iterationCluster1: A (2,10), D (5,8), H (4,9)
Cluster2: B: B (2,5), G (1,2)
Cluster3: C (8,4), E (7,5), F (6,4)
Centroids after 1stiteration
Cluster1: centroid: (3.66, 9)
Cluster2: centroid: (1.5, 3.5)
Cluster3: centroid: (7, 4.33)
Clusters after 2nd iteration(no change)
Cluster1: A (2,10), D (5,8), H (4,9)
Cluster2: B: B (2,5), G (1,2)
Cluster3: C (8,4), E (7,5), F (6,4)
Centroids after 2nd iteration (no change)
Cluster1: centroid: (3.66, 9)
Cluster2: centroid: (1.5, 3.5)Cluster3: centroid: (7, 4.33)
2.Initial Supports
Beer: Support = 9/10
Cola: Support=8/10
Nuts: Support=7/10Crisps: Support=4/10 (Drop Crisps)
Beer, Cola: Support=7/10
Beer, Nuts: Support=6/10
Cola, Nuts: Support=6/10
Beer->Cola (Support=70%, Confidence= 7/9=77%
Cola->Beer (Support=70%, Confidence= 7/8=87.5Beer->Nuts (Support=60%, Confidence= 6/9=66%
Nuts->Beer (Support= 60%, Confidence= 6/7=85.7%
Cola->Nuts (Support=60%, Confidence= 6/8=75%
Nuts->Cola (Support=60%, Confidence= 6/7=85.7%
[email protected], [email protected] 16th Dec2003
-
8/3/2019 2003Data Mining Tut 3
4/4
4. The first cluster will be formed from G3 and G5 since they have the minimum distance.
G35 G1 G2 G4
G35 0
G1 11 0
G2 10 9 0
G4 9 6 5 0
[email protected], [email protected] 16th Dec2003
Single Linkage Complete Linkage