Unit 1.2 Sampling Design Corresponds to Chapter 2 in Triola.
jntuabookblog.files.wordpress.com file · Web viewBinary Representation Market basket data can be...
Transcript of jntuabookblog.files.wordpress.com file · Web viewBinary Representation Market basket data can be...
![Page 1: jntuabookblog.files.wordpress.com file · Web viewBinary Representation Market basket data can be represented in a binary format as shown in Table 6.2, where each row corresponds](https://reader030.fdocuments.in/reader030/viewer/2022040701/5d5ade8888c993eb1c8b498b/html5/thumbnails/1.jpg)
4. ASSOCIATION ANALYSIS: BASIC CONCEPTS & ALGORITHMS
This chapter presents a methodology known as association analysis, which is useful for discovering interesting relationships hidden in large data sets. The uncovered relationships can be represented in the form of association rules or sets of frequent items. For example, the following rule can be extracted from the data set shown in Table 6.1:
Problem Definition
Binary Representation Market basket data can be represented in a binary format as shown in Table 6.2, where each row corresponds to a transaction and each column corresponds to an item. An item can be treated as a binary variable whose value is one if the item is present in a transaction and zero otherwise. Because the presence of an item in a transaction is often considered more important than its absence, an item is an asymmetric binary variable.
1
![Page 2: jntuabookblog.files.wordpress.com file · Web viewBinary Representation Market basket data can be represented in a binary format as shown in Table 6.2, where each row corresponds](https://reader030.fdocuments.in/reader030/viewer/2022040701/5d5ade8888c993eb1c8b498b/html5/thumbnails/2.jpg)
2
![Page 3: jntuabookblog.files.wordpress.com file · Web viewBinary Representation Market basket data can be represented in a binary format as shown in Table 6.2, where each row corresponds](https://reader030.fdocuments.in/reader030/viewer/2022040701/5d5ade8888c993eb1c8b498b/html5/thumbnails/3.jpg)
6.2 Frequent Itemset Generation
3
![Page 4: jntuabookblog.files.wordpress.com file · Web viewBinary Representation Market basket data can be represented in a binary format as shown in Table 6.2, where each row corresponds](https://reader030.fdocuments.in/reader030/viewer/2022040701/5d5ade8888c993eb1c8b498b/html5/thumbnails/4.jpg)
The Apriori Algorithm: Finding Frequent Itemsets Using Candidate Generation
The name of the algorithm is based on the fact that the algorithm uses prior knowledge of frequent itemset properties
Apriori property: All nonempty subsets of a frequent itemset must also be frequent.
4
![Page 5: jntuabookblog.files.wordpress.com file · Web viewBinary Representation Market basket data can be represented in a binary format as shown in Table 6.2, where each row corresponds](https://reader030.fdocuments.in/reader030/viewer/2022040701/5d5ade8888c993eb1c8b498b/html5/thumbnails/5.jpg)
5
![Page 6: jntuabookblog.files.wordpress.com file · Web viewBinary Representation Market basket data can be represented in a binary format as shown in Table 6.2, where each row corresponds](https://reader030.fdocuments.in/reader030/viewer/2022040701/5d5ade8888c993eb1c8b498b/html5/thumbnails/6.jpg)
Table 5.1
Example 5.3 Apriori. Let’s look at a concrete example, based on the AllElectronics transaction database, D, of Table 5.1. There are nine transactions in this database, that is, |D |= 9. We useFigure 5.2 to illustrate the Apriori algorithm for finding frequent itemsets in D.
6
![Page 7: jntuabookblog.files.wordpress.com file · Web viewBinary Representation Market basket data can be represented in a binary format as shown in Table 6.2, where each row corresponds](https://reader030.fdocuments.in/reader030/viewer/2022040701/5d5ade8888c993eb1c8b498b/html5/thumbnails/7.jpg)
1. In the first iteration of the algorithm, each item is a member of the set of candidate 1-itemsets, C1. The algorithm simply scans all of the transactions in order to count the number of occurrences of each item.
2. Suppose that the minimum support count required is 2, that is, min sup = 2. (Here, we are referring to absolute support because we are using a support count. The corresponding relative support is 2/9 = 22%). The set of frequent 1-itemsets, L1, can then be determined. It consists of the candidate 1-itemsets satisfying minimum support. In our example, all of the candidates in C1 satisfy minimum support.
3. To discover the set of frequent 2-itemsets, L2, the algorithm uses the join to
generate a candidate set of 2-itemsets, C2. C2 consists of 2-itemsets. Note that no candidates are removed fromC2 during the prune step because each subset of the candidates is also frequent.
4. Next, the transactions in D are scanned and the support count of each candidate itemset inC2 is accumulated, as shown in the middle table of the second row in Figure 5.2.
5. The set of frequent 2-itemsets, L2, is then determined, consisting of those candidate 2-itemsets in C2 having minimum support.
7
![Page 8: jntuabookblog.files.wordpress.com file · Web viewBinary Representation Market basket data can be represented in a binary format as shown in Table 6.2, where each row corresponds](https://reader030.fdocuments.in/reader030/viewer/2022040701/5d5ade8888c993eb1c8b498b/html5/thumbnails/8.jpg)
7. The transactions in D are scanned in order to determine L3, consisting of those candidate 3-itemsets in C3 having minimum support (Figure 5.2).
8
![Page 9: jntuabookblog.files.wordpress.com file · Web viewBinary Representation Market basket data can be represented in a binary format as shown in Table 6.2, where each row corresponds](https://reader030.fdocuments.in/reader030/viewer/2022040701/5d5ade8888c993eb1c8b498b/html5/thumbnails/9.jpg)
9
![Page 10: jntuabookblog.files.wordpress.com file · Web viewBinary Representation Market basket data can be represented in a binary format as shown in Table 6.2, where each row corresponds](https://reader030.fdocuments.in/reader030/viewer/2022040701/5d5ade8888c993eb1c8b498b/html5/thumbnails/10.jpg)
10
![Page 11: jntuabookblog.files.wordpress.com file · Web viewBinary Representation Market basket data can be represented in a binary format as shown in Table 6.2, where each row corresponds](https://reader030.fdocuments.in/reader030/viewer/2022040701/5d5ade8888c993eb1c8b498b/html5/thumbnails/11.jpg)
11
![Page 12: jntuabookblog.files.wordpress.com file · Web viewBinary Representation Market basket data can be represented in a binary format as shown in Table 6.2, where each row corresponds](https://reader030.fdocuments.in/reader030/viewer/2022040701/5d5ade8888c993eb1c8b498b/html5/thumbnails/12.jpg)
12
![Page 13: jntuabookblog.files.wordpress.com file · Web viewBinary Representation Market basket data can be represented in a binary format as shown in Table 6.2, where each row corresponds](https://reader030.fdocuments.in/reader030/viewer/2022040701/5d5ade8888c993eb1c8b498b/html5/thumbnails/13.jpg)
13
![Page 14: jntuabookblog.files.wordpress.com file · Web viewBinary Representation Market basket data can be represented in a binary format as shown in Table 6.2, where each row corresponds](https://reader030.fdocuments.in/reader030/viewer/2022040701/5d5ade8888c993eb1c8b498b/html5/thumbnails/14.jpg)
14
![Page 15: jntuabookblog.files.wordpress.com file · Web viewBinary Representation Market basket data can be represented in a binary format as shown in Table 6.2, where each row corresponds](https://reader030.fdocuments.in/reader030/viewer/2022040701/5d5ade8888c993eb1c8b498b/html5/thumbnails/15.jpg)
15
![Page 16: jntuabookblog.files.wordpress.com file · Web viewBinary Representation Market basket data can be represented in a binary format as shown in Table 6.2, where each row corresponds](https://reader030.fdocuments.in/reader030/viewer/2022040701/5d5ade8888c993eb1c8b498b/html5/thumbnails/16.jpg)
16
![Page 17: jntuabookblog.files.wordpress.com file · Web viewBinary Representation Market basket data can be represented in a binary format as shown in Table 6.2, where each row corresponds](https://reader030.fdocuments.in/reader030/viewer/2022040701/5d5ade8888c993eb1c8b498b/html5/thumbnails/17.jpg)
17
![Page 18: jntuabookblog.files.wordpress.com file · Web viewBinary Representation Market basket data can be represented in a binary format as shown in Table 6.2, where each row corresponds](https://reader030.fdocuments.in/reader030/viewer/2022040701/5d5ade8888c993eb1c8b498b/html5/thumbnails/18.jpg)
18
![Page 19: jntuabookblog.files.wordpress.com file · Web viewBinary Representation Market basket data can be represented in a binary format as shown in Table 6.2, where each row corresponds](https://reader030.fdocuments.in/reader030/viewer/2022040701/5d5ade8888c993eb1c8b498b/html5/thumbnails/19.jpg)
19
![Page 20: jntuabookblog.files.wordpress.com file · Web viewBinary Representation Market basket data can be represented in a binary format as shown in Table 6.2, where each row corresponds](https://reader030.fdocuments.in/reader030/viewer/2022040701/5d5ade8888c993eb1c8b498b/html5/thumbnails/20.jpg)
20
![Page 21: jntuabookblog.files.wordpress.com file · Web viewBinary Representation Market basket data can be represented in a binary format as shown in Table 6.2, where each row corresponds](https://reader030.fdocuments.in/reader030/viewer/2022040701/5d5ade8888c993eb1c8b498b/html5/thumbnails/21.jpg)
6.6 Mining Frequent Itemsets without Candidate Generation (FP Growth)
As we have seen, in many cases the Apriori candidate generate-and-test method significantly reduces the size of candidate sets, leading to good performance gain. However, it can suffer from two nontrivial costs:
21
![Page 22: jntuabookblog.files.wordpress.com file · Web viewBinary Representation Market basket data can be represented in a binary format as shown in Table 6.2, where each row corresponds](https://reader030.fdocuments.in/reader030/viewer/2022040701/5d5ade8888c993eb1c8b498b/html5/thumbnails/22.jpg)
22
![Page 23: jntuabookblog.files.wordpress.com file · Web viewBinary Representation Market basket data can be represented in a binary format as shown in Table 6.2, where each row corresponds](https://reader030.fdocuments.in/reader030/viewer/2022040701/5d5ade8888c993eb1c8b498b/html5/thumbnails/23.jpg)
23
![Page 24: jntuabookblog.files.wordpress.com file · Web viewBinary Representation Market basket data can be represented in a binary format as shown in Table 6.2, where each row corresponds](https://reader030.fdocuments.in/reader030/viewer/2022040701/5d5ade8888c993eb1c8b498b/html5/thumbnails/24.jpg)
24
![Page 25: jntuabookblog.files.wordpress.com file · Web viewBinary Representation Market basket data can be represented in a binary format as shown in Table 6.2, where each row corresponds](https://reader030.fdocuments.in/reader030/viewer/2022040701/5d5ade8888c993eb1c8b498b/html5/thumbnails/25.jpg)
25