FINDING FUZZY SETS FOR QUANTITATIVE ATTRIBUTES FOR MINING OF FUZZY ASSOCIATE RULES

1

FINDING FUZZY SETS FOR QUANTITATIVE ATTRIBUTES FOR MINING

OF FUZZY ASSOCIATE RULES

By H.N.A. Pham, T.W. Liao, and E. TriantaphyllouDepartment of Industrial Engineering

3128 CEBA BuildingLouisiana State University

Baton Rouge, LA 70803-6409Email: [email protected], [email protected], and

[email protected]

2

Introduction

Background

A fuzzy approach for mining associate rules

Experimental evaluation

Conclusions

Outline

3

Introduction• Associate analysis is a new and attractive

research area in data mining

• The Apriori algorithm (R. Agrawal, IBM 1993) is a key technique for Associate analysis

• Though the Apriori principle allows us to considerably reduce the search space, the technique still requires a huge computation, particularly for large databases

• This research proposes an approach for finding fuzzy sets for quantitative attributes in a database by using clustering techniques and then employs techniques for mining of fuzzy Associate rules .

4

Introduction

Background Associate rules and the Apriori

algorithm Necessity to find fuzzy sets for

quantitative attributes

A fuzzy approach for fuzzy mining associate rulesExperimental evaluation

Conclusions

Outline

5

Associate rules: Market basket analysis

• Analyzes customer buying habits by finding associations between the different items that customers place in their “shopping baskets” (in the form X Y, where X and Y are sets of items)

• I = {I1=beer, I2=cake, I3=onigiri}

• A transactional database

• An Associate rule: {I1} {I3}

How often people buy candy and beer together?

TID1: {I1, I2, I3} TID2: {I1, I2} TID3: {I2, I3} TID4: {I2} TID5: {I1, I2}

6

Rule measures: Support and Confidence

Associate rule: X Y

support s = probability that a transaction contains X and Y

confidence c = conditional probability that a transaction having X also contains Y

A C (s=50%, c=66.6%)

C A (s=50%, c=100%)

Transaction ID Items Bought2000 A,B,C1000 A,C4000 A,D5000 B,E,F

Customer buys onigiri

Customer buys both Customerbuys beer

7

Associate mining: the Apriori algorithm

It is composed of two steps:

1. Find all frequent itemsets: By definition, each of these itemsets will occur at least as frequently as a pre-determined minimum support count

2. Generate strong Associate rules from the frequent itemsets: By definition, these rules must satisfy minimum support and minimum confidence

(Agrawal, 1993)

8

Associate mining: the Apriori principle

For rule A C support = support({A and C}) = 50%

confidence = support({A and C})/support({A}) = 66.6%

The Apriori principle: Any subset of a frequent itemset must be

frequent (if an itemset is not frequent, neither are its

supersets)

Transaction ID Items Bought2000 A,B,C1000 A,C4000 A,D5000 B,E,F

Frequent Itemset Support{A} 75%{B} 50%{C} 50%{A,C} 50%

Min. support 50%Min. confidence 50%

9

The Apriori algorithm: Finding frequent itemsets using candidate generation

1. Find the frequent itemsets: the sets of items that have support higher than the minimum support A subset of a frequent itemset must also be a frequent itemset

i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemsets

Iteratively find frequent itemsets Lk with cardinality from 1 to k (k-

itemset) from candidate itemsets Ck (Lk Ck)

2. Use the frequent itemsets to generate Associate rules.

C1 … Li-1 Ci Li Ci+1 … Lk

10

Example (min_sup_count = 2)

TID List of items_IDs

T100 I1, I2, I5T200 I2, I4T300 I2, I3T400 I1, I2, I4T500 I1, I3T600 I2, I3T700 I1, I3T800 I1, I2, I3, I5T900 I1, I2, I3

Itemset Sup.Count

{I1} 6 {I2} 7 {I3} 6 {I4} 2 {I5} 2

C1

Itemset Sup.Count

{I1} 6 {I2} 7 {I3} 6 {I4} 2 {I5} 2

L1

Transactional data

Scan D for count of each candidate

Compare candidate support count with minimum support count

11

Example (min_sup_count = 2)

Itemset {I1, I2} {I1, I3} {I1, I4} {I1, I5} {I2, I3} {I2, I4} {I2, I5} {I3, I4} {I3, I5} {I4, I5}

C2


Itemset S.count {I1, I2} 4 {I1, I3} 4 {I1, I4} 1 {I1, I5} 2 {I2, I3} 4 {I2, I4} 2 {I2, I5} 2 {I3, I4} 0 {I3, I5} 1 {I4, I5} 0

C2Compare candidate support count with minimum support count

Itemset S.count {I1, I2} 4 {I1, I3} 4 {I1, I5} 2 {I2, I3} 4 {I2, I4} 2 {I2, I5} 2

L2

Generate candidates C3 from L2by using the Apriori principle

Itemset

{I1, I2, I3} {I1, I2, I5}


Itemset Sc

{I1, I2, I3} 2 {I1, I2, I5} 2

C3

Compare candidate support count with minimum support count

Itemset Sc

{I1, I2, I3} 2 {I1, I2, I5} 2

L3

Generate candidates C2 from L1by using the Apriori principle

12

Necessity to find fuzzy sets for quantitative attributes

Transaction ID Age Married NumCars

100 33 Yes 2

200 39 Yes 2

300 35 No 1

400 20 No 0

A quantitative associate rule with min_sup= min_conf =50%

(Age = 33 or 39) and (Married = Yes) -> (NumCars =2)

A quantitative associate rule with min_sup= min_conf=50%

(Age = 33..39) and (Married = Yes) -> (NumCars =2)

A fuzzy associate rule with min_sup= min_conf =50%

(Age = middle-aged) and (Married = Yes) -> (NumCars =2)

13

Solution: Shape boundary intervals

It is composed of two steps:

1. Partition the attribute domains into small intervals and combine adjacent intervals into larger ones such that the combined intervals will have enough supports

2. Replace the original attribute by its attribute-interval pairs, the quantitative problem can be transformed to a Boolean one.

(Srikant and Agrawal, 1996)

14

Example: Shape boundary intervals

Transaction ID Age Married NumCars

100 33 Yes 2

200 39 Yes 2

300 35 No 1

400 20 No 0

Yes

No

No

No

Age: 18-30

No

Yes

Yes

Yes

Age: 31-39

No

No

Yes

Yes

Married

Yes

Yes

No

No

NumCars:0-1

No400

No300

Yes200

Yes100

NumCars:2-3Transaction ID

• Algorithms ignore or over-emphasize the elements near the boundary of the intervals in the mining process

• The use of shape boundary interval is also not intuitive with respect to human perception

15

Solution: Experts

• An user or expert must provide to this algorithm the required fuzzy sets

of the quantitative attributes and their corresponding membership

functions

• Fuzzy sets and their corresponding

membership functions provided by experts may

not be suitable for mining fuzzy Associate rules

in the database

16

Solution: Fuzzy sets for quantitative attributesIt is composed of three steps:

Step 1: Transform the original database into positive integer

Step 2: For each attribute

Cluster values of the attribute ith into k medoids

Classify the attribute ith into k fuzzy sets

Generate membership functions for each fuzzy set

End for

Step 3: Transform the database based on fuzzy sets(Ada, 1998)

Lose association between attributes in the mining approach

17

Introduction

Background

A fuzzy approach for fuzzy mining associate rules

Fuzzy approach

Fuzzy mining associate rules


Conclusions

Outline

18

Fuzzy approachIt is composed of five steps:

Step 1: Transform the original database into one with positive integers

Step 2: Cluster values of attributes into k medoids.

Step 3: Classify attributes into k fuzzy sets

Step 4: Generate membership functions for each fuzzy set

Step 5: Transform the database based on fuzzy sets

19

Fuzzy approach: Step 2

Clustering:

• The clustering method considers the search

space of a database with n attributes as an n-

dimensional space

• Use the Matlab fuzzy tool box Do not lose association between attributes in the mining approach

20


Classify:• Let {m1, m2, …, mk} be k medoids found from step 2, where

mi = {ai1, ai2, …, ain} is the medoid ith. • Let the attribute jth have a range [minj, maxj] and {a1j, a2j, …,

akj} be set of mid-points of the attribute jth. The k fuzzy sets of this attribute will be ranged in

[minj, a2j], [a1j, a3j], …, [a(i-1)j, a(i+1)j], …, and [a(k-1)j, maxj]

m1 a11… aj1

… a1n

… … .. … … …

mk ak1… ajn akn

minj maxj

a(i-

1)j

aij a(i+1)j

Fuzzy set

21


Generate membership functions (triangular function):

ortherwise

xaifa

x

axifa

x

xaif

axf

jjk

jk

j

j

jk

jjj

k

j

jk

jjk

jij

,0

max,max

max

min,min

min

,1

)max,,min:(

)2

()1

2(

22

2

2

22


Transform the database based on fuzzy sets:

• Let Tij be the value of the ith transaction at the jth attribute

Tij = fuzzy label ith if fij(Tij) = max(fkj(Tij))

23

Example of fuzzy approach

ID Salary IQ

1 10000 120

2 7000 100

3 30000 183

4 9000 110

5 15000 140

6 20000 165

7 5000 85

3000015000 – 32000

High_S

150007000 – 20000

Medium_S

70004000 – 10000

Low_S

Mid-pointRangeFuzzy label

183140 – 200

High_I

140100 – 165

Medium_I

10050 – 120Low_I

Mid-pointRangeFuzzy label

7

6

5

4

3

2

1

ID

Low_ILow_S

Medium_IMedium_S

Medium_IMedium_S

Low_ILow_S

High_IHigh_S

Low_ILow_S

Low_ILow_S

IQSalary

7

6

5

4

3

2

1

ID

0.310.14

0.740.56

0.740.83

0.860.86

0.670.37

0.830.71

0.80.71

IQ’s membership

Salary’s membership

Step 2

Steps 3, 4, 5

24

Fuzzy mining Associate rules

(Attilia, 2000)

It is composed of two steps:1. Find all itemsets that have fuzzy support

(FS<X,A>) above the user specified minimum support. These itemsets are called frequent itemsets.

2. Use the frequent itemsets to generate the desired rules. Let X and Y be frequent itemsets. We can determine if the rule X => Y holds by computing the fuzzy confidence FC<<X,A>,<Y,B>> and this value is larger than the user specified minimum confidence value.

25

Fuzzy mining Associate rules - cont

D

xtAaxFS Tt

jijxjXjAX

i

).,(

,

Ttjijx

Xx

Ttjijz

Zz

ij

j

ij

j

ztAam

ztCcm

).,(

).,( FC B Y,,A X,

• D = {t1, t2, …, tn}: transactions• <X,A> with X is attributes and A is the corresponding fuzzy sets in X • Z = X U Y, C = A U B

26

Introduction

Background

A fuzzy approach for fuzzy mining associate rules


Conclusions

Outline

27

Experiments: Synthetic datasets

• Using synthetic datasets of varying sizes:

Name |D| |T| Size (MB)

D100k.T10 100K 10 3M

D100k.T20 100K 20 6M

D320k.T30 320K 30 18M

|D| = Number of transactions|T| = Average amount of items on transactions

28

Experiment environment• Software

Database : Microsoft Access 2003 Language: C++ and Visual Basic, Matlab Platform: Windows

• Hardware PC Pentium IV-2.66 GMhz, RAM 1GB

29

Evaluate mean of rulesFrom database Salary and IQ, we have rules from the approach with minimum support=43% and minimum confidence = 50% as follows:Rule 1: If 1st variable is low approximately 7000 [ 4000, 10000]

then 2nd variable is low approximately 100 [50, 120]Rule 2: If 1st variable is medium approximately 15000 [7000, 20000]

then 2nd variable is medium approximately 140 [ 100, 165]

the Apriori algorithm Mining quantitative algorithm with fuzzy approach

No frequent Itemsets Frequent Itemset 11st variable is low approximately 7000 [4000, 10000], 2nd variable is low approximately 100 [50, 120]Frequent Itemset 21st variable is medium approximately 15000 [7000, 20000] , 2nd variable is medium approximately 140 [ 100, 165]

Minimum support = 43%

30

Evaluate mean of rules - cont

the Apriori algorithm Mining quantitative algorithm

Frequent Itemset 11st variable is 5000, 2nd variable is 85Frequent Itemset 21st variable is 7000, 2nd variable is 100Frequent Itemset 31st variable is 9000, 2nd variable is 110Frequent Itemset 41st variable is 10000, 2nd variable is 120Frequent Itemset 51st variable is 15000, 2nd variable is 140Frequent Itemset 61st variable is 20000, 2nd variable is 165Frequent Itemset 71st variable is 30000, 2nd variable is 183

Frequent Itemset 11st variable is low approximately 7000

[ 4000, 10000], 2nd variable is low approximately 100 [50, 120]

Frequent Itemset 21st variable is high approximately 30000

[15000, 32000] , 2nd variable is high approximately 183 [140, 200]

Frequent Itemset 31st variable is medium approximately

15000 [7000, 20000] , 2nd variable is medium approximately 140 [ 100, 165]

minimum support = 15%

31

Evaluate fuzziness

7

6

5

4

3

2

1

ID

0.310.14

0.740.56

0.740.83

0.860.86

0.670.37

0.830.71

0.80.71

IQ’s membership


7

6

5

4

3

2

1

ID

0.510.34

0.840.66

0.840.83

0.90.9

0.670.57

0.930.91

0.850.74

IQ’s membership


Ada New approach

Using the Yager’s fuzziness with p = 1

• Ada_fuzziness_Salary ≈ 0.357 ≤ NewApproach_fuzziness_Salary ≈ 0.425• Ada_fuzziness_IQ ≈ 0.51 ≤ NewApproach_fuzziness_IQ ≈ 0.59

The new approach is fuzzier than Ada

n

i

AAp

p XiXiAADASupp

AADAf

1

~~1 )(()

~,

~(,

)~

(

)~

,~

(1)

~(

32

Evaluate fuzziness - cont

Ada’s approach New approach













minimum support = 15%In Ada’s Approach, mid points of ranges are moved out centre values. This leads to change mean of frequent itemsets.

33

Execution time (sec.) with different minimum support thresholds

Name Min_sup = 35% Min_sup = 40% Min_sup = 50%

Apriori Fuzzy* Apriori Fuzzy * Apriori Fuzzy *

D100k.T30 80860 42558 4158 1980 485 244

D100k.T20 155440 77720 30005 15792 27012 13506

D320k.T30 329532 147673 69011 28425 52322 20259

*: do not include the transfer time

Name Transferring time a database into fuzzy sets

D100k.T30 95

D100k.T20 5062

D320k.T30 9112

34

Execution time (sec.) with different minimum support thresholds - cont

Min_sup=35%

0

50000

100000

150000

200000

250000

300000

350000

1 2 3

Fuzzy

Apriori

Min_sup=40%

0

10000

20000

30000

40000

50000

60000

70000

80000

1 2 3

Apriori

Fuzzy

Min_sup=50%

0

10000

20000

30000

40000

50000

60000

1 2 3

Apriori

Fuzzy

•Execution time (transfer + mining time) of the fuzzy method is better than the Apriori.

•Moreover, mean of rules is more “Understandable”

35

Conclusions• Proposed an approach to find fuzzy sets for

quantitative attributes for mining associate rules

• An experimental evaluation shows that the mean of rules and execution time when using the fuzzy approach in mining Associate rules are better than that of other algorithms

• Future work: Improve the fuzzy mining approach Develop incremental algorithms for associate

analysis using Support Vector Machines

36

THANK YOUH.N.A. Pham, T.W. Liao, and E.

TriantaphyllouDepartment of Industrial Engineering

3128 CEBA BuildingLouisiana State University

Baton Rouge, LA 70803-6409Email: [email protected], [email protected], and

[email protected]

FINDING FUZZY SETS FOR QUANTITATIVE ATTRIBUTES FOR MINING OF FUZZY ASSOCIATE RULES

Documents

Transcript of FINDING FUZZY SETS FOR QUANTITATIVE ATTRIBUTES FOR MINING OF FUZZY ASSOCIATE RULES