Title: Attribute reduction in decision systems based on relation matrix

25
Xi’an Jiaotong University Xi’an Jiaotong University Xi’an Jiaotong University Title: Attribute reduction in decision systems based on relation matrix Authors: Cheng Zhong and Jin-hai Li

description

Title: Attribute reduction in decision systems based on relation matrix. Authors: Cheng Zhong and Jin-hai Li. Introduction Some basic notions related to RST Two indices for measuring the significance of the attributes in a decision system - PowerPoint PPT Presentation

Transcript of Title: Attribute reduction in decision systems based on relation matrix

Page 1: Title: Attribute reduction in decision                 systems based on relation matrix

Xi’an Jiaotong UniversityXi’an Jiaotong UniversityXi’an Jiaotong University

Title: Attribute reduction in decision systems based on relation matrix

Authors: Cheng Zhong and Jin-hai Li

Page 2: Title: Attribute reduction in decision                 systems based on relation matrix

Xi’an Jiaotong UniversityXi’an Jiaotong UniversityXi’an Jiaotong University

Contents

1. Introduction

2. Some basic notions related to RST

3. Two indices for measuring the significance of the attributes in a decision system

4. A heuristic attribute-reduction algorithm for decision systems

5. Numerical experiments

Page 3: Title: Attribute reduction in decision                 systems based on relation matrix

Xi’an Jiaotong UniversityXi’an Jiaotong UniversityXi’an Jiaotong University

1. Introduction

Rough set theory (RST), proposed by Pawlak in 1982 [1], is one of the effective mathematical tools for processing fuzzy and uncertainty knowledge. Nowadays, RST has been applied to a variety of fields such as artificial intelligence, data mining, pattern recognition and knowledge discovery [2-7].

As well known, attribute reduction is one of the key issues in RST. It is performed in information systems by means of the notion of a reduct based on

Page 4: Title: Attribute reduction in decision                 systems based on relation matrix

Xi’an Jiaotong UniversityXi’an Jiaotong UniversityXi’an Jiaotong University

a specialization of the notion of independence due to Marczewski [8]. Up to now, much attention has been paid to this issue and many different methods of attribute reduction have been proposed for decision systems. For example, the reduction approaches in [9-13] are respectively based on partition, discernibility matrix, conditional information entropy, positive region, and ant colony optimization approach.

Page 5: Title: Attribute reduction in decision                 systems based on relation matrix

Xi’an Jiaotong UniversityXi’an Jiaotong UniversityXi’an Jiaotong University

Though there are already many reduction methods for decision systems, it may be necessary to further investigate the issue of attribute reduction, because the Boolean reasoning-based algorithms for finding a minimal reduct of a decision system are computationally expensive and they are even impossibly implemented for a large dataset; on the other hand, it is hard for heuristic reduction methods to check whether the obtained reduct is minimal when the given decision system is large.

Page 6: Title: Attribute reduction in decision                 systems based on relation matrix

Xi’an Jiaotong UniversityXi’an Jiaotong UniversityXi’an Jiaotong University

In [14], numerical experiments illustrate that, by designing the reduction algorithm from the viewpoint of relation matrix, it is efficient for the algorithm to find a minimal reduct of an information system in the operating environment of the MATLAB software which is of great ability in dealing with matrix computations. In this study, a new heuristic attribute-reduction algorithm in decision systems is proposed from the viewpoint of relation matrix, and some numerical experiments are conducted to access the performance of the proposed algorithm.

Page 7: Title: Attribute reduction in decision                 systems based on relation matrix

Xi’an Jiaotong UniversityXi’an Jiaotong UniversityXi’an Jiaotong University

2. Some basic notions related to RST

Definition 1. An information system is a quadruple (U, A, V, f ) , where U is a nonempty and finite set of objects, A is a nonempty and finite set of attributes, V :=∪Va with Va being the domain of attribute a, and f is an information function such that f(x,a) ∈ Va for every x∈U and every a ∈A.

A decision system is an information system (U, C∪D, V, f ) with C∩D=Ф, where C and D are called the conditional and decision attribute sets, respectively.

Page 8: Title: Attribute reduction in decision                 systems based on relation matrix

Xi’an Jiaotong UniversityXi’an Jiaotong UniversityXi’an Jiaotong University

For a subset P of A, let us define the corresponding equivalence relation as

IND(P):={(x, y) ∈U×U| f(x, a)= f(y, a) for any a∈P}

(1)

and denote the equivalence class of IND(P) which contains the object x∈U by [x]P , i.e.

[x]P :={ y ∈U | (x, y) ∈ IND(P) }. (2)

The factor set of all equivalence classes of IND(P) is denoted by U/P, i.e. U/P :={ [x]P | x∈U }.

Page 9: Title: Attribute reduction in decision                 systems based on relation matrix

Xi’an Jiaotong UniversityXi’an Jiaotong UniversityXi’an Jiaotong University

Definition 2. Let (U, A, V, f) be an information system and P A. For a subset X of U, RP(X):={x ∈U|

[x]P X} and RP(X):={x ∈U| [x]P ∩X≠Ф} are called P-

lower and P-upper approximations of X, respectively.

Definition 3. Let (U, A, V, f) be an information system and let P and Q be two subsets of A. Then POSP

(Q):=∪X ∈U/Q RP(X) is called P-positive region of Q,

where RP(X) is the P-lower approximation of X.

Page 10: Title: Attribute reduction in decision                 systems based on relation matrix

Xi’an Jiaotong UniversityXi’an Jiaotong UniversityXi’an Jiaotong University

Definition 4. Let S:=(U, C∪D, V, f) be a decision system, a ∈C, and P C. If POSC(Q)= POSC\{a}(Q), a is s

aid to be D-dispensable in C; otherwise, a is said to be D-indispensable in C. The set of all the D-indispensable attributes is called the core of S and denoted by Core(S). Furthermore, if POSP(Q)= POSC(Q), an

d each of the attributes of P is D-indispensable, then P is called a reduct of S.

Page 11: Title: Attribute reduction in decision                 systems based on relation matrix

Xi’an Jiaotong UniversityXi’an Jiaotong UniversityXi’an Jiaotong University

3. Two indices for measuring significance of the attributes of a decision system

In order to propose a heuristic algorithm of attribute reduction from the viewpoint of relation matrix, we design two indices below to measure significance of the attributes of a decision system based on relation matrix. Before embarking on this issue, we first briefly introduce how to connect positive regions with relation matrices.

Page 12: Title: Attribute reduction in decision                 systems based on relation matrix

Xi’an Jiaotong UniversityXi’an Jiaotong UniversityXi’an Jiaotong University

Definition 5. Let (U, A, V, f) be an information system and let P be a subset of A. The relation matrix of U×U under P, denoted by P(U), is defined as P(U):=(Pij)n×n where n is the cardinality of U, and Pij =1 if (xi, xj) ∈IND(P); otherwise, Pij =0.

It can be known from Definition 5 that

P(U)=∏a ∈P a (U), (3)

where a (U) times b(U) equals c(U) with its elements being cij =aij bij .

Page 13: Title: Attribute reduction in decision                 systems based on relation matrix

Xi’an Jiaotong UniversityXi’an Jiaotong UniversityXi’an Jiaotong University

For a given decision system S:=(U, C∪D, V, f), let U:={x1, x2, …, xn}, POSC(Q):={xC(1), xC(2),, …, xC(m)}, and

P C. We give an index

ω(P):=∑i∈{C(1),… , C(m)}∑j∈{1,2,… , n} Pij∣Pij - Dij ∣ (4) to connect the positive regions POSP(Q) and POSC

(Q) with the relation matrices P(U) and D(U). Then the following two conditions hold:

( ) If P Q C, then ⅰ ω(P) ≤ω(Q) .

(ⅱ) POSP(Q)=POSC(Q) if and only if ω(P)=0.

Page 14: Title: Attribute reduction in decision                 systems based on relation matrix

Xi’an Jiaotong UniversityXi’an Jiaotong UniversityXi’an Jiaotong University

Now, we are ready to define two indices to measure significance of the attributes of a decision system.

Definition 6. Let S:=(U, C∪D, V, f) be a decision system and let P be a subset of C. The significance of each attribute b of P is defined by

SIG (P|b, D) := ω(P\{b}) - ω(P). (5)

It can easily be known from Definition 6 that the significance of each b of P, measured by the difference between ω(P\{b}) andω(P), indicates that how much the index ω(P) changes when b is removed from P.

Page 15: Title: Attribute reduction in decision                 systems based on relation matrix

Xi’an Jiaotong UniversityXi’an Jiaotong UniversityXi’an Jiaotong University

Definition 7. Let S:=(U, C∪D, V, f) be a decision system and let P be a proper subset of C. The significance of each attribute b of C\P with respect to P is defined by

SIG (b|P, D) := ω(P) - ω(P ∪{b}). (6)

It should be noted that SIG (b|P, D) is different from SIG (P|b, D) because the former holds for b ∈C\P while the latter is defined for b∈p. It can be known from Definition 7 that the significance of each b ∈C\P with respect to P is measured by the magnitude that the index ω(P) changes when b is added into P.

Page 16: Title: Attribute reduction in decision                 systems based on relation matrix

Xi’an Jiaotong UniversityXi’an Jiaotong UniversityXi’an Jiaotong University

4. A heuristic attribute-reduction algorithm for decision systems

According to the above two indices, we propose a heuristic attribute-reduction algorithm below for decision systems. To this end, we first give some properties related to these two indices.

Proposition 1. Let S:=(U, C∪D, V, f) be a decision system. The following two conditions are satisfied:

( ) ⅰ a ∈ C is D-indispensable in C if and only if SIG (C|a, D) >0.

( ) Core (S)={a ⅱ ∈ C | SIG (C|a, D) >0}.

Page 17: Title: Attribute reduction in decision                 systems based on relation matrix

Xi’an Jiaotong UniversityXi’an Jiaotong UniversityXi’an Jiaotong University

Let S:=(U, C∪D, V, f) be a decision system and let P be a proper subset of C. b ∈ C\P is called unimportant with respect to P if

SIG (b|P, D) =0. (7)

Proposition 2. Let S:=(U, C∪D, V, f) be a decision system, P Q C, and b ∈ C\Q. If b is unimportant with respect to P, then b is also unimportant with respect to Q.

Page 18: Title: Attribute reduction in decision                 systems based on relation matrix

Xi’an Jiaotong UniversityXi’an Jiaotong UniversityXi’an Jiaotong University

Proposition 3. Let S:=(U, C∪D, V, f) be a decision system and let P be a subset of C. If ω(P)=0, and SIG (P|b, D) >0 for any b ∈P, then P is a reduct of S.

Now, we are ready to present a heuristic attribute-reduction algorithm for decision systems.

Page 19: Title: Attribute reduction in decision                 systems based on relation matrix

Xi’an Jiaotong UniversityXi’an Jiaotong UniversityXi’an Jiaotong University

Step 1: Set Core(S)=Ф, E=Ф.Step 2: Compute SIG (C|a, D) for every a ∈C; if SIG (C|a, D)

>0, then Core(S) is updated by Core(S) ∪ {a}.Step 3: If ω(Core(S) )=0, go to Step 7; otherwise, go to Step 4.Step 4: Set E= Core(S).Step 5: Choose an attribute b from C\E with SIG (b|E, D) =m

ax a ∈ C\E { SIG (a|E, D) }, delete all the unimportant attributes with respect to E from C\E, and set E=E ∪ {a}.

Step 6: If ω(E )=0, go to Step 7; otherwise, go back to Step 5.Step 7: If there exists an attribute e ∈ E such that SIG (E| e,

D) =0, the go to Step 8; otherwise, go to Step 9.Step 8: E is updated by E\{e}.Step 9: Output E and end the algorithm.

Input: A decision system S:=(U, C∪D, V, f)

Output: A reduct of S

Page 20: Title: Attribute reduction in decision                 systems based on relation matrix

Xi’an Jiaotong UniversityXi’an Jiaotong UniversityXi’an Jiaotong University

For convenience of description, the above algorithm is termed as ACMR for short.

Note that in Step 5, deleting unimportant attributes gradually from the search space not only does not affect the effectiveness of the algorithm, but also can improve its efficiency. The time complexity of the algorithm ACMR is O(|A|2|U|2), where |A|=|C|+|D|.

Proposition 4. The algorithm ACMR is complete. That is, the attribute set output by ACMR is a reduct of the input decision system S with certainty.

Page 21: Title: Attribute reduction in decision                 systems based on relation matrix

Xi’an Jiaotong UniversityXi’an Jiaotong UniversityXi’an Jiaotong University

5. Numerical experiments

In order to access the performance of the algorithm ACMR, we chose from UCI (University of California, Irvine) six databases Iris Plants Database, BUPA Liver Disorders, Balance Scale Weight, Tic-Tac-Toe Endgame, Zoo, and Chess End-Game to do experiments. The operating results of the algorithm ACMR on MATLAB software are reported in Table 1.

Page 22: Title: Attribute reduction in decision                 systems based on relation matrix

Xi’an Jiaotong UniversityXi’an Jiaotong UniversityXi’an Jiaotong University

Table 1. Experimental results output by the algorithm ACMR

Decision systems |U| |C| |R| Running Time (second)

Iris Plants Database 150 4 3 0.24

BUPA Liver Disorders 345 6 3 0.56

Balance Scale Weight 625 4 4 0.82

Tic-Tac-Toe Endgame 958 9 8 17.18

Zoo 101 16 5 0.34

Chess End-Game 3196 36 29 226.99

*|U| is the cardinality of the set of objects, |C| the cardinality of the set of conditional attributes, and |R| the cardinality of the output set by the algorithm ACMR. It can be known from Table 1 that the running time of each of the chosen databases is quite short.

Page 23: Title: Attribute reduction in decision                 systems based on relation matrix

Xi’an Jiaotong UniversityXi’an Jiaotong UniversityXi’an Jiaotong University

Decision systems |R| |MR| Whether or not the output reduct is minimal

Iris Plants Database 3 3 Yes

BUPA Liver Disorders 3 3 Yes

Balance Scale Weight 4 4 Yes

Tic-Tac-Toe Endgame 8 8 Yes

Zoo 5 5 Yes

Chess End-Game 29 --- Yes

Table 2 below is used to check whether or not the reduct output by the algo

rithm ACMR is minimal.

* |MR| is the cardinality of minimal reduct. In order to check whether or not the reduct output by the algorithm ACMR is minimal, the Boolean reasoning-based algorithm in [3] is used to compute minimal reducts of the above six decision systems. The notation “---” means that the result is not obtained by the Boolean reasoning-based algorithm within three days. However, we can still conclude that the output reduct by the algorithm ACMR is minimal because the core of the dataset Chess End-Game is 27.

Page 24: Title: Attribute reduction in decision                 systems based on relation matrix

Xi’an Jiaotong UniversityXi’an Jiaotong UniversityXi’an Jiaotong University

Furthermore, a contrast between the algorithm ACMR and the algorithm in [8] (denoted by algorithm a) in terms of the running time is given below:

0

50

100

150

200

250

300

350

400

dataset1

dataset3

dataset5

Al gori thm a

Al gori thmACMR

The reason why the running time of the algorithm ACMR is less than that of the algorithm a is shown as follows: 1) For the algorithm ACMR, in the process of finding minimal reducts, unimportant attributes are gradually deleted from the search space; 2) The algorithm ACMR is designed from the viewpoint of relation matrix and the MATLAB software is of great ability in dealing with matrix computations.

Page 25: Title: Attribute reduction in decision                 systems based on relation matrix

Xi’an Jiaotong UniversityXi’an Jiaotong UniversityXi’an Jiaotong University