On Applications of Rough Sets theory to Knowledge Discovery Frida Coaquira UNIVERSITY OF PUERTO RICO...

On Applications of Rough On Applications of Rough Sets theory to Knowledge Sets theory to Knowledge

DiscoveryDiscoveryFrida Coaquira

UNIVERSITY OF PUERTO RICOMAYAGÜEZ CAMPUS

[email protected]

Introduction

One goal of the Knowledge Discovery is extract meaningfulknowledge.Rough Sets theory was introduced by Z. Pawlak (1982) as a mathematical tool for data analysis.

Rough sets have many applications in the field ofKnowledge Discovery: feature selection, discretizationprocess, data imputations and create decision Rules.

Rough set have been introduced as a tool to deal with,uncertain Knowledge in Artificial Intelligence Application.

Equivalence RelationEquivalence Relation

Let X be a set and let x, y, and z be elements of X. An equivalence relation R on X is a Relation on X such that:

Reflexive Property: xRx for all x in X.

Symmetric Property: if xRy, then yRx.

Transitive Property: if xRy and yRz, then xRz.

Rough Sets Theory

Let , be a Decision system data,

Where: U is a non-empty, finite set called the universe ,

A is a non-empty finite set of attributes, C and D are subsetsof A, Conditional and Decision attributes subsets respectively.

for is called the value set of a , The elements of U are objects, cases, states, observations.

The Attributes are interpreted as features, variables,

characteristics conditions, etc.

),,,( DCAUT

aVUa : ,Aa aV

Indiscernibility Relation

The Indecernibility relation IND(P) is an

equivalence relation.

Let , , the indiscernibility

relation IND(P), is defined as follows:

for all

Aa AP

:),{()( UUyxPIND ,Pa )}()( yaxa

Indiscernibility Relation

The indiscernibility relation defines a partition in U.

Let , U/IND(P) denotes a family of all equivalence

classes of the relation IND(P), called elementary sets.

Two other equivalence classes U/IND(C) and

U/IND(D), called condition and decision equivalence

classes respectively, can also be defined.

AP

R-lower approximation

Let and , R is a subset of conditional

features, then the R-lower approximation

set of X, is the set of all elements of U which

can be with certainty classified as elements of X.

R-lower approximation set of X is a subset of X

CR UX

}:/{ XYRUYXR

R-upper approximation

the R-upper approximation set of X, is theset of all elements of U such that:

X is a subset of R-upper approximation set of X.R-upper approximation contains all data which can possiblybe classified as belonging to the set X

the R-Boundary set of X is defined as:

}:/{ XYRUYXR

XRXRXBN )(

Representation of the approximation sets

XRXR If then, X is R-definible (the boundary set is empty) If then X is Rough with respect to R.

ACCURACY := Card(Lower)/ Card (Upper)

XRXR

Decision Class

The decision d determines the partition

of the universe U.

Where for

will be called the classification of objects

in T

determined by the decision d.

The set Xk is called the k-th decision class of T

},...,{)( )(1 drT XXdCLASS

})(:{ kxdUxX k )(1 drk

)(dCLASST

Decision Class

This system data information has 3 classes, We represent the partition: lower approximation, upper approximation and boundary set.

Rough Sets Theory

Lets consider U={x1, x2, x3, x4, x5, x6, x7, x8} and the

equivalence relation R with the equivalence classes:

X1={x1,x3,x5}, X2={x2,x4}and X3={x6,x7,x8} is a Partition.

Let the classification C={Y1,Y2,Y3} such that

Y1={x1, x2, x4}, Y2={x3, x5, x8}, Y3={x6, x7}

Only Y1has lower approximation, i.e. ,21 XYR

Positive region and Reduct

Positive region

POSR(d) is called the positive region of classification

CLASST(d) is equal to the union of all lower approximation

of decision classes.

Reducts ,are defined as minimal subset of condition

attributes which preserve positive region defined by the set

of all condition attributes, i.e.

A subset is a relative reduct iff

1 ,

2 For every proper subset condition 1 is not true.

)()( DPOSDPOS CR CR

RR '

Dependency coefficient

Is a measure of association, Dependency coefficient

between condition attributes A and a decision attribute d is

defined by the formula:

Where, Card represent the cardinality of a set.

)(

))((),(

UCard

dPOSCarddA A

Discernibility matrix

Let U={x1, x2, x3,…, xn} the universe on decision system

Data. Discernibility matrix is defined by:

,

where, is the set of all attributes that classify objects

xi and xj into different decision classes in U/D partition.

for some i, j } .

))}()(,()()(:{ jijiij xdxdDdxaxaCam nji ,...,3,2,1,

ijm

}{:{)( amCaCCORE ij

Dispensable feature

Let R a family of equivalence relations and let P R,

P is dispensable in R if IND(R) = IND(R-{P}),

otherwise P is indispensable in R.

CORE

The set of all indispensable relation in C will be called the

core of C.

CORE(C)= ∩RED(C), where RED(C) is the family of all

reducts of C.

Small Example

Let , the universe set.

, the conditional features set.

, Decision features set.

},,,,,,{ 7654321 xxxxxxxU

},,,{ 4321 aaaaC

}{dD

d

1 0 2 1 1

1 0 2 0 1

1 2 0 0 2

1 2 2 1 0

2 1 0 0 2

2 1 1 0 2

2 1 2 1 1

1a 2a 3a 4a

1x

2x

3x

4x

5x

6x

7x

{,,{, {{, {,{,,,{,,{,,, {,,,,,{,,, ,,

,,

Discernibility Matrix

-

-

- -

- -

1x2x 3x 4x 5x 6x

2x

3x

4x

5x

6x

7x

},,{ 432 aaa

}{ 2a

},{ 32 aa

},{ 42 aa

},,{ 321 aaa},,,{ 4321 aaaa

},,,{ 4321 aaaa },,{ 321 aaa

},,,{ 4321 aaaa

},,,{ 4321 aaaa

},,,{ 4321 aaaa

},{ 43 aa

},{ 43 aa},{ 43 aa},{ 21 aa

Example

Then, the Core(C) = {a2}

The partition produces by Core is

U/{a2} = {{ x1,x2 },{x5, x6,x7 },{x3,x4 }},

and the partition produces by the decision feature d is

U/{d}={{ x4},{ x1,x2 ,x7 },{x3 ,x5 ,x6 }}

Similarity relation

A similarity relation on the set of objects is

, It contain all objects similar to x.

Lower approximation

, is the set of all element of U

which can be with certainty classified as elements of X.

Upper approximation

SIM-Possitive region of partition

Let

}:{ xySIMUyxSIM TT

}:{)( XxSIMXxXSIM TT

UX

Xx

TT xSIMXSIM

)(

)}(,...,1:{ driX i

})(:{ ixdUxX i )(

1

)(}){(dr

iiTT XSIMdSIMPOS

UX

Similarity measures

a

b

are parameters, this measure is not symmetric.

Similarity for nominal attribute

minmax

1),(aa

vvvvS

ji

jia

otherwise. 0

if 1),( jia vvS ajaji vvv

aa ,

)(

1 )().(

),(),(),(

dr

k

ji

jia kdPdr

vakdPvakdPvvS

Quality of approximation of classification

Is the ratio of all correctly classified objects to all objects.

Relative Reduct

is s relative reduct for SIMA{d} iff

1)

2) for every proper subset condition 1) is not true.

)(

})){((}){(

UCard

dSIMPOSCarddSIM T

T

AR }){(}){( dSIMPOSdSIMPOS RA

Attribute Reduction

The purpose is select a subset of attributes from an Original

set of attributes to use in the rest of the process.

Selection criteria: Reduct concept description.

Reduct is the essential part of the knowledge, which define

all basic concepts.

Other methods are:• Discernibility matrix (n×n)• Generate all combination of attributes and then evaluate

the classification power or dependency coefficient (complete search).

Discretization Methods

The purpose is development an algorithm that find aconsistent set of cuts point which minimizes the number ofRegions that are consistent.Discretization methods based on Rough set theory try to findThese cutpoints A set of S points P1, …, Pn in the plane R2 , partitioned intotwo disjoint categories S1, S2 and a natural number T. Is there a consistent set of lines such that the partition of theplane into region defined by them consist of at most Tregions?

Consistent

Def. A set of cuts P is consistent with A (or A-consistent) iff,

where and are general decisions of A and AP

respectively.

Def. A set Pirr of cuts is A-irreducible iff Pirr is A-consistent

and any its proper subfamily P’ ( P’ PPirr) is not

A-inconsistent.

PAA A

PA

Level of Inconsistency

Let B a subset of A and

Where Xi is a classification of U and

, i = 1,2,…,n

Lc represents the percentage of instances which can beCorrectly classified into class Xi with respect to subset B.

U

XBL i

c

ji XX

UX i

Imputation Data

The rules of the system should have Maximum in terms of consistency.

The relevant attributes for x is defined by.

is defined }

And the relation

for all

x and y are consistent if .Example

Let x=(1,3,?,4), y=(2,?,5,4) and z=(1,?,5,4)

x and z are consistent

x and y are not consistent

)(:{)( xaRaxrelR

)()( yaxayxRc )()( yrelxrela RR yxRc

zxRc

Decision rules

F1F1 F2F2 F3F3 F4F4 DD RuleRuless

O3O3 00 00 00 11 LL R1R1

O5O5 00 00 11 33 LL R1R1

O1O1 00 11 00 22 LL R2R2

O4O4 00 11 11 00 MM R3R3

O2O2 11 11 00 22 HH R4R4

Rule1 if (F2=0) then (D=L)Rule2 if (F1=0) then (D=L)Rule3 if (F4=0) then (D=M)Rule4 if (F1=0) then (D=H)

The algorithm should minimize the number of features included in decision rules.

ReferencesReferences

[1] Gediga, G. And Duntsch, I. (2002) Maximum Consistency of [1] Gediga, G. And Duntsch, I. (2002) Maximum Consistency of Incomplete Data Via Non-invasive Imputation. Artificial Incomplete Data Via Non-invasive Imputation. Artificial Intelligence. Intelligence.

[2] Grzymala, J. and Siddhave, S. (2004) Rough set Approach to Rule [2] Grzymala, J. and Siddhave, S. (2004) Rough set Approach to Rule Induction from Incomplete Data. Proceeding of the IPMU’2004, Induction from Incomplete Data. Proceeding of the IPMU’2004, the10th International Conference on information Processing and the10th International Conference on information Processing and Management of Uncertainty in Knowledge-Based System.Management of Uncertainty in Knowledge-Based System.

[3] Pawlak, Z. (1995) Rough sets. Proccedings of the 1995 ACM 23rd [3] Pawlak, Z. (1995) Rough sets. Proccedings of the 1995 ACM 23rd annual conference on computer science.annual conference on computer science.

[4]Tay, F. and Shen, L. (2002) A modified Chi2 Algorithm for [4]Tay, F. and Shen, L. (2002) A modified Chi2 Algorithm for Discretization. In IEEE Transaction on Knowledge and Data Discretization. In IEEE Transaction on Knowledge and Data engineering, Vol 14, No. 3 may/june.engineering, Vol 14, No. 3 may/june.

[5] Zhong, N. (2001) Using Rough Sets with Heuristics for Feature [5] Zhong, N. (2001) Using Rough Sets with Heuristics for Feature Selection. Journal of Intelligent Information Systems, 16, 199-214, Selection. Journal of Intelligent Information Systems, 16, 199-214, Kluwer Academic Publishers. Kluwer Academic Publishers.

THANK YOU!THANK YOU!

On Applications of Rough Sets theory to Knowledge Discovery Frida Coaquira UNIVERSITY OF PUERTO RICO...

Documents

Transcript of On Applications of Rough Sets theory to Knowledge Discovery Frida Coaquira UNIVERSITY OF PUERTO RICO...