Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth...

68
Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth [email protected] http://users.aber.ac.uk/rkj

Transcript of Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth...

Page 1: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Fuzzy-rough data mining

Richard JensenAdvanced Reasoning GroupUniversity of Aberystwyth

[email protected]://users.aber.ac.uk/rkj

Page 2: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Outline

• Knowledge discovery process

• Fuzzy-rough methods– Feature selection and extensions– Instance selection– Classification/prediction– Semi-supervised learning

Page 3: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Knowledge discovery• The process

• The problem of too much data– Requires storage– Intractable for data mining algorithms– Noisy or irrelevant data is misleading/confounding

Page 4: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Feature Selection

Page 5: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Feature selection• Why dimensionality reduction/feature selection?

• Growth of information - need to manage this effectively• Curse of dimensionality - a problem for machine learning and data

mining• Data visualisation - graphing data

High dimensionaldata

DimensionalityReduction

Low dimensionaldata

Processing System

Intractable

Page 6: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Why do it?• Case 1: We’re interested in features– We want to know which are relevant– If we fit a model, it should be interpretable

• Case 2: We’re interested in prediction– Features are not interesting in themselves– We just want to build a good classifier (or other kind of

predictor)

Page 7: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Feature selection process• Feature selection (FS) preserves data semantics by selecting

rather than transforming

• Subset generation: forwards, backwards, random…• Evaluation function: determines ‘goodness’ of subsets• Stopping criterion: decide when to stop subset search

Generation Evaluation

StoppingCriterion

Validation

Feature set Subset

Subsetsuitability

Continue Stop

Page 8: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Fuzzy-rough feature selection

Page 9: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Fuzzy-rough set theory

• Problems:– Rough set methods (usually) require data

discretization beforehand– Extensions, e.g. tolerance rough sets, require

thresholds– Also no flexibility in approximations• E.g. objects either belong fully to the lower (or upper)

approximation, or not at all

Page 10: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Fuzzy-rough sets

implicator

t-norm

Fuzzy-rough set

Rough set

Page 11: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Fuzzy-rough feature selection

• Based on fuzzy similarity

• Lower/upper approximations

|minmax|

|)()(|1),(

aa

yaxayxRa

)},({),( yxRTyxRaP

Pa

(e.g.)

Page 12: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

FRFS: evaluation function

• Fuzzy positive region #1

• Fuzzy positive region #2 (weak)

• Dependency function

Page 13: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

FRFS: finding reducts

• Fuzzy-rough QuickReduct– Evaluation: use the dependency function (or other

fuzzy-rough measure)

– Generation: greedy hill-climbing

– Stopping criterion: when maximal evaluation function is reached (or to degree α)

Page 14: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

FRFS

• Other search methods– GAs, PSO, EDAs, Harmony Search, etc– Backward elimination, plus-L minus-R, floating

search, SAT, etc

• Other subset evaluations– Fuzzy boundary region– Fuzzy entropy– Fuzzy discernibility function

Page 15: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Ant-based FS

Page 16: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Boundary regionUpperApproximation

Set X

LowerApproximation

Equivalence class [x]B

Page 17: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

FRFS: boundary region• Fuzzy lower and upper approximation define

fuzzy boundary region

• For each concept, minimise the boundary region– (also applicable to crisp RSFS)

• Results seem to show this is a more informed heuristic (but more computationally complex)

Page 18: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Finding smallest reducts• Usually too expensive to search exhaustively for

reducts with minimal cardinality

• Reducts found via discernibility matrices through, e.g.:– Converting from CNF to DNF (expensive)– Hill-climbing search using clauses (non-optimal)– Other search methods - GAs etc (non-optimal)

• SAT approach– Solve directly in SAT formulation– DPLL approach ensures optimal reducts

Page 19: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Fuzzy discernibility matrices• Extension of crisp approach– Previously, attributes had {0,1} membership to clauses– Now have membership in [0,1]

• Fuzzy DMs can be used to find fuzzy-rough reducts

Page 20: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Formulation

• Fuzzy satisfiability

• In crisp SAT, a clause is fully satisfied if at least one variable in the clause has been set to true

• For the fuzzy case, clauses may be satisfied to a certain degree depending on which variables have been assigned the value true

Page 21: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Example

Page 22: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

DPLL algorithm

Page 23: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Experimentation: results

Page 24: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

FRFS: issues

• Problem – noise tolerance!

Page 25: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Vaguely quantified rough sets

y belongs to the lower approximation of A iff all elements of Ry belong to A

y belongs to the upper approximation of A iff at least one element of Ry belongs to A

y belongs to the lower approximation of A iff most elements of Ry belong to A

y belongs to the upper approximation of A iff at least some elements of Ry belong to A

Pawlakrough set

VQRS

Page 26: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

VQRS-based feature selection

• Use the quantified lower approximation, positive region and dependency degree– Evaluation: the quantified dependency (can be

crisp or fuzzy)– Generation: greedy hill-climbing– Stopping criterion: when the quantified positive

region is maximal (or to degree α)

• Should be more noise-tolerant, but is non-monotonic

Page 27: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

ProgressRough set

theory

Fuzzy rough set theory

VQRS

OWA-FRFS

Fuzzy VPRS

...

Qualitative data

Quantitative data

Noisy data

Monotonic

Page 28: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

More issues...

• Problem #1: how to choose fuzzy similarity?

• Problem #2: how to handle missing values?

Page 29: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Interval-valued FRFS

• Answer #1: Model uncertainty in fuzzy similarity by interval-valued similarity

IV fuzzy rough set

IV fuzzy similarity

Page 30: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Interval-valued FRFS

• When comparing two object values for a given attribute – what to do if at least one is missing?

• Answer #2: Model missing values via the unit interval

Page 31: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Other measures

• Boundary region

• Discernibility function

Page 32: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Initial experimentationOriginal Dataset

Cross-validation folds

Type-1 FRFS

Reduced folds

JRip

Data corruption

IV-FRFS methods

Reduced folds

JRip

Page 33: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Initial experimentation

Page 34: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Initial results: lower approx

Page 35: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Instance Selection

Page 36: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Instance selection: basic ideas

Not needed

Remove objects to keep the underlying approximations unchanged

Page 37: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Instance selection: basic ideas

Remove objects whose positive region membership is < 1

Noisy objects

Page 38: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

FRIS-I

Page 39: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

FRIS-II

Page 40: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

FRIS-III

Page 41: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Fuzzy rough instance selection

• Time complexity is a problem for FRIS-II and FRIS-III

• Less complex: Fuzzy rough prototype selection– More on this later...

Page 42: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Fuzzy-rough classification and prediction

Page 43: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

FRNN/VQNN

Page 44: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

FRNN/VQNN

Page 45: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Further developments• FRNN and VQNN have limitations (for classification

problems)– FRNN only uses one neighbour– VQNN equivalent to FNN if the same similarity relation

is used

• POSNN uses the positive region to also consider the quality of neighbours– E.g. instances in overlapping class regions are less

interesting– More on this later...

Page 46: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Discovering rules via RST

• Equivalence classes– Form the antecedent part of a rule– The lower approximation tells us if this is

predictive of a given concept (certain rules)

• Typically done in one of two ways:– Overlaying reducts– Building rules by considering individual

equivalence classes (e.g. LEM2)

Page 47: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

QuickRules framework• The fuzzy tolerance classes used during this

process can be used to create fuzzy rules

• When a reduct is found the resulting rules cover all instances

GenerationEvaluation and

StoppingCriterion

Validation

Feature set Subset

Subsetsuitability

Continue Stop

Rule Induction

Page 48: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Harmony search approach• R. Diao and Q. Shen. A harmony search based

approach to hybrid fuzzy-rough rule induction, Proceedings of the 21st International Conference on Fuzzy Systems, 2012.

Page 49: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

a b c score

2 3 1 3

2 3 2 4

4 4 2 9

3 4 5 21

Minimise ( a – 2 ) 2 + ( b – 3 ) 4 + ( c – 1 ) 2 + 3

Musicians

Notes

Harmony

HarmonyMemory

Harmony search approach

Fitness

Page 50: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Key notion mapping

Harmony Rule set

Fitness Combined evaluation

Musician Fuzzy rule rx

Note Feature subset

Solution

Evaluation

Variable

Value

HarmonySearch

Hybrid RuleInduction

NumericalOptimisation

Page 51: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Comparison vs QuickRulesHarmonyRules56.33±10.00

QuickRules63.1±11.89

Rule cardinality distribution for dataset web of 2556 features

Page 52: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Fuzzy-rough semi-supervised learning

Page 53: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Semi-supervised learning (SSL)• Lies somewhere between supervised and

unsupervised learning

• Why use it?– Data is expensive to label/classify– Labels can also be difficult to obtain– Large amounts of unlabelled data available

• When is SSL useful?– Small number of labelled objects but large number of

unlabelled objects

Page 54: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Semi-supervised learning• A number of methods for SSL – self-learning,

generative models etc.– Labelled data objects – usually small in number– Unlabelled data objects – usually large in number– A set of features describe the objects– Class label tells us only which labelled objects belong to

• SSL therefore attempts to learn labels (or structure) for data which has no labels– Labelled data provides ‘clues’ for the unlabelled data

Page 55: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Co-training

Labelled Dataset

Learner 1 Learner 2

subset 1 subset 2

PredictionsPredictions

Unlabelled Data

Page 56: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Self-learning

Labelled Dataset

Learner

Predictions

Unlabelled Data

Labelled data objects

Page 57: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Fuzzy-rough self learning (FRSL)• Basic idea is to propagate labels using the upper and lower

approximations– Label only those objects which belong to the lower

approximation of a class to a high degree– Can use upper approximation to decide on ties

• Attempts to minimise mis-labelling and subsequent reinforcement

• Paper: N. Mac Parthalain and R. Jensen. Fuzzy-Rough Set based Semi-Supervised Learning. Proceedings of the 20th International Conference on Fuzzy Systems (FUZZ-IEEE’11), pp. 2465-2471, 2011.

Page 58: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

FRSL

Labelled dataset

Fuzzy-rough learner

Predictions

Unlabelled Data

Labelled data objects

Lower approximation

membership = 1?

Yes

No

Page 59: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Experimentation (Problem 1)

Page 60: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

SS-FCM

Page 61: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

FNN

Page 62: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

FRSL

Page 63: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Experimentation (Problem 2)

cluster 1 cluster 2 labelled 1 labelled 2

Page 64: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

SS-FCM

Page 65: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

FNN

Page 66: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

FRSL

Page 67: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

Conclusion• Looked at fuzzy-rough methods for data mining– Feature selection, finding optimal reducts– Handling missing values and other problems – Classification/prediction– Instance selection– Semi-supervised learning

• Future work– Imputation, better rule induction and instance

selection methods, more semi-supervised methods, optimizations, instance/feature weighting

Page 68: Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk .

FR methods in Weka• Weka implementations of all fuzzy-rough methods

can be downloaded from:

• KEEL version available soon (hopefully!)

http://users.aber.ac.uk/rkj/book/wekafull.jar