1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques —...

1May 3, 2023 Data Mining: Concepts and Techniques 1

Data Mining: Concepts and Techniques

— Chapter 7 —

Classification Ensemble Learning

Drawback of Single Classifier• If we do some classifiers to choose the best,

Which one is the best?– Maybe more than one classifiers meet the criteria (e.g.

same training accuracy), especially in the following situations:

• Without sufficient training data• The learning algorithm leads to different local optima

easily

2

Drawback of Single Classifier- Cont.• Potentially valuable information may be lost by

discarding the results of less-successful classifiers– E.g., the discarded classifiers may correctly classify some

samples

• The final decision must be wrong if the output of selected classifier is wrong

• The trained classifier may not be complex enough to handle the problem

3

Ensemble Methods• Combining a number of trained classifiers lead

to a better performance than any single one– Errors can be complemented by other correct

classifications– Different classifiers have different knowledge

regarding the problem• To decompose a complex problem into sub-

problems for which the solutions obtained are simpler to understand, implement, manage and update

4

General IdeaOriginal

Training data

....D1 D2 Dt-1 Dt

D

Step 1:Create Multiple

Data Sets

C1 C2 Ct -1 Ct

Step 2:Build Multiple

Classifiers

C*Step 3:

CombineClassifiers

5

Affecting Factors in ensemble methods

• Three factors affecting the accuracy:– Accuracy of individual classifiers

• How good are the individual classifiers?– Fusion Methods

• How to combine classifiers?– Diversity among classifiers

• What are the differences between decisions of various classifiers?

6

Accuracy of individual classifier• The performance of an individual classifier is

affected by– Training Dataset (sample and feature)– Learning Model (types of classifier)– Model’s Parameters (e.g. the number of neurons in

NN)

7

Examples of Ensemble Methods• How to generate an ensemble of classifiers?

– Bootstrap aggregating(Bagging) انداز خودراهمتراکم

– Boosting (AdaBoost) افزایشی

8

Bagging: Boostrap Aggregation• Example in real world: Diagnosis based on multiple doctors’ majority vote• Training

– Given a set D of d tuples, at each iteration i, a training set Di of d tuples is sampled with replacement from D (i.e., boostrap)

– A classifier model Mi is learned for each training set Di

• Classification: classify an unknown sample X – Each classifier Mi returns its class prediction– The bagged classifier M* counts the votes and assigns the class with the

most votes to X• Prediction: can be applied to the prediction of continuous values by taking the

average value of each prediction for a given test tuple• Accuracy

– Often significant better than a single classifier derived from D– For noise data: not considerably worse, more robust – Proved improved accuracy in prediction

9

Bagging• Sampling with replacement

• Build classifier on each bootstrap sample

Original Data 1 2 3 4 5 6 7 8 9 10Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7

10

Boosting• Example: Consult several doctors, based on a combination of weighted

diagnoses—weight assigned based on the previous diagnosis accuracy• How boosting works?

– Weights are assigned to each training tuple– A series of k classifiers is iteratively learned– After a classifier Mi is learned, the weights are updated to allow the

subsequent classifier, Mi+1, to pay more attention to the training tuples that were misclassified by Mi

– The final M* combines the votes of each individual classifier, where the weight of each classifier's vote is a function of its accuracy

• The boosting algorithm can be extended for the prediction of continuous values• Comparing with bagging: boosting tends to achieve greater accuracy, but it also

risks overfitting the model to misclassified data

11

Boosting• Records that are wrongly classified will have

their weights increased• Records that are classified correctly will have

their weights decreasedOriginal Data 1 2 3 4 5 6 7 8 9 10Boosting (Round 1) 7 3 2 8 7 9 4 10 6 3Boosting (Round 2) 5 4 9 4 2 5 1 7 4 2Boosting (Round 3) 4 4 8 10 4 5 4 6 3 4

• Example 4 is hard to classify

• Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds

12

Adaboost (Freund and Schapire, 1997)• Given a set of d class-labeled tuples, (X1, y1), …, (Xd, yd)• Initially, all the weights of tuples are set the same (1/d)• Generate k classifiers in k rounds. At round i,

– Tuples from D are sampled (with replacement) to form a training set Di of the same size

– Each tuple’s chance of being selected is based on its weight– A classification model Mi is derived from Di

– Its error rate is calculated using Di as a test set– If a tuple is misclssified, its weight is increased, o.w. it is decreased

• Error rate: err(Xj) is the misclassification error of tuple Xj. Classifier Mi error rate is the sum of the weights of the misclassified tuples:

• The weight of classifier Mi’s vote is)()(1log

i

i

MerrorMerror

d

jji errwMerror )()( jX

1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques —...

Documents

Transcript of 1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques —...