Data mining technique for classification and feature evaluation using stream mining(ranjit banshpal)

18
Data Mining Technique For Classification and Feature Evaluation Using Stream Mining Ranjit R. Banshpal

description

Data mining technique for classification and feature evaluation using stream mining

Transcript of Data mining technique for classification and feature evaluation using stream mining(ranjit banshpal)

Page 1: Data mining technique for classification and feature evaluation using stream mining(ranjit banshpal)

Data Mining Technique For Classification and Feature Evaluation

Using Stream Mining

Ranjit R. Banshpal

Page 2: Data mining technique for classification and feature evaluation using stream mining(ranjit banshpal)

•Introduction

•Data streams classification

•Decision Tree

•VFDT

•Challenges

•Applications

•Conclusion

•References

OUTLINE

Page 3: Data mining technique for classification and feature evaluation using stream mining(ranjit banshpal)

IntroductionIntroduction

• What is Data mining ?

• Extracting knowledge from historical data.

• What is Data stream Mining ?

• Extracting knowledge from real high stream data

• Why we use Data stream Mining ?

Page 4: Data mining technique for classification and feature evaluation using stream mining(ranjit banshpal)

Network Traffic Data

Sensor Data Call Center Data

Continue flow Data

Examples:

Introduction (Cont…)Introduction (Cont…)

Page 5: Data mining technique for classification and feature evaluation using stream mining(ranjit banshpal)

5

• Uses past labeled data to build classification model

• Predicts the labels of future instances using the model

• Helps decision making

Data Stream ClassificationData Stream Classification

Network traffic

Classification model

Attack traffic

Firewall

Block and quarantine

Benign traffic

Server

Model

update

Expert analysis and labeling

Page 6: Data mining technique for classification and feature evaluation using stream mining(ranjit banshpal)

Decision TreesDecision Trees

• Decision tree is a classification model. Its structure is a like a general tree structure or flow chart.– Internal node: It is used for testing the attribute

value.

–Leaf node: class labels.

Fig: Decision Tree of Weather

Page 7: Data mining technique for classification and feature evaluation using stream mining(ranjit banshpal)

Decision Tree (cont...)Decision Tree (cont...)

• Limitations–Classic decision tree assume all training data

can be simultaneously stored in main memory.

–Disk-based decision tree repeatedly read training data from disk sequentially.

Page 8: Data mining technique for classification and feature evaluation using stream mining(ranjit banshpal)

VFDTVFDT

• VFDT takes less time as compare to Decision tree.

• In order to find the best attribute at a node, it will take small subset of

the training examples that pass through that node.

– Given a stream of examples, use the first ones to choose the

root attribute.

– Once the root attribute is chosen, the successive examples

are passed down to the corresponding leaves, and used to

choose the attribute there, and so on recursively.

Page 9: Data mining technique for classification and feature evaluation using stream mining(ranjit banshpal)

VFDT (cont...)VFDT (cont...)

Data Stream

Data Stream

(Gender)-Type) (Car_

GG_

Age<30?

Yes

Yes No

Age<30?

Car Type=Sports Car?

No

Yes

Yes No

No

Car Type= normal

Page 10: Data mining technique for classification and feature evaluation using stream mining(ranjit banshpal)

• Infinite length

• Concept-drift

• Concept-evolution

• Feature Evolution

ChallengesChallenges

Page 11: Data mining technique for classification and feature evaluation using stream mining(ranjit banshpal)

classifier Ensemble M

outlier detection moduleBuffer outliers instances.

Clusters instances in

Buffer

cluster isTransform

ed

into a pseudopoin

t data

structure

clusters clusters

clusters

Centroid,Weight,radiusCentroid,Weight,radi

usCentroid,Weight,radiusCentroid,Weight,radius

Set of Pseudopoint H

The data stream is divided into equal sized chunks(Input)

Calculate q-NSC value Assigned to every instance in Pseudopoint

If tp is greater than the threshold

corresponding classifier

votes in favor

of a another class

Another instance

algorithm

Fig: Work flow for Identifying concept evolution.

Page 12: Data mining technique for classification and feature evaluation using stream mining(ranjit banshpal)

Feature-EvolutionFeature-Evolution

Page 13: Data mining technique for classification and feature evaluation using stream mining(ranjit banshpal)

•Applicable to many domains such as•Intrusion detection system.

•Share Market Data.

•Security Monitoring.

•Network monitoring and traffic engineering.

•Business : credit card transaction flows.

•Telecommunication calling records.

•Web logs and web page click streams.

ApplicationsApplications

Page 14: Data mining technique for classification and feature evaluation using stream mining(ranjit banshpal)

• In data stream classification VFDT algorithm is efficient to

classified high dimensional data in to the another class.

• Then, VFDT shows two key mechanisms of the another class

detection technique, outlier detection, and multiple class

detection.

ConclusionConclusion

Page 15: Data mining technique for classification and feature evaluation using stream mining(ranjit banshpal)

ReferencesReferences[1] Mohammad M. Masud, Qing Chen, Latifur Khan, Charu C. Aggarwal, JingGao,

Jiawei Han, “Classification and Adaptive Novel Class Detection of Feature-Evolving

Data Streams”, IEEE Tran. on Knowledge And Data Engi., Vol. 25, No. 7, July

2013.

[2] Durga Toshniwal, Yogita K,“Clustering Techniques for Streaming Data–A

Survey”, 3rd IEEE International Advance Computing Conference (IACC), 2013.

[3] S. Hashemi, Y. Yang, Z. Mirzamomen, and M. Kangavari, “Adapted One-versus-

All Decision Trees for Data Stream Classi-fication,” IEEE Trans. Knowledge and

Data Eng., vol. 21, no. 5, pp. 624-637, May 2012.

[4] A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavalda,“New Ensemble

Methods for Evolving Data Streams,” Proc. ACMSIGKDD 15th Int’l Conf.

Knowledge Discovery and Data Mining,pp. 139-148, 2011.

Page 16: Data mining technique for classification and feature evaluation using stream mining(ranjit banshpal)

[5] C.C. Aggarwal, “On Classification and Segmentation of Massive Audio Data

Streams,” Knowledge and Information System, vol. 20, pp. 137-156, July 2009.

[6] M.M. Masud, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Classification

and Novel Class Detection in Concept-Drifting Data Streams under Time

Constraints,” IEEE Trans. Knowledge and Data Eng.,vol. 23, no. 6, pp. 859-874,

June 2011.

[7] M.M. Masud, Q. Chen, L. Khan, C. Aggarwal, J. Gao, J. Han, and B.M.

Thuraisingham, “Addressing Concept-Evolution in Concept-Drifting Data Streams,”

Proc. IEEE Int’l Conf. Data Mining (ICDM), pp. 929-934, 2010.[8] M.-Y. Yeh, B.-R. Dai, and M.-S. Chen, “Clustering over multiple evolving streams by events and correlations,” IEEE Trans. on Knowl. and Data Eng., vol. 19, no. 10, pp. 1349–1362, Oct. 2009

ReferencesReferences

Page 17: Data mining technique for classification and feature evaluation using stream mining(ranjit banshpal)

Any Questions?

Page 18: Data mining technique for classification and feature evaluation using stream mining(ranjit banshpal)

THANK YOUTHANK YOU