Data Stream Classification and Novel Class Detection

109
University of Texas at Dallas Data Stream Classification Data Stream Classification and Novel Class Detection and Novel Class Detection Mehedy Masud, Latifur Khan, Qing Chen and Bhavani Thuraisingham Department of Computer Science , University of Texas at Dallas Jing Gao, Jiawei Han Department of Computer Science , University of Illionois at Urbana Champaign Charu Aggarwal IBM T. J. Watson This work was funded in part by Aug 10, 2011 Masud et al.

description

Data Stream Classification and Novel Class Detection. Mehedy Masud, Latifur Khan, Qing Chen and Bhavani Thuraisingham Department of Computer Science , University of Texas at Dallas Jing Gao, Jiawei Han Department of Computer Science , University of Illionois at Urbana Champaign - PowerPoint PPT Presentation

Transcript of Data Stream Classification and Novel Class Detection

Page 1: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Data Stream Classification and Data Stream Classification and Novel Class DetectionNovel Class Detection

Mehedy Masud, Latifur Khan, Qing Chen and Bhavani Thuraisingham

Department of Computer Science , University of Texas at Dallas

Jing Gao, Jiawei Han

Department of Computer Science , University of Illionois at

Urbana Champaign

Charu Aggarwal

IBM T. J. WatsonThis work was funded in part by

Aug 10, 2011Masud et al.

Page 2: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Outline of The PresentationOutline of The Presentation

Background

Data Stream ClassificationNovel Class Detection

Aug 10, 2011Masud et al. 2

Page 3: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

IntroductionIntroductionCharacteristics of Data streams are:

◦ Continuous flow of

data

Network traffic

Sensor data Call center

records

◦ Examples:

Aug 10, 2011Masud et al. 3

Page 4: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Uses past labeled data to build classification modelPredicts the labels of future instances using the modelHelps decision making

Data Stream ClassificationData Stream Classification

Network traffic

Classification model

Attack traffic

Firewall

Block and quarantine

Benign traffic

Server

Model update

Expert analysis and labeling

Aug 10, 2011Masud et al. 4

Page 5: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Data Stream Classification Data Stream Classification (cont..)(cont..)

What are the applications?◦Security Monitoring◦Network monitoring and traffic

engineering.◦Business : credit card transaction

flows.◦Telecommunication calling records.◦Web logs and web page click

streams.

Aug 10, 2011Masud et al. 5

Page 6: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Infinite length

Concept-drift

Concept-evolution

Feature Evolution

ChallengesChallenges

Aug 10, 2011Masud et al. 6

Page 7: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Impractical to store and use all historical data

◦ Requires infinite storage

◦ And running time

Infinite LengthInfinite Length

Aug 10, 2011Masud et al. 7

Page 8: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Concept-DriftConcept-Drift

Negative instancePositive instance

A data chunk

Current hyperplane

Previous hyperplane

Instances victim of concept-drift

Aug 10, 2011Masud et al. 8

Page 9: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Concept-EvolutionConcept-Evolution

X X X X X X X X X X XX X X X X X XX X X X X X X X X X X X X X X X XX X X X X X

X X X X X X

Novel classy

x1

y1

y2

x

++++ ++

++ + + ++ + +++ ++ + ++ + + + ++ +

+++++ ++++ +++ + ++ + + ++ ++

+

- - - - - - - - - - - - - - -

+ + + + + + + + + + + + + + + +

- - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - - - - - -- - - - -

Classification rules:

R1. if (x > x1 and y < y2) or (x < x1 and y < y1) then class = +

R2. if (x > x1 and y > y2) or (x < x1 and y > y1) then class = -Existing classification models misclassify novel class instances

AC

D

B

y

x1

y1

y2

x

++++ ++

++ + + ++ + +++ ++ + ++ + + + ++ +

+++++ ++++ +++ + ++ + + ++ ++

+

- - - - - - - - - - - - - - -

+ + + + + + + + + + + + + + + +

- - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - - - - - -- - - - -

A

CD

B

Aug 10, 2011Masud et al. 9

Page 10: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Dynamic FeaturesDynamic FeaturesWhy new features evolving

◦ Infinite data stream

Normally, global feature set is unknown

New features may appear

◦ Concept drift

As concept drifting, new features may appear

◦ Concept evolution

New type of class normally holds new set of features

Different chunks may have different feature sets

Aug 10, 2011Masud et al. 10

Page 11: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Dynamic FeaturesDynamic Features

Feature Extraction & Selection

i + 1st chunk

ith chunk

Existing classification models need complete fixed features and apply to all the chunks. Global features are difficult to predict. One solution is using all English words and generate vector. Dimension of the vector will be too high.

Current

model

Training New Model

Feature SpaceConversion

Classification &Novel Class Detection

runway, climb

runway, clear, ramp

runway, ground, ramp

ith chunk and i + 1st chunk and models have

different feature sets

Feature Set

Aug 10, 2011Masud et al. 11

Page 12: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Outline of The PresentationOutline of The Presentation

Introduction

Data Stream Classification

Novel Class Detection

Aug 10, 2011Masud et al. 12

Page 13: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

DataStream Classification DataStream Classification (cont..) (cont..)

Single Model Incremental Classification

Ensemble – model based classification◦Supervised◦Semi-supervised◦Active learning

Aug 10, 2011Masud et al. 13

Page 14: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Single Model Incremental Classification

Ensemble – model based classification◦Data Selection ◦Semi-supervised◦Skewed Data

I

OverviewOverview

Aug 10, 2011Masud et al. 14

Page 15: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Ensemble of ClassifiersEnsemble of Classifiers

C1

C2

C3

x,?

+

+

-input

ClassifierIndividual outputs

voting

+

Ensemble output

Aug 10, 2011Masud et al. 15

Page 16: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Ensemble Classification of Ensemble Classification of Data StreamsData Streams

Divide the data stream into equal sized chunks◦ Train a classifier from each data chunk◦ Keep the best L such classifier-ensemble◦ Example: for L= 3

Data chunks

Classifiers

D1

C1

D2

C2

D3

C3

Ensemble

C1 C2 C3

D4

Prediction

D4

C4C4

C4

D5D5

C5C5

C5

D6

Labeled chunkUnlabeled chunk

Addresses infinite lengthand concept-drift

Note: Di may contain data points from different classes

Aug 10, 2011Masud et al. 16

Page 17: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

A completely new class of data arrives in the stream

Concept-Evolution ProblemConcept-Evolution Problem

y

x1

y1

y2

x

- - - - - - - - - - - - - - - -

+ + + + + + + + + + + + + + +

- - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - - - - - -- - - - -

x<x1

T F

y<y1y<y2

T F-

T F + -

D B C

++++ ++

++ + + ++ + +++ ++ + ++ + + + ++ +

+++++ ++++ +++ + ++ + + ++ ++

+

+A

A

B

C

D

(a) A decision tree, (b) corresponding feature space partitioning

(a) (b)

ECSMiner

y1

X X X X X X X X X X XX X X X X X XX X X X X X X X X X X X X X X X XX X X X X XX X X X X X

Novel classy

x1

y2

x

++++ ++

++ + + ++ + +++ ++ + ++ + + + ++ +

+++++ ++++ +++ + ++ + + ++ ++

+

- - - - - - - - - - - - - - -

+ + + + + + + + + + + + + + + +

- - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - - - - - -- - - - -

(c)

(c) A Novel class (denoted by x) arrives in the stream.

Aug 10, 2011Masud et al. 17

Page 18: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

ECSMiner: OverviewECSMiner: Overview

Last labeled chunk

Data Stream

Ensemble of L models

Newer instances (unlabeled)

Older instances (labeled)

Training

New mod

el

Update

M1

M2 ML. . .

Overview of ECSMiner algorithm

xnow

Just arrived

Outlier detectio

n

Buffer?

Classification

No

Buffering and novel

class detection

Yes

ECSMiner

Based on: Mohammad M. Masud, Jing Gao, Latifur Khan, Jiawei Han, and Bhavani Thuraisingham. “Integrating Novel Class Detection with Classification for Concept-Drifting Data Streams”. In Proceedings of 2009 European Conf. on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD’09), Bled, Slovenia, 7-11 Sept, 2009, pp 79-94 (extended version appeared in IEEE Transaction on Knowledge and Data Engineering (TKDE)).

Aug 10, 2011Masud et al. 18

Page 19: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

AlgorithmAlgorithm

Training Novel class detection and classification

ECSMiner

Aug 10, 2011Masud et al. 19

Page 20: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Novel Class DetectionNovel Class DetectionNon parametric

◦does not assume any underlying model of existing

classes

Steps:

1.Creating and saving decision boundary during

training

2.Detecting and filtering outliers

3.Measuring cohesion and separation among test and

training instances

ECSMiner

Aug 10, 2011Masud et al. 20

Page 21: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Training: Training: Creating Decision Creating Decision BoundaryBoundary

ECSMiner

++++ ++ + + +

+ +++ ++ +

+ + + + ++ +

+++ ++ ++ +++

+++++ ++++ +++ + ++ + +

++ ++ + ++

- - - - - - - - - - -- - - - - - - - - -

- -- - - - - - - - - -

- -- - - - - - - - - -

-

y

x1

y1

y2B

CA

D

x

-- - - - - - -

- - - - - - - -

+++ ++ + + + + + + + + + + +

Raw training dataClusters are created

y

x1

y1

y2

x

A

D

C

B

Pseudopoints

Addresses Infinite length problem

Aug 10, 2011Masud et al. 21

Page 22: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Outlier Detection and Outlier Detection and FilteringFiltering

x1 x

y

y1

y2B

CA

D x

x

AND

Routlier

Routlier

Routlier

Ensemble of L modelsM1 M2 ML

xTest instance

. . .

True

X is a filtered outlier (Foutlier)(potential novel class instance)

False

X is an existing class instance

Test instance inside decision boundary (not outlier)

Test instance outside decision

boundary Raw outlier or

Routlier

Routliers may appear as a result of novel class, concept-drift, or noise. Therefore, they are filtered to reduce noise as much as possible.

ECSMiner

Aug 10, 2011Masud et al. 22

Page 23: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Novel Class DetectionNovel Class Detection

AND

Routlier

Routlier

Routlier

Ensemble of L modelsM1 M2 ML

xTest instance

. . .

True

X is a filtered outlier (Foutlier)(potential novel class instance)

False

X is an existing class instance

ECSMiner

(Step 1)

(Step 2)

Compute q-NSC with all models

and other Foutliers(Step 3)

q-NSC>0

for q’>q

Foutliers with

all models

?

(Step 4)

Novel class found

Y

N Treat as existing class

Aug 10, 2011Masud et al. 23

Page 24: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Computing Cohesion & Computing Cohesion & SeparationSeparation

a(x) = mean distance from an Foutlier x to the instances in o,q(x)bmin(x) = minimum among all bc(x) (e.g. b+(x) in figure)q-Neighborhood Silhouette Coefficient (q-NSC):

a(x)),(x)bmax(

a(x)) (x)(b NSC(x)-q

min

min

If q-NSC(x) is positive, it means x is closer to Foutliers than any other class.

ECSMiner

x

o,5(x)

+,5(x)

- - - -

+ + + +

- - - -

-

+ + + + +

-,5(x)

a(x)

b+

(x)b-(x)

Aug 10, 2011Masud et al. 24

Page 25: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Speeding Up Speeding Up Computing N-NSC for every Foutlier instance x

takes quadratic time in the number of Foutliers. In order to make the computation faster,

We create Ko pseudopoints (Fpseudopoints) from Foutliers using K-means clustering,where Ko = (No/S) * K. Here S is the chunk size and No is the number of Foutliers.perform the computations on the Fpseudopoints

Thus, the time complexity

◦ to compute the N-NSC of all of the Fpseudopoints is O(Ko(Ko+K))

◦ which is constant, since both Ko and K are independent of

the input size.

◦ However, by gaining speed we lose some precision,

although the loss is negligible (to be analyzed shortly) Aug 10, 2011Masud et al. 25

Page 26: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Algorithm To Detect Novel Algorithm To Detect Novel ClassClass

ECSMiner

Aug 10, 2011Masud et al. 26

Page 27: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

““Speedup” PenaltySpeedup” PenaltyAs discussed earlier

◦by speeding up computation in step – 3, we lose

some precision since the result deviates from exact

result

◦This analysis shows that the deviation is negligiblei

j

x

i

j

(i-j)2

(x-j)2

(x-i)2

Figure 6. Illustrating the computation of deviation. i is an Fpseudopoint, i,e., a cluster of Foutliers, and j is an existing class Pseudopoint, i.e., a cluster of existing class instances. In this particular example, all instances in i belong to a novel class.

Aug 10, 2011Masud et al. 27

Page 28: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

““Speedup” PenaltySpeedup” PenaltyApproximate:

Exact:

Deviation:

Aug 10, 2011Masud et al. 28

Page 29: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Experiments - DatasetsExperiments - Datasets• We evaluated our approach on two synthetic

and two real datasets:•SynC – Synthetic data with only concept-drift.

Generated using hyperplane equation. 2 classes, 10 attributes, 250K instances•SynCN – Synthetic data with concept-drift and

novel class. Generated using Gaussian distribution. 20 classes, 40 attributes, 400K instances•KDD cup 1999 intrusion detection (10% version)

– real dataset. 23 classes, 34 attributes, 490K instances•Forest cover – real dataset. 7 classes, 54

attributes, 581K instances

Aug 10, 2011Masud et al. 29

Page 30: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Experiments - SetupExperiments - SetupDevelopment:

◦ Language: Java

H/W:

◦ Intel P-IV with

◦ 2GB memory and

◦ 3GHz dual processor CPU.

Parameter settings:

◦ K (number of pseudopoints per chunk) = 50

◦ N (minimum number of instances required to declare novel

class) = 50

◦ M (ensemble size) = 6

◦ S (chunk size) = 2,000

Aug 10, 2011Masud et al. 30

Page 31: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Experiments - BaselineExperiments - Baseline Competing approaches:

◦ i) MineClass (MC): our approach

◦ ii) WCE-OLINDDA_Parallel (W-OP)

◦ iii) WCE-OLINDDA_Single (W-OS): Where WCE-OLINDDA is a combination of the

Weighted Classifier Ensemble (WCE) and novel class detector OLINDDA, with default

parameter settings for WCE and OLINDDA

We use this combination since to the best of our knowledge there is no approach that Can

classify and detect novel classes simultaneously

OLINDDA assumes there is only one normal class, and all other classes are novel

◦ Therefore, we apply two variations –

W-OP keeps parallel OLINDDA models, one for each class

W-OS keeps a single model that absorbs a novel class when encountered

Aug 10, 2011Masud et al. 31

Page 32: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Experiments - ResultsExperiments - ResultsEvaluation metrics

◦ Mnew = % of novel class instances Misclassified as existing class

= Fn∗100/Nc

◦ Fnew = % of existing class instances Falsely identified as novel

class = Fp∗100/ (N−Nc)

◦ ERR = Total misclassification error (%)(including Mnew and Fnew)

= (Fp+Fn+Fe)∗100/N

◦ where Fn = total novel class instances misclassified as existing

class,

◦ Fp = total existing class instances misclassified as novel class,

◦ Fe = total existing class instances misclassified (other than Fp),

◦ Nc = total novel class instances in the stream,

◦ N = total instances the stream.

Aug 10, 2011Masud et al. 32

Page 33: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Experiments - ResultsExperiments - Results

Forest Cover KDD cup SynCN

Aug 10, 2011Masud et al. 33

Page 34: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Experiments - ResultsExperiments - Results

Aug 10, 2011Masud et al. 34

Page 35: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Experiments – Parameter Experiments – Parameter SensitivitySensitivity

Aug 10, 2011Masud et al. 35

Page 36: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Experiments – RuntimeExperiments – Runtime

Aug 10, 2011Masud et al. 36

Page 37: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Dynamic FeaturesDynamic FeaturesSolution:

◦ Global Features◦ Local Features◦ Union

Mohammad Masud, Qing Chen, Latifur Khan, Jing Gao, Jiawei Han,

and Bhavani Thuraisingham, “Classification and Novel Class

Detection of Data Streams in A Dynamic Feature Space,” in Proc.

of Machine Learning and Knowledge Discovery in Databases,

European Conference, ECML PKDD 2010, Barcelona, Spain, Sept

2010, Springer, Page 337-352

Aug 10, 2011Masud et al. 37

Page 38: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Feature Mapping Across Models Feature Mapping Across Models and Test Data Points and Test Data Points

Feature set varies in different chunks. Especially, when new class appears, new features should be selected and added to the feature set.

Strategy 1 – Lossy fixed (Lossy-F) conversion / Global

◦ Use the same fixed feature in the entire stream.

We call this a lossy conversion because future model and instances

may lose important features due to this mapping.

Strategy 2 – Lossy local (Lossy-L) conversion / Local

◦ We call this lossy conversion because it may loss feature values

during mapping.

Strategy 3 – Dimension preserving (D-Preserving)

Mapping / Union Aug 10, 2011Masud et al. 38

Page 39: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Feature Space Conversion – Feature Space Conversion – Lossy-L Mapping (Local)Lossy-L Mapping (Local)

Assume that each data chunk has different feature vectors

When a classification model is trained, we save the feature vector with the model

When an instance is tested, its feature vector is mapped (i.e., projected) to the model’s feature vector.

Aug 10, 2011Masud et al. 39

Page 40: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Feature Space Conversion – Feature Space Conversion – Lossy-L MappingLossy-L Mapping

For example, ◦ Suppose the model has two features (x,y)◦ The instance has two features (y,z)◦ When testing, assume the instance has

two features (x,y)◦ Where x = 0, and y value is kept as it is

Aug 10, 2011Masud et al. 40

Page 41: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Conversion Strategy II – Lossy-L Conversion Strategy II – Lossy-L MappingMapping

Graphically:

Aug 10, 2011Masud et al. 41

Page 42: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Conversion Strategy III – D-Conversion Strategy III – D-Preserving MappingPreserving Mapping

When an instance is tested, both the model’s feature vector and the instance’s feature vector are mapped (i.e., projected) to the union of their feature vectors.

◦ The feature dimension is increased.

◦ In the mapping, both the features in the testing

instance and model are preserved. The extra

features are filled with all 0s.

Aug 10, 2011Masud et al. 42

Page 43: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Conversion Strategy III – D-Conversion Strategy III – D-Preserving MappingPreserving Mapping

For example, ◦ suppose the model has three features

(a,b,c)◦ The instance has four features (b,c,d,e)◦ When testing, we project both the model’s

feature vector and the instance’s feature vector to (a,b,c,d,e)

◦ Therefore, in the model, d, and e will be considered 0s and in the instance, a will be considered 0

Aug 10, 2011Masud et al. 43

Page 44: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Conversion Strategy III – D-Conversion Strategy III – D-Preserving MappingPreserving Mapping

Previous Example

Aug 10, 2011Masud et al. 44

Page 45: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

DiscussionDiscussionLocal does not favor novel class, it favors

existing classes.

◦ Local features will be enough to model existing

classes.

Union favors novel class.

◦ New features may be discriminating for novel class,

hence Union works.

Aug 10, 2011Masud et al. 45

Page 46: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

ComparisonComparisonWhich strategy is the better?Assumption: lossless conversion (union)

preserves the properties of a novel class. In other words, if an instance belongs to a

novel class, it remains outside the decision boundary of any model Mi of the ensemble M in the converted feature space. Lemma:

If a test point x belongs to a novel class, it will be miss-classified by the ensemble M as an existing class instance under certain conditions when the Lossy-L conversion is used.

Aug 10, 2011Masud et al. 46

Page 47: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

ComparisonComparison Proof:

Let X1,…,XL,XL+1,…,XM be the dimensions of the model

and

Let X1,…,XL,XM+1,…,XN be the dimensions of the test

point

Suppose the radius of the closest cluster (in the

higher dimension) is R

Also, let the test point be a novel class instance.

Combined feature space = X1,…,XL,XL+1,…,XM,XM+1,

…,XN

Aug 10, 2011Masud et al. 47

Page 48: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

ComparisonComparison Proof (continued):

Combined feature space = X1,…,XL,XL+1,…,XM,XM+1,…,XN

Centroid of the cluster (original space): X1=x1,

…,XL=xL,XL+1=xL+1,…,XM=xM i.e., x1,…,xL, xL+1,…,xM

Centroid of the cluster (combined space): x1,…,xL, xL+1,

…,xM , 0,…,0

Test point (original space):

X1=x’1,…,XL=x’L,XM+1=x’M+1,…,XN=x’N i.e., x1,…,xL,

x’M+1,…,x’N

Test point (combined space): x’1,…,x’L, 0,…,0, x’M+1,

…,x’N

Aug 10, 2011Masud et al. 48

Page 49: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

ComparisonComparison Proof (continued):

Centroid (combined spc): x1,…,xL, xL+1,…,xM , 0 ,…, 0

Test point (combined space): x’1,…,x’L, 0,…, 0, x’M+1,…,x’N

R2< ((x1 –x’1)2+,…, +(xL –x’L)2+ x2L+1+…+x2

M)+ (x’2M+1+…

+x’2N)

R2< a2 + b2

R2 = a2 + b2 - e2 (e2 >0)

a2 = R2 + (e2 – b2)

a2 < R2 (provided that e2 < b2)

Therefore, in Lossy-L conversion, the test point will not be

an outlier

Aug 10, 2011Masud et al. 49

Page 50: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Baseline ApproachesBaseline Approaches WCE is Weighted Classifier Ensemble1, which addresses

multi-class ensemble classifier.

OLINDDA is a novel class detector 2 works only for binary

class.

FAE algorithm is an ensemble classifier that addresses

feature evolution3 and concept drift.

ECSMiner is a multi-class ensemble classifier that addresses

concept drift and concept evolution4.

Aug 10, 2011Masud et al. 50

Page 51: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Approaches ComparisonApproaches Comparison

Proposed techniqu

es

Challenges

Infinite

length

Concep

t-drift

Concept-

evolution

Dynamic

Features

OLINDDA

WCE

FAE

ECSMiner

DXMiner

Aug 10, 2011Masud et al. 51

Page 52: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Experiments: DatasetsExperiments: Datasets We evaluated our approach on different datasets:

Data SetConcep

t Drift

Concept

Evolutio

n

Dynamic

Feature

# of

Instanc

e

# of

Class

KDD 492K 7

Forest Cover 387K 7

NASA 140K 21

Twitter 335K 21

Aug 10, 2011Masud et al. 52

Page 53: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Experiments: ResultsExperiments: Results

Evaluation metrics: let

◦ Fn = total novel class instances misclassified as

existing class,

◦ Fp = total existing class instances misclassified as

novel class,

◦ Fe = total existing class instances misclassified (other

than Fp),

◦ Nc = total novel class instances in the stream,

◦ N = total instances the stream

Aug 10, 2011Masud et al. 53

Page 54: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Experiments: ResultsExperiments: ResultsWe use the following performance metrics to

evaluate our technique:

◦ Mnew = % of novel class instances Misclassified

as existing class, i.e,

◦ Fnew = % of existing class instances Falsely identified

as novel class, i.e.,

◦ ERR = Total misclassification error (%)(including Mnew and Fnew), i.e.,

Aug 10, 2011Masud et al. 54

Page 55: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Experiments: SetupExperiments: SetupDevelopment:

◦ Language: Java

H/W:

◦ Intel P-IV with

◦ 3GB memory and

◦ 3GHz dual processor CPU.

Parameter settings:

◦ K (number of pseudo points per chunk) = 50

◦ q (minimum number of instances required to declare novel

class) = 50

◦ L (ensemble size) = 6

◦ S (chunk size) = 1,000

Aug 10, 2011Masud et al. 55

Page 56: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Experiments: BaselineExperiments: BaselineCompeting approaches:

◦ i) DXMiner (DXM): our approach- 4 variations:

Lossy-F conversion

Lossy-L conversion

D-Preserving conversion

◦ ii) FAE-WCE-OLINDDA_Parallel (W-OP)

◦ Assumes there is only one normal class, and all other classes

are novel . W-OP keeps parallel OLINDDA models, one for each

class

We use this combination since to the best of our knowledge there is no

approach that can classify and detect novel classes simultaneously with

feature evolution.

◦ iii) FAE-ECSMiner

Aug 10, 2011Masud et al. 56

Page 57: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Twitter ResultsTwitter Results

Aug 10, 2011Masud et al. 57

Page 58: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Twitter ResultsTwitter Results

D-

preserving

Lossy -

Local

Lossy-

GlobalO-F

AUC 0.88 0.83 0.76 0.56

Aug 10, 2011Masud et al. 58

Page 59: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

NASA DatasetNASA Dataset

Deviatio

nInfo Gain O-F

AUC 0.996 0.967 0.876

Aug 10, 2011Masud et al. 59

Page 60: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Forest Cover ResultsForest Cover Results

Aug 10, 2011Masud et al. 60

Page 61: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Forest Cover ResultsForest Cover Results

D-

preservingO-F

AUC 0.97 0.74Aug 10, 2011Masud et al. 61

Page 62: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

KDD ResultsKDD Results

Aug 10, 2011Masud et al. 62

Page 63: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

KDD ResultsKDD Results

D-

preserving

FAE-

Olindda

AUC 0.98 0.96Aug 10, 2011Masud et al. 63

Page 64: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Summary ResultsSummary Results

Aug 10, 2011Masud et al. 64

Page 65: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Improved Outlier Detection and Multiple Novel Improved Outlier Detection and Multiple Novel Class DetectionClass Detection

Challenges◦ High false positive (FP) (existing classes detected as novel) and

false negative (FN) (missed novel classes) rates ◦ Two or more novel classes arrive at a time

Solutions1

◦ Dynamic decision boundary – based on previous mistakes

Inflate the decision boundary if high FP, deflate if high FN

◦ Build statistical model to filter out noise data and concept drift from the outliers.

◦ Multiple novel classes are detected by Constructing a graph where outlier cluster is a vertex Merging the vertices based on silhouette coefficient Counting the number of connected components in the resultant (i.e., merged)

graph

1 Mohammad M. Masud, Qing Chen, Jing Gao, Latifur Khan, Charu Aggarwal, Jiawei Han, and Bhavani Thuraisingham, Addressing Concept-Evolution in Concept-Drifting Data Streams, In Proc ICDM ’10, Sydney, Australia, Dec 14-17, 2010.

Proposed Methods

Aug 10, 2011Masud et al. 67

Page 66: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Outlier Threshold (OUTTH)Outlier Threshold (OUTTH)

To declare a testing instance being an outlier, using cluster

radius r is not enough because of the data noise

x

o,5(x)

+,5(x)

+ + + + +

a(x)

b+

(x)

+ +

◦ So, beyond the radius r, a threshold (OUTTH) will

be setup, so that most noisy data around model

cluster will be classified immediately

Proposed Methods

Aug 10, 2011Masud et al. 68

Page 67: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Outlier Threshold (OUTTH)Outlier Threshold (OUTTH)

Every instance outside the cluster range has a weight

◦ If wt(x) >= OUTTH, this instance will be consider as

existing class.

◦ If wt(x) < OUTTH, this instance will be an outlier.

Pros:

◦ Noisy data will be classified immediately

Cons

◦ OUTTH is hard to be determined

Noisy data and novel class instance may occur simultaneously

Different dataset may have different OUTTH

Proposed Methods

))(exp()( rbxwt x

Aug 10, 2011Masud et al. 69

Page 68: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Outlier Threshold (OUTTH)Outlier Threshold (OUTTH)

If threshold is too high, noisy data may become outlier

◦ FP rate will go up

If threshold is too low, novel class instance will be labeled as

existing class

◦ FN rate will go up

Proposed Methods

x

o,5(x)

+,5(x)

+ + + + +

a(x)

b+

(x)

+ +

OUTTH = ?

We need to balance on these two

Aug 10, 2011Masud et al. 70

Page 69: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Introduction

Data Stream Classification

Clustering Novel Class Detection

• Finer Grain Novel Class Detection

• Dynamic Novel Class Detection

• Multiple Novel Class Detection

Aug 10, 2011Masud et al. 71

Page 70: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Dynamic threshold settingDynamic threshold settingProposed Methods

◦ Defer approach

After a testing chunk has been labeled, based on the marginal FP and FN rate of the this testing chunk

update the OUTTH, and then apply the new OUTTH to the next testing chunk

◦ Eager approach

What is marginal FP or marginal FN

Once a marginal FP or marginal FN instance detected, update OUTTH with step function, and apply the

updated OUTTH to the next testing instance

x

+ + + + +

a(x)

+ +

Marginal FP

Marginal FN

Aug 10, 2011Masud et al. 72

Page 71: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Dynamic threshold settingDynamic threshold settingProposed Methods

Aug 10, 2011Masud et al. 73

Page 72: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Defer approach and Eager Defer approach and Eager approach comparisonapproach comparison

In Defer approach, OUTTH updates after a data chunk

is labeled

◦ Too late – In the testing chunk, many marginal FP or FN

may occur due to an improper OUTTH threshold

◦ Overreact – If there are many marginal FP or FN instances

in the labeled testing chunk, the OUTTH update may

overreact for the next testing chunk

In Eager approach, OUTTH updates aggressively

whenever marginal FP or FN happens.

◦ The model is more tolerate to noisy data and concept drift.

◦ The model is more sensitive to novel class instances.

Proposed Methods

Aug 10, 2011Masud et al. 74

Page 73: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Outliers StatisticsOutliers Statistics For each outlier instance, we calculate the novelty

probability Pnov

◦ If Pnov is large (close to 1), indicates that the outlier has a

high probability of being a novel instance.

Pnov contains two parts

◦ The first part measures how far the outlier being away from

the model cluster

◦ The second part Psc is the Silhouette Coefficient, measures

the cohesion and separation to the model cluster of the q-

Neighbors of the outlier

Proposed Methods

scnov Pxwt

xwtxP

))(min(1

)(1)(

Aug 10, 2011Masud et al. 75

Page 74: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Outliers StatisticsOutliers Statistics

• Concept Drift

• Novel Class

Three scenarios may occur simultaneously

Proposed Methods

• Noise Data

Aug 10, 2011Masud et al. 76

Page 75: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Outlier Statistics Gini Outlier Statistics Gini AnalysisAnalysis The Gini coefficient is a measure of statistical

inequality. The discrete Gini coefficient is:

If we divide 0~1 into n equal size bin, and put all outlier

Pnov into corresponding bin, then we can get cdf yi

◦ If all Pnov is very low, to an extreme cdf yi = 1

◦ If all Pnov are very high, to an extreme cdf yi =0; except

yn=1

Proposed Methods

n

i i

n

i i

y

yinn

nsG

1

11

211

)(

0121

1121

1121

1)( 11

1

1

n

inn

nn

inn

ny

yinn

nsG

n

i

n

in

i i

n

i i

n

nn

ny

yinn

nsG

n

i i

n

i i 121

1121

1)(

1

1

Aug 10, 2011Masud et al. 77

Page 76: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Outlier Statistics Gini Outlier Statistics Gini AnalysisAnalysis

◦ If all outlier Pnov distribute evenly, yi =i/n

Proposed Methods

n

n

n

n

n

n

nn

nnnn

ni

inn

n

i

iinn

ny

yinn

nsG

n

i

n

i

n

i

n

in

i i

n

i i

3

1

3

)12(21

)1(

2

6

)12)(1(21

1121

1

121

1121

1)(

1

1

2

1

1

1

1

After get the outlier Pnov distribution, calculate G(s)

If G(s)> , declare novel class

If G(s) <= , classified the outlier as existing

class instance.

When n ∞, 0.33

n

n

3

1

n

n

3

1

n

n

3

1

Aug 10, 2011Masud et al. 78

Page 77: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Outlier Statistics Gini Analysis Outlier Statistics Gini Analysis LimitationLimitation

◦ To an extreme, it is impossible the differentiate concept drift and

concept evolution by Gini coefficient, when concept drift is just

“looks like” concept evolution.

Proposed Methods

Aug 10, 2011Masud et al. 79

Page 78: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Introduction

Data Stream Classification

Clustering Novel Class Detection

• Finer Grain Novel Class Detection

• Dynamic Novel Class Detection

• Multiple Novel Class Detection

Aug 10, 2011Masud et al. 80

Page 79: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Multi Novel Class DetectionMulti Novel Class DetectionProposed Methods

Data Stream

Novel class A

y

x1

y1

y2

x x1

y2

x

Positive Instance

Negative InstanceNovel Instance

Novel class B

y2

If we always assume novel instances belong to one novel type, one type of novel instances, either A or B, will be misclassified.

Aug 10, 2011Masud et al. 81

Page 80: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Multi Novel Class DetectionMulti Novel Class DetectionProposed Methods

The main idea in detecting multiple novel classes is to

construct a graph, and identify the connected

components in the graph.

The number of connected components determines the

number of novel classes.

Aug 10, 2011Masud et al. 82

Page 81: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Multi Novel Class DetectionMulti Novel Class DetectionProposed Methods

Two Phases:

◦ Building the connected graph

Build directed nearest neighbor graph.

From each vertex (outlier cluster), add

edge from this vertex to its nearest

neighbor.

Silhouette coefficient from the vertex to

its nearest neighbor is larger than some

threshold, the edge will be removed.

Problem: Linkage Circle

◦ Component merging phase

Gaussian distribution centric

decision

Aug 10, 2011Masud et al. 83

Page 82: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Multi Novel Class DetectionMulti Novel Class DetectionProposed Methods

◦ Component merging phase

In probability theory, “ the normal (or Gaussian) distribution, is a continuous

probability distribution that is often used as a first approximation to describe real-

valued random variables that tend to cluster around a single mean value” 1

If two Gaussian Distribution variables (g1, g2) can be separated, the following condition

will be hold:

Since μ is proportion to σ, if the two variables (components) will remain separated;

otherwise, these two components will be merged.

2121 ),(_ ggdistcentroidd

2)10(

2)(

2)

2(

2

1

2

2

2

1 2

2

2

2

2

2

2

2

2

0

2

0 2

22

2

2

0

2

2

eex

dedxxexx

)(),(_ 2121 cggdistcentroidd

1. Amari Shunichi, Nagaoka Hiroshi. Methods of information geometry. Oxford University Press. ISBN 0-8218-0531-2, 2000.

Aug 10, 2011Masud et al. 84

Page 83: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Experiments: DatasetsExperiments: Datasets We evaluated our approach on different datasets:

Experiment Results

Data SetConcept Drift

Concept Evolution

Dynamic Feature

# of Instance

# of Class

KDD 492K 7

Forest Cover 387K 7

NASA 140K 21

Twitter 335K 21

SynED 400K 20

Aug 10, 2011Masud et al. 85

Page 84: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Experiments: SetupExperiments: Setup Development:

◦ Language: Java

H/W:

◦ Intel P-IV with

◦ 3GB memory and

◦ 3GHz dual processor CPU.

Parameter settings:

◦ K (number of pseudo points per chunk) = 50

◦ q (minimum number of instances required to declare novel class) =

50

◦ L (ensemble size) = 6

◦ S (chunk size) = 1,000

Experiment Results

Aug 10, 2011Masud et al. 86

Page 85: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Experiments: BaselineExperiments: Baseline

Competing approaches:

◦ i) DEMminer our approach- 5 variations:

Lossy-F conversion

Lossy-L conversion

Lossless conversion - DEMminer

Dynamic OUTTH + Lossless conversion - DEMminer-Ex (without Gini)

Dynamic OUTTH + Gini + Lossless conversion - DEMminer-Ex

◦ ii) WCE-OLINDDA (O-W)

◦ iii) FAE-WCE-OLINDDA_Parallel (O-F)

We use this combination since to the best of our knowledge there is no approach

that can classify and detect novel classes simultaneously with feature evolution.

Aug 10, 2011Masud et al. 87

Page 86: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Experiments: ResultsExperiments: Results Evaluation metrics:

◦ Fn = total novel class instances misclassified as existing

class,

◦ Fp = total existing class instances misclassified as novel

class,

◦ Fe = total existing class instances misclassified (other than

Fp),

◦ Nc = total novel class instances in the stream,

◦ N = total instances the stream

Experiment Results

Aug 10, 2011Masud et al. 88

Page 87: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Twitter ResultsTwitter ResultsExperiment Results

Aug 10, 2011Masud et al. 89

Page 88: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Twitter ResultsTwitter Results

DEMminer Lossy -L Lossy-F O-F

AUC 0.88 0.83 0.76 0.56

Experiment Results

Aug 10, 2011Masud et al. 90

Page 89: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Twitter ResultsTwitter ResultsExperiment Results

Aug 10, 2011Masud et al. 91

Page 90: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Twitter ResultsTwitter Results

DEMminer-Ex DEMminer OW

AUC 0.94 0.88 0.56

Experiment Results

Aug 10, 2011Masud et al. 92

Page 91: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Forest Cover ResultsForest Cover ResultsExperiment Results

Aug 10, 2011Masud et al. 93

Page 92: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Forest Cover ResultsForest Cover Results

DEMminerDEMminer-Ex

(without Gini)DEMminer-Ex OW

AUC 0.97 0.99 0.97 0.74

Experiment Results

Aug 10, 2011Masud et al. 94

Page 93: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

NASA DatasetNASA DatasetExperiment Results

Aug 10, 2011Masud et al. 95

Page 94: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

NASA DatasetNASA Dataset

Deviation Info Gain FAE

AUC 0.996 0.967 0.876

Experiment Results

Aug 10, 2011Masud et al. 96

Page 95: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

KDD ResultsKDD ResultsExperiment Results

Aug 10, 2011Masud et al. 97

Page 96: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

KDD ResultsKDD Results

DEMminer O-F

AUC 0.98 0.96

Experiment Results

Aug 10, 2011Masud et al. 98

Page 97: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Result SummaryResult SummaryExperiment Results

Dataset Method ERR Mnew Fnew AUC FP FN

Twitter

DEMminer

Lossy-F

Lossy-L

O-F

4.2 30.5 0.8

32.5 0.0 32.6

1.6 82.0 0.0

3.4 96.7 1.6

0.877

0.834

0.764

0.557

- -

- -

- -

- -

ASRS

DEMminer

DEMminer(info-gain)

O-F

0.02 - -

1.4 - -

3.4 - -

0.996

0.967

0.876

0.00 0.1

0.04 10.3

0.00 24.7

Forest Cover

DEMminer

O-F

3.6 8.4 1.3

5.9 20.6 1.1

0.973

0.743

- -

- -

KDDDEMminer

O-F

1.2 5.9 0.9

4.7 9.6 4.4

0.986

0.967

- -

- -

Aug 10, 2011Masud et al. 99

Page 98: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Result SummaryResult SummaryExperiment Results

Dataset Method ERR Mnew Fnew AUC

Twitter

DEMminer

DEMminer-Ex

OW

4.2 30.5 0.8

1.8 0.7 0.6

3.4 96.7 1.6

0.877

0.944

0.557

Forest Cover

DEMminer

DEMminer-Ex

OW

3.6 8.4 1.3

3.1 4.0 0.68

5.9 20.6 1.1

0.974

0.990

0.743

Aug 10, 2011Masud et al. 100

Page 99: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Running Time ComparisonRunning Time ComparisonExperiment Results

Dataset

Time(sec)1/K Points/sec Speed gain

DEMminer Lossy-F O-F DEMminer Lossy-F O-F DEMminer over O-F

Twitter 23 3.5 66.7 43 289 15 2.9

ASRS 21 4.3 38.5 47 233 26 1.8

Forest Cover 1.0 1.0 4.7 967 1003 212 4.7

KDD 1.2 1.2 3.3 858 812 334 2.5

Aug 10, 2011Masud et al. 101

Page 100: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Multi Novel Detection Multi Novel Detection ResultsResults

Experiment Results

Aug 10, 2011Masud et al. 102

Page 101: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Multi Novel Detection ResultsMulti Novel Detection ResultsExperiment Results

Aug 10, 2011Masud et al. 103

Page 102: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

ConclusionConclusionExperiment Results

Aug 10, 2011Masud et al. 104

•Our data stream classification technique addresses

•Infinite length

•Concept-drift

•Concept-evolution

•Feature-evolution

•Existing approaches only address first two issues

•Applicable to many domains such as

•Intrusion/malware detection

•Text categorization

•Fault detection etc.

Page 103: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

ReferencesReferences J. Gehrke, V. Ganti, R. Ramakrishnan, and W. Loh. : BOAT-Optimistic Decision

Tree Construction. In Proc. SIGMOD, 1999.

P. Domingos and G. Hulten, “Mining high-speed data streams”. In Proc.

SIGKDD, pages 71-80, 2000.

Wenerstrom, B., Giraud-Carrier, C., “Temporal data mining in dynamic feature

spaces”. In: Perner, P. (ed.) ICDM 2006. LNCS (LNAI), vol. 4065, pp. 1141.1145.

Springer, Heidelberg (2006)

E. J. Spinosa, A. P. de Leon F. de Carvalho, and J. Gama. “Cluster-based novel

concept detection in data streams applied to intrusion detection in computer

networks”. In Proc. 2008 ACM symposium on Applied computing, pages 976–

980, (2008).

M. Scholz and R. Klinkenberg. “An ensemble classifier for drifting concepts.” In

Proc. ICML/PKDD Workshop in Knowledge Discovery in Data Streams., 2005.

Aug 10, 2011Masud et al. 105

Page 104: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

References (contd.)References (contd.) Brutlag, J.(2000). “Aberrant behavior detection in time series for network

monitoring.” In: Proc. Usenix Fourteenth System Admin. Conf. LISA XIV, New

Orleans, LA. (Dec 2000)

Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S.: “A geometric framework for

unsupervised anomaly detection: Detection intrusions in unlabeled data.”

Applications of Data Mining in Computer Security, Kluwer (2002).

Fan, W. “Systematic data selection to mine concept-drifting data streams.” In Proc.

KDD 04

Gao, J, Wei Fan, and Jiawei Han. (2007a). "On Appropriate Assumptions to Mine Data

Streams”

Gao, J. Wei Fan, Jiawei Han, Philip S. Yu. (2007b). “A General Framework for Mining

Concept-Drifting Data Streams with Skewed Distributions.” SDM 2007

Goebel, J. and T. Holz. Rishi: “Identify bot contaminated hosts by irc nickname

evaluation. In Usenix/Hotbots” ’07 Workshop, 2007.

Grizzard, J. B., V. Sharma, C. Nunnery, B. B. Kang, and D. Dagon (2007). “Peer-to-

peer botnets: Overview and case study.” In Usenix/Hotbots ’07 Workshop.Aug 10, 2011Masud et al. 106

Page 105: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

References (contd.)References (contd.) Keogh & Pazzani, (2000) E.J., J., P.M.: “Scaling up dynamic time warping

for data mining applications.” In: ACM SIGKDD. (2000)

Lemos, R. (2006): Bot software looks to improve peerage. SecurityFocus.

http://www.securityfocus.com/news/11390 (2006).

Livadas, C., B.Walsh, D. Lapsley, and T. Strayer (2006) “Using machine

learning techniques to identify botnet traffic.” In 2nd IEEE LCN Workshop

on Network Security (WoNS’2006), November 2006.

LURHQ Threat Intelligence Group (2004). Sinit p2p trojan analysis.

http://www.lurhq.com/sinit.html (2004)

Rajab, M. A. J. Zarfoss, F. Monrose, and A. Terzis (2006) “A multifaceted

approach to understanding the botnet phenomenon.” In Proceedings of

the 6th ACM SIGCOMM on Internet Measurement Conference (IMC), 2006.

Kagan Tumar and Joydeep ghosh (1996).“Error correlation and error

reduction in ensemble classifiers” (Connection sciece), 8(3-4):385-403Aug 10, 2011Masud et al. 10

7

Page 106: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

References (contd.)References (contd.) Mohammad Masud, Jing Gao, Latifur Khan, Jiawei Han, and Bhavani

Thuraisingham, “A Multi-Partition Multi-Chunk Ensemble Technique to

Classify Concept-Drifting Data Streams.” In Proc, of 13th Pacific-Asia

Conference on Knowledge Discovery and Data Mining (PAKDD-09), Page:

363-375, Bangkok, Thailand, April 2009.

Mohammad Masud, Jing Gao, Latifur Khan, Jiawei Han, and Bhavani

Thuraisingham, “A Practical Approach to Classify Evolving Data Streams:

Training with Limited Amount of Labeled Data.” In Proc. of  2008 IEEE

International Conference on Data Mining  (ICDM 2008), Pisa, Italy, Page

929-934, December, 2008.

Clay Woolam, Mohammed Masud, and Latifur Khan , “Lacking Labels In

The Stream: Classifying Evolving Stream Data With Few Labels”. In Proc.

of 18th International Symposium on Methodologies for Intelligent

Systems (ISMIS), Page 552-562, September 2009 Prague, Czech Republic

Aug 10, 2011Masud et al. 108

Page 107: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

References (contd.)References (contd.)

Mohammad Masud, Qing Chen, Latifur Khan, Charu Aggarwal, Jing Gao, Jiawei Han, and

Bhavani Thuraisingham,  “Addressing Concept-Evolution in Concept-Drifting Data

Streams”.  In Proc. of 2010 10th IEEE International Conference on Data Mining (ICDM

2010), Sydney, Australia, Dec 2010.

Mohammad M. Masud, Qing Chen, Jing Gao, Latifur Khan, Jiawei Han, Bhavani

Thuraisingham , “Classification and Novel Class Detection of Data Streams in a Dynamic

Feature Space”. In Proc. of European Conference on Machine Learning and Knowledge

Discovery in Databases, ECML PKDD 2010, Barcelona, Spain, September 20- 24, 2010,

Springer 2010, ISBN 978-3-642-15882-7, Page: 337-352.

Mohammad M. Masud, Jing Gao, Latifur Khan, Jiawei Han, and Bhavani Thuraisingham,

“Classification and Novel Class Detection in Data Streams with Active Mining”. In Proc of

14th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 21-24 June,

2010, Page 311-324, - Hyderabad, India.

Aug 10, 2011Masud et al. 109

Page 108: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

References (contd.)References (contd.) Mohammad M. Masud, Jing Gao, Latifur Khan, Jiawei Han, and Bhavani Thuraisingham, “Classification and Novel

Class Detection in Concept-Drifting Data Streams under Time Constraints" , IEEE Transactions on Knowledge &

Data Engineering (TKDE), 2011,  IEEE Computer Society, June 2011, Vol. 23, No. 6, Page  859-874.

Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu, “A Framework for Clustering Evolving Data streams”

Published in Proceedings VLDB ’03 proceedings of the 29th international conference on Very Large Data Bases-

Volume 29

H. Wang, W. Fan, P. S. Yu, and J. Han. “Mining concept-drifting data streams using ensemble classifiers”. In Proc.

ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 226–235,

Washington, DC, USA, Aug, 2003. ACM.

Mohammad M. Masud, Jing Gao, Latifur Khan, Jiawei Han, and Bhavani Thuraisingham. “Integrating Novel Class

Detection with Classification for Concept-Drifting Data Streams”. In Proceedings of 2009 European Conf. on

Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD’09), Bled,

Slovenia, 7-11 Sept, 2009.

Aug 10, 2011Masud et al. 110

Page 109: Data Stream Classification and Novel Class Detection

University of Texas at Dallas

Questions

Masud et al. 111

Aug 10, 2011