CONFIDENTIAL Wei Fan IBM Research Matt Miller, Sal Stolfo Columbia University Wenke Lee Georgia Tech...

22
CONFIDENTIAL Wei Fan IBM Research Matt Miller, Sal Stolfo Columbia University Wenke Lee Georgia Tech Philip Chan Florida Tech December 1, 2001 Using Artificial Anomalies to Detect Known and Unknown Network Intrusions

Transcript of CONFIDENTIAL Wei Fan IBM Research Matt Miller, Sal Stolfo Columbia University Wenke Lee Georgia Tech...

Page 1: CONFIDENTIAL Wei Fan IBM Research Matt Miller, Sal Stolfo Columbia University Wenke Lee Georgia Tech Philip Chan Florida Tech December 1, 2001 Using Artificial.

CONFIDENTIAL

Wei FanIBM Research

Matt Miller, Sal StolfoColumbia University

Wenke LeeGeorgia Tech

Philip ChanFlorida Tech

December 1, 2001

Using Artificial Anomalies to Detect Known and Unknown Network Intrusions

Page 2: CONFIDENTIAL Wei Fan IBM Research Matt Miller, Sal Stolfo Columbia University Wenke Lee Georgia Tech Philip Chan Florida Tech December 1, 2001 Using Artificial.

CONFIDENTIAL

Anomaly Detection and Classification

DifferencesClassification system builds models to detect repeated patterns of known event types.

Anomaly detection tracks inconsistencies deviating from "known" and "expected".

Examples: Intrusion Detection SystemsMisuse detection: detects known intrusion types.Anomaly detection: detects network events different from normal events and known intrusions. Anomalies are likely to be newly launched intrusions.

Training DataClassification: clearly labeled examples.Anomaly Detection: no labeled anomalous data. Otherwise, they won't be anomalies.

Page 3: CONFIDENTIAL Wei Fan IBM Research Matt Miller, Sal Stolfo Columbia University Wenke Lee Georgia Tech Philip Chan Florida Tech December 1, 2001 Using Artificial.

CONFIDENTIAL

Problem

Problem: How to use Inductive Learning for anomaly detection?Wide range of inductive learners available. Comprehensible models.

Solution: Compute artificial anomaly data from classification training data to convert anomaly detection into classification.

All artificial anomalies will be assigned to the label anomaly.For example, use normal and known intrusion data to compute artificial anomalies.

Page 4: CONFIDENTIAL Wei Fan IBM Research Matt Miller, Sal Stolfo Columbia University Wenke Lee Georgia Tech Philip Chan Florida Tech December 1, 2001 Using Artificial.

CONFIDENTIAL

Some Observations on Inductive Learning Algorithms

Only discover boundaries to separate data with different given labels.

Data of unknown types will always be misclassified as one of the given classes.

Example:How to distinguish a bear and a cat?An inductive model might be:

If Weight(x) < 5lb, x is a cat; otherwise, it is a bear.However,

If x is a horse, the model will mistakenly predict it to be a bear.The ideal answer would be "I don't know. It is neither bear nor cat."

Page 5: CONFIDENTIAL Wei Fan IBM Research Matt Miller, Sal Stolfo Columbia University Wenke Lee Georgia Tech Philip Chan Florida Tech December 1, 2001 Using Artificial.

CONFIDENTIAL

Generate artificial anomalies with label name "anomaly" to delineate the boundary between known and unknown.

How to generate artificial anomalies? Compute examples that are close to but different from those with given labels.

Where to place the artificial anomalies? Put more artificial anomalies around infrequent examples or sparse regions in the training data.

Solution Summary

Page 6: CONFIDENTIAL Wei Fan IBM Research Matt Miller, Sal Stolfo Columbia University Wenke Lee Georgia Tech Philip Chan Florida Tech December 1, 2001 Using Artificial.

CONFIDENTIAL

Digging into Dataset

Assuming that the boundary (between known and unknown) is close to the known data.

Randomly change the value of one feature of a given datum while leaving the other features unaltered.

Concentrate on areas in the training data that are "sparse".Sparse regions are characterized by infrequent feature values.Example:

Panda is a very "sparse" bear in the bear class.Panda has white body, black eye shades and black legs.

Generate more artificial anomalies around sparse regions.Something with white body, white eyes shades, black legs and a weight above 200 lb. are anomalies.

Based on the frequency of feature values, we compensate sparse regions by filling in more artificial anomalies.

Page 7: CONFIDENTIAL Wei Fan IBM Research Matt Miller, Sal Stolfo Columbia University Wenke Lee Georgia Tech Philip Chan Florida Tech December 1, 2001 Using Artificial.

CONFIDENTIAL

Overall Effect

Sparse regions will be focused and very specific rules will be generated to cover these regions.

For example,IF white body, black eye shades, black legs and weight > 200lb, THEN it is (panda) bear

ELSE try other rulesELSE IF none of the rules are satisfiedTHEN predict "We don't know what it is based on our limited knowledge"

Without the artificial anomalies, the animal with "white body, white eye shades and black legs" will be misclassified as a bear at its best.

Page 8: CONFIDENTIAL Wei Fan IBM Research Matt Miller, Sal Stolfo Columbia University Wenke Lee Georgia Tech Philip Chan Florida Tech December 1, 2001 Using Artificial.

CONFIDENTIAL

Distribution-based Artificial Anomaly Generation Algorithm

Iterate through every feature value..fmax is the most frequenct feature value of feature F.count(fmax) is its frequency count.fi is another feature value of the same feature F.count(fi) is its frequency count.countdiff = count(fmax) - count(fi).generate countdiff number of artificial anomalies for feature value fi.

For a data whose feature F has value fi.Change its value fi to any value that is not fi while leaving the other all other features unchanged.

Change its label to "anomaly".

Page 9: CONFIDENTIAL Wei Fan IBM Research Matt Miller, Sal Stolfo Columbia University Wenke Lee Georgia Tech Philip Chan Florida Tech December 1, 2001 Using Artificial.

CONFIDENTIAL

Application of Artificial Anomaly

Pure Anomaly Detection:Training data have only one class, such as normal.Detect any data that are different from the given single class.Artificial anomalies are computed from this single class.

Combined Misuse and Anomaly Detection:Classification and anomaly are performed in the same time.For example, detects Bear, Cat and non-bear and non-cat in the same time

Efficient since both classification and anomaly detection are done in the same time.

One single module.Efficient Model Deployment.

Page 10: CONFIDENTIAL Wei Fan IBM Research Matt Miller, Sal Stolfo Columbia University Wenke Lee Georgia Tech Philip Chan Florida Tech December 1, 2001 Using Artificial.

CONFIDENTIAL

Experiment on Pure Anomaly Detection

Measurements:False alarm rate: predicted anomalies that are actually normal.Detection rate: true anomalies correctly detected.

Original Dataset:1998 DARPA Intrusion Detection Evaluation Dataset (also 1999 KDDCUP dataset).

Original dataset contains both normal data and intrusion data.There are 4 basic types of intrusions. Each type has a few subclasses.

U2R: User to RootR2L: Remote to LocalDOS: Denial of ServicesPRB: Probing

Page 11: CONFIDENTIAL Wei Fan IBM Research Matt Miller, Sal Stolfo Columbia University Wenke Lee Georgia Tech Philip Chan Florida Tech December 1, 2001 Using Artificial.

CONFIDENTIAL

Intrusions and Categories

Page 12: CONFIDENTIAL Wei Fan IBM Research Matt Miller, Sal Stolfo Columbia University Wenke Lee Georgia Tech Philip Chan Florida Tech December 1, 2001 Using Artificial.

CONFIDENTIAL

Experiment Setup

Training Set: normal data and artificial anomalies computed from normal data. No intrusions are included.

Test Set: both normal and all intrusion data.Goal: can we detect all intrusion data as "anomalies" without having them in the training data?

Learner:RIPPER: inductive tree learner.

Page 13: CONFIDENTIAL Wei Fan IBM Research Matt Miller, Sal Stolfo Columbia University Wenke Lee Georgia Tech Philip Chan Florida Tech December 1, 2001 Using Artificial.

CONFIDENTIAL

Pure Anomaly Result

False Alarm Rate: 2%Anomaly Detection Rate:

Page 14: CONFIDENTIAL Wei Fan IBM Research Matt Miller, Sal Stolfo Columbia University Wenke Lee Georgia Tech Philip Chan Florida Tech December 1, 2001 Using Artificial.

CONFIDENTIAL

Experiment on Combined Misuse and Anomaly Detection

One single module that detects both known intrusions and unknown events that are neither normal nor intrusions.

Group different types of intrusions into 13 clusters.Similar intrusions are grouped together. Knowledge of intrusions of one cluster may not help detect intrusions of another cluster.

Page 15: CONFIDENTIAL Wei Fan IBM Research Matt Miller, Sal Stolfo Columbia University Wenke Lee Georgia Tech Philip Chan Florida Tech December 1, 2001 Using Artificial.

CONFIDENTIAL

Experiment Setup

Training Set:normal data plus a few clusters of intrusions PLUS artificial anomalies.

Test Set:all data: normal and all intrusions

Goal:Can we detect unseen intrusions (excluded intrusion clusters) as anomalies?

Do we have to compromise performance to detect known intrusions (included intrusion clusters)?

There are 13! ways to choose combinations of training and test sets.

We use 3 unique sequences to introduce intrusion clusters.Add one cluster at a time.Training: normal + Cluster 1 to iTesting: normal and all types of intrusions.

Page 16: CONFIDENTIAL Wei Fan IBM Research Matt Miller, Sal Stolfo Columbia University Wenke Lee Georgia Tech Philip Chan Florida Tech December 1, 2001 Using Artificial.

CONFIDENTIAL

Measurements

True Class Detection RateAnomaly Detection Rate

Page 17: CONFIDENTIAL Wei Fan IBM Research Matt Miller, Sal Stolfo Columbia University Wenke Lee Georgia Tech Philip Chan Florida Tech December 1, 2001 Using Artificial.

CONFIDENTIAL

True Class Detection RateIntrusion i correctly detected as intrusion i.

Page 18: CONFIDENTIAL Wei Fan IBM Research Matt Miller, Sal Stolfo Columbia University Wenke Lee Georgia Tech Philip Chan Florida Tech December 1, 2001 Using Artificial.

CONFIDENTIAL

Anomaly Detection RatePercentage of anomaly or unknown intrusions correctly detected as anomaly.

Page 19: CONFIDENTIAL Wei Fan IBM Research Matt Miller, Sal Stolfo Columbia University Wenke Lee Georgia Tech Philip Chan Florida Tech December 1, 2001 Using Artificial.

CONFIDENTIAL

Efficient Model Deployment

(summary)

Efficient learning and deployment of models to detect new attacks.

When data about new attacks are collected, we do not want to retrain the model for all intrusions from scratch.

Instead, we only train a lightweight model to detect the new attacks.

Using artificial anomaly, the older model can detect anomalies.The new lightweight model and older model are combined together to detect both new and existing attacks.

When an event is detected as anomaly, it will be sent to the new light weight model to check if it is the new attack or just anomaly.

Experiments show that accuracy remains unchanged, but efficiency is 150 times faster.

Page 20: CONFIDENTIAL Wei Fan IBM Research Matt Miller, Sal Stolfo Columbia University Wenke Lee Georgia Tech Philip Chan Florida Tech December 1, 2001 Using Artificial.

CONFIDENTIAL

Many Related Work

(incomplete list)Anomaly Detection:

SRI's IDES use probability distribution of past activities to measure abnormality of host events. We measure network events.

Forrest et al uses absence of subsequence to measure abnormality.Lane and Brodley employ a similar approach but use incremental learning approach to update stored sequence from UNIX shell commands.

Ghosh and Schwarzbard use neural network to learn profile of normality and distance function to detect abnormality.

Generating Artificial Data:Nigam et al assign label to unlabelled data using classifier trained from labeled data.

Chang and Lippman applied voice transformation techniques to add artificial training talkers to increase variability.

Page 21: CONFIDENTIAL Wei Fan IBM Research Matt Miller, Sal Stolfo Columbia University Wenke Lee Georgia Tech Philip Chan Florida Tech December 1, 2001 Using Artificial.

CONFIDENTIAL

Proposed a feature value distribution-based artificial anomaly generation algorithm.

Applied this algorithm for both pure anomaly and combined misuse and anomaly detection for intrusion detection.

It remains to be seen if the same approach works for other domains,

Summary and Future Work

Page 22: CONFIDENTIAL Wei Fan IBM Research Matt Miller, Sal Stolfo Columbia University Wenke Lee Georgia Tech Philip Chan Florida Tech December 1, 2001 Using Artificial.

CONFIDENTIAL

Distribution Based Artificial Anomaly