Information-Theoretic Measures for Anomaly Detection

30
Information-Theoretic Measures for Anomaly Detection Wenke Lee, and Dong Xiang (North Carolina Stat e University) IEEE Security and Privacy, 2001 Speaker: Chang Huan Wu 2009/4/14

description

Information-Theoretic Measures for Anomaly Detection. Wenke Lee, and Dong Xiang (North Carolina State University). IEEE Security and Privacy, 2001. Speaker: Chang Huan Wu 2009/4/14. Outline. Introduction Information-Theoretic Measures Case Studies Conclusions. Introduction (1/2). - PowerPoint PPT Presentation

Transcript of Information-Theoretic Measures for Anomaly Detection

Page 1: Information-Theoretic Measures for Anomaly Detection

Information-Theoretic Measures for Anomaly

Detection

Wenke Lee, and Dong Xiang(North Carolina State University)

IEEE Security and Privacy, 2001Speaker: Chang Huan Wu

2009/4/14

Page 2: Information-Theoretic Measures for Anomaly Detection

2

Outline

IntroductionInformation-Theoretic MeasuresCase StudiesConclusions

Page 3: Information-Theoretic Measures for Anomaly Detection

3

Introduction (1/2)

Misuse detection– Use the “signatures” of known attacks

Anomaly detection– Use established normal profiles

The basic premise for anomaly detection: There is regularity in audit data that is consistent with the normal behavior and thus distinct from the abnormal behavior

Page 4: Information-Theoretic Measures for Anomaly Detection

4

Introduction (2/2)

Most anomaly detection models are built based solely on “expert” knowledge or intuition

Provide theoretical foundations as well as useful tools that can facilitate the IDS development process and improve the effectiveness of ID technologies

Page 5: Information-Theoretic Measures for Anomaly Detection

5

Information-Theoretic Measures (1/7)

Entropy

Use entropy as a measure of the regularity of audit data

Page 6: Information-Theoretic Measures for Anomaly Detection

6

Information-Theoretic Measures (2/7) Conditional Entropy

Let X be a collection of sequences where each is (e1, e2, …, en-1, en), each ei is an audit event; let Y be the collection of subsequences where each is (e1, e2, …, ek), and k < n

H(X | Y) tell us how much uncertainty remains for the rest of audit events in a sequence x after we have seen y

Page 7: Information-Theoretic Measures for Anomaly Detection

7

Information-Theoretic Measures (3/7)

Relative Entropy

Relative entropy measures the distance of the regularities between two datasets– Training dataset and testing dataset

Page 8: Information-Theoretic Measures for Anomaly Detection

8

Information-Theoretic Measures (4/7)

When we use conditional entropy to measure the regularity of sequential dependencies, we can use relative conditional entropy to measure the distance between two audit datasets

Page 9: Information-Theoretic Measures for Anomaly Detection

9

Information-Theoretic Measures (5/7)

Intrusion detection can be cast as a classification problem

When constructing a classifier, a classification algorithm needs to search for features with high information gain– When the dataset is partitioned according

to this feature values, the subsets will have lower entropy

Page 10: Information-Theoretic Measures for Anomaly Detection

10

Information-Theoretic Measures (6/7)

Information Gain

Page 11: Information-Theoretic Measures for Anomaly Detection

11

H(X)=-((4/16)*log2(4/16)+(12/16)*log2(12/16))=0.8113E( 年齡 )=(6/16)*H(<35)+(10/16)*H(>35)=0.7946

Gain( 年齡 )=H(X)-E( 年齡 )=0.0167

Gain( 年齡 )=0.0167 Gain( 性別 )=0.0972 Gain( 家庭所得 )=0.0177

Information Gain

Page 12: Information-Theoretic Measures for Anomaly Detection

12

Information-Theoretic Measures (7/7)

Intuitively, the more information we have, the better the detection performance– There is always a cost for any gain

We can define information cost as the average time for processing an audit record and checking against the detection model

Page 13: Information-Theoretic Measures for Anomaly Detection

13

UNM sendmail System Call Data (1/6)

University of New Mexico (UNM) sendmail system call data

Each trace contains the consecutive system calls made by the run-time processes

Used the first 80% traces as the training data and the last 20%as part of the testing data

Page 14: Information-Theoretic Measures for Anomaly Detection

14

UNM sendmail System Call Data (2/6)

H(length-n sequences | subsequences of the length n-1) Measures the regularity of how the first n-1 system calls

determines the n-th system call

=> Conditional entropy drops as sequence length increases

Page 15: Information-Theoretic Measures for Anomaly Detection

15

UNM sendmail System Call Data (3/6)

For normal data, the trend of misclassification rate coincides with the trend of conditional entropy

Page 16: Information-Theoretic Measures for Anomaly Detection

16

UNM sendmail System Call Data (4/6)

Misclassification rates for the intrusion traces are much higher This suggests that we can use the range of the

misclassification rate as the indicator of whether a given trace is normal or abnormal (intrusion)

Page 17: Information-Theoretic Measures for Anomaly Detection

17

UNM sendmail System Call Data (5/6)

When the training and testing normal datasets differs more, then the misclassification rate on testing normal data is also higher

Page 18: Information-Theoretic Measures for Anomaly Detection

18

UNM sendmail System Call Data (6/6)

The cost is a linear function of the sequence length Length ↑, accuracy ↑ but cost also↑

Page 19: Information-Theoretic Measures for Anomaly Detection

19

MIT Lincoln Lab sendmail BSM Data (1/6)

BSM data developed and distributed by MIT Lincoln Lab for the 1999 DARPA evaluation

Each audit record corresponds to a system call made by sendmail– Contains additional information (Ex. u

ser and group IDs, the obj name)

Page 20: Information-Theoretic Measures for Anomaly Detection

20

MIT Lincoln Lab sendmail BSM Data (2/6)

UNM data : (s1, s2, … , sl)BSM data

– so : (s1_o1, s2_o2, … , sl_ol)

– s-o : (s1, o1, s2, o2, … , sl, ol)

– s: system call , o: obj name (system or user or other)

Page 21: Information-Theoretic Measures for Anomaly Detection

21

MIT Lincoln Lab sendmail BSM Data (3/6)

Conditional entropy drops as sequence length increases

Page 22: Information-Theoretic Measures for Anomaly Detection

22

MIT Lincoln Lab sendmail BSM Data (4/6)

For in-bound mails the testing data have clearly higher misclassification rates than the training data

Page 23: Information-Theoretic Measures for Anomaly Detection

23

MIT Lincoln Lab sendmail BSM Data (5/6)

Out-bound mails have much smaller relative conditional entropy than in-bound mails

Page 24: Information-Theoretic Measures for Anomaly Detection

24

MIT Lincoln Lab sendmail BSM Data (6/6)

Though the performance with obj name is slightly better, if we consider cost, it is actually better to use system call name only

Page 25: Information-Theoretic Measures for Anomaly Detection

25

MIT Lincoln Lab Network Data (1/4)

tcpdump data developed and distributed by MIT Lincoln Lab for the 1998 DARPA evaluation

Each record describes a connection using the following features: timestamp, duration, source port, source host, service…

Page 26: Information-Theoretic Measures for Anomaly Detection

26

MIT Lincoln Lab Network Data (2/4)

Destination host was used for partitioning the data into per-host subsets

Page 27: Information-Theoretic Measures for Anomaly Detection

27

MIT Lincoln Lab Network Data (3/4)

We can see from the figure that intrusion datasets have much higher misclassification rates

Models from the (more) partitioned datasets have much better performance

Page 28: Information-Theoretic Measures for Anomaly Detection

28

MIT Lincoln Lab Network Data (4/4)

Conditional entropy decrease as window size grows

Page 29: Information-Theoretic Measures for Anomaly Detection

29

Conclusion

Proposed to use some information-theoretic measures for anomaly detection

Page 30: Information-Theoretic Measures for Anomaly Detection

30

Comments

Provide theoretical foundations, use numbers to tell the result

Plentiful experiment result