Benchmarking Anomaly-based Detection Systems
description
Transcript of Benchmarking Anomaly-based Detection Systems
Benchmarking Anomaly-based Detection Systems
Ashish GuptaNetwork Security
May 2004
Overview
• The Motivation for this paper– Waldo example
• The approach• Structure in data• Generating the data and anomalies• Injecting anomalies• Results
– Training and Testing: the method– Scoring– Presentation– The ROC curves: somewhat obvious
MotivationDoes anomaly detection depend on
regularity/randomness of data ?
Where’s Waldo !
Where’s Waldo !
Where’s Waldo !
The aim
• Hypothesis:– Differences in data regularity affect anomaly
detection– Different environments different regularity
• Regularity– Highly redundant or random ?– Example of environment’s affect
010101010101010101010101Or
0100011000101000100100101
Consequences
One IDS : Different False Alarm Rates
Need custom system/training for each environment ?
Temporal affects: Regularity may vary over time ?
Structure in dataMeasuring randomness
010101010101010101010101Or
0100011000101000100100101
Measuring Randomness
Relative Entropy Sequential Dependence+
Conditional Relative Entropy
The benchmark datasets
• Three types:– Training data ( the background data)– Anomalies– Testing data ( background + anomalies )
• Generating the sequences– 5 sets, each set 11 files ( for increasing
regularity)– Each set different alphabet size– Alphabet size decides complexity
Anomaly Generation
• What’s a surprise ? – Different from the expected probability
• Types:– Juxta-positional : different arrangements of data
• 001001001001001001111– Temporal
• Unexpected periodicities– Other types ?
Types in this paper
• Foreign symbol– AAABABBBABABCBBABABBA
• Foreign n-gram
– AAABABAABAABAAABBBBA• Rare n-gram
– AABBBABBBABBBABBBABBBABBAA
• Injecting anomalies– Make sure not more than 0.24 %
The experiments
The Hypothesis is true
• The hypothesis:– Nature of “normal” background noise affects
signal detection• The anomaly detector
– To detect anomalous subsequences– Learning phase n-gram probability table– Unexpected event anomaly !– Anomaly threshold decides level of surprise
• Example of anomaly detectionAAA 0.12
AAB 0.13
ABA 0.20
BAA 0.17
BBB 0.15
BBA 0.12
AAC ANOMALY !
Scoring
• Event outcomes– Hits– Misses– False alarms
• Threshold– Decides level of surprise– 0 completely unsurprising, 1 astonishing– Need to calibrate
Presentation of results
• Presents two aspects:– % correct detections– % false detections
• Detector operates through a range of sensitivities– Higher sensitivity ? – Need the right sensitivity
Interpretation
• Nothing overlaps regularity affects detection !
• What does this mean ?• Detection metrics are data dependent• Cannot say:
– My XYZ product will flag down 75% percent anomalies with 10% false hit rate !
– Sir, are you sure ?
Real world data
• Regularity index for system calls for different users
• Is this surprising ?• What about network traffic ?
Conclusions
Data Structure Anomaly Detection Effectiveness
Evaluation is data dependent
Conclusions
Change in regularityDifferent system
Or
Change the parameters
Quirks ?
• Assumes rather naïve detection systems– “Simple retraining will not suffice”
• An intelligent detection can take this into account.
• What is really an anomaly ? – If data is highly irregular, won’t randomness
produce some anomalies by itself• Anomaly is a relative term
– Here anomalies are generated independently