[IEEE 2008 IEEE International Performance Computing and Communications Conference (IPCCC) - Austin,...
Transcript of [IEEE 2008 IEEE International Performance Computing and Communications Conference (IPCCC) - Austin,...
Noise-Resistant Payload Anomaly Detection for
Network Intrusion Detection Systems
Sun-il Kim
Department of Computer Science
Information Technology and Systems Center
University of Alabama in Huntsville
Email: [email protected]
Nnamdi Nwanze
Department of Electrical and Computer Engineering
State University of New York at Binghamton
Email: [email protected]
Abstract—Anomaly-based intrusion detection systems are anessential part of a global security solution and effectively comple-ment signature-based detection schemes. Its strength in detectingpreviously unknown and never seen attacks make it attractive,but it is more prone to higher false positives. In this paper,we present a simple payload based intrusion detection schemethat is resilient to contaminated traffic that may unintentionallybe used during training. Our results show that, by adjusting thetwo tuning parameters used in our approach, the ability to detectattacks while maintaining low false positives is not hindered, evenwhen 10% of the training traffic consists of attacks. Test resultsalso show that our approach is not sensitive to changes in theparameters, and a wide range of values can be used to yieldhigh per-packet detection rates (over 99.5%) while keeping falsepositives low (below 0.3%).
I. INTRODUCTION
In the cat-and-mouse game of computer network security,
purveyors of malicious content always seek to be one step
ahead of security providers. As a result of the great strides
made in the field of computer security, there has been an
increase in the sophistication of exploits as well as in the
number of computer based attacks [1]. The prevalent use of
computers that permeates almost every aspect of daily life
necessitates that computer systems be protected to ensure
safe and ongoing/continuous/uninterruptible computer usage.
Current security offerings include virus checkers, firewalls
and intrusion detection systems. While virus checkers are a
useful tool in abating infections, they tend to be designed
to detect exploits that have already entered a host system.
Traditional firewalls have the capability to prevent unwanted
network traffic from reaching a host by blocking access to
open ports. However, malicious users still have access to open
ports. Intrusion detection systems are a solution that have
shown promise in being able to provide additional protection
from a bevy of attacks.
The operation of Intrusion Detection Systems (IDS) can
be placed under two main classifications: signature-based and
anomaly-based systems. Signature-based or misuse systems,
such as Snort [2] and EMERALD [3], operate by patterning
This work was made possible with the support from the Information TrustInstitute of the University of Illinois at Urbana-Champaign and the Hewlett-Packard Company through its Adaptive Enterprise Grid Program. The contentof the information does not necessarily reflect the position or the policy ofthese organizations.
misuses of the system and alerting any activity that matches
attack signatures included in a database. Anomaly-based IDS
operate by patterning the normal and alerting activity that
deviates from the normal model. Both modes of operation have
their advantages and shortcomings. Signature-based systems,
in general, tend to be simpler to operate, and have higher de-
tection accuracy. However, they are prone to miss attacks that
do not appear within their database of signatures. In addition,
they can be circumvented through simple variations made on
attacks that may be included within their signatures database.
On the other hand, anomaly-based systems have the ability
to detect new attacks and variations of known attacks since
they pattern normal operation rather than the attack pattern.
However, the implementation and operational complexities of
anomaly-based systems often detract from the feasibility of
such systems and simplifying system complexities can often
result in reduced system accuracy. Anomaly-based systems
are also more prone to false positives than signature-based
implementations.
In order to take advantage of the benefits of anomaly-
based detection, a number of research efforts ([4], [5], [6], [7],
[8], [9], [10]) have proposed various approaches to intrusion
detection. While a majority of these approaches rely on
connection information (such as source and destination IP
addresses, source and destination ports, TCP flags, etc.) or
flow statistics only a few have considered using full packet
payload bytes as a feature for intrusion detection.
Although, when contending with packet payloads IDS are
plagued by packet size and high-dimensionality issues, there
are advantages to using packet payloads as features for intru-
sion detection.
The work described in this paper presents a novel approach
to anomaly-based network intrusion detection. The payload-
based approach presented is stateless and uses simple statisti-
cal spread analysis (only needed during training) to differen-
tiate normal network traffic from anomalous and potentially
intrusive traffic. Since the approach is stateless, it is resistant
to evasion techniques that attempt to gain access (in fail-open
implementations) or elicit a Denial of Service (in fail-close
implementations) by overwhelming the IDS with an abun-
dance of network traffic. In addition, because the approach is
stateless, detection decisions are made on a per packet basis
517978-1-4244-3367-4/08/$25.00 ©2008 IEEE
with the goal of reaching a swift and accurate decision as the
packet traverses through the IDS. The approach is designed
to work on a per service basis (i.e. http, ftp, smtp, etc) and
therefore includes tunable parameters that allow it to adapt
to different networks and traffic types. In experiments with
collected network data and the 1999 DARPA data sets we show
that the approach is able to achieve 100% attack detection with
low false positive rates. The system is also designed to operate
separately on inbound and outbound traffic. Advantages of
this configuration include 1) Faster operation - Working on
separate traffic flows puts less of a burden on detection systems
and 2) More accurate detection models - nuances, subtle or not,
between inbound and outbound traffic flows can be captured
during training and used to more accurately detect and separate
insider and outsider attacks.
We also demonstrate the approach’s resistance to "noisy"
training data by using a poisoned training dataset to train
the system and detecting attacks. Although, the approach can
accommodate the use of full byte packet histograms with
minimal processing, we show that comparable results can be
achieved by using partial packet histograms that capture data
pertinent to describing features of network packet payloads.
We envision this solution working in tandem with a signature-
based system, as part of a complete security solution as we
believe that a layered approach to security is the best approach.
The paper is organized as follows: We first discuss related
works in intrusion detection in Section II. Section III provides
a detailed description of the approach covering the training
and detection process. In Section IV, applicable dimensionality
reduction techniques are discussed. Further discussion of the
approach including evaluation results and testing with contam-
inated data is covered in Section V. Section VI concludes this
paper with a summary and discussion of future work.
II. BACKGROUND
As mentioned earlier, there are IDS research works that
use packet header information and flow statistics as features
in detecting attacks. The authors of [11] use source and
destination ports and IP addresses, protocol type and packet
length to form a 12 point description vector to describe traffic.
The work presented in [12] uses a 125 coordinate made up of
protocol type, flags and service attributes system to describe
connections. The approach described in [13], NATE, solely
uses packet header information as features in building its
detection model. The work discussed in [6], NETAD, is a
packet-level approach that uses the first 48 bytes of every
network packet as a feature vector, including at most 8 bytes of
the packet’s payload. Due to the lack of payload information
used as features, neither of these approaches sufficiently
characterize the payload.
The authors of [5], on the other hand, have developed an
approach that uses byte frequency distributions of packet pay-
loads. The distribution is arranged in order of frequency and
grouped into six coarsely defined ranges. The work described
in [14], PAYL, uses the full byte frequency distributions (256-
bin histograms) over different connection window sizes and
use a simplified mahalanobis distance measure to separate
normal and intrusive traffic. [15] also incorporate the use of
packet payloads in their approach, Anagram. The approach
bases its detection on high order (n>1) n-gram analysis, taking
advantage of anomalous n-grams that are inherent to common
attacks and advanced attacks that use mimicry to alter their
byte frequency distribution in an effort to appear normal. The
work described in [10] also makes use of packet payload
bytes and incorporates binning and bit-pattern hash functions
to create models of normal packet payloads. Compared to the
approaches described in [10] and [15], the approach described
in this paper is more resistant to the effects of noisy training
data because the overall performance depends on statistical
average and standard deviation rather than potential occur-
rences of anomalous packets. In addition, by being able to
analyze full and partial byte histograms, the approach is able to
reach a middle ground between the approaches described in [5]
and [14], achieving a balance between the need for providing
generality in describing packet payloads and reducing the size
of the feature space.
III. ANOMALY DETECTION USING SIMPLE STATISTICAL
SPREAD
In this section, we describe our approach to detecting
anomalous packets based on statistical spread of the frequency
of byte values. We first start with a few definitions (Our
approach along with the definitions are summarized in Algo-
rithm 1). In a packet, there are 256 byte (or character) values.
We use the term bin to refer to each byte value. The 256
bins are simply ordered according to their byte value. Not all
256 bins are necessarily needed for our detection approach.
Therefore, we define a set B, which is the set of bins that are
actually used. bki is the number of characters that fall into the
ith bin in packet k. ui is the overall average count (expected
value) for the ith bin obtained from the training data consisting
of normal/sanitized traffic. Likewise, obtained from the same
data set, σi is the standard deviation for the ith bin.
B - set of bins used for training and detectionbk
i - frequency count for ith bin in packet kui - average count for ith bin (obtained from training)σi - standard deviation for ith bin (obtained from training)Ψu - average score (obtained from training)Ψσ - standard deviation of the scores (obtained from training)ω - global tuning parameterτ = Ψu + (ωΨσ)α - per-bin tuning parametermini = ui − (ασi)maxi = ui + (ασi)scorek = |S|, S ⊆ B, i ∈ S, bk
i > maxi or bk
i < mini.For each packet k:
1: scorek ← 02: for all i ∈ B do3: if (bk
i > maxi) or (bk
i < mini) then4: scorek ← scorek + 15: if (scorek > τ) then6: k is an anomalous packet
Algorithm 1: Simple detection algorithm.
518
-5
0
5
10
15
20
25
30
35
40
0 50 100 150 200 250
co
un
t
bin ID
1 standard deviation around per-bin average
-5
0
5
10
15
20
25
30
35
0 50 100 150 200 250
co
un
t
bin ID
1 standard deviation around per-bin average
Fig. 1: Tolerance range for each bin with α=1 u ± σ.
A. Detection Approach
To determine the characteristics of anomalous packets, we
define a simple metric based on how much each bin in a
packet, bki , deviates from the norm, ui. For each packet k, we
compute a score (measure the degree of anomaly), scorek,
which counts the number of bins that fall outside the selected
range (to be described shortly).
Our approach is split into two stages—training and de-
tection. During training, we compute the average score, Ψu,
for all packets in the training data set. We also compute the
standard deviation of the scores, Ψσ . A global threshold, τ ,
is then computed based on the Ψu and the Ψσ , as well as
a global tuning parameter, ω. ω is chosen by the system
manager depending on the characteristics of the system’s
traffic (described in a later section), and is the multiplier
which is used to adjust the threshold. In other words, it
determines how many standard deviations (not necessarily a
integer value) away from the average is considered “safe”.
Therefore, τ = Ψu + (ωΨσ).
Once the τ is obtained, it can be used in the detection stage
as follows. scorek is computed by counting the number of
bins that fall outside the tolerance range defined for each bin.
mini is the lower bound of the tolerance range and maxi is
the upper bound for the ith bin. Both mini and maxi are
set by using a tuning parameter, α. α determines how many
0
0.2
0.4
0.6
0.8
1
1.2
1.4
2 2.5 3 3.5 4
fals
e p
ositiv
e (
%)
ω
Tuning 256 Bins
α0.10.20.30.40.51.01.52.0
Fig. 2: False positive rates for various α and ω values (WATSON).
99
99.1
99.2
99.3
99.4
99.5
99.6
99.7
99.8
99.9
0 0.2 0.4 0.6 0.8 1 1.2 1.4
pe
r-p
acke
t d
ete
ctio
n (
%)
false positive (%)
|B|=256
α
0.10.20.30.40.51.01.52.0
Fig. 3: Per-packet detection vs. false positives using all 256 bins. Note thatall attacks are detected (WATSON).
standard deviations above (for max) or below (for min) the
average the system allows for traffic defined to be normal.
Therefore, mini = ui−(ασi) and maxi = ui+(ασi). scorek
is then the number of bins that fall outside this range. That
is, scorek = |S|, S ⊆ B, i ∈ S, bki > maxi or bk
i < mini. If
scorek exceeds τ , the packet is considered anomalous.
We experimented with utilizing weighted scores (each
packet’s score is determined by not only how many bins violate
the normal range, but also by how much). It was worse than
making a binary decision for each bin (abnormal vs. normal)
and summing up the number of violations to obtain the score
as described above.
B. Test Data Sets and Attacks
We utilize both the DARPA data set and traffic collected at
the State University of New York (termed WATSON from here
on). The DARPA data set consists of attack-free and attack-
laden data available for use. Being that traffic collected in
the wild is likely to contain known and unknown attacks, the
WATSON traffic was sanitized using methods consistent with
other related works where Snort, a widely used signature based
detection tool, was used to remove any known attacks from
the data. Given the sanitized data sets, we computed the u and
519
σ for all bins. Figure 1 shows the tolerance range (mini <
x < maxi) for each bin where α is 1 (arbitrarily chosen
for illustration purposes only). About 65,000 packets are used
for training and 15,000 packets are used for testing from the
DARPA set. The WATSON data set was created by collecting
approximately 2 weeks of traffic. 72 hours of traffic is used
for training and about 24,000 packets are used for testing. We
use 19 attacks included in DARPA as well as 4 attacks that
were not known at the time the data set was created (webDAV,
Nimda, DoS and CodeRed).
C. Training and Parameter Selection
Using simple methods of system tuning consistent with
other works, we empirically select the operating points (using
ROC curves [14]) as well as the two tunable parameters—the
global tolerance tuning parameter, ω and the per-bin tolerance
tuning parameter, α. In order for a system to be effective,
system parameters such as the per-bin tolerance must not be
too sensitive. We tested a wide range of values and the results
show that our detection algorithm is capable of accepting a
wide range of values for these parameters.
First, a value for α is chosen in order to compute the score
for each packet in the sanitized data as well as the attack
data. We then compute the Ψu and the Ψσ using the training
data set. Using a wide range of values for ω, we test varying
values of the final threshold, τ , against the normal traffic in the
sanitized, test data set (to obtain the false positive rate). Then,
an acceptable false positive rate, for example, 1.5%, is chosen.
Parameter values that force the system over this number are
then discarded. Figure 2 illustrates a false positive rates for
various values of α and ω. We test remaining values against
the known attack traffic (to obtain the per-packet detection
rate). A ROC curve is then used to compare the per-packet
detection rate to the false positive rate. Figure 3 shows many
valid operating points, with the group clustered around the top
left corner representing the best combination of true positives
(TP) and false positives (FP).
IV. REDUCING THE NUMBER OF BINS
Depending on the typical traffic pattern for various services
and systems, some bins may not be as useful as others. In
some cases, some bins may even be ignored without affecting
the performance of the detection scheme. In this section, we
introduce a technique based on applying Principle Component
Analysis (PCA) to determine the significance of each bin.
The results are then used in our evaluation to compare the
effectiveness of using limited number of bins to all 256 bins.
Performing Principal Component Analysis on a set of
data takes a few simple steps. Consider some data formed
into a matrix, X. The matrix consists of n observation (or
measurement) vectors x1,x2, ...,xn where each vector has m
dimensions. The first step in the process involves getting a zero
mean or "centered" version of the data. This entails calculating
the mean, u, across all dimensions of the data where
u =1
N
N∑
n=1
xn (1)
Upon achieving a zero-mean version of the data, Xzm, the
next step in the PCA process is to calculate the covariance ma-
trix, C, of the resulting centered data matrix. The expression
for C is
C ≡1
n − 1XzmX
T
zm(2)
The next stage in the process entails computing eigenvectors
ei, and corresponding eigenvalues λi, of the covariance matrix,
for i = 1, 2, ...,m. The eigenvalues in a diagonal matrix D,
sorted by descending value and the corresponding eigenvectors
in a matrix, V, provide the principal components ranked in
order of contribution. The expression for the diagonal matrix,
D, is given below.
Dj,k =
{
λj if j = k and λj > λj+1
0 if j 6= k(3)
Extracting pertinent features from network packets using
PCA entails generating 256-byte packet histograms from a
learning dataset and performing PCA on the dataset. The
histograms from the dataset make up 256-dimensional vectors
that are used to form an n × m matrix, where n is equal to
the number of packets used for learning and m = 256. Uponperforming PCA, the principal components that best represent
the data can be used for feature extraction. There exists in
the literature ([16], [17],[18]) works that detail graphical and
mathematical methods for selecting the principal components
that should be retained for analysis. One of the most prevalent
methods mentioned in the literature is the scree test. The scree
test is a graphical method where sorted eigenvalues are plotted.
Principal components associated with points to the left of
where the drop off rate of is gradual are retained and the rest
are discarded as they contribute less to the representation of
the data. The number of principal components that are retained
can be represented by the following equations
NPC = δ(1) (4)
δ(n) =
1 + δ(n + 1) if n ≤ (rank(D) − 2) and
Cn > Cn+1 + Cn+2
0 otherwise
(5)
The number of principal components that are retained is
represented by NPC . The rank function returns the number
of linearly dependent rows and columns of D and Cn is the
variance contribution (in percent) of the nth eigenvalue along
the diagonal of D.
The final step in the feature extraction process deals with
examining the coefficients of the retained eigenvectors or the
principal component loadings. Since the absolute values of
component loadings are suggestive of their contribution to
their respective bins, the features that better describe the nature
520
α ω TP FP
0.1 2.0∼2.2 0.9956 0.0119
0.2 2.2∼2.3 0.9972 0.0100
0.3 2.8∼2.9 0.9968 0.0029
0.4 3.0∼3.1 0.9851 0.0013
0.5 3.3 0.9847 0.0018
0.6 3.1 0.9956 0.0045
0.7 3.5 0.9827 0.0038
0.8 3.3 0.9919 0.0065
0.9 3.4 0.9904 0.0064
1.0 3.5 0.9883 0.0066
1.1 3.4 0.9924 0.0075
1.2 3.7 0.9928 0.0078
1.3 3.8 0.9932 0.0076
1.4 4.0 0.9924 0.0077
1.5 3.7 0.9932 0.0097
1.6 3.8 0.9924 0.0086
1.7 4.0 0.9908 0.0082
1.8 4.0∼4.1 0.9912 0.0105
1.9 4.4 0.9904 0.0082
2.0 4.2 0.9919 0.0089
Fig. 4: Selected operating points for various α’s.
0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
2.4 2.6 2.8 3 3.2 3.4
fals
e p
ositiv
e r
ate
(%
)
ω
98.4
98.6
98.8
99
99.2
99.4
99.6
99.8
2.4 2.6 2.8 3 3.2 3.4
pe
r-p
acke
t d
ete
ctio
n r
ate
(%
)
ω
α
0.30.40.50.60.7
Fig. 5: Performance of using all 256 bins with varying α and ω parameters.100% of attacks are detected at all data points shown.
of the traffic data are the bins associated with higher valued,
negative or positive, component loadings.
V. EVALUATION AND DISCUSSION
We tested the detection approach against the 4 newer attacks
(webDAV, Nimda, DoS and CodeRed) and 10 out of the 19
attacks from DARPA, which were assumed to be unknown
during training time and when the parameters are selected.
The other 9 attacks (randomly chosen) were used during
training/setup. All attacks were detected with high per-packet
detection rates and low false positives. We next present the
results followed by a discussion of the effect of having noisy
training data.
A. Results
We first show a wide range of operating points for WAT-
SON. In Figure 4, α values and ω values are shown with
corresponding per-packet detection rates (TP) and false posi-
tive rates (FP). Note that a wide range of values can be used
20000100000
packet id
normal traffic
threshold
0.16
0.18
0.2
0.22
0.24
0.26
0.28
0.3
0
sco
re
packet id
attack traffic
a
bcd
e&f
g
h
2487
label attack note
a yaga1 last packet
b apache2 239th packets
c back last packet; *Normal payload
d webDAV last packet
e DoS 1st packet; *This is a normal packet
f Nimda 7th of 17 packets
g Nimda 16th of 17 packets
h Nimda last packet
Fig. 6: Three attack packets with scores over 0.68 and an attack packet witha score of over 0.48 are not shown in the graph (α=0.3 & ω=2.8).
to achieve similar levels of performance. It is also important
to note that even though only a single ω value is shown for
each α, a wide range of values of ω can also be used. Figure 5
illustrates this result using a few α values.
Figure 6 shows what a typical score plot looks like for all
packets in the test data set as well as the attack packets (in this
figure we show all 23 attacks for illustration purposes). For the
normal traffic, packets that map to scores above the threshold
are the false positives. For the attack traffic, packets with
scores that fall below the threshold are missed by the detection
scheme. The table below the figure describes exactly which
packets (in which attacks) slipped by the detection scheme.
Note that some of the packets are actually normal packets
that happened to be a part of a sequence/set of packets that
make up an attack.
Figure 7 and Figure 8 show ROC curves for DARPA using
all bins (|B|=256) and 35 bins selected using the PCA method.
The results show that feature space reduction is effective with
utilizing less than 14% of the total number of bins, but some
questions about the validity of the DARPA data set have been
raised due to the way it was generated. Although the DARPA
data set shows a small amount of artificial characteristics (for
example, no packets use higher end byte values), the results
when compared to using real, collected traffic (WATSON)
show that it is still very useful in performing such tests.
Figure 9 and Figure 10 show ROC curves for WATSON with
|B|=256 and |B|=100. The results are similar to DARPA,
where still a large number of bins were eliminated without
having a significant impact on overall performance.
521
99
99.1
99.2
99.3
99.4
99.5
99.6
99.7
99.8
99.9
0 0.5 1 1.5 2
pe
r-p
acke
t d
ete
ctio
n (
%)
false positive (%)
|B|=256
α
0.10.20.30.40.51.01.52.0
Fig. 7: DARPA: TP vs. FP using all bins.
99
99.1
99.2
99.3
99.4
99.5
99.6
99.7
99.8
99.9
0 0.5 1 1.5 2
pe
r-p
acke
t d
ete
ctio
n (
%)
false positive (%)
|B|=35
α0.10.20.30.40.51.01.52.0
Fig. 8: DARPA: TP vs. FP using top 35 bins.
B. Robustness to Contaminated Training Data
Unlike anomaly detection schemes that rely on catching
new instances of anomalous packets, the detection approach
presented in this work is robust to contaminated training data.
We took the sanitized training data and added attack traffic
that fall outside the norm.
Figure 11 shows the Ψu when the training data is clean
as well as when it contains varying percentage of anomalous
traffic. Figure 12 shows the corresponding Ψσ . Note that the
mean score is extremely robust even with 10% of the traffic
being poisoned with potentially unknown attacks. As expected,
the standard deviation migrates from the norm incrementally
as more anomalous packets are introduced. This however does
not have a significant impact on the overall performance of
the detection scheme, especially as we consider the important
problem of detecting new, previously never seen anomalies.
Figure 13 shows results from running the test with the
same normal test traffic and attack traffic used in the previous
section. In this experiment, however, the training data was
poisoned as described above. The same parameter selection
method is used to generate the operating points, and the
results show that the detection scheme indeed works well
even when a significant portion of the training data was
99
99.1
99.2
99.3
99.4
99.5
99.6
99.7
99.8
99.9
0 0.2 0.4 0.6 0.8 1 1.2 1.4
pe
r-p
acke
t d
ete
ctio
n (
%)
false positive (%)
|B|=256
α
0.10.20.30.40.51.01.52.0
Fig. 9: WATSON: TP vs. FP using all bins.
99
99.1
99.2
99.3
99.4
99.5
99.6
99.7
99.8
99.9
0 0.2 0.4 0.6 0.8 1 1.2 1.4
pe
r-p
acke
t d
ete
ctio
n (
%)
false positive (%)
|B|=100
α
0.10.20.30.40.51.01.52.0
Fig. 10: WATSON: TP vs. FP using top 100 bins.
contaminated. The key reason for its robustness lies in the
fact that characterization of what is deemed normal is done
at a larger scale using the variation in the per-bin statistics.
Contrast to detection methods that rely on not having seen
a particular pattern in its normal traffic (during training), our
approach is able to easily over come the possibility of having
a contaminated data set for training.
VI. CONCLUSION AND FUTURE WORK
A wide range of security measures must be utilized in order
to provide a system with the highest level of protection. With
respect to intrusion detection, again, a wide array of techniques
can be used together to make the system more secure. We pro-
posed an effective anomaly-based network intrusion detection
scheme that is resilient to contamination of the training data.
Test results, using both the DARPA data set as well as real
traffic that was collected, showed that our approach allows
the system to detect packet payload anomalies with a low
false positive rate (for example, significantly lower than 1%).
We also showed a feature space reduction technique using
principal component analysis, where the number of byte values
can be reduced significantly without affecting performance.
Finally, we showed that the detection scheme’s performance
522
0
0.05
0.1
0.15
0.2
0.25
0.3
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Ψµ
α
poissoned training traffic
clean1%2%3%4%5%
10%
Fig. 11: Effect of contaminated training data on Ψu.
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
0.055
0.06
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Ψρ
α
poissoned training traffic
clean1%2%3%4%5%
10%
Fig. 12: Effect of contaminated training data on Ψσ .
99
99.1
99.2
99.3
99.4
99.5
99.6
99.7
99.8
99.9
0 0.2 0.4 0.6 0.8 1 1.2 1.4
pe
r-p
acke
t d
ete
ctio
n r
ate
false positive rate
performance with poisoned training traffic, α=0.3
clean1%2%3%4%5%
10%
% poisoned clean 1% 2% 3% 4% 5% 10%
ω2.6
∼2.9
2.3
∼2.6
2.4
∼2.6
2.5
∼2.8
2.3
∼2.7
2.3
∼2.6
2.0
∼2.1
Fig. 13: ROC curve generated using the system trained with contaminateddata. Range of ω’s where 100% of attacks are detected with less than 1%
false positives.
does not degrade even when we poisoned the training data set
on purpose.
Given the simple nature of our runtime procedure (all com-
plex operations are done only during training), we believe that
cost-effective hardware implementation is feasible. Currently,
we are in the process of performing real-time tests by setting
up a high-performance server to gauge how well such intru-
sion detection systems can be deployed without affecting the
normal traffic flow. We are also experimenting with advanced
attacks (for example, attacks that may try to blend in with
the normal traffic). As discussed previously, we believe that a
multi-layered detection measure is needed, and the burden of
do-it-all approach should be avoided. However, we have made
recent progress in utilizing an added layer of characterization
by examining correlation in bin-based statistics. A detailed
discussion is outside the scope of this paper and will be
presented in the near future.
REFERENCES
[1] R. Richardson, 2007 CSI Computer Crime And Security Survey, Com-puter Security Institute.
[2] M. Roesch, “Snort - lightweight intrusion detection for networks,”http://www.snort.org/docs/lisapaper.txt.
[3] P. G. Neumann and P. A. Poras, “Experiences with emerald to date,” inRAID 1999.
[4] E. Eskin, A. Arnold, M. Prerau, L. Portnor, and S. Stolfo, “A geometricframework for unsupervised anomaly detection: Detecting intrusions inunlabeled data,” in Data Mining for Security App., Kluwer, 2002.
[5] C. Kruegel, T. Toth, and E. Kirda, “Service specific anomaly detectionfor network intrusion detection,” in Applied Computing (SAC), ACM
Digital Library 2002.[6] M. Mahoney, “Network traffic anomaly detection based on packet bytes,”
in 18th ACM Symp. Applied Computing 2003.[7] A. Gupta and R. Sekar, “An approach for detecting self-propagating
email using anomaly detection,” in International Symp. on Recent
Advances in Intrusion Detection 2003.[8] D. Summerville, N. Nwanze, and V. Skormin, “Anomalous packet
identification for network intrusion detection,” in 5th IEEE Systems, Man
and Cybernetics Information Assurance Workshop 2004.[9] I. Onuta and A. Ghorbani, “Svision: A novel visual network-anomaly
identification technique,” computers and Security, vol. 26, Issue 3, pp201-212, May 2007.
[10] N. Nwanze and D. Summerville, “Detection of anomalous networkpackets using lightweight stateless payload inspection,” 4th IEEE LCNWorkshop on Network Security (WNS) 2008.
[11] K. Labib and V. R. Vemuri, “An application of principal componentanalysis to the detection and visualization of computer network attacks,”Annals of Telecommunications, Nov./Dec. 2005.
[12] Y. Bouzida, F. Cuppens, N. Cuppens-Boulahia, and S. Gombault,“Efficient intrusion detection using principal. component analysis,”www.rennes.enst-bretagne.fr/ fcuppens/articles/sar04.pdf.
[13] C. Taylor and J. Alves-Foss, “Nate- network analysis of anomoloustraffic events, a low cost approach,” in NSPW 2001.
[14] K. Wang and S. J. Stolfo, “Anomalous payload-based network intru-sion detection,” in Columbia University Technical Report, Feb. 2nd,
2004, http://www1.cs.columbia.edu/ids/publications/Payl-AD.02.01.04-
final.PDF.
[15] K. Wang, J. J. Parekh, and S. J. Stolfo, “Anagram: A content anomalydetector resistant to mimicry attack,” in RAID 2006.
[16] G. Rache, M. Riopel, and J. G. Blais, “Non graphical solutions for thecattells scree test,” 2006.
[17] L. Hansen, “Generalizable patterns in neuroimaging: How many princi-pal components,” 1998.
[18] I. T. Jolliffe, principal Component Analysis. New York: Springer, 1986.
523