A Practical Anomaly-based Intrusion Detection by Outlier Mining in ...
Anomaly-based Intrusion Detection from Traffic Datamining on
Transcript of Anomaly-based Intrusion Detection from Traffic Datamining on
Qin and Hwang, USC, Dec.17, 2003 Page 1 of 35
Anomaly-based Intrusion Detection from Traffic Datamining on Internet Connections*
Min Qin and Kai Hwang Internet and Grid Computing Laboratory, EEB 231
University of Southern California, Los Angeles, CA 90089-2562
Abstract: In this paper, we present a new datamining approach to generating frequent episode
rules for the construction of anomaly-based, intrusion detection systems (IDS). These rules are
derived from normal network traffic profiles. An anomaly is detected when the rule deviates
significantly from the normal patterns. Three rule pruning techniques are devised to reduce the
rule search space by 50-80%. This reduction makes datamining viable in detecting unknown
network attacks. The new approach accelerates the entire process of machine learning and profile
matching for intrusion detection. Testing our new scheme over DARPA 1999 IDS evaluation
data sets, we find a 13% reduction in false alarms over 50 network attack incidents.
The network episode rules reveal inter-relationship among sequences of network
connection events. We detect unknown attacks embedded in telnet, http, ftp, smtp, and other
requests of TCP, UDP or ICMP connections. Our IDS leads to an intrusion detection rate up to
47% for DoS (denial of service), R2L (remote-to-local), and probe attacks. Our scheme detects
many attacks that cannot be detected by Snort, including the smurf, Apache2, Guesstelnet, Dict,
Neptune, and Udpstorm. We recommend the use of the proposed anomaly detection scheme
jointly with signature-based IDS to yield even better results. These results prove the viability of
using the new scheme to build automated intrusion detection and response systems in real time.
Index Terms: Network security, intrusion detection, traffic datamining, anomaly detection, association rules, frequent episode rules, false alarm rate, Snort evaluation, and distributed Grid computing.
_________________________________________________
• Manuscript submitted to IEEE Transactions on Dependable and Secure Computing, Dec. 19, 2003. This paper was extended significantly from a preliminary version, �Effectively Generating Frequent Episode Rules for Anomaly-based Intrusion Detection�, submitted to the 2004 IEEE Symposium on Security and Privacy for consideration of presentation. The research support of this work from NSF/ITR Grant ACI-0325409 is acknowledged. All rights are reserved by the coauthors. Min Qin can reached via [email protected] and Kai Hwang by [email protected]
Qin and Hwang, USC, Dec.17, 2003 Page 2 of 35
1. Introduction
Cyber crimes have become major threats to Internet computing, web and Grid services.
Network security cannot be assured if unwanted intrusions are not stopped or removed in a
timely manner. In August 2003, the outbreak of the MS Blast worm has caused millions of
machines to become defenseless with interrupted Internet services [34]. An effective intrusion
detection system (IDS) should be able to detect such attacks at the early stage. The purpose is to
raise the alarms timely to prevent major damages on network or client resources [37].
Extensive research has been reported on the design and evaluation of IDS in the past. The
NSS Group [29] in UK has evaluated various commercial IDSs from security companies.
Gaffney et al [12] have proposed a decision theoretic approach to evaluate IDS. Method for
reducing false alarm rate of IDS was introduced by Axelsson [3], who identified base-rate fallacy
and implementation barriers. Integrating access control and intrusion detection was introduced
by Ryutov et al [36]. Other recent studies on IDS can be found in Burroughs et al [8],
Gopalakrishna [13], Ranum [33], and Sekar et al [38].
According to the detection methods used, the IDSs are generally classified into two major
categories: signature-based versus anomaly-based. The signature-based IDS applies a misuse-
detection model, by which the attacks are checked against saved signatures (characteristics) from
known attacks previously detected. The Snort [35] and the STAT [18] are good examples of this
kind of IDS. Just like most anti-virus packages, the misuse model is based on pattern matching,
which is only good in detecting known attacks with signatures collected.
Anomaly-based IDSs are based on a normal-use detection model. Good examples are the
IDES [22] and EMERALD [31]. The normal-use model checks the attack patterns against
normal network behavior. The incoming traffic is compared with normal characteristics to reveal
any significant deviations. The advantage of using anomaly detection lies in its ability to cope
with unknown attack patterns. The major drawback of anomaly detection lies in higher false
alarms than using signature matching [2]. Most existing anomaly-based IDSs concentrate on
Qin and Hwang, USC, Dec.17, 2003 Page 3 of 35
detecting traffic anomalies. Other techniques, such as detecting anomalies of packet header, were
reported by Mahoney [24].
Datamining [11] is a commonly explored technique to build anomaly-based IDS [28]. In
order to distinguish between intrusive and normal behavior, algorithms are needed to generate
the association rules [1] or frequent episode rules (FER) [25] from audit traffic data. The
concept of generating FERs using minimal occurrences started with the work of Mannila and
Toivonen [26]. Associations are often used to capture the intra-record patterns, while FERs are
used to detect inter-record patterns. With a huge amount of audit records, datamining often
generates many long FERs with a high degree of redundancy or repetitions. In this paper, we aim
to remove those unwanted redundancy or ineffective episode rules.
Statistically generated sequential rules for detecting anomalies were introduced in [39].
Instead of using datamining, a time-based inductive learning machine was adapted to the changes
in normal user behavior. Anomaly is detected when a sequence of events deviates significantly
from the normal sequential rules. Hofmeyr, et al. [15] uses a similar approach by analyzing a
sequence of system calls to detect intrusions. In [19], Lane et al. transformed discrete temporal
sequence into a metric space and use a clustering technique to reduce the size of the user model.
With datamining, there are several approaches to effective IDS construction [6, 20]. In an
earlier work by Lee et al [20], they use axis and reference attributes to constrain the number of
rules generated. Their method can reduce the number of rules to some extent. The JAM (Java
Agent for Meta-learning) project [20] uses datamining to generate rules to provide temporal
features. JAM uses RIPPER [9] in building classifiers that can detect signature of attacks. This is
essentially a misuse IDS. Fan et al [10] extended Lee�s work by introducing artificial anomalies
to discover accurate boundaries between known classes and anomalies.
Bridge et al [7] applies fuzzy frequent episode and fuzzy association rules to the problem
of intrusion detection. The ADAM project [4, 5] offered a datamining framework for detecting
network intrusions. Unlike JAM, ADAM is an anomaly based detection system. ADAM uses a
Qin and Hwang, USC, Dec.17, 2003 Page 4 of 35
sliding window to scan frequent associations in TCP connection data. These associations are
compared with normal profiles that have been constructed. ADAM has the ability to detect novel
attacks through a pseudo-Bayes estimator with a low false alarm rate. In this paper, we reduce
further the applicable FER rule space. Our method differs from both Lee�s scheme and ADAM
by using a FER-matching methodology.
The rest of the paper is organized as follows: In Section 2, we introduce basic techniques
for mining audit data. Axis and reference attributes are revisited. Section 3 presents an anomaly-
based IDS architecture using datamining capabilities. We introduce a base-support algorithm for
generating useful FERs to detect intrusions. Our new FER generation algorithm compares
favorably over the level-wise algorithm developed by Lee�s group [20]. Advantages of our
mining algorithm are justified here.
In Section 4, three pruning techniques are introduced to reduce the FER search space. These
pruning laws are illustrated with concrete traffic connection events. In Section 5, we propose a
new algorithm for pruning ineffective episode rules by applying the reduction laws,
systematically. We also outline the anomaly generation processes here. In section 6, the
experimental results are reported in terms of intrusion detection rate and false alarm rate. Finally,
we summarize the contributions and make a few suggestions for further research effort.
2. Mining of Audit Profiles in Network Traffic
In order to build effective network IDS against intrusions, we use datamining to find the
patterns of both normal and intrusion behaviors from system audit data. We adopted the idea of
axis and reference attributes introduced by Lee et al. [20], since it includes domain-specific
knowledge and is able to describe relationships among traffic records. The tasks of datamining
are described by either association rules or frequent episode rules. An association rule is aimed
at finding interesting intra-relationship inside a connection record. The FER describes the inter-
relationship among multiple connection records.
Qin and Hwang, USC, Dec.17, 2003 Page 5 of 35
2.1 Association Rules vs. Frequent Episode Rules
Let T be a set of traffic connection records and A be a set of attributes defined over the
connections. For example, the set A can be chosen as {timestamp, duration, service, srchost,
desthost} for TCP connections. Let I be a set of values defined on A, such as I = { timestamp =
10, duration = 1, service = http, srchost = 128.125.1.1, desthost = 128.125.1.10 }. Any subset
of I is called an Itemset representing the characteristics of connection events.
Let X be a traffic itemset (event) under evaluation. The support value for X, denoted
Support (X), is defined by the percentage of connection records in T that satisfies X. For
example, X = {timestamp=10, duration=1} is an itemset. Y = {service = http} is another
itemset. In this example, φ=YX I . The union of the two itemsets X U Y = {timestamp =100,
duration=1, service=http} represents the characteristics of the three traffic attributes as listed.
Association Rules: An association rule is defined between two traffic itemsets, X and Y. These
two itemsets are disjoint with φ=YX I . The rule is denoted by :
X → Y, ( c, s ) (1.a)
The association rule is characterized by a support value s and a confidence level c. These are
probabilities of the corresponding traffic events, defined by:
Y)(XSupports U= and (X)Support
Y)(XSupportc U= (1.b)
Both s and c are fractional numbers calculated directly from the Support functions on the
itemsets X and on the joint itemset X U Y as exemplified above.
Frequent Episode Rules: In general, an FER is expressed by the expression:
L1, L2, �, Ln → R1, � , Rm, (c, s, window) (2.a)
where Li (1 ≤ i ≤ n) and Rj (1 ≤ j ≤ m) are ordered itemsets in a traffic record set T. We call L1,
L2, �Ln the LHS (left hand side) episode and R1,�.Rm the RHS (right hand side) episode of the
Qin and Hwang, USC, Dec.17, 2003 Page 6 of 35
rule. Note that all itemsets are sequentially ordered, that is L1, L2, �Ln, R1,�., Rm must occur in
the ordering as listed. However, other itemsets could be embedded within our episode sequence.
We define the support and confidence of rule (2.a) by the following two expressions:
0121 )......( sRRLLSupports m ≥= UUU (2.b)
021
121
)...()......(
cLLLSupport
RRLLSupportc
n
m ≥=UU
UUUUU (2.c)
We consider the minimal occurrence [26] of the episode sequence in the entire traffic
stream. The support value s is defined by the percentage of occurrences of the episode within the
parentheses out of the total number of traffic records audited. The confidence level c is the joint
probability of the minimal occurrence of the joint episodes out of the support for the LHS
episode. Both parameters are lower bounded, by so and co, which are the minimum support value
and the minimum confidence level, respectively. The window size is an upper bound on the time
duration of the entire episode sequence.
Example 1: Association rules and episode rules
Consider the following association rule for an http connection event:
(service = http) → (duration = 1) (0.8, 0.1)
The rule indicates that 80% of all the http connections have duration less than one second. There
are 10% of all network connections that are initiated from http requests with a duration less than
one second.
Now, consider the following frequent episode rule for a sequence of network events:
(service = authentication) → (service = smtp) (service = smtp) (0.6, 0.1, 2 sec)
This rule specifies an authentication event. If the authentication service is requested at time t,
there is a confidence level of c = 60% that two smtp services will follow before the time t + w,
Qin and Hwang, USC, Dec.17, 2003 Page 7 of 35
where the event window w = 2 sec. The support of 3 traffic events (service = authentication),
(service = smtp), (service = smtp) accounts for 10% of all network connections. !
The traffic connections on both sides of a FER need not be disjoint in an episode
sequence of events. Episode rules can be used to characterize attacks. The SYN flood attack is
specified by the following episode rule:
(service = http, flag = S0) (service = http, flag = S0) → (service = http, flag = S0)
where the event (service = http, flag = S0) is an association. Flag �S0� means that only the SYN
packet was seen for a particular connection. The combination of associations and FERs reveals
useful information on normal and intrusive behaviors. Theses rules can be applied to build IDS
to defend against both known and unknown attacks.
2.2 Axis Attributes vs. Reference Attributes
The basic rule generation algorithm does not take any domain specific knowledge into
consideration. Often, too many ineffective rules are generated to be useful. For example, the
association rule: Srcbytes = 200 → destbytes = 300 is of little interest to the intrusion detection
process, since the number of bytes sent by the source (srcbytes) and destination (destbytes) is
irrelevant to the traffic and threat conditions.
In order to address this issue, Lee et al [20] has introduced the concepts of axis attributes
and reference attributes to constrain the generation of mining rules. For each association rules, it
must contain some values of axis attributes. Those association rules that do not contain any axis
attributes are considered irrelevant to the context. Axis attributes are selected from essential
attributes [20] such as srchost (source host), desthost (destination host), srcport (source port),
and service (destination port).
Different combinations of the essential attributes form the axis attributes. We also
incorporated connection flag as an essential attribute, since some flags are pretty rare in daily
Qin and Hwang, USC, Dec.17, 2003 Page 8 of 35
network traffic and is hard to mine. However, flag has to be combined with at least one another
essential attribute to form axis attributes. All Itemsets or traffic events in an FER must contain
some axis attributes. The reference attributes demand itemsets to have the same reference value.
3. Datamining for Anomaly Intrusion Detection
Our long-term goal is to build an intelligent intrusion detection system that can help
secure any distributed computing infrastructure such as a Grid system. The system can detect not
only known intrusion patterns, but also novel unknown intrusions. In order to achieve this
objective, we use datamining to profile frequent network patterns for detecting anomalies.
3.1 The Network Datamining Architecture
Three major components of our IDS are the datamining engine, the intrusion detection
engine, and the alarm generation engine as shown in Fig. 1. In this paper, we apply the normal
profile database and construct the anomaly detection engine. The alarm generation is beyond the
scope of this report. In order to correctly detect intrusion patterns, we extract two levels of
information from raw audit data of the network traffic. Although connection level information is
very effective against flood and scan attacks, it can detect only a small portion of attacks. Most
R2L and U2R attacks cannot be discovered at the connection level.
It should be noted that we combine the anomaly intrusion detection with the signature-
based detection mechanisms in Figure 1. An attack can be detected by either mechanisms,
whichever confirms the intrusion first. Once an intrusion anomaly is discovered in the traffic
profile, its signatures will be added to the signature database. Initially, we generate the episode
rules from the 1999 MIT Lincoln IDS evaluation data sets. [14, 17]. Eventually, we will update
the database and extend the rules using more recent traffic connection records. The whole traffic
database will be periodically updated for experiment purpose at USC.
Qin and Hwang, USC, Dec.17, 2003 Page 9 of 35
Audit data
Data preprocessor
Anomaly Detection
Engine
Normal profile
database Signature Database
Data mining Engine
Alarm generation Security policy
Intrusion Detection
Engine
Alarm generator
Feature extraction
Rules from real-time traffic
Attack-free episode rules
Figure 1 Our datamining architecture for anomaly-based intrusion detection
We added some additional features extracted from the packet level data to detect new
attacks. In order to do so, we use the IDS tool Bro [30] to extract the features from both
connection and packet information. The key features are generated for all traffic connections in
Table 1.
Table 1 Key Features Extracted from Traffic Connection Records
Feature Name Description
Timestamp Time when the first packet of the connection is seen
Duration Length of the connection in seconds, ignored for UDP packets
Srchost IP address of the source host
Srcport Port number of the source host
Srcbyte Number of bytes sent by the source host
Destbyte Number of bytes sent by the destination host
Desthost IP address of the destination
Destport Port Number of the destination host
Flag Connection status flag. Typical flag values given below. SF: both SYN and FIN packets are known for a connection S0: only the SYN packet was seen for a TCP connection REJ: the connection was rejected by the destination
Urgent Number of urgent flags in the connection
Frag_Error Number of Fragment errors in the connection
Qin and Hwang, USC, Dec.17, 2003 Page 10 of 35
For each connection, we check whether they violate any RFC protocols. For example,
TCP three-way handshake protocol can be easily verified by looking at the packets for
establishing the connection. Also during the preprocessing stage, packets with infrequent
properties are identified for the purpose of anomaly detection. We keep a strong interest in those
infrequent attribute values since attackers often utilize them. For example, packets with same
destination and source address will normally indicate some potential attacks. In order to detect
more R2L attacks, we extracted functional features from the network traffic as listed in Table 2.
Table 2 Functional Features Extracted for Specific Services Feature Name Description
Login_Failed Determine whether a request is failed to yield ftp, telnet, rlogin services, etc
Sensitive_files Sensitive files that are visited/created by the user, ex. .rhost, .password, etc
http_request Number of http requests in an http connection
Privileged port Whether the srcport is a privileged port (port number <1024), or only for TCP connections
Guest Whether the user login as guest/anonymous, mostly for ftp connections
Root_login Whether the user login as root
Instead of comparing frequent episodes, we use FERs as an indicator to detect anomalies
since it describes the relationship among a series of connections. If the FERs generated by the
datamining engine deviates significantly from all normal FER rules, an alarm is raised. We then
calculate some temporal statistics from current traffic data to analyze the connection data.
3.2 A New Network Datamining Algorithm
Most mining techniques exclude infrequent traffic patterns. This may cause the IDS to be
ineffective in detecting rare network events. For example, the authentication service is infrequent
in ordinary network traffic. If we lower the support threshold, then a large number of
uninteresting patterns associated with frequent services are discovered. Lee, et. al. [20] used
level-wise mining to iteratively lower the minimum support value.
Qin and Hwang, USC, Dec.17, 2003 Page 11 of 35
Initially, they use a high minimum support value si to find the episodes related to high
frequency axis attribute values. Then the procedure iteratively lowers the support threshold by
half so that each new candidate itemset must contain at least one �new� axis value. The
procedure terminates, when a very small threshold s0 is reached.
The distribution of the services in the training data set has a great impact on using Lee�s
algorithm. We have tested Lee�s level-wise algorithm using the 1999 DARPA intrusion data set
[14, 17]. The result shows that some FERs contain unrelated associations. For example, the
following rule reveals the service as the axis attribute and the srchost as the reference attribute.
(service = http, flag = SF) (service = http, flag = SF, srcbyte = 5000)
(service = telnet, flag = SF) → (service = http, flag = SF, srcbyte = 5000) (0.7, 0.0025)
The telnet operation normally is not related the http operation. This episode rule is not
useful to describe normal traffic pattern. Though telnet is a frequent service, the episode rules
related to telnet are rare. The above episode rule has the highest support value among all rules
relating to telnet. It is probable for a common service to appear in an episode rule with extremely
low support value, if individual connections are independent of each another.
We introduce a base-support mining algorithm to address this problem. For a frequent
episode X, we define base-support as the ratio of the minimum occurrences that contain X to the
number of records containing the most uncommon axis attribute value in X. For an FER to be
generated, its base-support value must exceed a threshold value.
To construct normal network patterns, the attack-free training data of the first and the
third week of DARPA 1999 intrusion data set [17, 23] are fed into our base-support mining
engine. We use a simplified approach in merging frequent episode rules from multiple days.
After finding FERs from each day�s audit record, we simply merge them into a large rule set by
removing the redundant rules.
Qin and Hwang, USC, Dec.17, 2003 Page 12 of 35
Our base-support mining algorithm is specified below. The inputs are the base support
threshold value, and all axis attribute values. The output of the mining process is the set of FERs
generated from the traffic connection records. The algorithm involves rule support calculation,
database scanning, sorting of serial traffic events, and finally the generation of meaningful
frequent episode rules.
Algorithm 1: Base-Support Traffic Datamining Input: Base-support value threshold s and its axis attribute(s)
Output: Frequent episode rules
Begin (1) For all axis attribute(s) values, calculate their support in the database
(2) Scan database to form L = {large 1-itemsets that meet s × saxis, where saxis is the support value of the axis attribute(s) of this 1-itemsets}
(3) While (there are new rules generated) do begin (4) Find serial episode from L: The episode must have support value
larger than s × min{saxis}, where min{saxis} is the minimum support value of all axis attribute(s) in this episode
(5) Append the generated episode rules to the output rule set
(6) end while
end
Figure 2 shows the result of base-support and level-wise mining algorithms for all the
TCP connections in two weeks� of attack-free data. For both algorithms, the minimum
confidence value is 0.6 and window size is chosen as 30 second. We choose source host as the
reference attribute and service as the axis attribute.
In the level-wise mining algorithm, increasing the initial support value may not be able to
help generate fewer frequent episode rules. When the training set contains 10 days of network
data, the level-wise algorithm with initial support value 0.3 will generate more rules than that
with initial value 0.1. Because the initial support value is divided by two at each iteration, it has
Qin and Hwang, USC, Dec.17, 2003 Page 13 of 35
less and less impact after several iterations. Thus it is hard for user to control the generation of
rules related to infrequent axis attributes.
On the other hand, our base-support mining algorithm is �fair� to different axis attribute
values, since the same percentage of records related to different axis attribute values is required
for a candidate rule. High minimum base-support value will result in fewer frequent episode
rules. Under the context of different axis attribute values, the base-support mining algorithm
provides the same minimum support value as a normal datamining algorithm does.
0
50
100
150
200
250
300
350
400
1 4 7 10
Training sets in days
Num
ber o
f epi
sode
rule
s ge
nera
ted
Base-support algorithm, minimumbase-support=0.1
Base-support algorithm withminimum base-support=0.3
level-wise algorithm, initialsupport value=0.1
level-wise algorithm, initialsupport value=0.3
Figure 2. Experiment on attack-free TCP connections of 1999 DARPA Intrusion Detection Evaluation data set using our base-support mining algorithm with minimum confidence value of 0.6 and a window size of 30 sec, reference attribute = srchost, axis attribute = service
4. Episode Rule Transformation Laws
Because of the large number of records in the TCPdump data, there are still a large
number of uninteresting rules generated by using our base-support mining. In order to reduce the
number of rules generated and to provide a simplified view of data patterns, we propose the
following pruning techniques to reduce the rule space. Without reduction, the rule search space
Qin and Hwang, USC, Dec.17, 2003 Page 14 of 35
may escalate too large to be practical for just one day�s collection of network traffic records. We
consider an FER effective, if it is more applicable and more frequently used. An episode rule is
said to be ineffective, if it is rarely used in detecting anomalies in network traffic. The following
traffic laws generally apply to today�s open network environment.
4.1 Transposition of Episode Rules
Some FER rules differ from each other only in the position of �→�, which seperates the
LHS from the RHS of the FER rule. Keeping all those rules will largely increase the search space.
We use the following law to remove these redundant rules.
Theorem 1: Transposition Law
Comparing the following two frequent episode rules:
L1, L2, �, Ln → R1, �., Rm (c1, s1) (4.a)
L1, L2, �, Ln-1 → Ln, R1, �., Rm (c2, s2) (4.b)
The second rule in Eq.(4.b) is transposed by moving the event Ln from the LHS to the
RHS. We consider the second rule more effective than the first one.
Proof: These two rules have the same support value s1 = s2 = support(L1, L2, �Ln, R1,�., Rm) as
defined in (2.b). However their confidence levels are different:
21n1
m1n1
n1
m1n11 c
)...L(L)....RR...L(L
)...L(L)....RR...L(Lc =≥=
−U
UU
U
UU
SupportSupport
SupportSupport (4.c)
The transposed rule in Eq.(4.b) has a smaller confidence value than the original rule in
Eq.(4.a). If the confidence level c2 is above the minimum value c, c1 must be larger than c. Thus
the first rule is always implied by the second. So we can prune it from the rule set. Q.E.D.
We can generate many rules from a frequent episode as long as the minimum confidence
level is satisfied. Theorem 1 specifies that for a given frequent episode, we only keep the rule
with shortest LHS that satisfies the minimum confidence.
Qin and Hwang, USC, Dec.17, 2003 Page 15 of 35
Example 2: Application of transposition law
Comparing the following two FER rules, the first one is more effective than the second
one, because of satisfaction of the transposition law. Both frequent episode rules describes the
normal http behavior. We need only to include the first one in our normal FER rule set
(service = http, flag = SF) → (service = http, flag = SF), (service = http, flag = SF)
(service = http, flag = SF), (service = http, flag = SF) → (service = http, flag = SF)
During detection phase, if the second rule is in the output after applying the transposition
law, we regard it as a normal rule since it is implied by the first rule in our normal rule set.
Because we only generate one FER from a frequent episode, a large number of redundant
comparisons are not necessary. However, datamining may generate some longer rules for
describing similar normal behavior. A good example is given below:
(service = http, flag = SF), (service = http, flag = SF) →
(service = http, flag = SF), (service = http, flag = SF).
Compared with the above shorter rules, this rule has the same power in describing normal http
behavior. We need to remove redundancy in such rules as introduced below. !
4.2 Elimination of Episode Rules
Shorter rules or rules with shorter LHS (left hand side) are considered more effective
than longer rules or rules with longer LHS. This is because shorter rules are often easier to apply
or to compare. Clustering of shorter rules is also much easier. Thus how to reduce long rules to
shorter rules will be very useful for enhancing the performance of an IDS system.
Theorem 2: Rule Elimination Law
The following FER rule:
L1, L2 → R1 (c1, s1) (5.a)
Qin and Hwang, USC, Dec.17, 2003 Page 16 of 35
becomes ineffective, if one of the following two conditions is met.
(i). Rule L1 → R1 does not exist and the rule L2 → R1 (c2, s2) exist in the rule set, where
δ<−
2
21
ccc (5.b)
(ii). Rule L2 → R1 does not exist and rule L1 → R1 (c3,s3) exist in the rule set, where
δ<−
3
31
ccc (5.c)
The parameter δ is called the similarity threshold, which reveals the degree of similarity
between rule (5.a) and the rule L2 → R1 (c2, s2) or rule L1 → R1 (c3, s3) above.
Proof: The rule L2 → R1 (c2, s2) specifies that if L2 happens, there is a chance of c2 that R1 will
follow L2. We suppose that L1 does not exclude R1, here �exclusion� means that if L1 takes place
then R1 will not take place in the same window. Thus if L2 takes place after L1, there is still a
great chance that R1 will follow L2. If L1 and R1 are not closely related (do not exist in the rule
set because of the low support), there is no need to put L1 into the FER. So the rule L1, L2 → R1
is implied by rule L2 → R1 as long as L1 does not exclude R1. A similar conclusion can be drawn
for rule L1 → R1 (c3, s3). Q.E.D.
Example 3: Application of elimination law
Based on the above law, the following rule is considered ineffective
(service = http) (service = authentication) → (service = smtp) (0.6, 0.1)
because the following rule exists
(service = authentication) → (service = smtp) (0.65, 0.1)
The authentication is related only to the smtp operation, the appearance of http does not affect
the other two itemsets. We keep only the following rule: (service = http) → (service = http)
Qin and Hwang, USC, Dec.17, 2003 Page 17 of 35
and remove the dummy rule: (service = telnet) (service = http) → (service = http). Because
many ineffective rules are not in the normal traffic profile, removing them can help us reduce the
false alarms to some extent as demonstrated in Section 6. !
We are mainly interested in daily network traffic. In particular, we pay attention to the
TCPdump data. Most intrusive FERs normally contain only one or two specific associations,
thus this rule normally establishes. For intrusion FER with many associations, we can identify
the key associations inside the FER, the elimination law is applicable here.
4.3 Reconstruction of Episode Rules
Many FERs detected from the network traffic has some transitive patterns. If we have two
FERs A → B and B → C in the rule set, it is very often that FER A → B, C also exists in the rule
set. Here A, B and C are associations. The rule A → B, C seems redundant, since we can
reconstruct it from the previous two rules.
Theorem 3: Reconstruction Law
Consider the following three FERs:
L1 → R1 (c1, s1), R1 → R2 (c2, s2) and L1 → R1, R2 (c3, s3) (6.a)
We consider the last rule L1 → R1, R2 (c3, s3) ineffective, if
σ<−
3
213
cccc (6.b)
where σ is a transitive threshold, which indicates the strength of the relation among three events
L1, R1, R2.
Proof: Based on the first rule L1 → R1 (c1, s1), each connection record containing R1, the
confidence (probability) of L1 in the same window is equal to:
)(
)(
1
11
RSupportRLSupport
cU
=′ (6.c)
Qin and Hwang, USC, Dec.17, 2003 Page 18 of 35
For a connection record containing R1, the confidence of R2 following R1 is c2 in the
same time window. If L1 and R2 are independent, the confidence of a connection containing R1
follows L1 and appears ahead of R2 is approximated by the product 2ccc ⋅′=′′ . Here we called it
�approximated�, since the episode L1, R1, R2 may use a longer window. The confidence of the
new rule L1 →R1 , R2 is computed by:
211
211
1
1
1
2113 )(
)()(
)()(
)(cc
LSupportcRLSupport
LSupportcRSupport
LSupportRRLSupport
c =⋅
=′′⋅
≈=UUU (6.d)
Rules L1 and R2 were assumed independent. This law is only used to justify the existence
of rule L1 →R1, R2 , when we have L1 → R1 (c1, s1) and R1 → R2 (c2, s2). In practice, the rule L1
→R1, R2 implies the existence of rule L1 → R1. So if we find rule R1 → R2 in the rule set, we
can remove rule L1 → R1, R2 , if (6.c) is satisfied. This is because we can reconstruct the first
rule from the other two rules. Q.E.D.
This reconstruction law is particularly powerful when the window size is large. For
smaller window sizes, the occurrence of the episode L1, R1 , R2 may often have a duration longer
than the window size, which violates our �approximated� assumption. The following example
clarifies the reconstruction process.
Example 4: Application of reconstruction law
The following episode rule is ineffective
(service = ftp, srcbyte = 1000) → (service = smtp) (service = authentication)
because it can be reconstructed from the following two rules already in the rule set:
(Service = ftp, srcbyte = 1000) → (service = smtp)
(service = smtp) → (service = authentication).
To reconstruct the first rule, we simply follow the following transitive path in order.
Qin and Hwang, USC, Dec.17, 2003 Page 19 of 35
(service = ftp, srcbyte =1000), (service = smtp), (service = auth).
The reconstruction law helps us split long FERs into short ones, making them easier for
clustering analysis in the detection process. !
5. Pruning of Ineffective Episode Rules
In order to apply our rule pruning laws, we integrated them into our base-support mining
engine. The following procedure is given to generate the most effective episode rules in an open
network environment.
5.1 The Episode Rule Pruning Algorithm
We apply Algorithm 2 to reduce the episode rules with different window sizes over the
same data set in Figure 3. Here, we choose the minimum base-support of 0.1 and the minimum
confidence level of 0.6. The service is chosen as the axis attribute and the destination host is
chosen as the reference attribute. This figure plots the rule set growth against the window size
under two different traffic conditions.
We classify the audit data into these two categories: intra-LAN connections and inter-
LAN connections. For DAPRA 1999 IDS evaluation data set, a large potion of intra-LAN traffic
consists of user applications (destport > 1024) and smtp services. However, in inter-LAN traffic,
the http services dominate. The result shows that 80% of the original rules can be pruned by
application of the above three pruning laws. This is very helpful, especially when the window
size increases.
Most rules have the LHS reduce to just one episode event, which is ideal for clustering
and comparison purposes. The message being conveyed here is that the pruning laws are in deed
effective in reducing the rule space significantly. This reduction will simplify the whole process
of anomaly detection as seen in the next two sections. This rule-space reduction makes it
possible to construct anomaly-based IDS, that can detect unknown intrusions in real-time.
Qin and Hwang, USC, Dec.17, 2003 Page 20 of 35
Algorithm 2: Pruning Ineffective Episode Rules Input: Candidate frequent episode of events Output: The reduced frequent episode rule set L after pruning
Begin (1) L= Ø and S= Ø, where S is a temporary set for storing all episodes
(2) While there are more candidate frequent episodes, do begin (3) Calculate the rule r that satisfies the minimum confidence level
with the shortest LHS, if no such rule exists, then r = null
(4) S = S U {r}
end while (5) For all the rules in S, do
(6) For the current rule r, find rules in S that can apply the elimination (7) rule or transitive rule, if not found, L= L U {r}
(8) end For all
end
0 20 40 60 80 100 120 140 1600
100
200
300
400
500
600
700
Num
ber o
f Rul
es G
ener
ated
Window Size(Sec)
Inter-LAN FERs Pruned inter-LAN FERs Intra-LAN FERs Pruned intra-LAN FERs
0 20 40 60 80 100 120 140 160
0
10
20
30
40
50
60
70
80
Prun
ing
Rat
e(%
)
Window size(seconds)
Pruning rate for intra-LAN FERs Pruning rate for inter-LAN FERs
(a) Variation of FER space size (b) Variation of pruning rates
Figure 3 The effects of pruning on the space of frequent episode rule (FER) for inter-LAN and intra-LAN traffic events, where the base-support = 0.1, the minimum confidence = 0.6, the reference attributes = destination, and axis attributes = service
Dec.17, 2003 Page 21 of 35
To apply the association and frequent episode rules in real-time intrusion analysis, we
use a scanning period (sliding window) to collect the connection data. The scanning period is
continuously moving forward by a fixed amount of time, called step size. For each period, we
apply our datamining framework to generate the FERs and the association rules with high
support. This approach is similar to association rules used by Barbara, et al [4]. However, we
generate both association and episode rules. We build the normal profiles using the DARPA
1999 intrusion data sets [17].
To apply a FER within a given scanning period, we calculate the minimal number of
occurrences of this rule as an additional feature to better characterize the traffic. For this
purpose, we amend the format of FER to as follows:
X → Y, (c, s, m) (7.a)
where c, s, m are the confidence, support, and minimal occurrence of this FER, respectively.
During the training phase, the maximum occurrence number is calculated for a given FER for a
large number of attack-free scanning periods. We denote this maximum value as M, An FER is
anomalous, if its minimal occurrence number exceeds the maximum number to some extent, as
formally specified below:
Mm ⋅≥ γ (7.b)
where γ ≥ 1 is a relaxation factor. If γ is large, we can tolerate more occurrences of a given
FER within a scanning period.
5.2 Anomaly Intrusion Detection Process
The process of datamining for anomaly-based intrusion detection is illustrated in Fig. 4.
To evaluate pure anomaly detection using our FERs, we use the audit data sets collected from
the first and the third week of the MIT/Lincoln IDS training process. Only TCPdump data were
applied in our work. We use the last two weeks� test data from MIT/LL to generate the FERs
and to compare the rules generated in normal profiles.
Dec.17, 2003 Page 22 of 35
In order to detect anomalies, we use Bro to generate features from the incoming
packets. If any anomaly is detected from an individual packet, an alarm is generated. To detect
attacks consisting of many packets/connections, we generate the rules from the traffic data
using the base-support mining technique specified in Algorithm 1. Anomaly detected by
individual packets is checked first. Then we proceed with related packet streams.
Feature extraction using Bro system [30]
Traffic connection records
A nom aly detected in individual packets?
G enerating Frequent Episode Rules from sequence of connection records
N o
Is FER matches norm al traffic patterns in rule set?
Is occurrence num ber exceeding the relaxation
threshold? (9.b)
The traffic connections are
considered norm alA nom aly detected in traffic connections
N o
Y es
N o
Y es
Error flags or abnorm al netw ork
statistics
Y es
Y es
N o
Figure 4 The process of anomaly-based intrusion detection through network traffic datamining
The major step in Fig.4 is to determine the minimum occurrence number. When one
cannot find the matching FER or the rule does not satisfy the condition in (7.b), we consider it
an anomaly rule. When many anomaly rules are generated in a particular scanning period, we
Dec.17, 2003 Page 23 of 35
calculate the temporal statistics from the network connections. These features include the
average network traffic, number of error flags, number of connections to the same destination
host, etc. If the pattern is verified as an intrusion, we add the rules to our signature database.
We have experimented different γ values and their effect on number of false alarms by
extracting all the TCP connections from DAPRA 1999 data set. Here the scanning period is
chosen from 100 seconds to 1000 seconds and the episode window is from 2 seconds to 10
seconds. The result shown in Fig.5 indicates that after the initial sharp drop in number of false
alarms. Increasing the relaxation factor γ beyond certain limit (say 1.2 in Fig.5) does not reduce
the false alarms further for short scanning periods.
1.0 1.1 1.2 1.3 1.4 1.50
2
4
6
8
10
12
14
16
Num
ber o
f fal
se a
larm
s
Relaxation Factor (γ )
Scan period = 100s Scan period = 300s Scan period = 500s Scan period = 1000s
Figure 5 The effect of γ on false alarms associated with all TCP connections
6. Testing on Traffic Data from 1999 DARPA IDS Evaluation
In 1999, an MIT Lincoln Lab research group [14, 17] has conducted an intensive
evaluation of many IDSs under DARPA sponsorship. We have tested our new IDS scheme at
USC Internet and Grid Computing Laboratory over the same traffic data sets collected by the
MIT/LL group. Reported below are our testing results and interpretation of the results.
Dec.17, 2003 Page 24 of 35
6.1 Data Sets used in DARPA IDS Evaluation
Attacks from the Darpa IDS evaluation have tested in our experiments. Without any
prior knowledge, we can still detect a large number of unknown intrusions. However, we have
to admit a shortcoming of this data set. The attack-free training set has to be comprehensive.
The DARPA 1999 data set does not contain enough background traffic patterns as criticized by
McHugh [27].
The main problem of intrusion detection is the response delay in detecting attacks. To
illustrate the situation, we simulated the distribution of ICMP connections when a typical
smurf attack takes place. In Fig.6, the smurf attack is demonstrated with a traffic spike. This
spike corresponds to a large number of unexpected ICMP connections between 16 � 20
seconds. The attack lasted for 4 seconds, our scanning period is 15 seconds and the step size is
10 seconds.
Figure 6 The impact of step size on the response time of a typical smurf attack,
with step size=15 sec and scanning period size=10 sec
Initially, the scanning period starts from the beginning. At the next iteration, the
scanning period moved to the 10th second. For scanning period 1, no anomalous FER was
generated. However, by collecting all the connections in scanning period 2, we were able to
detect the anomalies.
Dec.17, 2003 Page 25 of 35
We can only detect the attacks after we have collected all the connections in scanning
period 2, at which point the smurf attack has already finished. Thus choosing a suitable step
size of scanning period is very important in detecting attacks in real time. If the step size is too
small, we have to mine the connection data frequently, which may waste many CPU cycles,
when there are no attacks in the traffic. If the step size is too large, attacks that take place
during a very short time interval will be detected after they already finish.
The MIT/LL IDS evaluation data sets are sufficient to prove the new anomaly detection
concept proposed in this paper. However, we are ware of the limits of the Darpa IDS evaluation
data sets in covering large-scale attacks like DDoS type. In the future, we plan to continue the
experiments on using real traffic records from reported attack incidents by CERT (Computer
Emergency Report Team) at CMU in recent years. The purpose is to further prove the
effectiveness of the new detection scheme. With real attack patterns tested, the results will be
more accurate to reflect the impacts of Internet traffic connections.
6.2 Intrusion Detection Results and False Alarms
We have tested the effect of our pruning algorithms on different window sizes. The
testing results are shown in Fig.7. If the scanning period is small, the connection records are too
few to generate useful FERs. On the other hand, if the period is very large, frequent itemsets
are missed for taking place in a very short duration of time.
From Fig. 7(a), our rule pruning algorithm demonstrates a large reduction of the
searching time by almost 50% without affecting the detection rate. Because some stealthy
attacks cannot be detected by small periods, choosing different periods is very important to the
detection rate. We found that the combination of period of 100 seconds and 7200 seconds is the
best way in detecting most possible attacks, known or unknown.
Because some probing attacks may last for time intervals that are larger than 120
seconds, we treat connections with special flags (other than REJ, SF, etc) differently since most
probing attacks end up with some special connection status. Instead of mining FERs by time,
Dec.17, 2003 Page 26 of 35
we mine FERs by their connection number. We sort connections with same rare flags according
to their timestamp.
0
500
1000
1500
2000
2500
100 300 500 1000 7200
Scanning period (sec)
num
ber o
f ano
mal
y ru
les
gene
rate
d
Detection without rule pruningDetection with rule pruning
02468
1012141618
100 300 500 1000 7200Scanning period (sec)
fals
e al
arm
rais
ed
(a) Anomaly rules generated (b) False alarm rate
Figure 7 Generation of anomaly rules and false alarm rate in using our frequent episode intrusion detection scheme with 5 scanning periods corresponding to 5 event window sizes from 2, 3, 5, 10, and 120 seconds, respectively
We regard the sequence number of each connection as the timestamp of this connection
and use the similar approach mentioned above. This procedure is similar to that of [21]. Most
probe attacks involve a large number of network services or destination addresses. We added
two more statistical features here for helping detect probe attacks. One is the number of
different desthost, the other is the number of different services in all connections of a FER that
desthost or service does not serve as the reference attribute, respectively.
Example 5: Comparing Association and Frequent Episode Rules
Consider the following FER generated without using the pruning rules. This is an
anomaly rule leading to a false alarm, since there is no such rules in the normal profile.
(service = user_application, flag = REJ), (service = http, flag = REJ)
→ (service = http, flag = REJ) (0.6, 0.000007, 3)
By applying the elimination law, the above rule is eliminated, because the following episode
rule exists in the normal profile.
Dec.17, 2003 Page 27 of 35
(service = http, flag = REJ) → (service = http, flag = REJ) (0.63, 0.000022, 7)
and the rule (service = user_application, flag = REJ) → (service = http, flag = REJ) does
not exist in the normal rule set. This example explains why rule pruning can reduce the number
of false alarms in the intrusion detection process. !
Most individual connection attacks do not show any anomalies in FERs. The attacks that
can be identified by FERs and those cannot are given in Fig.8. We found that a large portion of
the attacks detected by our approach is through FER anomalies. We have compared our
detection results with those reported by the ADAM group [4][5]. They use an association-based
anomaly detection system.
0
5
10
15
20
25
30
35
DoS R2L ProbeAttack category
Tota
l num
ber o
f atta
cks
dete
cted
Intrusions detected bysingle packet/connectionanomaliesIntrusions detected byFER anomalies
Figure 8. Number of intrusive attacks detected by checking
against the frequent episode rules
We choose to evaluate the intrusion detection rate corresponding to unknown attacks or
misclassifications in [5]. This metric is calculated by dividing all detected attacks by the total
number of incidents. The comparison result is shown in Fig. 9. Our FER scheme results in a
higher successful detection rate than the ADAM results on DoS and Probe attacks. However,
the ADAM system performs much better in R2L type of attacks. Over all, our FER scheme has
an average detection rate of 32%, compared with 20% reported by the ADAM group.
Dec.17, 2003 Page 28 of 35
0.0%5.0%
10.0%15.0%20.0%25.0%30.0%35.0%40.0%45.0%50.0%
DOS R2L Probe Total
Attack category
Ave
rage
Det
ectio
n Ra
teassociation [4]based approach
Our approach byFERs
Figure 9. Anomaly intrusion detection rate and false alarm rate in using our frequent episode rules, compared with using the association rules in ADAM report
In our opinion, these two approaches are complementary in nature. When choosing the
desthost as reference attribute, the rule (service = smtp, flag = SF) will be considered as a
normal association. However, our episode rule (service = smtp, flag = SF) → (service = smtp,
flag = SF) detects a Mailbomb attack, if the number of occurrence is high. This corresponds to
an intrusion by sending a lot of smtp packets to the victim server during a short time period.
Thus combining the two methods may upgrade the overall detection rate further.
6.3 Anomaly Detection Compared with Snort Results
Snort is a packet-level signature-based IDS, it is pretty effective in real-time intrusion
detection of known attacks. When attack signatures appear in the network traffic, Snort
immediately raises an alarm. However, Snort is rather weak against attacks such as smurf and
mailbomb. This is because each connection in a mailbomb attack is a legitimate connection.
We have to maintain temporal traffic statistics to identify anomalous network behaviors in real
time. However, traffic pattern normally has a large variation. Maintaining a suitable threshold
is crucial to differentiate anomalous from normal behaviors.
To reduce the response delay of our anomaly IDS, we use a different approach to
generate FERs over a large amount of traffic data during a relatively short observation time. If
the number of connections in a scanning period exceeds a certain threshold, we immediately
Dec.17, 2003 Page 29 of 35
feed these records to our datamining engine instead of waiting for the rest connections in the
same scanning period.
We have tested our IDS against the use of Snort on DARPA 1999 intrusion dataset in
simulated real-time experiments. If both systems report an attack, we choose the one that
responded first. For anomaly detection, the step size is chosen as 10 seconds and the scanning
period size is chosen as 100 seconds. The detection results by combining Snort and our
anomaly detection engine are shown in Table 3.
Table 3 Intrusion Detection Results of our Anomaly IDS compared with using Snort over the DARPA 1999 IDS Evaluation Data Set
Attack name
Category Start timestamp*
Finish timestamp*
First Detected by
Detected at timestamp*
Pod Dos 923315990 923315990 Snort 923315990
Pod Dos 923316510 923316510 Snort 923316510
Smurf Dos 923319250 923319254 Anomaly IDS 923319250
Portsweep Probe 923319788 923320010 Snort 923319788
Apache2 Dos 923322559 923322666 Anomaly IDS 923322561
Guesstelnet R2L 923324290 923324479 Anomaly IDS 923324318
Dosnuke Dos 923327133 923327133 Snort 923327133
Smurf Dos 923332686 923332688 Anomaly IDS 923332686
Apache2 Dos 923335537 923335584 Anomaly IDS 923335541
Pod Dos 923336543 923336544 Snort 923336543
Dict R2L 923344325 923345308 Anomaly IDS 923344358
Neptune Dos 923349835 923350245 Anomaly IDS 923349841
Dosnuke Dos 923356070 923356070 Snort 923356070
Udpstorm Dos 923356827 923358123 Anomaly IDS 923356827
*Timestamps listed here are the number of seconds passed since the attack began
To evaluate the real-time intrusion response time, we define intrusion detection
efficiency α as follows: Let β be the duration of the attack and δ be the overhead time needed to
Dec.17, 2003 Page 30 of 35
detect the attack. The efficiency α is defined by the following ratio. The average detection
efficiency of different attacks listed Table 3 is plotted in Figure 10.
α = ( β - δ ) / β (8)
Snort has a slightly better performance over detectable attacks such as Pod and
Portsweep. However, our anomaly-based IDS can detect many unknown attacks such as
Smurf, Apache2, Guesstelnet, Dict, and Neptune in Fig.10. These attacks are not detectable
by Snort. Snort performs much better than our scheme only in the case of a Dosnuke attack.
0%
20%
40%
60%
80%
100%
Pod
Smur
f
Ports
wee
p
Apac
he2
Gue
sste
lnet
Dos
nuke Dic
t
Nep
tune
Udp
stor
m
Attack name
Det
ectio
n ef
ficie
ncy
Snort
datamining basedanomaly detection
Figure 10 The detection efficiency of our anomaly detection system compared with using the Snort on nine attacks listed in Table 3
7. Conclusions and Future Work
In this paper, we have developed a new datamining scheme to detecting intrusions in an
open network. In order to generate the frequent episode rules more effectively, we introduced a
new base-support mining algorithm. This algorithm makes the analysis of audit network data
systematically. Three episode rule pruning techniques are introduced. As the window size
increases to 20 - 40 sec, the episode rule pruning rate increases sharply to 80% for inter-LAN
traffic and about 50% for intra-LAN traffic. As the window size increases beyond 50 sec, the
pruning rate drops to 60% and 35%, respectively.
Dec.17, 2003 Page 31 of 35
The illustrative examples demonstrate hoe to use our episode mining technique to
discover unknown attacks. The three episode rule-pruning laws enable fast detection of
anomaly out of a much reduced rule search space. Algorithm 2 specified the procedure to apply
these pruning rules effectively. These techniques reduce the search space as much as 80% for
anomaly intrusion detection. Through testing the new pruning techniques over the DARPA
1999 IDS evaluation data sets [17], we demonstrate 13% reduction in false alarms over 50
attack incidents of U2R and R2L to DoS attacks.
Our method is particularly effective to detect DoS and Probe types of attacks. The use
of FERs leads to a detection rate of 47% for DoS attacks, 19% for R2L attacks, and 47% for
probe attacks. These results are very encouraging. The FER are shown more effective than
using the association rules alone. We find that the FERs perform much better in revealing inter-
relationship among subsequent connection request records. The use of association rules is
better for revealing intra-relationship inside a single audited traffic record.
We recommend the use the proposed anomaly detection scheme jointly with signature-
based IDS to yield even better results. These results prove the viability of using the new
scheme in building automated intrusion detection and response systems in any open networks.
For future research, our continued effort will cover the following four aspects. These
research tasks are currently in progress. We expect to generate a lot more benchmark
experimental results in security experiments tied to distributed Grid applications in the future.
• Implement a distributed testbed for automated intrusion detection and response jointly by
coordinated effort from multiple security managers located at various domains.
• Compare our method with other mining technique in improving the intrusion detection
model. We will build our own attack databases and collect our own training sets.
• Investigate rare connection properties associated with detecting single packet attacks by
combining our anomaly detection scheme with the use of encrypted tunnels in virtual
private networks.
Dec.17, 2003 Page 32 of 35
• The MIT intrusion data set considered only DoS attacks from a single source. We plan to
extend our work to defend against distributed denial-of service (DDoS) attacks in follow-
up experiments.
Acknowledgements: The financial support of this work from the NSF ITR Research Grant
ACI-0325409 to USC is appreciated. This work was benefited from group discussions with our
team members: Shanshan Song, Ching-hua Chuan, and Rakesh Rajbanshi of the USC Internet
and Grid Computing Laboratory.
References:
[1] R. Agrawal, T. Imielinski, A. Swami, �Mining Associations between Sets of Items in Massive Databases�, Proc. of the ACM-SIGMOD 1993 Int'l Conference on Management of Data, Washington D.C., May 1993, 207-216.
[2] J. Allen, A. Christie, W. Fithen, J. McHugh, J. Pickel and E. Stoner, �State of the Practice of Intrusion Detection Technologies�, Carnegie Mellon Software Engineering Institute, Jan. 2000.
[3] S. Axelsson. �The base-rate fallacy and its implications for the difficulty of intrusion detection�. 6th ACM Conference on computer and communications security, pages 1--7, Kent Ridge Digital Labs, Singapore, Nov. 1999.
[4] D. Barbara, J. Couto, S. Jajodia, L. Popyack, and N. Wu. �ADAM: Detecting Intrusions by Data Mining�. IEEE Workshop on Information Assurance and Security, 2001.
[5] D. Barbara, N. Wu, and S. Jajodia. �Detecting Novel Network Intrusions using Bayes Estimators�, First SIAM Conf. on Data Mining, Chicago, IL. 2001.
[6] E. Bloedorn, A. D. Christiansen, W. Hill, C. Skorupka, L. M. Talbot and J. Tivel. “Data Mining
for Network Intrusion Detection: How to Get Started”, MITRE Technical Report, August 2001.
[7] S. M. Bridges and R. M. Vaughn, �Fuzzy Data Mining and Genetic Algorithms Applied to Intrusion Detection,� Proc. of 23rd National Information Systems Security Conference, Baltimore, Maryland, October 2000.
[8] D. J. Burroughs, L. F. Wilson, and G. V. Cybenko. �Analysis of Distributed Intrusion Detection Systems Using Bayesian Methods�. Proc. of IEEE International Performance Computing and Communication Conference, April 2002.
[9] W. Cohen, �Fast Effective Rule Induction�, Proc. of the 12th International Conference on Machine Learning, 1995.
Dec.17, 2003 Page 33 of 35
[10] W. Fan, M. Miller, S. Stolfo, W. Lee, and P. Chan. �Using Artificial Anomalies to Detect Unknown and Known Network Intrusions.� Proc. of The First IEEE International Conference on Data Mining, San Jose, CA, November 2001.
[11] U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, �Advances in Knowledge Discovery and Data Mining�, AAAI Press, 1996.
[12] J. Gaffney and J. Ulvila. �Evaluation of Intrusion Detectors: A Decision Theory Approach�. IEEE Symp. on Security and Privacy, Oakland, CA., May 14-16, 2001
[13] R. Gopalakrishna. �A Framework for Distributed Intrusion Detection using Interest Driven Cooperating Agents�, Dept. of Computer Sciences, Purdue Univ., May 2001.
[14] J. Haines, L. Rossey, R. Lippmann and R. Cunnigham, "Extending the 1999 Evaluation". Proc. of DISCEX 2001, Anaheim, California, June 11-12, 2001
[15] S. A. Hofmeyr, S. Forrest, and A. Somayaji. �Intrusion Detection using Sequences of System Calls�. Journal of Computer Security, vol. 6, pages 151-180, 1998.
[16] Lincoln Laboratory, �1999 DAPRA Intrusion Detection Scoring Truth�, http://www.ll.mit.edu/SST/ideval/docs/1999/master-listfile-condensed.txt.
[17] R. Lippmann. �The 1999 DARPA Off-line Detection Intrusion Detection Evaluation�. MIT Lincoln Lab, 2000.
[18] K. Llgun, R. A. Kemmerer, and P. A. Porras. �State Transition Analysis: A Rule-based Intrusion Detection Approach�, IEEE Transactions on Software Engineering, 21(3):181-199, March 1995.
[19] T. Lane and C.E. Brodley. �Temporal Sequence Learning and Data Reduction for Anomaly Detection�, Proc. of the fifth ACM conf. on Computer and Comm. Security, pages 150-158, 1998.
[20] W. Lee, S. J. Stolfo, and K. Mok. �Adaptive Intrusion Detection: a Data Mining Approach�, Artificial Intelligence Review, Kluwer Academic Publishers, 14(6):533-567, December 2000.
[21] W. Lee and S. Stolfo. �A Framework for Constructing Features and Models for Intrusion Detection Systems�, ACM Trans. on Information and System Security, Volume 3, No. 4, Nov. 2000.
[22] T. Lunt, A. Tamaru, F. Gilham, R. Jagannathan, P. Neumann, and H. Javitz, A. Valdes, and T. Garvey. �A Real-time Intrusion Detection Expert System (IDES)�, Technical Report, Computer Science Laboratory, SRI International, Menlo Park, California, February 1992.
[23] M. V. Mahoney and P. K. Chan. �An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection�. Proceedings of International Symp. on Recent Advances in Intrusion Detection (RAID), pp. 220-237, September 2003.
Dec.17, 2003 Page 34 of 35
[24] M. V. Mahoney and P. K. Chan. �PHAD: Packet Header Anomaly Detection for Indentifying Hostile Network Traffic�, Florida Tech. Technical Report CS-2001-4, April 2001.
[25] H. Mannila, H. Toivonen, and A. I. Verkamo. �Discovery of Frequent Episodes in Event Sequences�, Data Mining and Knowledge Discovery, 1(3), 1997.
[26] H. Mannila and H. Toivonen. “Discovering Generalized Episodes using Minimal Occurrences”,
Proc. of the Second Int’l Conf. on knowledge discovery and datamining, Portland, Oregon,
August, 1996.
[27] J. McHugh. �Testing Intrusion Detection Systems: A Critique of the 1998 and 1999 DARPA Off-line Intrusion Detection System Evaluation as Performed by Lincoln Laboratory�. ACM Trans. on Information and System Security, 3(4). November, 2000.
[28] S. Noel, D. Wijesekera, and C. Youman, �Modern Intrusion Detection, Data Mining, and Degrees of Attack Guilt� in Applications of Data Mining in Computer Security, Daniel Barbarà and Sushil Jajodia, eds. Kluwer Academic Publishers, Boston, 2002.
[29] The NSS Group, �Intrusion Detection Systems Group Test (Edition 2)�. Technical Report, Oakwood House, Wennington, Cambridgeshire, December 2001.
[30] V. Paxson. Bro: a system for detecting network intruders in real-time. USENIX Security Symposium, pages 31--51. USENIX Association, 1998.
[31] P. A. Porras and P. G. Neumann. �EMERALD: Event Monitoring Enabling Responses to Anomalous Live Disturbances�, Proc. of the 19th National Computer Security Conference, pages 353-365, Baltimore, Maryland, 22-25 October 1997.
[32] M. Qin and K. Hwang. �Effectively Generating Frequent Episode Rules for Anomaly-based Intrusion Detection�, submitted to 2004 IEEE Symposium on Security and Privacy, Oakland, CA.
[33] M. J. Ranum. �Experiences Benchmarking Intrusion Detection Systems�, NFR Security White Paper, December, 2001.
[34] L. Robert. �Slapdash monster roams the Net�, http://zdnet.com.com/2100-1105_2-5062998.html, CNET News.com, August 13, 2003.
[35] M. Roesch, “Snort - lightweight intrusion detection for networks”. Proc. of USENIX Thirteenth
Systems Administration Conference (LISA '99), pages 229-238, Berkeley, California. 1999.
[36] T. Ryutov, C. Neuman, D. Kim, and L. Zhou, “Integrated Access Control and Intrusion Detection
for Web Servers”, IEEE Transactions on Parallel and Distributed Systems. Vol. 14, No 9, pp.
841-850, September 2003.
[37] P. Sommer. “Intrusion detection systems as evidence”. Computer Networks: International Journal
of Computer and Telecomm. Networking, Vol.31 No..23-24, p.2477-2487, Dec. 1999
Dec.17, 2003 Page 35 of 35
[38] R. Sekar, Y. Guang, S. Verma, and T. Shanbhag. “A High-Performance Network Intrusion
Detection System”. Proc. of ACM Conf. Computer and Comm. Security, pages 8-17, Nov. 1999.
[39] H. S. Teng, K. Chen, and S. Lu, �Adaptive Real-time Anomaly Detection using Inductively Generated Sequential Patterns�, IEEE Proc. of Symposium in Security and Privacy, Oakland, CA., pages 278-284, May 7-9, 1990.
Biographical Sketches:
Min Qin is presently pursuing his Ph.D. degree in the Computer Science Department at the
University of Southern California. He received his B.E. and M.E. degrees in Computer
Science from Shanghai Jiaotong University in China. His current research interest includes
Internet security, datamining, distributed systems, and database systems. He can be reached at
Kai Hwang is a Professor and Director of Internet and Grid Computing Laboratory at the
University of Southern California (USC). He received the Ph.D. from the University of
California, Berkeley. An IEEE Fellow, he specializes in computer architecture, parallel
processing, Internet and wireless security, and distributed computing systems. He has authored
or coauthored 6 scientific books and 170 Journal/conference papers in the above areas.
Dr. Hwang is the founding Editor-in-Chief of the Journal of Parallel and Distributed
Computing. He has performed advisory and consulting work for IBM Fishkill, Intel SSD, MIT
Lincoln Lab., ETL in Japan, and GMD in Germany. Presently, he leads a USC research group
in developing distributed intrusion detection and response system for protecting clusters,
Intranets, and Grid resources. The NetShield software and GridSec testbed are currently under
construction at USC for trusted grid, cluster, and Internet computing. He can be reached at