Anomaly-based Intrusion Detection from Traffic Datamining on

35
Qin and Hwang, USC, Dec.17, 2003 Page 1 of 35 Anomaly-based Intrusion Detection from Traffic Datamining on Internet Connections* Min Qin and Kai Hwang Internet and Grid Computing Laboratory, EEB 231 University of Southern California, Los Angeles, CA 90089-2562 Abstract: In this paper, we present a new datamining approach to generating frequent episode rules for the construction of anomaly-based, intrusion detection systems (IDS). These rules are derived from normal network traffic profiles. An anomaly is detected when the rule deviates significantly from the normal patterns. Three rule pruning techniques are devised to reduce the rule search space by 50-80%. This reduction makes datamining viable in detecting unknown network attacks. The new approach accelerates the entire process of machine learning and profile matching for intrusion detection. Testing our new scheme over DARPA 1999 IDS evaluation data sets, we find a 13% reduction in false alarms over 50 network attack incidents. The network episode rules reveal inter-relationship among sequences of network connection events. We detect unknown attacks embedded in telnet, http, ftp, smtp, and other requests of TCP, UDP or ICMP connections. Our IDS leads to an intrusion detection rate up to 47% for DoS (denial of service), R2L (remote-to-local), and probe attacks. Our scheme detects many attacks that cannot be detected by Snort, including the smurf, Apache2, Guesstelnet, Dict, Neptune, and Udpstorm. We recommend the use of the proposed anomaly detection scheme jointly with signature-based IDS to yield even better results. These results prove the viability of using the new scheme to build automated intrusion detection and response systems in real time. Index Terms: Network security, intrusion detection, traffic datamining, anomaly detection, association rules, frequent episode rules, false alarm rate, Snort evaluation, and distributed Grid computing. _________________________________________________ Manuscript submitted to IEEE Transactions on Dependable and Secure Computing, Dec. 19, 2003. This paper was extended significantly from a preliminary version, Effectively Generating Frequent Episode Rules for Anomaly-based Intrusion Detection, submitted to the 2004 IEEE Symposium on Security and Privacy for consideration of presentation. The research support of this work from NSF/ITR Grant ACI-0325409 is acknowledged. All rights are reserved by the coauthors. Min Qin can reached via [email protected] and Kai Hwang by [email protected]

Transcript of Anomaly-based Intrusion Detection from Traffic Datamining on

Page 1: Anomaly-based Intrusion Detection from Traffic Datamining on

Qin and Hwang, USC, Dec.17, 2003 Page 1 of 35

Anomaly-based Intrusion Detection from Traffic Datamining on Internet Connections*

Min Qin and Kai Hwang Internet and Grid Computing Laboratory, EEB 231

University of Southern California, Los Angeles, CA 90089-2562

Abstract: In this paper, we present a new datamining approach to generating frequent episode

rules for the construction of anomaly-based, intrusion detection systems (IDS). These rules are

derived from normal network traffic profiles. An anomaly is detected when the rule deviates

significantly from the normal patterns. Three rule pruning techniques are devised to reduce the

rule search space by 50-80%. This reduction makes datamining viable in detecting unknown

network attacks. The new approach accelerates the entire process of machine learning and profile

matching for intrusion detection. Testing our new scheme over DARPA 1999 IDS evaluation

data sets, we find a 13% reduction in false alarms over 50 network attack incidents.

The network episode rules reveal inter-relationship among sequences of network

connection events. We detect unknown attacks embedded in telnet, http, ftp, smtp, and other

requests of TCP, UDP or ICMP connections. Our IDS leads to an intrusion detection rate up to

47% for DoS (denial of service), R2L (remote-to-local), and probe attacks. Our scheme detects

many attacks that cannot be detected by Snort, including the smurf, Apache2, Guesstelnet, Dict,

Neptune, and Udpstorm. We recommend the use of the proposed anomaly detection scheme

jointly with signature-based IDS to yield even better results. These results prove the viability of

using the new scheme to build automated intrusion detection and response systems in real time.

Index Terms: Network security, intrusion detection, traffic datamining, anomaly detection, association rules, frequent episode rules, false alarm rate, Snort evaluation, and distributed Grid computing.

_________________________________________________

• Manuscript submitted to IEEE Transactions on Dependable and Secure Computing, Dec. 19, 2003. This paper was extended significantly from a preliminary version, �Effectively Generating Frequent Episode Rules for Anomaly-based Intrusion Detection�, submitted to the 2004 IEEE Symposium on Security and Privacy for consideration of presentation. The research support of this work from NSF/ITR Grant ACI-0325409 is acknowledged. All rights are reserved by the coauthors. Min Qin can reached via [email protected] and Kai Hwang by [email protected]

Page 2: Anomaly-based Intrusion Detection from Traffic Datamining on

Qin and Hwang, USC, Dec.17, 2003 Page 2 of 35

1. Introduction

Cyber crimes have become major threats to Internet computing, web and Grid services.

Network security cannot be assured if unwanted intrusions are not stopped or removed in a

timely manner. In August 2003, the outbreak of the MS Blast worm has caused millions of

machines to become defenseless with interrupted Internet services [34]. An effective intrusion

detection system (IDS) should be able to detect such attacks at the early stage. The purpose is to

raise the alarms timely to prevent major damages on network or client resources [37].

Extensive research has been reported on the design and evaluation of IDS in the past. The

NSS Group [29] in UK has evaluated various commercial IDSs from security companies.

Gaffney et al [12] have proposed a decision theoretic approach to evaluate IDS. Method for

reducing false alarm rate of IDS was introduced by Axelsson [3], who identified base-rate fallacy

and implementation barriers. Integrating access control and intrusion detection was introduced

by Ryutov et al [36]. Other recent studies on IDS can be found in Burroughs et al [8],

Gopalakrishna [13], Ranum [33], and Sekar et al [38].

According to the detection methods used, the IDSs are generally classified into two major

categories: signature-based versus anomaly-based. The signature-based IDS applies a misuse-

detection model, by which the attacks are checked against saved signatures (characteristics) from

known attacks previously detected. The Snort [35] and the STAT [18] are good examples of this

kind of IDS. Just like most anti-virus packages, the misuse model is based on pattern matching,

which is only good in detecting known attacks with signatures collected.

Anomaly-based IDSs are based on a normal-use detection model. Good examples are the

IDES [22] and EMERALD [31]. The normal-use model checks the attack patterns against

normal network behavior. The incoming traffic is compared with normal characteristics to reveal

any significant deviations. The advantage of using anomaly detection lies in its ability to cope

with unknown attack patterns. The major drawback of anomaly detection lies in higher false

alarms than using signature matching [2]. Most existing anomaly-based IDSs concentrate on

Page 3: Anomaly-based Intrusion Detection from Traffic Datamining on

Qin and Hwang, USC, Dec.17, 2003 Page 3 of 35

detecting traffic anomalies. Other techniques, such as detecting anomalies of packet header, were

reported by Mahoney [24].

Datamining [11] is a commonly explored technique to build anomaly-based IDS [28]. In

order to distinguish between intrusive and normal behavior, algorithms are needed to generate

the association rules [1] or frequent episode rules (FER) [25] from audit traffic data. The

concept of generating FERs using minimal occurrences started with the work of Mannila and

Toivonen [26]. Associations are often used to capture the intra-record patterns, while FERs are

used to detect inter-record patterns. With a huge amount of audit records, datamining often

generates many long FERs with a high degree of redundancy or repetitions. In this paper, we aim

to remove those unwanted redundancy or ineffective episode rules.

Statistically generated sequential rules for detecting anomalies were introduced in [39].

Instead of using datamining, a time-based inductive learning machine was adapted to the changes

in normal user behavior. Anomaly is detected when a sequence of events deviates significantly

from the normal sequential rules. Hofmeyr, et al. [15] uses a similar approach by analyzing a

sequence of system calls to detect intrusions. In [19], Lane et al. transformed discrete temporal

sequence into a metric space and use a clustering technique to reduce the size of the user model.

With datamining, there are several approaches to effective IDS construction [6, 20]. In an

earlier work by Lee et al [20], they use axis and reference attributes to constrain the number of

rules generated. Their method can reduce the number of rules to some extent. The JAM (Java

Agent for Meta-learning) project [20] uses datamining to generate rules to provide temporal

features. JAM uses RIPPER [9] in building classifiers that can detect signature of attacks. This is

essentially a misuse IDS. Fan et al [10] extended Lee�s work by introducing artificial anomalies

to discover accurate boundaries between known classes and anomalies.

Bridge et al [7] applies fuzzy frequent episode and fuzzy association rules to the problem

of intrusion detection. The ADAM project [4, 5] offered a datamining framework for detecting

network intrusions. Unlike JAM, ADAM is an anomaly based detection system. ADAM uses a

Page 4: Anomaly-based Intrusion Detection from Traffic Datamining on

Qin and Hwang, USC, Dec.17, 2003 Page 4 of 35

sliding window to scan frequent associations in TCP connection data. These associations are

compared with normal profiles that have been constructed. ADAM has the ability to detect novel

attacks through a pseudo-Bayes estimator with a low false alarm rate. In this paper, we reduce

further the applicable FER rule space. Our method differs from both Lee�s scheme and ADAM

by using a FER-matching methodology.

The rest of the paper is organized as follows: In Section 2, we introduce basic techniques

for mining audit data. Axis and reference attributes are revisited. Section 3 presents an anomaly-

based IDS architecture using datamining capabilities. We introduce a base-support algorithm for

generating useful FERs to detect intrusions. Our new FER generation algorithm compares

favorably over the level-wise algorithm developed by Lee�s group [20]. Advantages of our

mining algorithm are justified here.

In Section 4, three pruning techniques are introduced to reduce the FER search space. These

pruning laws are illustrated with concrete traffic connection events. In Section 5, we propose a

new algorithm for pruning ineffective episode rules by applying the reduction laws,

systematically. We also outline the anomaly generation processes here. In section 6, the

experimental results are reported in terms of intrusion detection rate and false alarm rate. Finally,

we summarize the contributions and make a few suggestions for further research effort.

2. Mining of Audit Profiles in Network Traffic

In order to build effective network IDS against intrusions, we use datamining to find the

patterns of both normal and intrusion behaviors from system audit data. We adopted the idea of

axis and reference attributes introduced by Lee et al. [20], since it includes domain-specific

knowledge and is able to describe relationships among traffic records. The tasks of datamining

are described by either association rules or frequent episode rules. An association rule is aimed

at finding interesting intra-relationship inside a connection record. The FER describes the inter-

relationship among multiple connection records.

Page 5: Anomaly-based Intrusion Detection from Traffic Datamining on

Qin and Hwang, USC, Dec.17, 2003 Page 5 of 35

2.1 Association Rules vs. Frequent Episode Rules

Let T be a set of traffic connection records and A be a set of attributes defined over the

connections. For example, the set A can be chosen as {timestamp, duration, service, srchost,

desthost} for TCP connections. Let I be a set of values defined on A, such as I = { timestamp =

10, duration = 1, service = http, srchost = 128.125.1.1, desthost = 128.125.1.10 }. Any subset

of I is called an Itemset representing the characteristics of connection events.

Let X be a traffic itemset (event) under evaluation. The support value for X, denoted

Support (X), is defined by the percentage of connection records in T that satisfies X. For

example, X = {timestamp=10, duration=1} is an itemset. Y = {service = http} is another

itemset. In this example, φ=YX I . The union of the two itemsets X U Y = {timestamp =100,

duration=1, service=http} represents the characteristics of the three traffic attributes as listed.

Association Rules: An association rule is defined between two traffic itemsets, X and Y. These

two itemsets are disjoint with φ=YX I . The rule is denoted by :

X → Y, ( c, s ) (1.a)

The association rule is characterized by a support value s and a confidence level c. These are

probabilities of the corresponding traffic events, defined by:

Y)(XSupports U= and (X)Support

Y)(XSupportc U= (1.b)

Both s and c are fractional numbers calculated directly from the Support functions on the

itemsets X and on the joint itemset X U Y as exemplified above.

Frequent Episode Rules: In general, an FER is expressed by the expression:

L1, L2, �, Ln → R1, � , Rm, (c, s, window) (2.a)

where Li (1 ≤ i ≤ n) and Rj (1 ≤ j ≤ m) are ordered itemsets in a traffic record set T. We call L1,

L2, �Ln the LHS (left hand side) episode and R1,�.Rm the RHS (right hand side) episode of the

Page 6: Anomaly-based Intrusion Detection from Traffic Datamining on

Qin and Hwang, USC, Dec.17, 2003 Page 6 of 35

rule. Note that all itemsets are sequentially ordered, that is L1, L2, �Ln, R1,�., Rm must occur in

the ordering as listed. However, other itemsets could be embedded within our episode sequence.

We define the support and confidence of rule (2.a) by the following two expressions:

0121 )......( sRRLLSupports m ≥= UUU (2.b)

021

121

)...()......(

cLLLSupport

RRLLSupportc

n

m ≥=UU

UUUUU (2.c)

We consider the minimal occurrence [26] of the episode sequence in the entire traffic

stream. The support value s is defined by the percentage of occurrences of the episode within the

parentheses out of the total number of traffic records audited. The confidence level c is the joint

probability of the minimal occurrence of the joint episodes out of the support for the LHS

episode. Both parameters are lower bounded, by so and co, which are the minimum support value

and the minimum confidence level, respectively. The window size is an upper bound on the time

duration of the entire episode sequence.

Example 1: Association rules and episode rules

Consider the following association rule for an http connection event:

(service = http) → (duration = 1) (0.8, 0.1)

The rule indicates that 80% of all the http connections have duration less than one second. There

are 10% of all network connections that are initiated from http requests with a duration less than

one second.

Now, consider the following frequent episode rule for a sequence of network events:

(service = authentication) → (service = smtp) (service = smtp) (0.6, 0.1, 2 sec)

This rule specifies an authentication event. If the authentication service is requested at time t,

there is a confidence level of c = 60% that two smtp services will follow before the time t + w,

Page 7: Anomaly-based Intrusion Detection from Traffic Datamining on

Qin and Hwang, USC, Dec.17, 2003 Page 7 of 35

where the event window w = 2 sec. The support of 3 traffic events (service = authentication),

(service = smtp), (service = smtp) accounts for 10% of all network connections. !

The traffic connections on both sides of a FER need not be disjoint in an episode

sequence of events. Episode rules can be used to characterize attacks. The SYN flood attack is

specified by the following episode rule:

(service = http, flag = S0) (service = http, flag = S0) → (service = http, flag = S0)

where the event (service = http, flag = S0) is an association. Flag �S0� means that only the SYN

packet was seen for a particular connection. The combination of associations and FERs reveals

useful information on normal and intrusive behaviors. Theses rules can be applied to build IDS

to defend against both known and unknown attacks.

2.2 Axis Attributes vs. Reference Attributes

The basic rule generation algorithm does not take any domain specific knowledge into

consideration. Often, too many ineffective rules are generated to be useful. For example, the

association rule: Srcbytes = 200 → destbytes = 300 is of little interest to the intrusion detection

process, since the number of bytes sent by the source (srcbytes) and destination (destbytes) is

irrelevant to the traffic and threat conditions.

In order to address this issue, Lee et al [20] has introduced the concepts of axis attributes

and reference attributes to constrain the generation of mining rules. For each association rules, it

must contain some values of axis attributes. Those association rules that do not contain any axis

attributes are considered irrelevant to the context. Axis attributes are selected from essential

attributes [20] such as srchost (source host), desthost (destination host), srcport (source port),

and service (destination port).

Different combinations of the essential attributes form the axis attributes. We also

incorporated connection flag as an essential attribute, since some flags are pretty rare in daily

Page 8: Anomaly-based Intrusion Detection from Traffic Datamining on

Qin and Hwang, USC, Dec.17, 2003 Page 8 of 35

network traffic and is hard to mine. However, flag has to be combined with at least one another

essential attribute to form axis attributes. All Itemsets or traffic events in an FER must contain

some axis attributes. The reference attributes demand itemsets to have the same reference value.

3. Datamining for Anomaly Intrusion Detection

Our long-term goal is to build an intelligent intrusion detection system that can help

secure any distributed computing infrastructure such as a Grid system. The system can detect not

only known intrusion patterns, but also novel unknown intrusions. In order to achieve this

objective, we use datamining to profile frequent network patterns for detecting anomalies.

3.1 The Network Datamining Architecture

Three major components of our IDS are the datamining engine, the intrusion detection

engine, and the alarm generation engine as shown in Fig. 1. In this paper, we apply the normal

profile database and construct the anomaly detection engine. The alarm generation is beyond the

scope of this report. In order to correctly detect intrusion patterns, we extract two levels of

information from raw audit data of the network traffic. Although connection level information is

very effective against flood and scan attacks, it can detect only a small portion of attacks. Most

R2L and U2R attacks cannot be discovered at the connection level.

It should be noted that we combine the anomaly intrusion detection with the signature-

based detection mechanisms in Figure 1. An attack can be detected by either mechanisms,

whichever confirms the intrusion first. Once an intrusion anomaly is discovered in the traffic

profile, its signatures will be added to the signature database. Initially, we generate the episode

rules from the 1999 MIT Lincoln IDS evaluation data sets. [14, 17]. Eventually, we will update

the database and extend the rules using more recent traffic connection records. The whole traffic

database will be periodically updated for experiment purpose at USC.

Page 9: Anomaly-based Intrusion Detection from Traffic Datamining on

Qin and Hwang, USC, Dec.17, 2003 Page 9 of 35

Audit data

Data preprocessor

Anomaly Detection

Engine

Normal profile

database Signature Database

Data mining Engine

Alarm generation Security policy

Intrusion Detection

Engine

Alarm generator

Feature extraction

Rules from real-time traffic

Attack-free episode rules

Figure 1 Our datamining architecture for anomaly-based intrusion detection

We added some additional features extracted from the packet level data to detect new

attacks. In order to do so, we use the IDS tool Bro [30] to extract the features from both

connection and packet information. The key features are generated for all traffic connections in

Table 1.

Table 1 Key Features Extracted from Traffic Connection Records

Feature Name Description

Timestamp Time when the first packet of the connection is seen

Duration Length of the connection in seconds, ignored for UDP packets

Srchost IP address of the source host

Srcport Port number of the source host

Srcbyte Number of bytes sent by the source host

Destbyte Number of bytes sent by the destination host

Desthost IP address of the destination

Destport Port Number of the destination host

Flag Connection status flag. Typical flag values given below. SF: both SYN and FIN packets are known for a connection S0: only the SYN packet was seen for a TCP connection REJ: the connection was rejected by the destination

Urgent Number of urgent flags in the connection

Frag_Error Number of Fragment errors in the connection

Page 10: Anomaly-based Intrusion Detection from Traffic Datamining on

Qin and Hwang, USC, Dec.17, 2003 Page 10 of 35

For each connection, we check whether they violate any RFC protocols. For example,

TCP three-way handshake protocol can be easily verified by looking at the packets for

establishing the connection. Also during the preprocessing stage, packets with infrequent

properties are identified for the purpose of anomaly detection. We keep a strong interest in those

infrequent attribute values since attackers often utilize them. For example, packets with same

destination and source address will normally indicate some potential attacks. In order to detect

more R2L attacks, we extracted functional features from the network traffic as listed in Table 2.

Table 2 Functional Features Extracted for Specific Services Feature Name Description

Login_Failed Determine whether a request is failed to yield ftp, telnet, rlogin services, etc

Sensitive_files Sensitive files that are visited/created by the user, ex. .rhost, .password, etc

http_request Number of http requests in an http connection

Privileged port Whether the srcport is a privileged port (port number <1024), or only for TCP connections

Guest Whether the user login as guest/anonymous, mostly for ftp connections

Root_login Whether the user login as root

Instead of comparing frequent episodes, we use FERs as an indicator to detect anomalies

since it describes the relationship among a series of connections. If the FERs generated by the

datamining engine deviates significantly from all normal FER rules, an alarm is raised. We then

calculate some temporal statistics from current traffic data to analyze the connection data.

3.2 A New Network Datamining Algorithm

Most mining techniques exclude infrequent traffic patterns. This may cause the IDS to be

ineffective in detecting rare network events. For example, the authentication service is infrequent

in ordinary network traffic. If we lower the support threshold, then a large number of

uninteresting patterns associated with frequent services are discovered. Lee, et. al. [20] used

level-wise mining to iteratively lower the minimum support value.

Page 11: Anomaly-based Intrusion Detection from Traffic Datamining on

Qin and Hwang, USC, Dec.17, 2003 Page 11 of 35

Initially, they use a high minimum support value si to find the episodes related to high

frequency axis attribute values. Then the procedure iteratively lowers the support threshold by

half so that each new candidate itemset must contain at least one �new� axis value. The

procedure terminates, when a very small threshold s0 is reached.

The distribution of the services in the training data set has a great impact on using Lee�s

algorithm. We have tested Lee�s level-wise algorithm using the 1999 DARPA intrusion data set

[14, 17]. The result shows that some FERs contain unrelated associations. For example, the

following rule reveals the service as the axis attribute and the srchost as the reference attribute.

(service = http, flag = SF) (service = http, flag = SF, srcbyte = 5000)

(service = telnet, flag = SF) → (service = http, flag = SF, srcbyte = 5000) (0.7, 0.0025)

The telnet operation normally is not related the http operation. This episode rule is not

useful to describe normal traffic pattern. Though telnet is a frequent service, the episode rules

related to telnet are rare. The above episode rule has the highest support value among all rules

relating to telnet. It is probable for a common service to appear in an episode rule with extremely

low support value, if individual connections are independent of each another.

We introduce a base-support mining algorithm to address this problem. For a frequent

episode X, we define base-support as the ratio of the minimum occurrences that contain X to the

number of records containing the most uncommon axis attribute value in X. For an FER to be

generated, its base-support value must exceed a threshold value.

To construct normal network patterns, the attack-free training data of the first and the

third week of DARPA 1999 intrusion data set [17, 23] are fed into our base-support mining

engine. We use a simplified approach in merging frequent episode rules from multiple days.

After finding FERs from each day�s audit record, we simply merge them into a large rule set by

removing the redundant rules.

Page 12: Anomaly-based Intrusion Detection from Traffic Datamining on

Qin and Hwang, USC, Dec.17, 2003 Page 12 of 35

Our base-support mining algorithm is specified below. The inputs are the base support

threshold value, and all axis attribute values. The output of the mining process is the set of FERs

generated from the traffic connection records. The algorithm involves rule support calculation,

database scanning, sorting of serial traffic events, and finally the generation of meaningful

frequent episode rules.

Algorithm 1: Base-Support Traffic Datamining Input: Base-support value threshold s and its axis attribute(s)

Output: Frequent episode rules

Begin (1) For all axis attribute(s) values, calculate their support in the database

(2) Scan database to form L = {large 1-itemsets that meet s × saxis, where saxis is the support value of the axis attribute(s) of this 1-itemsets}

(3) While (there are new rules generated) do begin (4) Find serial episode from L: The episode must have support value

larger than s × min{saxis}, where min{saxis} is the minimum support value of all axis attribute(s) in this episode

(5) Append the generated episode rules to the output rule set

(6) end while

end

Figure 2 shows the result of base-support and level-wise mining algorithms for all the

TCP connections in two weeks� of attack-free data. For both algorithms, the minimum

confidence value is 0.6 and window size is chosen as 30 second. We choose source host as the

reference attribute and service as the axis attribute.

In the level-wise mining algorithm, increasing the initial support value may not be able to

help generate fewer frequent episode rules. When the training set contains 10 days of network

data, the level-wise algorithm with initial support value 0.3 will generate more rules than that

with initial value 0.1. Because the initial support value is divided by two at each iteration, it has

Page 13: Anomaly-based Intrusion Detection from Traffic Datamining on

Qin and Hwang, USC, Dec.17, 2003 Page 13 of 35

less and less impact after several iterations. Thus it is hard for user to control the generation of

rules related to infrequent axis attributes.

On the other hand, our base-support mining algorithm is �fair� to different axis attribute

values, since the same percentage of records related to different axis attribute values is required

for a candidate rule. High minimum base-support value will result in fewer frequent episode

rules. Under the context of different axis attribute values, the base-support mining algorithm

provides the same minimum support value as a normal datamining algorithm does.

0

50

100

150

200

250

300

350

400

1 4 7 10

Training sets in days

Num

ber o

f epi

sode

rule

s ge

nera

ted

Base-support algorithm, minimumbase-support=0.1

Base-support algorithm withminimum base-support=0.3

level-wise algorithm, initialsupport value=0.1

level-wise algorithm, initialsupport value=0.3

Figure 2. Experiment on attack-free TCP connections of 1999 DARPA Intrusion Detection Evaluation data set using our base-support mining algorithm with minimum confidence value of 0.6 and a window size of 30 sec, reference attribute = srchost, axis attribute = service

4. Episode Rule Transformation Laws

Because of the large number of records in the TCPdump data, there are still a large

number of uninteresting rules generated by using our base-support mining. In order to reduce the

number of rules generated and to provide a simplified view of data patterns, we propose the

following pruning techniques to reduce the rule space. Without reduction, the rule search space

Page 14: Anomaly-based Intrusion Detection from Traffic Datamining on

Qin and Hwang, USC, Dec.17, 2003 Page 14 of 35

may escalate too large to be practical for just one day�s collection of network traffic records. We

consider an FER effective, if it is more applicable and more frequently used. An episode rule is

said to be ineffective, if it is rarely used in detecting anomalies in network traffic. The following

traffic laws generally apply to today�s open network environment.

4.1 Transposition of Episode Rules

Some FER rules differ from each other only in the position of �→�, which seperates the

LHS from the RHS of the FER rule. Keeping all those rules will largely increase the search space.

We use the following law to remove these redundant rules.

Theorem 1: Transposition Law

Comparing the following two frequent episode rules:

L1, L2, �, Ln → R1, �., Rm (c1, s1) (4.a)

L1, L2, �, Ln-1 → Ln, R1, �., Rm (c2, s2) (4.b)

The second rule in Eq.(4.b) is transposed by moving the event Ln from the LHS to the

RHS. We consider the second rule more effective than the first one.

Proof: These two rules have the same support value s1 = s2 = support(L1, L2, �Ln, R1,�., Rm) as

defined in (2.b). However their confidence levels are different:

21n1

m1n1

n1

m1n11 c

)...L(L)....RR...L(L

)...L(L)....RR...L(Lc =≥=

−U

UU

U

UU

SupportSupport

SupportSupport (4.c)

The transposed rule in Eq.(4.b) has a smaller confidence value than the original rule in

Eq.(4.a). If the confidence level c2 is above the minimum value c, c1 must be larger than c. Thus

the first rule is always implied by the second. So we can prune it from the rule set. Q.E.D.

We can generate many rules from a frequent episode as long as the minimum confidence

level is satisfied. Theorem 1 specifies that for a given frequent episode, we only keep the rule

with shortest LHS that satisfies the minimum confidence.

Page 15: Anomaly-based Intrusion Detection from Traffic Datamining on

Qin and Hwang, USC, Dec.17, 2003 Page 15 of 35

Example 2: Application of transposition law

Comparing the following two FER rules, the first one is more effective than the second

one, because of satisfaction of the transposition law. Both frequent episode rules describes the

normal http behavior. We need only to include the first one in our normal FER rule set

(service = http, flag = SF) → (service = http, flag = SF), (service = http, flag = SF)

(service = http, flag = SF), (service = http, flag = SF) → (service = http, flag = SF)

During detection phase, if the second rule is in the output after applying the transposition

law, we regard it as a normal rule since it is implied by the first rule in our normal rule set.

Because we only generate one FER from a frequent episode, a large number of redundant

comparisons are not necessary. However, datamining may generate some longer rules for

describing similar normal behavior. A good example is given below:

(service = http, flag = SF), (service = http, flag = SF) →

(service = http, flag = SF), (service = http, flag = SF).

Compared with the above shorter rules, this rule has the same power in describing normal http

behavior. We need to remove redundancy in such rules as introduced below. !

4.2 Elimination of Episode Rules

Shorter rules or rules with shorter LHS (left hand side) are considered more effective

than longer rules or rules with longer LHS. This is because shorter rules are often easier to apply

or to compare. Clustering of shorter rules is also much easier. Thus how to reduce long rules to

shorter rules will be very useful for enhancing the performance of an IDS system.

Theorem 2: Rule Elimination Law

The following FER rule:

L1, L2 → R1 (c1, s1) (5.a)

Page 16: Anomaly-based Intrusion Detection from Traffic Datamining on

Qin and Hwang, USC, Dec.17, 2003 Page 16 of 35

becomes ineffective, if one of the following two conditions is met.

(i). Rule L1 → R1 does not exist and the rule L2 → R1 (c2, s2) exist in the rule set, where

δ<−

2

21

ccc (5.b)

(ii). Rule L2 → R1 does not exist and rule L1 → R1 (c3,s3) exist in the rule set, where

δ<−

3

31

ccc (5.c)

The parameter δ is called the similarity threshold, which reveals the degree of similarity

between rule (5.a) and the rule L2 → R1 (c2, s2) or rule L1 → R1 (c3, s3) above.

Proof: The rule L2 → R1 (c2, s2) specifies that if L2 happens, there is a chance of c2 that R1 will

follow L2. We suppose that L1 does not exclude R1, here �exclusion� means that if L1 takes place

then R1 will not take place in the same window. Thus if L2 takes place after L1, there is still a

great chance that R1 will follow L2. If L1 and R1 are not closely related (do not exist in the rule

set because of the low support), there is no need to put L1 into the FER. So the rule L1, L2 → R1

is implied by rule L2 → R1 as long as L1 does not exclude R1. A similar conclusion can be drawn

for rule L1 → R1 (c3, s3). Q.E.D.

Example 3: Application of elimination law

Based on the above law, the following rule is considered ineffective

(service = http) (service = authentication) → (service = smtp) (0.6, 0.1)

because the following rule exists

(service = authentication) → (service = smtp) (0.65, 0.1)

The authentication is related only to the smtp operation, the appearance of http does not affect

the other two itemsets. We keep only the following rule: (service = http) → (service = http)

Page 17: Anomaly-based Intrusion Detection from Traffic Datamining on

Qin and Hwang, USC, Dec.17, 2003 Page 17 of 35

and remove the dummy rule: (service = telnet) (service = http) → (service = http). Because

many ineffective rules are not in the normal traffic profile, removing them can help us reduce the

false alarms to some extent as demonstrated in Section 6. !

We are mainly interested in daily network traffic. In particular, we pay attention to the

TCPdump data. Most intrusive FERs normally contain only one or two specific associations,

thus this rule normally establishes. For intrusion FER with many associations, we can identify

the key associations inside the FER, the elimination law is applicable here.

4.3 Reconstruction of Episode Rules

Many FERs detected from the network traffic has some transitive patterns. If we have two

FERs A → B and B → C in the rule set, it is very often that FER A → B, C also exists in the rule

set. Here A, B and C are associations. The rule A → B, C seems redundant, since we can

reconstruct it from the previous two rules.

Theorem 3: Reconstruction Law

Consider the following three FERs:

L1 → R1 (c1, s1), R1 → R2 (c2, s2) and L1 → R1, R2 (c3, s3) (6.a)

We consider the last rule L1 → R1, R2 (c3, s3) ineffective, if

σ<−

3

213

cccc (6.b)

where σ is a transitive threshold, which indicates the strength of the relation among three events

L1, R1, R2.

Proof: Based on the first rule L1 → R1 (c1, s1), each connection record containing R1, the

confidence (probability) of L1 in the same window is equal to:

)(

)(

1

11

RSupportRLSupport

cU

=′ (6.c)

Page 18: Anomaly-based Intrusion Detection from Traffic Datamining on

Qin and Hwang, USC, Dec.17, 2003 Page 18 of 35

For a connection record containing R1, the confidence of R2 following R1 is c2 in the

same time window. If L1 and R2 are independent, the confidence of a connection containing R1

follows L1 and appears ahead of R2 is approximated by the product 2ccc ⋅′=′′ . Here we called it

�approximated�, since the episode L1, R1, R2 may use a longer window. The confidence of the

new rule L1 →R1 , R2 is computed by:

211

211

1

1

1

2113 )(

)()(

)()(

)(cc

LSupportcRLSupport

LSupportcRSupport

LSupportRRLSupport

c =⋅

=′′⋅

≈=UUU (6.d)

Rules L1 and R2 were assumed independent. This law is only used to justify the existence

of rule L1 →R1, R2 , when we have L1 → R1 (c1, s1) and R1 → R2 (c2, s2). In practice, the rule L1

→R1, R2 implies the existence of rule L1 → R1. So if we find rule R1 → R2 in the rule set, we

can remove rule L1 → R1, R2 , if (6.c) is satisfied. This is because we can reconstruct the first

rule from the other two rules. Q.E.D.

This reconstruction law is particularly powerful when the window size is large. For

smaller window sizes, the occurrence of the episode L1, R1 , R2 may often have a duration longer

than the window size, which violates our �approximated� assumption. The following example

clarifies the reconstruction process.

Example 4: Application of reconstruction law

The following episode rule is ineffective

(service = ftp, srcbyte = 1000) → (service = smtp) (service = authentication)

because it can be reconstructed from the following two rules already in the rule set:

(Service = ftp, srcbyte = 1000) → (service = smtp)

(service = smtp) → (service = authentication).

To reconstruct the first rule, we simply follow the following transitive path in order.

Page 19: Anomaly-based Intrusion Detection from Traffic Datamining on

Qin and Hwang, USC, Dec.17, 2003 Page 19 of 35

(service = ftp, srcbyte =1000), (service = smtp), (service = auth).

The reconstruction law helps us split long FERs into short ones, making them easier for

clustering analysis in the detection process. !

5. Pruning of Ineffective Episode Rules

In order to apply our rule pruning laws, we integrated them into our base-support mining

engine. The following procedure is given to generate the most effective episode rules in an open

network environment.

5.1 The Episode Rule Pruning Algorithm

We apply Algorithm 2 to reduce the episode rules with different window sizes over the

same data set in Figure 3. Here, we choose the minimum base-support of 0.1 and the minimum

confidence level of 0.6. The service is chosen as the axis attribute and the destination host is

chosen as the reference attribute. This figure plots the rule set growth against the window size

under two different traffic conditions.

We classify the audit data into these two categories: intra-LAN connections and inter-

LAN connections. For DAPRA 1999 IDS evaluation data set, a large potion of intra-LAN traffic

consists of user applications (destport > 1024) and smtp services. However, in inter-LAN traffic,

the http services dominate. The result shows that 80% of the original rules can be pruned by

application of the above three pruning laws. This is very helpful, especially when the window

size increases.

Most rules have the LHS reduce to just one episode event, which is ideal for clustering

and comparison purposes. The message being conveyed here is that the pruning laws are in deed

effective in reducing the rule space significantly. This reduction will simplify the whole process

of anomaly detection as seen in the next two sections. This rule-space reduction makes it

possible to construct anomaly-based IDS, that can detect unknown intrusions in real-time.

Page 20: Anomaly-based Intrusion Detection from Traffic Datamining on

Qin and Hwang, USC, Dec.17, 2003 Page 20 of 35

Algorithm 2: Pruning Ineffective Episode Rules Input: Candidate frequent episode of events Output: The reduced frequent episode rule set L after pruning

Begin (1) L= Ø and S= Ø, where S is a temporary set for storing all episodes

(2) While there are more candidate frequent episodes, do begin (3) Calculate the rule r that satisfies the minimum confidence level

with the shortest LHS, if no such rule exists, then r = null

(4) S = S U {r}

end while (5) For all the rules in S, do

(6) For the current rule r, find rules in S that can apply the elimination (7) rule or transitive rule, if not found, L= L U {r}

(8) end For all

end

0 20 40 60 80 100 120 140 1600

100

200

300

400

500

600

700

Num

ber o

f Rul

es G

ener

ated

Window Size(Sec)

Inter-LAN FERs Pruned inter-LAN FERs Intra-LAN FERs Pruned intra-LAN FERs

0 20 40 60 80 100 120 140 160

0

10

20

30

40

50

60

70

80

Prun

ing

Rat

e(%

)

Window size(seconds)

Pruning rate for intra-LAN FERs Pruning rate for inter-LAN FERs

(a) Variation of FER space size (b) Variation of pruning rates

Figure 3 The effects of pruning on the space of frequent episode rule (FER) for inter-LAN and intra-LAN traffic events, where the base-support = 0.1, the minimum confidence = 0.6, the reference attributes = destination, and axis attributes = service

Page 21: Anomaly-based Intrusion Detection from Traffic Datamining on

Dec.17, 2003 Page 21 of 35

To apply the association and frequent episode rules in real-time intrusion analysis, we

use a scanning period (sliding window) to collect the connection data. The scanning period is

continuously moving forward by a fixed amount of time, called step size. For each period, we

apply our datamining framework to generate the FERs and the association rules with high

support. This approach is similar to association rules used by Barbara, et al [4]. However, we

generate both association and episode rules. We build the normal profiles using the DARPA

1999 intrusion data sets [17].

To apply a FER within a given scanning period, we calculate the minimal number of

occurrences of this rule as an additional feature to better characterize the traffic. For this

purpose, we amend the format of FER to as follows:

X → Y, (c, s, m) (7.a)

where c, s, m are the confidence, support, and minimal occurrence of this FER, respectively.

During the training phase, the maximum occurrence number is calculated for a given FER for a

large number of attack-free scanning periods. We denote this maximum value as M, An FER is

anomalous, if its minimal occurrence number exceeds the maximum number to some extent, as

formally specified below:

Mm ⋅≥ γ (7.b)

where γ ≥ 1 is a relaxation factor. If γ is large, we can tolerate more occurrences of a given

FER within a scanning period.

5.2 Anomaly Intrusion Detection Process

The process of datamining for anomaly-based intrusion detection is illustrated in Fig. 4.

To evaluate pure anomaly detection using our FERs, we use the audit data sets collected from

the first and the third week of the MIT/Lincoln IDS training process. Only TCPdump data were

applied in our work. We use the last two weeks� test data from MIT/LL to generate the FERs

and to compare the rules generated in normal profiles.

Page 22: Anomaly-based Intrusion Detection from Traffic Datamining on

Dec.17, 2003 Page 22 of 35

In order to detect anomalies, we use Bro to generate features from the incoming

packets. If any anomaly is detected from an individual packet, an alarm is generated. To detect

attacks consisting of many packets/connections, we generate the rules from the traffic data

using the base-support mining technique specified in Algorithm 1. Anomaly detected by

individual packets is checked first. Then we proceed with related packet streams.

Feature extraction using Bro system [30]

Traffic connection records

A nom aly detected in individual packets?

G enerating Frequent Episode Rules from sequence of connection records

N o

Is FER matches norm al traffic patterns in rule set?

Is occurrence num ber exceeding the relaxation

threshold? (9.b)

The traffic connections are

considered norm alA nom aly detected in traffic connections

N o

Y es

N o

Y es

Error flags or abnorm al netw ork

statistics

Y es

Y es

N o

Figure 4 The process of anomaly-based intrusion detection through network traffic datamining

The major step in Fig.4 is to determine the minimum occurrence number. When one

cannot find the matching FER or the rule does not satisfy the condition in (7.b), we consider it

an anomaly rule. When many anomaly rules are generated in a particular scanning period, we

Page 23: Anomaly-based Intrusion Detection from Traffic Datamining on

Dec.17, 2003 Page 23 of 35

calculate the temporal statistics from the network connections. These features include the

average network traffic, number of error flags, number of connections to the same destination

host, etc. If the pattern is verified as an intrusion, we add the rules to our signature database.

We have experimented different γ values and their effect on number of false alarms by

extracting all the TCP connections from DAPRA 1999 data set. Here the scanning period is

chosen from 100 seconds to 1000 seconds and the episode window is from 2 seconds to 10

seconds. The result shown in Fig.5 indicates that after the initial sharp drop in number of false

alarms. Increasing the relaxation factor γ beyond certain limit (say 1.2 in Fig.5) does not reduce

the false alarms further for short scanning periods.

1.0 1.1 1.2 1.3 1.4 1.50

2

4

6

8

10

12

14

16

Num

ber o

f fal

se a

larm

s

Relaxation Factor (γ )

Scan period = 100s Scan period = 300s Scan period = 500s Scan period = 1000s

Figure 5 The effect of γ on false alarms associated with all TCP connections

6. Testing on Traffic Data from 1999 DARPA IDS Evaluation

In 1999, an MIT Lincoln Lab research group [14, 17] has conducted an intensive

evaluation of many IDSs under DARPA sponsorship. We have tested our new IDS scheme at

USC Internet and Grid Computing Laboratory over the same traffic data sets collected by the

MIT/LL group. Reported below are our testing results and interpretation of the results.

Page 24: Anomaly-based Intrusion Detection from Traffic Datamining on

Dec.17, 2003 Page 24 of 35

6.1 Data Sets used in DARPA IDS Evaluation

Attacks from the Darpa IDS evaluation have tested in our experiments. Without any

prior knowledge, we can still detect a large number of unknown intrusions. However, we have

to admit a shortcoming of this data set. The attack-free training set has to be comprehensive.

The DARPA 1999 data set does not contain enough background traffic patterns as criticized by

McHugh [27].

The main problem of intrusion detection is the response delay in detecting attacks. To

illustrate the situation, we simulated the distribution of ICMP connections when a typical

smurf attack takes place. In Fig.6, the smurf attack is demonstrated with a traffic spike. This

spike corresponds to a large number of unexpected ICMP connections between 16 � 20

seconds. The attack lasted for 4 seconds, our scanning period is 15 seconds and the step size is

10 seconds.

Figure 6 The impact of step size on the response time of a typical smurf attack,

with step size=15 sec and scanning period size=10 sec

Initially, the scanning period starts from the beginning. At the next iteration, the

scanning period moved to the 10th second. For scanning period 1, no anomalous FER was

generated. However, by collecting all the connections in scanning period 2, we were able to

detect the anomalies.

Page 25: Anomaly-based Intrusion Detection from Traffic Datamining on

Dec.17, 2003 Page 25 of 35

We can only detect the attacks after we have collected all the connections in scanning

period 2, at which point the smurf attack has already finished. Thus choosing a suitable step

size of scanning period is very important in detecting attacks in real time. If the step size is too

small, we have to mine the connection data frequently, which may waste many CPU cycles,

when there are no attacks in the traffic. If the step size is too large, attacks that take place

during a very short time interval will be detected after they already finish.

The MIT/LL IDS evaluation data sets are sufficient to prove the new anomaly detection

concept proposed in this paper. However, we are ware of the limits of the Darpa IDS evaluation

data sets in covering large-scale attacks like DDoS type. In the future, we plan to continue the

experiments on using real traffic records from reported attack incidents by CERT (Computer

Emergency Report Team) at CMU in recent years. The purpose is to further prove the

effectiveness of the new detection scheme. With real attack patterns tested, the results will be

more accurate to reflect the impacts of Internet traffic connections.

6.2 Intrusion Detection Results and False Alarms

We have tested the effect of our pruning algorithms on different window sizes. The

testing results are shown in Fig.7. If the scanning period is small, the connection records are too

few to generate useful FERs. On the other hand, if the period is very large, frequent itemsets

are missed for taking place in a very short duration of time.

From Fig. 7(a), our rule pruning algorithm demonstrates a large reduction of the

searching time by almost 50% without affecting the detection rate. Because some stealthy

attacks cannot be detected by small periods, choosing different periods is very important to the

detection rate. We found that the combination of period of 100 seconds and 7200 seconds is the

best way in detecting most possible attacks, known or unknown.

Because some probing attacks may last for time intervals that are larger than 120

seconds, we treat connections with special flags (other than REJ, SF, etc) differently since most

probing attacks end up with some special connection status. Instead of mining FERs by time,

Page 26: Anomaly-based Intrusion Detection from Traffic Datamining on

Dec.17, 2003 Page 26 of 35

we mine FERs by their connection number. We sort connections with same rare flags according

to their timestamp.

0

500

1000

1500

2000

2500

100 300 500 1000 7200

Scanning period (sec)

num

ber o

f ano

mal

y ru

les

gene

rate

d

Detection without rule pruningDetection with rule pruning

02468

1012141618

100 300 500 1000 7200Scanning period (sec)

fals

e al

arm

rais

ed

(a) Anomaly rules generated (b) False alarm rate

Figure 7 Generation of anomaly rules and false alarm rate in using our frequent episode intrusion detection scheme with 5 scanning periods corresponding to 5 event window sizes from 2, 3, 5, 10, and 120 seconds, respectively

We regard the sequence number of each connection as the timestamp of this connection

and use the similar approach mentioned above. This procedure is similar to that of [21]. Most

probe attacks involve a large number of network services or destination addresses. We added

two more statistical features here for helping detect probe attacks. One is the number of

different desthost, the other is the number of different services in all connections of a FER that

desthost or service does not serve as the reference attribute, respectively.

Example 5: Comparing Association and Frequent Episode Rules

Consider the following FER generated without using the pruning rules. This is an

anomaly rule leading to a false alarm, since there is no such rules in the normal profile.

(service = user_application, flag = REJ), (service = http, flag = REJ)

→ (service = http, flag = REJ) (0.6, 0.000007, 3)

By applying the elimination law, the above rule is eliminated, because the following episode

rule exists in the normal profile.

Page 27: Anomaly-based Intrusion Detection from Traffic Datamining on

Dec.17, 2003 Page 27 of 35

(service = http, flag = REJ) → (service = http, flag = REJ) (0.63, 0.000022, 7)

and the rule (service = user_application, flag = REJ) → (service = http, flag = REJ) does

not exist in the normal rule set. This example explains why rule pruning can reduce the number

of false alarms in the intrusion detection process. !

Most individual connection attacks do not show any anomalies in FERs. The attacks that

can be identified by FERs and those cannot are given in Fig.8. We found that a large portion of

the attacks detected by our approach is through FER anomalies. We have compared our

detection results with those reported by the ADAM group [4][5]. They use an association-based

anomaly detection system.

0

5

10

15

20

25

30

35

DoS R2L ProbeAttack category

Tota

l num

ber o

f atta

cks

dete

cted

Intrusions detected bysingle packet/connectionanomaliesIntrusions detected byFER anomalies

Figure 8. Number of intrusive attacks detected by checking

against the frequent episode rules

We choose to evaluate the intrusion detection rate corresponding to unknown attacks or

misclassifications in [5]. This metric is calculated by dividing all detected attacks by the total

number of incidents. The comparison result is shown in Fig. 9. Our FER scheme results in a

higher successful detection rate than the ADAM results on DoS and Probe attacks. However,

the ADAM system performs much better in R2L type of attacks. Over all, our FER scheme has

an average detection rate of 32%, compared with 20% reported by the ADAM group.

Page 28: Anomaly-based Intrusion Detection from Traffic Datamining on

Dec.17, 2003 Page 28 of 35

0.0%5.0%

10.0%15.0%20.0%25.0%30.0%35.0%40.0%45.0%50.0%

DOS R2L Probe Total

Attack category

Ave

rage

Det

ectio

n Ra

teassociation [4]based approach

Our approach byFERs

Figure 9. Anomaly intrusion detection rate and false alarm rate in using our frequent episode rules, compared with using the association rules in ADAM report

In our opinion, these two approaches are complementary in nature. When choosing the

desthost as reference attribute, the rule (service = smtp, flag = SF) will be considered as a

normal association. However, our episode rule (service = smtp, flag = SF) → (service = smtp,

flag = SF) detects a Mailbomb attack, if the number of occurrence is high. This corresponds to

an intrusion by sending a lot of smtp packets to the victim server during a short time period.

Thus combining the two methods may upgrade the overall detection rate further.

6.3 Anomaly Detection Compared with Snort Results

Snort is a packet-level signature-based IDS, it is pretty effective in real-time intrusion

detection of known attacks. When attack signatures appear in the network traffic, Snort

immediately raises an alarm. However, Snort is rather weak against attacks such as smurf and

mailbomb. This is because each connection in a mailbomb attack is a legitimate connection.

We have to maintain temporal traffic statistics to identify anomalous network behaviors in real

time. However, traffic pattern normally has a large variation. Maintaining a suitable threshold

is crucial to differentiate anomalous from normal behaviors.

To reduce the response delay of our anomaly IDS, we use a different approach to

generate FERs over a large amount of traffic data during a relatively short observation time. If

the number of connections in a scanning period exceeds a certain threshold, we immediately

Page 29: Anomaly-based Intrusion Detection from Traffic Datamining on

Dec.17, 2003 Page 29 of 35

feed these records to our datamining engine instead of waiting for the rest connections in the

same scanning period.

We have tested our IDS against the use of Snort on DARPA 1999 intrusion dataset in

simulated real-time experiments. If both systems report an attack, we choose the one that

responded first. For anomaly detection, the step size is chosen as 10 seconds and the scanning

period size is chosen as 100 seconds. The detection results by combining Snort and our

anomaly detection engine are shown in Table 3.

Table 3 Intrusion Detection Results of our Anomaly IDS compared with using Snort over the DARPA 1999 IDS Evaluation Data Set

Attack name

Category Start timestamp*

Finish timestamp*

First Detected by

Detected at timestamp*

Pod Dos 923315990 923315990 Snort 923315990

Pod Dos 923316510 923316510 Snort 923316510

Smurf Dos 923319250 923319254 Anomaly IDS 923319250

Portsweep Probe 923319788 923320010 Snort 923319788

Apache2 Dos 923322559 923322666 Anomaly IDS 923322561

Guesstelnet R2L 923324290 923324479 Anomaly IDS 923324318

Dosnuke Dos 923327133 923327133 Snort 923327133

Smurf Dos 923332686 923332688 Anomaly IDS 923332686

Apache2 Dos 923335537 923335584 Anomaly IDS 923335541

Pod Dos 923336543 923336544 Snort 923336543

Dict R2L 923344325 923345308 Anomaly IDS 923344358

Neptune Dos 923349835 923350245 Anomaly IDS 923349841

Dosnuke Dos 923356070 923356070 Snort 923356070

Udpstorm Dos 923356827 923358123 Anomaly IDS 923356827

*Timestamps listed here are the number of seconds passed since the attack began

To evaluate the real-time intrusion response time, we define intrusion detection

efficiency α as follows: Let β be the duration of the attack and δ be the overhead time needed to

Page 30: Anomaly-based Intrusion Detection from Traffic Datamining on

Dec.17, 2003 Page 30 of 35

detect the attack. The efficiency α is defined by the following ratio. The average detection

efficiency of different attacks listed Table 3 is plotted in Figure 10.

α = ( β - δ ) / β (8)

Snort has a slightly better performance over detectable attacks such as Pod and

Portsweep. However, our anomaly-based IDS can detect many unknown attacks such as

Smurf, Apache2, Guesstelnet, Dict, and Neptune in Fig.10. These attacks are not detectable

by Snort. Snort performs much better than our scheme only in the case of a Dosnuke attack.

0%

20%

40%

60%

80%

100%

Pod

Smur

f

Ports

wee

p

Apac

he2

Gue

sste

lnet

Dos

nuke Dic

t

Nep

tune

Udp

stor

m

Attack name

Det

ectio

n ef

ficie

ncy

Snort

datamining basedanomaly detection

Figure 10 The detection efficiency of our anomaly detection system compared with using the Snort on nine attacks listed in Table 3

7. Conclusions and Future Work

In this paper, we have developed a new datamining scheme to detecting intrusions in an

open network. In order to generate the frequent episode rules more effectively, we introduced a

new base-support mining algorithm. This algorithm makes the analysis of audit network data

systematically. Three episode rule pruning techniques are introduced. As the window size

increases to 20 - 40 sec, the episode rule pruning rate increases sharply to 80% for inter-LAN

traffic and about 50% for intra-LAN traffic. As the window size increases beyond 50 sec, the

pruning rate drops to 60% and 35%, respectively.

Page 31: Anomaly-based Intrusion Detection from Traffic Datamining on

Dec.17, 2003 Page 31 of 35

The illustrative examples demonstrate hoe to use our episode mining technique to

discover unknown attacks. The three episode rule-pruning laws enable fast detection of

anomaly out of a much reduced rule search space. Algorithm 2 specified the procedure to apply

these pruning rules effectively. These techniques reduce the search space as much as 80% for

anomaly intrusion detection. Through testing the new pruning techniques over the DARPA

1999 IDS evaluation data sets [17], we demonstrate 13% reduction in false alarms over 50

attack incidents of U2R and R2L to DoS attacks.

Our method is particularly effective to detect DoS and Probe types of attacks. The use

of FERs leads to a detection rate of 47% for DoS attacks, 19% for R2L attacks, and 47% for

probe attacks. These results are very encouraging. The FER are shown more effective than

using the association rules alone. We find that the FERs perform much better in revealing inter-

relationship among subsequent connection request records. The use of association rules is

better for revealing intra-relationship inside a single audited traffic record.

We recommend the use the proposed anomaly detection scheme jointly with signature-

based IDS to yield even better results. These results prove the viability of using the new

scheme in building automated intrusion detection and response systems in any open networks.

For future research, our continued effort will cover the following four aspects. These

research tasks are currently in progress. We expect to generate a lot more benchmark

experimental results in security experiments tied to distributed Grid applications in the future.

• Implement a distributed testbed for automated intrusion detection and response jointly by

coordinated effort from multiple security managers located at various domains.

• Compare our method with other mining technique in improving the intrusion detection

model. We will build our own attack databases and collect our own training sets.

• Investigate rare connection properties associated with detecting single packet attacks by

combining our anomaly detection scheme with the use of encrypted tunnels in virtual

private networks.

Page 32: Anomaly-based Intrusion Detection from Traffic Datamining on

Dec.17, 2003 Page 32 of 35

• The MIT intrusion data set considered only DoS attacks from a single source. We plan to

extend our work to defend against distributed denial-of service (DDoS) attacks in follow-

up experiments.

Acknowledgements: The financial support of this work from the NSF ITR Research Grant

ACI-0325409 to USC is appreciated. This work was benefited from group discussions with our

team members: Shanshan Song, Ching-hua Chuan, and Rakesh Rajbanshi of the USC Internet

and Grid Computing Laboratory.

References:

[1] R. Agrawal, T. Imielinski, A. Swami, �Mining Associations between Sets of Items in Massive Databases�, Proc. of the ACM-SIGMOD 1993 Int'l Conference on Management of Data, Washington D.C., May 1993, 207-216.

[2] J. Allen, A. Christie, W. Fithen, J. McHugh, J. Pickel and E. Stoner, �State of the Practice of Intrusion Detection Technologies�, Carnegie Mellon Software Engineering Institute, Jan. 2000.

[3] S. Axelsson. �The base-rate fallacy and its implications for the difficulty of intrusion detection�. 6th ACM Conference on computer and communications security, pages 1--7, Kent Ridge Digital Labs, Singapore, Nov. 1999.

[4] D. Barbara, J. Couto, S. Jajodia, L. Popyack, and N. Wu. �ADAM: Detecting Intrusions by Data Mining�. IEEE Workshop on Information Assurance and Security, 2001.

[5] D. Barbara, N. Wu, and S. Jajodia. �Detecting Novel Network Intrusions using Bayes Estimators�, First SIAM Conf. on Data Mining, Chicago, IL. 2001.

[6] E. Bloedorn, A. D. Christiansen, W. Hill, C. Skorupka, L. M. Talbot and J. Tivel. “Data Mining

for Network Intrusion Detection: How to Get Started”, MITRE Technical Report, August 2001.

[7] S. M. Bridges and R. M. Vaughn, �Fuzzy Data Mining and Genetic Algorithms Applied to Intrusion Detection,� Proc. of 23rd National Information Systems Security Conference, Baltimore, Maryland, October 2000.

[8] D. J. Burroughs, L. F. Wilson, and G. V. Cybenko. �Analysis of Distributed Intrusion Detection Systems Using Bayesian Methods�. Proc. of IEEE International Performance Computing and Communication Conference, April 2002.

[9] W. Cohen, �Fast Effective Rule Induction�, Proc. of the 12th International Conference on Machine Learning, 1995.

Page 33: Anomaly-based Intrusion Detection from Traffic Datamining on

Dec.17, 2003 Page 33 of 35

[10] W. Fan, M. Miller, S. Stolfo, W. Lee, and P. Chan. �Using Artificial Anomalies to Detect Unknown and Known Network Intrusions.� Proc. of The First IEEE International Conference on Data Mining, San Jose, CA, November 2001.

[11] U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, �Advances in Knowledge Discovery and Data Mining�, AAAI Press, 1996.

[12] J. Gaffney and J. Ulvila. �Evaluation of Intrusion Detectors: A Decision Theory Approach�. IEEE Symp. on Security and Privacy, Oakland, CA., May 14-16, 2001

[13] R. Gopalakrishna. �A Framework for Distributed Intrusion Detection using Interest Driven Cooperating Agents�, Dept. of Computer Sciences, Purdue Univ., May 2001.

[14] J. Haines, L. Rossey, R. Lippmann and R. Cunnigham, "Extending the 1999 Evaluation". Proc. of DISCEX 2001, Anaheim, California, June 11-12, 2001

[15] S. A. Hofmeyr, S. Forrest, and A. Somayaji. �Intrusion Detection using Sequences of System Calls�. Journal of Computer Security, vol. 6, pages 151-180, 1998.

[16] Lincoln Laboratory, �1999 DAPRA Intrusion Detection Scoring Truth�, http://www.ll.mit.edu/SST/ideval/docs/1999/master-listfile-condensed.txt.

[17] R. Lippmann. �The 1999 DARPA Off-line Detection Intrusion Detection Evaluation�. MIT Lincoln Lab, 2000.

[18] K. Llgun, R. A. Kemmerer, and P. A. Porras. �State Transition Analysis: A Rule-based Intrusion Detection Approach�, IEEE Transactions on Software Engineering, 21(3):181-199, March 1995.

[19] T. Lane and C.E. Brodley. �Temporal Sequence Learning and Data Reduction for Anomaly Detection�, Proc. of the fifth ACM conf. on Computer and Comm. Security, pages 150-158, 1998.

[20] W. Lee, S. J. Stolfo, and K. Mok. �Adaptive Intrusion Detection: a Data Mining Approach�, Artificial Intelligence Review, Kluwer Academic Publishers, 14(6):533-567, December 2000.

[21] W. Lee and S. Stolfo. �A Framework for Constructing Features and Models for Intrusion Detection Systems�, ACM Trans. on Information and System Security, Volume 3, No. 4, Nov. 2000.

[22] T. Lunt, A. Tamaru, F. Gilham, R. Jagannathan, P. Neumann, and H. Javitz, A. Valdes, and T. Garvey. �A Real-time Intrusion Detection Expert System (IDES)�, Technical Report, Computer Science Laboratory, SRI International, Menlo Park, California, February 1992.

[23] M. V. Mahoney and P. K. Chan. �An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection�. Proceedings of International Symp. on Recent Advances in Intrusion Detection (RAID), pp. 220-237, September 2003.

Page 34: Anomaly-based Intrusion Detection from Traffic Datamining on

Dec.17, 2003 Page 34 of 35

[24] M. V. Mahoney and P. K. Chan. �PHAD: Packet Header Anomaly Detection for Indentifying Hostile Network Traffic�, Florida Tech. Technical Report CS-2001-4, April 2001.

[25] H. Mannila, H. Toivonen, and A. I. Verkamo. �Discovery of Frequent Episodes in Event Sequences�, Data Mining and Knowledge Discovery, 1(3), 1997.

[26] H. Mannila and H. Toivonen. “Discovering Generalized Episodes using Minimal Occurrences”,

Proc. of the Second Int’l Conf. on knowledge discovery and datamining, Portland, Oregon,

August, 1996.

[27] J. McHugh. �Testing Intrusion Detection Systems: A Critique of the 1998 and 1999 DARPA Off-line Intrusion Detection System Evaluation as Performed by Lincoln Laboratory�. ACM Trans. on Information and System Security, 3(4). November, 2000.

[28] S. Noel, D. Wijesekera, and C. Youman, �Modern Intrusion Detection, Data Mining, and Degrees of Attack Guilt� in Applications of Data Mining in Computer Security, Daniel Barbarà and Sushil Jajodia, eds. Kluwer Academic Publishers, Boston, 2002.

[29] The NSS Group, �Intrusion Detection Systems Group Test (Edition 2)�. Technical Report, Oakwood House, Wennington, Cambridgeshire, December 2001.

[30] V. Paxson. Bro: a system for detecting network intruders in real-time. USENIX Security Symposium, pages 31--51. USENIX Association, 1998.

[31] P. A. Porras and P. G. Neumann. �EMERALD: Event Monitoring Enabling Responses to Anomalous Live Disturbances�, Proc. of the 19th National Computer Security Conference, pages 353-365, Baltimore, Maryland, 22-25 October 1997.

[32] M. Qin and K. Hwang. �Effectively Generating Frequent Episode Rules for Anomaly-based Intrusion Detection�, submitted to 2004 IEEE Symposium on Security and Privacy, Oakland, CA.

[33] M. J. Ranum. �Experiences Benchmarking Intrusion Detection Systems�, NFR Security White Paper, December, 2001.

[34] L. Robert. �Slapdash monster roams the Net�, http://zdnet.com.com/2100-1105_2-5062998.html, CNET News.com, August 13, 2003.

[35] M. Roesch, “Snort - lightweight intrusion detection for networks”. Proc. of USENIX Thirteenth

Systems Administration Conference (LISA '99), pages 229-238, Berkeley, California. 1999.

[36] T. Ryutov, C. Neuman, D. Kim, and L. Zhou, “Integrated Access Control and Intrusion Detection

for Web Servers”, IEEE Transactions on Parallel and Distributed Systems. Vol. 14, No 9, pp.

841-850, September 2003.

[37] P. Sommer. “Intrusion detection systems as evidence”. Computer Networks: International Journal

of Computer and Telecomm. Networking, Vol.31 No..23-24, p.2477-2487, Dec. 1999

Page 35: Anomaly-based Intrusion Detection from Traffic Datamining on

Dec.17, 2003 Page 35 of 35

[38] R. Sekar, Y. Guang, S. Verma, and T. Shanbhag. “A High-Performance Network Intrusion

Detection System”. Proc. of ACM Conf. Computer and Comm. Security, pages 8-17, Nov. 1999.

[39] H. S. Teng, K. Chen, and S. Lu, �Adaptive Real-time Anomaly Detection using Inductively Generated Sequential Patterns�, IEEE Proc. of Symposium in Security and Privacy, Oakland, CA., pages 278-284, May 7-9, 1990.

Biographical Sketches:

Min Qin is presently pursuing his Ph.D. degree in the Computer Science Department at the

University of Southern California. He received his B.E. and M.E. degrees in Computer

Science from Shanghai Jiaotong University in China. His current research interest includes

Internet security, datamining, distributed systems, and database systems. He can be reached at

[email protected]

Kai Hwang is a Professor and Director of Internet and Grid Computing Laboratory at the

University of Southern California (USC). He received the Ph.D. from the University of

California, Berkeley. An IEEE Fellow, he specializes in computer architecture, parallel

processing, Internet and wireless security, and distributed computing systems. He has authored

or coauthored 6 scientific books and 170 Journal/conference papers in the above areas.

Dr. Hwang is the founding Editor-in-Chief of the Journal of Parallel and Distributed

Computing. He has performed advisory and consulting work for IBM Fishkill, Intel SSD, MIT

Lincoln Lab., ETL in Japan, and GMD in Germany. Presently, he leads a USC research group

in developing distributed intrusion detection and response system for protecting clusters,

Intranets, and Grid resources. The NetShield software and GridSec testbed are currently under

construction at USC for trusted grid, cluster, and Internet computing. He can be reached at

[email protected]