ppt slides

21
Areej Al-Bataineh

description

 

Transcript of ppt slides

Page 1: ppt slides

Areej Al-Bataineh

Page 2: ppt slides

Data Mining Basics Definition Some techniques

Association Rules Classification Clustering

Data mining meets Intrusion Detection Detection Approaches Data mining use in IDS Case Study

Behavioral Feature for Network Anomaly Detection Conclusions

04/10/23Data Mining in Intrusion Detection 2

Page 3: ppt slides

Knowledge Discovery in Databases (KDD) “Process of extracting useful information from large

databases”

KDD basic steps1.Understanding the application domain2.Data integration and selection3.Data mining4.Pattern Evaluation5.Knowledge representation

Related Fields Machine learning, statistics, others

04/10/23Data Mining in Intrusion Detection 3

Page 4: ppt slides

“concerned with uncovering patterns, associations, changes, anomalies, and statistically significant structures and events in data”

Why Data Mining? Understand existing data Predict new data

Components Representation

▪ Decide on what model can we build. ▪ Model is a compact summary of examples.

Learning Element▪ Builds a model from a set of examples

Performance Element▪ Applies the model to new observations

04/10/23 4Data Mining in Intrusion Detection

Page 5: ppt slides

Well-known and used in Intrusion Detection Association Rules [Descriptive] Classification [Predictive] Clustering [Descriptive]

Preliminary step Raw Data Database Table (Training set) Columns – Attributes Rows - Records

04/10/23Data Mining in Intrusion Detection 5

Page 6: ppt slides

Motivated by market-basket analysis

Generate Rules that capture implications between attribute values

Rule Example Lettuce & Tomato -> Salad Dressing [0.4, 0.9]

Parameters [s, c] Support (s) % records satisfy LHS and RHS Confidence (c) = P(satisfies RHS | satisfies LHS)

Mining Problem “Find all association rules that have support

and confidence > user-defined minimum value”

04/10/23Data Mining in Intrusion Detection 6

Page 7: ppt slides

Predefined set of classes

Training set has Class as one of the attributes Supervised Learning

Mining Problem “Find a model for class attribute as a function of the

values of other attributes”

Use model to predict class for new records

Classifier representation If-then Rules Decision Trees

04/10/23Data Mining in Intrusion Detection 7

Page 8: ppt slides

Given Data Set and Similarity Measure Unsupervised Learning

Mining Problem “Group records into clusters such that all records within a cluster are more

similar to one another . And records in separate clusters are less similar another”

Similarity Measures: Euclidean Distance if attributes are continuous. Other Problem-specific Measures.

Clustering Methods Partitioning

▪ Divide data into disjoint partitions Hierarchical

▪ Root is complete data set, Leaves are individual records, and Intermediate layers -> partitions

04/10/23Data Mining in Intrusion Detection 8

Page 9: ppt slides

Detection Approach Misuse Detection▪ Based o known malicious

patterns (signatures) Anomaly Detection▪ Based on deviations from

established normal patterns (profiles)

Data Source Network-based (NIDS)▪ Network traffic

Host-based (HIDS)▪ Audit trails

04/10/23 9Data Mining in Intrusion Detection

Page 10: ppt slides

Signature extractionRule matchingAlarm data analysis

Reduce false alarms Eliminate redundant alarms

Feature selectionTraining Data cleaning

04/10/23Data Mining in Intrusion Detection 10

Page 11: ppt slides

Behavioral Feature for Network Anomaly Detection Training set = normal network traffic Feature provides semantics of the values of

data Feature selection is important Proposed method:▪ Feature extraction based on protocol behavior▪ Many Attacks uses protocol improperly▪ Ping of Death▪ SYN Flood▪ Teardrop

04/10/23Data Mining in Intrusion Detection 11

Page 12: ppt slides

Attributes packet header fields

Feature Single or multiple attributes

Protocol Specifications Policy for interaction Define attributes and the range of values

Flow Collection of packets exchanged between entities

engaged in protocol Client/Server flows

04/10/23Data Mining in Intrusion Detection 12

Page 13: ppt slides

Inter-Flow vs Intra-Flow Analysis (IVIA)

First step Identify attributes used in partitioning traffic data into flows ->

Src/Dst ports Result: HTTP flows, DNS flows, …etc

Next Step Examine change of attribute values

▪ Between flows (inter-flow)▪ Within a flow (intra-flow)

ResultsOperationallyVariable AttributesFlow DescriptorsOperationallyInvariant

04/10/23Data Mining in Intrusion Detection 13

Intra-Flow Changes

Inter-flow

Changes

Yes No

Yes

IHLService TypeTotal LengthIdentification

Flags_DF

Flags_MFFragment

OffsetTime to Live

Options

Source AddDestination Add

Protocol

No VersionFlags_reserved

Page 14: ppt slides

Uses 1999 DARPA IDS Evaluation data set

Build association rules for IP fragments using OVAs

Result - Top 8 ranking rules

04/10/23Data Mining in Intrusion Detection 14

Rule Support Strength

ipFlagsMF =1 & ipTTL = 63 ipTLen = 28 0.526 0.981

ipID < 2817 & ipFlagsMF = 1 ipTLen > 28

0.309 0.968

ipID < 2817 & ipTTL > 63 ipTLen > 28 0.299 1.000

ipTLen > 28 ipID < 2817 0.309 1.000

ipID < 2817 ipTLen > 28 0.309 0.927

ipTTL > 63 ipTLen > 28 0.299 0.988

ipTLen > 28 ipTTL > 63 0.299 0.967

ipTLen > 28 & ipOffset > 118 ipTTL > 63 0.291 1.000

Page 15: ppt slides

Transform OVAs into features that capture the protocol behavior

Behavior features Attribute observed over time/event

For an attribute observe Entropy Mean and standard deviations Parentage of event within value Percentage of events are monotonic Step size in attribute value

Training data requirement are reduced

Normal – acceptable uses of the protocol 04/10/23

Data Mining in Intrusion Detection 15

Page 16: ppt slides

Uses aggregate attribute values for some window of packets Window size = 10 Examples

▪ TcpPerFIN = % of packets with FIN set▪ meanIAT = Mean inter-arrival time

50 flows for each protocol = 250 flows Number of packets per flow (5 – 37000)

Use decision tree classifier (C5)▪ FTP, SSH, Telent, SMTP, HTTP

Classifier tested on DARPA data set FTP SSH Telnet SMTP WWW 100% 100% 100% 82% 98%

Real Network Traffic (85% - 100%) Kazaa 100 %

04/10/23Data Mining in Intrusion Detection 16

Page 17: ppt slides

04/10/23Data Mining in Intrusion Detection 17

>0.01

<=0.01

<=0.4

>0.4

<=0.79

>0.79

>546773

>546773

<=0.03

>0.03

>73

<=73

>0.79

Page 18: ppt slides

Behavioral Features for Network Anomaly Detection Attribute values cannot be used as features Interpretation of protocol specifications Transform attributes into behavior features aggregation of the attribute values

Data Mining Challenges Self-tuning data mining techniques Pattern-finding and prior knowledge Modeling of temporal data Scalability Incremental mining

04/10/23 18Data Mining in Intrusion Detection

Page 19: ppt slides

Tools Kdnuggets ▪ Web portal http://www.kdnuggets.com

WEKA▪ Most comprehensive and free collection of tools▪ http://www.cs.waikato.ac.nz/ml/weka

Data Sets Machine Learning Database Repository Knowledge Discovery in Databases Archive▪ http://kdd.ics.uci.edu

MIT Lincolin Labs▪ http://www.ll.mit.edu/IST/ideval

04/10/23Data Mining in Intrusion Detection 19

Page 20: ppt slides

“Applications of Data Mining in Computer Security” By Barbara and Jajodia

“Machine Learning and Data Mining for Computer Security” By Maloof

“Data Mining: Challenges and Opportunities for Data Mining During the Next Decade” By Grossman

“Data Mining: Concepts and Techniques” By Han and Kamber

SANS IDS FAQs https://www2.sans.org/resources/idfaq/

ACM Crossroads: IDS http://www.acm.org/crossroads/xrds2-4/intrus.html

04/10/23 20Data Mining in Intrusion Detection

Page 21: ppt slides

OLD Represent rules as a decision tree in memory Very inefficient Speed is linear in term of number of rules Rules growing fast

New Multi-pattern search algorithm Apply multiple rules in parallel Set-wise methodology Fire rule with the longest match

04/10/23Data Mining in Intrusion Detection 21