A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NETWORK FORENSIC ANALYSIS

37
A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NETWORK FORENSIC ANALYSIS BY: AKSHAYA ARUNAN M1 NE [IT] GECBH 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 1

Transcript of A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NETWORK FORENSIC ANALYSIS

Page 1: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUESFOR NETWORK FORENSIC

ANALYSIS

BY: AKSHAYA ARUNAN

M1 NE [IT]

GECBH

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 1

Page 2: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

OUTLINE

Objective

Introduction

Literature Survey

Proposed System

Conclusion

Reference

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 2

Page 3: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

OBJECTIVE

To develop a Network Intrusion Forensics System based on “transductive

scheme” that can

detect and analyze efficiently computer crime

extract digital evidence

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 3

Page 4: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

INTRODUCTION

Rapid development of network connectivity

Complexity and growth

Increase in the number of crimes

System connected are potential candidates for the malicious attack

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 4

Page 5: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

These attacks can affect:

physical or digital assets

funds

consumer confidence

national security

loss of life

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 5

Page 6: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

Network Forensics

Goal: To discover the source of security breaches or other information assurance

problems [1].

Evidence is captured from networks

Interpretation is substantially based on knowledge of network attacks

Allows us to make forensic determinations based on the observed traffic [2]

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 6

Page 7: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

LITERATURE SURVEY

Tcpdump [4],[5]

Wireshark[5]

Artificial Neural Network[1]

Support Vector Machine[5],[6]

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 7

Page 8: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

tcpdump

A free source common packet analyzer that runs under the command line.

Few functions:

Prints the contents of network packets

Display TCP/IP and other packets being transmitted or received

Can read packets from a network interface card

Can write packets to standard output or a file

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 8

Page 9: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

Wireshark

Wireshark is a free and open source packet analyzer.

Wireshark is similar to TCP Dump, but has a graphical front-end, plus some

integrated sorting and filtering options.

It is used for

network troubleshooting

analysis

software and communications protocol development

educational purpose

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 9

Page 10: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

Artificial Neural Network [1]

An ANN is an interconnected group of nodes, akin to the vast network of

neurons in a brain.

They can be used to infer a function from:

observations

data processing

Example: Robotics etc.

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 10

Page 11: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 11

INPUT HIDDEN OUTPUT

In the figure, each node represents an artificial neuron and an arrow represents a

connection from the output of one neuron to the input of another.

Page 12: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

Support Vector Machine [5], [6]

Constructs a hyperplane or a set of hyperplanes in a high or infinite dimensional

space, which can be used for classification, regression, or other tasks.

Supervised learning models

Analyze data and recognize patterns

Hyperplane: It is a subspace of one dimension less than its ambient space

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 12

Page 13: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

Disadvantages

ANN and SVM:

They were designed to find features for network forensics

These methods are effective in reducing the processing-time

But are insufficient in forensic analysis

tcpdump and Wireshark

These tools are designed to help debug network problems, but not special for forensic analysis

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 13

Page 14: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

PROPOSED SYSTEM

First, we propose an efficient TCM-KNN[3] based inference technology

It is much more effective than single, multiple traffic threshold

Second, to boost the real-time network forensic performance of TCM-KNN

simulated annealing (SA) algorithm[10]

Reduce the computational cost

More suitable in real network environment

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 14

Page 15: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

Transductive Confidence Machines for K-Nearest Neighbors

Commonly used machine learning and data mining method

Effective in fraud detection, pattern recognition and outlier detection

The confidence measure used in TCM is based upon universal tests for

randomness or their approximation

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 15

Page 16: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

Transductive scheme based network forensic

We develop a network intrusion forensics system based on transductive scheme

(NIFSTC) that can detect and analyze efficiently

network crime, and

digital evidence

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 16

Page 17: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

NIFSTC consists of the following components:

Network Traffic Capturer

Instance Selection and Feature Extractor

TCMKNN Based Network Forensic Analyzer

Evidence Analyzer

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 17

Page 18: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

NIFSTC system architecture

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 18

Page 19: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

Traffic capturer

The first step of NIFSTC system

Network traffic capture

Preparation for traffic analysis

Provides the base information for other components of the forensics system

The traditional packet capture library, Libpcap[4]

provides implementation independent access to the underlying packet capture facility

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 19

Page 20: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

Problems while using Libcap:

While heavy traffic network - captured data is transferred by the kernel to the user

processes with system call and memory copy.

In a high throughput network - the total amount of valuable CPU cycles is non-

ignorable.

The system overhead- too many operations of memory copy will consume a large

amount of CPU and memory resources.

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 20

Page 21: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

In order to improve the packet capture performance of the NIFSTC, it is

necessary

to reduce the intermediate steps during packet transmission,

bypass the OS kernel and

eliminate kernel’s memory copy.

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 21

Page 22: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

An efficient user-level packet capture mechanism based on semi-polling driven

technique [7,8].

Semi polling - With the semi-polling driven mechanism,

1) interrupts frequency is lowered

2) processing performance for short message is significantly ameliorated

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 22

Page 23: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

TCM-KNN based network forensic analyzer

TCM-KNN is an algorithm combining TCM [9] and KNN algorithm effectively

In the KNN algorithm, we denote the sorted sequence (in ascending order) of

the distances of point “i”, from the other points, with the same classification “y”

as

In this paper, we use Euclidean distance to calculate the distances between

points

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 23

𝐷𝑖𝑦

Page 24: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

We assign to every point a measure called the individual strangeness measure

This measure defines the strangeness of the point in relation to the rest of the

points

In our case the strangeness measure for a point I belonging to a normal class is

defined as:

= Ʃ D (1)

computed for an anomaly

D will stand for the jth shortest distance in this sequence

k is the number of neighbors used

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 24

ikJ=1 ij

Page 25: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

Equation (1) to compute the p-value as follows:

p( ) = #{i: ≥ }

(n+1)

# denotes the cardinality of the set

is the strangeness value for the test point

is among the j largest occurs with probability of at most j/n+1.

p value – non universal tests (Proedru et al) - a measure of how well the data

supports or not a null hypothesis – should be smaller to get greater evidence

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 25

new inew (2)

new

Page 26: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

Feature extractorExtracting features on the “network traffic” captured by Traffic Capturer

component.

A group of features is a kind of data structure characterizing network traffic.

The data structure for network event analysis is the connection log.

Some of the secondary attributes are

1) TCP flags

2) connection duration

3) volume of data passed in each direction

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 26

Page 27: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

Simulated annealing basedinstance selection

A local search technique simulating the physical process of “annealing”[10].

Deals with highly non–linear problems.

Begins a random solution, and in the next neighborhood search for each step of

the process.

Moves are controlled by some probability function.

The acceptance of a downhill depends on reduction in the value of the objective function

size of the search time

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 27

Page 28: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

Selects the most contributing examples and omits useless fitness function.

To apply SA, two important problems should be addressed:

Specification of the representation of the solutions

Definition of the fitness function

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 28

Page 29: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

1) Representation:

Training dataset - TR with instances.

Search space associated with the instance selection of TR is constituted by –Subsets of TR

Eg: chromosomes - subsets of TR - Uses a binary representation

A chromosome consists of genes with two possible states: 0 and 1

If 1, then its associated instance is included in the subset of TR represented by the chromosome.

If 0, then this does not occur.

Result: Selected chromosomes would be the reduced training dataset for TCM-KNN.

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 29

Page 30: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

2) Fitness function:

Let F(X) be a subset of instances of TR to evaluate and be coded by a chromosome.

Three measures to be seriously considered:

TP

FP

Percentage of training dataset reduction

Thus, Fitness function combines three values:

the detect_rate associated with fal_rate

reduce_rate of instances of with regards to TR

F(x)=C * (detect_rate - fal_rate) +(1-C) * reduce_rate (3)

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 30

Page 31: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

reduce rate =|TR|-|S | * 100 (4)

|TR|

|TR| - the number of the original training dataset and

|S| - the reduced training dataset using SA

C - an adjustment constant set by experiences

The objective of the SA is to maximize the fitness function defined

maximize detection rate

minimize the number of instances obtained as well as FP rate

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 31

Page 32: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

Evidence analyzer

Can connect distant, and incomplete abnormal events

A set of evidence analyzing utilities can examine different aspects of correlated

events in an efficient way

Then utilities are formed into NIFSTC system

Evidence analyzer uses two work modes:

1) count mode or

2) weighted analysis mode

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 32

Page 33: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

Evidence analyzer results in undirected evidence graph

Value of the attribute - nodes in graph

Node size - different weight

Edges - a relationship between two attribute values.

An evidence graph is shown in figure.

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 33

Page 34: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 34

Evidence Graph

Page 35: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

CONCLUSION

TCM- KNN is the most modern and precise algorithm to detect the network

crimes and analyze the forensic data.

Evidence analyzer gives the package of number of evidences and corresponding

weighted values.

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 35

Page 36: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

REFERENCES1) S Mukkamala, A.Sung, - ‘’Identifying significant features for network forensic analysis using

artificial intelligent techniques’’ - Int’l Journal of Digital Evidence[2003]

2) M.I. Cohen. PyFlag‚ - “An advanced network forensic framework” - Digital Investigation

(Elsevier Journal) [2008]

3) Y. Li, L. Guo, - “An active learning based TCM-KNN algorithm for supervised network

intrusion detection” – Computers Security (Elsevier Journal) [2007]

4) Libpcap – http://www.tcpdump.org/release/libcap-0.7.2.tar.gz, [2002]

5) Wikipedia – www.wikipedia.com

6) E. Eskin, A. Arnold, M, Prerau, L. Portnoy, S. Stolfo. – “A geometric framework for

unsupervised anomaly detection: detecting intrusions in unlabeled data” - D. Barbara and S.

Jajodia (editors), Applications of Data Mining in Computer Security, Kluwer, [2002]

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 36

Page 37: A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES  FOR NETWORK FORENSIC ANALYSIS

7) ZH Tian, BX Fang, XC Yun, - “User-Level message passing mechanism based on semi-

polling driven in RTLinux” - Journal of Software [2004]

8) ZH Tian, MZ Hu, B Li., - “Semi-Polling Based Interrupt Mitigation for High Performance

Packet Processing” - High Technology Letters [2005]

9) A. Gammerman, V. Vovk, - “Prediction algorithms and confidence measure based on

algorithmic randomness theory”, - Theoretical Computer Science[2002]

10) Aarts, E. and van Laarhoven, - “ Simulated anealing: A pedestrian review of the theory and

some applications”, in J. Kittler and P.A. Devijver (Eds.) - Pattern Recognition and

Applications, Springer-Verlag, Berlin[1987]

22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 37