A REVIEW OF MACHINE LEARNING BASED …rafea/CSCE590/Spring2015/Presentations/A...Outline...

A REVIEW OF MACHINE

LEARNING BASED

ANOMALY DETECTION

By Mohamed Elfadly [email protected]

Outline

Introduction

CyberSecurity Systems

Review of CyberSecurity Solutions

Machine Learning

Machine Learning for Anomaly Detection

Machine Learning Based Techniques

Machine Learning Applications

Introduction

As technology moves forward, users became more technical aware than before. People communicate and cooperate efficiently through the Internet using their personal computers, PDAs or mobile phones.

Through these digital devices linked by the Internet, hackers also attack personal privacy using a variety of weapons, such as viruses, Trojans, worms, botnet attacks, rootkits, adware, spam, and social engineering platforms.

Introduction

Those different forms of attacks are considered a

cyber-threat which can be categorized into one of

three groups according to the intruder’s purpose:

Stealing confidential information

Manipulating the components of cyber infrastructure

Denying the functions of the infrastructure

CyberSecurity System

CyberSecurity Systems

However, Building defense systems for discovered

attacks is not easy because of the constantly

evolving cyber attacks

That’s why, higher-level and adaptive

methodologies are required to discover the

embedded cyber intrusions

Many higher-level

adaptive cyber

defense systems can

be partitioned into

component[1]

Data-capturing tools, such as Libpcap for Linux and Winpcap for Windows,

capture events from the audit trails of resource information sources (e.g., network).

The data-preprocessing module filters out the attacks for which good signatures

have been learned.

A feature extractor derives basic features that are useful in event analysis engines,

including a sequence of system calls, start time, duration of a network flow, source IP

and source port, destination IP and destination port, protocol, number of bytes, and

number of packets.

In an analysis engine, various intrusion detection methods are implemented to

investigate the behavior of the cyber-infrastructure, which may or may not have

appeared before in the record, e.g., to detect anomalous traffic.

Solutions to cybersecurity problems:

Proactive Approaches: anticipate and eliminate vulnerabilities in the cyber system, while remaining prepared to defend effectively and rapidly against attacks

Reactive Approaches: such as intrusion detection systems (IDSs). IDSs detect intrusions based on the information from log files and network flow, so that the extent of damage can be determined, hackers can be tracked down, and similar attacks can be prevented in the future.

Review of Cyber Security Solutions

Proactive security solutions are designed to maintain the overall security of a system, even if individual components of the system have been compromised by an attack.

Researchers consider data-mining algorithms from the viewpoint of privacy preservation. This new research, introduced by Verykios et al., called PPDM (the Privacy preservation technique)[4].

Reactive Security Systems

An IDS intelligently monitors activities that occur in a computing resource, e.g., network traffic and computer usage, to analyze the events and to generate reactions.

The intrusion detection can be classified into the following modules [1]:

Misuse/Signature detection

Anomaly Detection

Hybrid Detection

Scan detector and Profiling modules.

IDS Modules

Misuse/Signature Detection: is an IDS triggering method that generates alarms when a known cyber misuse occurs.

Anomaly Detection: Anomaly detection triggers alarms when the detected object behaves significantly differently from the predefined normal patterns

Hybrid Detection: Combining both anomaly and misuse detection techniques to overcome their drawbacks

Scan Detection and Profiling Module: Scan detection generates alerts when attackers scan services or computer components in network systems before launching attacks. The Profiling modules group similar network connections and search for dominant behaviors using clustering algorithms.

Purpose

Most of the reactive security solutions depends

heavily on Machine learning approach to find

solutions to cyber security problems.

That’s why, a literature review will be conducted on

the anomaly detection using machine learning

Machine Learning

Machine learning is one of the corner stone fields

in Artificial Intelligence, where machines learn to act

autonomously, and react to new situations without

being pre-programmed. It is about designing

algorithms that allow computers to learn.

Machine Learning

Machine learning algorithms are categorized,

based on the desired outcome of the algorithm

Supervised Learning

Unsupervised Learning


Lust for victory will not give you the victory. You must receive the victory from your

opponent. He has no choice but to give it to you because he will sense your heart as

better or truer. Nature is your friend; it helps you to win. Your enemy will have

unnatural movement; therefore you will be able to know what he is going to do before

he does it.

Masaaki Hatsumi

Secret Ninjutsu

Anomaly Detection

The goal of anomaly detection is to target any event

falling outside of a predefined set of normal behaviors.

Anomaly detection first defines a profile of normal

behaviors, which reflects the health and sensitivity of a

cyber-infrastructure. Correspondingly, an anomaly

behavior is defined as a pattern in data that does not

conform to the expected behaviors.

Anomaly Detection

Anomaly detection relies on a clear boundary between normal and anomalous behaviors, where the profile of normal behaviors is defined as different from anomaly events. The profile must fit a set of criteria as explained by Gong[10].

For example, if a user who usually logs in around 10 am from university dormitory logs in at 5:30 am from an IP address of China, then an anomaly has occurred

Challenges

1. The key challenge is that the huge volume of data with high-dimensional feature space is difficult to manually analyze and monitor. Such analysis and monitoring requires highly efficient computational algorithms in data processing and pattern learning.

2. In the huge volume of network data, the same malicious data repeatedly occur while the number of similar malicious data is much smaller than the number of normal data.

3. Much of the data is streaming data, which requires online analysis

4. The concept of an anomaly/outlier varies among application domains; the labeled anomalies are not available for training/validation.


Workflow of anomaly

detection system

However, anomaly detection approaches has a major drawback, since it may trigger high rates of false alarm. Because it can flag any significant deviation from the baseline as an intrusion

Hackers often modify malicious codes or data to make them similar to normal patterns. So when such an attack occurs, it will detect it as part of the normal profile and the attack will be missed because it was judged to be part of normal profile, a false negative occur.

The problem always remain is how to minimize the false negative and false positive rates.

Machine Learning Based Techniques

Technique Pros/Cons

Fuzzy Logic - Reasoning is approximate rather than precise

- Effective, especially against port scans and probes

- High resource consumption involved

Genetic Algorithm - Biologically inspired and employs evolutionary algorithm.

- Uses the properties like Selection, Crossover, and Mutation

- Capable of deriving classification rules and selecting optimal

parameters

Neural Network - Ability to generalize from limited, noisy and incomplete data.

- Has potential to recognize future unseen patterns

Bayesian Network - Encodes probabilistic relationships among the variables of interest.

- Ability to incorporate both prior knowledge and data

Machine Learning Applications

1. Fusion of BVM and ELM for Anomaly Detection

2. Anomaly Detection Using Neural Network

Optimized with GSA Algorithm

Fusion of BVM and ELM for Anomaly Detection

Changning et al., in their paper “Fusion of BVM and ELM for Anomaly Detection in Computer Networks” stated that fusion or ensemble of classifiers is generally better than a single classifier. Therefore, the fusion of classifiers for anomaly detection not only improves the accuracy but also sustains the low false alarm rates with a high reliability and scalability. [13]. they utilizes the extreme learning machine (ELM) and ball vector machine (BVM) as two kinds of single classifiers.

Extracting a suitable features for representing the

network traffic flow can be divided into three groups:

The content features: containing information about the data

content of packets that could be relevant to anomaly or intrusion.

The intrinsic features are some general information related to the

connection.

Traffic features: for example, statistics related to past connection

similar to the current one.

Fusion Method

Step 1: Prepare three kinds of features that should be labeled.

Step 2: Every kinds of features is trained by BVM and ELM separately. The classifier is denoted as bvm(i) and elm(i) i =1, 2,3 . Lable(i) i =1,...,6 is each classifier’s output.

Step 3: Train a single hidden layer BP neural network with 6 input nodes, 30 hidden nodes and 6 output nodes using labeled data of BVM and ELM from step 2. (Using Lable(i) of bvm(i) and elm(i) as BP neural network’s input)

Step 4: Then using acquired Lable(i) as the input of neural network, to train a BP neural network, and then we obtain Train U as the output.

In the predicting process, BP neural network receives the labels from trained ELM and BVM classifier, obtains the Lable(i) and w(i) i = 1,...,6 .Then using major weighted vote to process the value of weight, if

Experiments & Results

BVM ELM BVM+ELM+BP

Accuracy 97.7% 93.32% 99.06%

False alarm rates 0.28% 0.36% 0.13%

They randomly selected 20000 examples from the whole dataset to compose an experiment dataset.

The features are divided into three parts: the content features, which have 13 attributes, intrinsic features, which

have 9 attributes, and the traffic features, which have 19 attributes.

Fusion Method VS SVM

A comparison between fusion method with other

fusion method, like SVM and BP neural network as

single classifier with same fusion scheme.

ELM+BVM+BP SVM+BP

Training Time 86s 102s

Accuracy 98.06% 98.02%

False alarm rates 0.13% 0.11%

Anomaly Detection Using Neural Network

Optimized with GSA Algorithm

In their paper “Flow-Based Anomaly Detection Using

Neural Network Optimized with GSA Algorithm” [11]

the authors proposes an anomaly-based Network

IDS which is an important tool to protect computer

networks from attacks.

Traditional packet-based NIDSs are time-intensive as they analyze all network packets. A state-of-the-art NIDS should be able to handle a high volume of traffic in real time. Flow-based intrusion detection is an effective method for high speed networks since it inspects only packet headers. Anomaly-based intrusion detection is a well-known method capable of detecting unknown attacks. So they offered a GSA-based flow anomaly detection system (GFADS), a multi-layer perceptron neural network with one hidden layer (MLP)

They used GSA to overcome the slow convergence

and the local minima caused by the back-

propagation used to train the MLPs. GSA is

memory-less and uses distance to agents in its

updating procedure. It has an adaptive learning

rate and it also has faster convergence.

Performance

They compared GSA with five gradient descent algorithms and PSO:

1. Gradient descent momentum and an adaptive learning rate (Train Gdx)

2. Gradient descent backpropagation (Train gd)

3. Gradient descent with adaptive learning rate backpropagation(Train Gda)

4. Gradient descent with momentum backpropagation (Train gdm)

5. Sequential order incremental training with learning function (Trains)

6. Particle Swarm Optimization Algorithm (PSO)

Future Work

Review researches on Hybird approaches where Anomaly and misuse (Signature Based) are combined together . Since each of these methods has cons and pros.

One of the most important disadvantages of anomaly detection is high false alarm ratio; however misuse detection is incapable in recognizing new attacks.

Thus if they are combined in smart way , the proposed model could use the combination of the qualities of two mentioned methods to cover the weakness of each one.

Reference1. Sumeet Dua and Xian Du. Data Mining and Machine Learning in cybersecurity. April 25, 2011 by Auerbach Publications

2. Canetti, R., R. Gennaro, A. Herzberg, and D. Naor. Proactive security: Long-term protection against break-ins. CryptoBytes 3 (1997): 1–8.

3. Barak, B., A. Herzberg, D. Naor, and E. Shai. The proactive security toolkit and applications. In: Proceedings of the 6th ACM Conference on Computer and Communications Security,Singapore, 1999, pp. 18–27.

4. Verykios, V.S., E. Bertino, I.N Fovino, L.P. Provenza, Y, Saygin, and Y. Theodoridis. State of-the-art in privacy preserving data mining. ACM SIGMOD Record 33 , 2004:50–57

5. Denning, D. An intrusion-detection model. IEEE Transactions on Software Engineering 13 (2) (1987): 118–131.

6. Tom M Mitchell. Machine Learning, volume 4. Burr Ridge, IL: McGraw Hill, June 1997.

7. Phil Simon. Too Big to Ignore: The Business Case for Big Data. Wiley, 2013

8. Taiwo Oladipupo Ayodele. New Advances in Machine Learning. InTech, 2010.

9. Harjinder Kaur, Gurpreet Singh, Jaspreet Minhas, “A Review of Machine Learning based Anomaly Detection Techniques”

10. Gong, F. Deciphering detection techniques: Part II. Anomaly-based intrusion detection. white paper, Mcafee Network Security Technologies Group, 2003.

11. Zahra Jadidi, Mansour Sheikhan, “Flow-Based Anomaly Detection Using Neural Network Optimized with GSA Algorithm”

12. Eskin, E., A. Arnold, M. Prerau, L. Portnoy, and S. Stolfo. A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data. In: Applications of Data Mining in Computer Security, edited by S. Jajodia and D. Barbara. Dordrecht:Kluwer, 2002, Chap. 4.

13. Changning Cai, Guojian Cheng, Huaxian Pan, “Fusion of BVM and ELM for Anomaly Detection in Computer Networks”

A REVIEW OF MACHINE LEARNING BASED …rafea/CSCE590/Spring2015/Presentations/A...Outline...

Documents

Transcript of A REVIEW OF MACHINE LEARNING BASED …rafea/CSCE590/Spring2015/Presentations/A...Outline...