0-Day Detection in Malware Software Dataset · 3.1 Oracle Relational Database Management System...

19
Salman Bin Abdulaziz University College of Engineering and Computer science Department of Computer Science 0-Day Detection in Malware Software Dataset By Sultan Mohamed Al-Ajmi [email protected] Under Supervision of Dr. Mohammad Alhawarat 1433/2012

Transcript of 0-Day Detection in Malware Software Dataset · 3.1 Oracle Relational Database Management System...

Page 1: 0-Day Detection in Malware Software Dataset · 3.1 Oracle Relational Database Management System (RDBMS): Oracle RDBMS is one of the most used and known Database management systems.

Salman Bin Abdulaziz University

College of Engineering and Computer science

Department of Computer Science

0-Day Detection in

Malware Software Dataset

By

Sultan Mohamed Al-Ajmi

[email protected]

Under Supervision of

Dr. Mohammad Alhawarat

1433/2012

Page 2: 0-Day Detection in Malware Software Dataset · 3.1 Oracle Relational Database Management System (RDBMS): Oracle RDBMS is one of the most used and known Database management systems.

Table of Contents

1. Introduction:….. .................................................................................................... 2

1.1Signature-Based:……. ...................................................................................... 2

1.2 Heuristic Scanning (0-Day Detection):.......……………...……………..……3

2. Literature Review:……………………...…………………………..…………... 3

3.Experiment Setup:………………….…………………..……………...…………5

3.1 Oracle Relational Database Management System (RDBMS):……..………..5

3.2 Malicious Software Dataset:……………………………….……. ………….6

3.3 Loading Dataset into the Database using OraLoader:……………….…. …..7

3.4 Oracle Data Mining:……………………………………. ………..………...9

4.Experiment Execution:…………………………….....…….……………...…….9

5.Results and Discussion:….. .................................................................................. 15

6.Future Work:………….. ...................................................................................... 15

7.References:…… ................................................................................................... 16

Page 3: 0-Day Detection in Malware Software Dataset · 3.1 Oracle Relational Database Management System (RDBMS): Oracle RDBMS is one of the most used and known Database management systems.

List of Tables & Figures

Tables:

Table 1: Dataset Summary ............................................................................................ 9

Figures:

Figure 1: Oracle RDBMS Site ...................................................................................... 5

Figure 2: OraLoader Login ........................................................................................... 7

Figure 3: OraLoader wizard (Open File Window) ........................................................ 7

Figure 4: OraLoader wizard (Table Options) ................................................................ 8

Figure 5: OraLoader wizard (Preview) ......................................................................... 8

Figure 6: Oracle Data Mining Connection .................................................................... 9

Figure 7: Selecting the Dataset Table ........................................................................... 9

Figure 8: Build Activity .............................................................................................. 10

Figure 9: Build Wizard (Model Choosing) ................................................................. 10

Figure 10: Build Wizard (Table and Primary key) ..................................................... 11

Figure 11: Build Wizard (Data Usage) ....................................................................... 11

Figure 12: Build Wizard (Model Completion) ............................................................ 12

Figure 13: Apply Activity ........................................................................................... 12

Figure 14: Apply Activity (Model) ............................................................................. 13

Figure 15: Apply Activity (Attributes Options) .......................................................... 13

Figure 16: Apply Activity (Results) ............................................................................ 14

Page 4: 0-Day Detection in Malware Software Dataset · 3.1 Oracle Relational Database Management System (RDBMS): Oracle RDBMS is one of the most used and known Database management systems.

1

Abstract

Security is a vital aspect in computer Science because it is a real problem

if your files or information are lost, damaged or even shared while they

are private. The solution is to analyze the shell trace and determine

whether this software is benign or malware. We are using Oracle Data

Mining software to analyze the trace file of a malware dataset with

Anomaly Detection Technique. It takes a bulk of records (training set)

with trace and the type of software (benign or malware) as input. Then

the model that can predict the type of new software upon its trace is

created. We Applied Anomaly Detection on Malicious Software Dataset

from data mining competition associated with ICONIP 2010. After that,

we got a Model for predicting if the software is benign or malware. The

Model accuracy is 81%. While signature-based techniques should study

and insert Malware signature to the antivirus software database after the

malware has already spread and infected many computers. Meanwhile

the 0-Day detection mechanism should detect Malwares immediately and

it can be reported to analysis.

Page 5: 0-Day Detection in Malware Software Dataset · 3.1 Oracle Relational Database Management System (RDBMS): Oracle RDBMS is one of the most used and known Database management systems.

2

1. Introduction:

Previously only strong passwords and firewalls were all that was required

to secure corporate networks. Nowadays, intruder attack methodologies

have become more targeted and sophisticated. So, we need new

techniques to defend against Intruders. Intrusion detection (ID) is a type

of security management system for computers and networks. An ID

System (IDS) monitors network traffic for suspicious activity and alerts

network administrators, or responds by taking predefined action. ID uses

vulnerability assessment or scanning, which is a technology developed to

assess the security of a computer system or network. What makes ID

important is that IDS will detect any type of intrusions or misuse that

falls out of normal system operation. This is as opposed to signature

based systems which can only detect attacks for which a signature has

previously been created.

Antivirus software uses 2 strategies to detect malwares: signature-based

and heuristic scanning. These will be discussed in the following

subsections.

1.1 Signature-Based:

Signature based detection method is a very old method. It is using a

simple method by comparing string in a scanned object or file (usually in

very specific places) against known malware or virus string pattern in a

virus signature database. The database is populated and updated by the

antivirus provider in a regular manner. Some anti-malware software

release several update in one day to catch up with the very fast new

malware creation. To add new signature to its database; the antivirus

provider will need reports of virus infections from its users. So there

always have to be some unlucky users who are of the first to be infected.

Page 6: 0-Day Detection in Malware Software Dataset · 3.1 Oracle Relational Database Management System (RDBMS): Oracle RDBMS is one of the most used and known Database management systems.

3

1.2 Heuristic Scanning (0-Day Detection):

Heuristic scanning is more advanced than that of signature based

scanning. In heuristic scanning the antivirus software will detect a

malware threat by recognizing some instructions or commands in the

scanned object or file. It will compare malicious instructions or

commands found in a file against a set of rule that is commonly used by

malicious software or virus. It will trigger an alarm when it finds a match

of certain rule. Heuristic scanning is much better than signature based

scanning especially in detecting new created malware or virus.

2. Literature Review:

In latest studies, researchers are concerned about the performance, real

time IDS and improvement of IDS Detection.

Decision tree based light weight intrusion detection using a wrapper

approach is discussed by [3]. The objective of this paper is to construct a

lightweight Intrusion Detection System (IDS) aimed at detecting

anomalies in networks. The goals of this paper are (i) removing

redundant instances that causes the learning algorithm to be unbiased (ii)

identifying suitable subset of features by employing a wrapper based

feature selection algorithm (iii) realizing proposed IDS with neurotree to

achieve better detection accuracy.

An efficient intrusion detection system based on support vector machines

and gradually features removal method is achieved by [4] using the

gradually feature removal method. In this work 19 critical features are

chosen to represent the various network visits. With the combination of

clustering method, ant colony algorithm and support vector machine

Page 7: 0-Day Detection in Malware Software Dataset · 3.1 Oracle Relational Database Management System (RDBMS): Oracle RDBMS is one of the most used and known Database management systems.

4

(SVM), an efficient and reliable classifier is developed to judge a

network visit to be normal or not. Moreover, the accuracy achieves

98.6249% in 10-fold cross validation and the average Matthews

correlation coefficient (MCC) achieves 0.861161.

A differentiated one-class classification method with applications to

intrusion detection is discussed in [2]. This paper proposes a new one-

class classification method with differentiated anomalies to enhance

intrusion detection performance for harmful attacks. We also propose

new extracted features for host-based intrusion detection based on three

viewpoints of system activity such as dimension, structure, and contents.

Experiments with simulated dataset and the DARPA 1998 BSM dataset

shows that their differentiated intrusion detection method performs better

than existing techniques in detecting specific type of attacks.

In [6] the authors propose a real-time intrusion detection approach using

a supervised machine learning technique. Their approach is simple and

efficient, and can be used with many machine learning techniques. They

applied different well-known machine learning techniques to evaluate the

performance of their IDS approach. Their experimental results show that

the Decision Tree technique can outperform the other techniques.

Therefore, they have further developed a real-time intrusion detection

system (RT-IDS) using the Decision Tree technique to classify on-line

network data as normal or attack data.

A Gaussian distributed WSN cannot effectively detect the intruder if it

starts from the network boundary [5]. In view of this, this paper

introduces a novel k-Gaussian deployment strategy to leverage the

advantages of both uniform and Gaussian random sensor deployment for

efficient and effective intrusion detection. The key idea is to employ

Page 8: 0-Day Detection in Malware Software Dataset · 3.1 Oracle Relational Database Management System (RDBMS): Oracle RDBMS is one of the most used and known Database management systems.

5

multiple deployment points in the area of interest and a subset of the total

sensors are deployed around each deployment point following a Gaussian

distribution and form a k-Gaussian distributed WSN.

3. Experiment Setup:

3.1 Oracle Relational Database Management System (RDBMS):

Oracle RDBMS is one of the most used and known Database

management systems. It has the powerful to deal with huge number of

records easily and efficiently. To use Oracle Database we have to install

one of the recent versions, I've chosen Oracle 10g and here are the

installation steps:

First of all, paste this URL to your browser address bar:

http://www.oracle.com/technetwork/database/10204-winx64-vista-win2k8-

082253.html

Figure 1: Oracle RDBMS Site

Page 9: 0-Day Detection in Malware Software Dataset · 3.1 Oracle Relational Database Management System (RDBMS): Oracle RDBMS is one of the most used and known Database management systems.

6

After downloading is complete, extract the file and carry out the

following steps:

1- Write the name of the database and your "sys" account password,

and then click next.

2- If this window gives you an error, then your PC is not compatible

with Oracle RDBMS, click next.

3- Check the summary information and click next.

4- Now Oracle RDBMS is being installed, wait until it finish and

click finish.

After Installing is complete, go to start run sql plus

And enter sys as the username and its password. Now your Oracle

DBMS is ready to import your dataset into it.

3.2 Malicious Software Dataset:

Tiltle Malicious Software Dataset

Sources International Conference on Neural Information Processing

Features

(Attributes)

Trace

Sequence of API calls

Class Attribute Type of the software

0 for benign

1 for malware

No. Instances 252

Missing

Attribute Values No

Table 1: Dataset Summary

This dataset is composed of a selection of Windows API/System-Call

trace files, intended for testing classifiers dealing with sequence patterns.

This dataset has been downloaded from the website of CS Mining Group

[9].

Page 10: 0-Day Detection in Malware Software Dataset · 3.1 Oracle Relational Database Management System (RDBMS): Oracle RDBMS is one of the most used and known Database management systems.

7

3.3 Loading Dataset into the Database using OraLoader:

One of the tools provided with Oracle DBMS is SQLLoader, where you

can load a table or data from excel, csv or text into your database.

OraLoader software uses SQLLoader with an easy interface. Here are the

steps in the form of screenshots:

Figure 2: OraLoader Login

Figure 3: OraLoader wizard (Open File Window)

Page 11: 0-Day Detection in Malware Software Dataset · 3.1 Oracle Relational Database Management System (RDBMS): Oracle RDBMS is one of the most used and known Database management systems.

8

Figure 4: OraLoader wizard (Table Options)

Figure 5: OraLoader wizard (Preview)

Click on close button, and now you are done with loading your dataset.

Page 12: 0-Day Detection in Malware Software Dataset · 3.1 Oracle Relational Database Management System (RDBMS): Oracle RDBMS is one of the most used and known Database management systems.

9

3.4 Oracle Data Mining:

Oracle Data Mining (ODM) embeds data mining within the Oracle

database. ODM algorithms operate natively on relational tables or views,

thus eliminating the need to extract and transfer data into standalone

tools or specialized analytic servers. Download the software from Oracle

website, and run the executable file:

Figure 6: Oracle Data Mining Connection

4. Experiment Execution:

After achieving all requirements in the last section, here are the steps of

how to apply Anomaly detection on the dataset as shown in the following

set of figures:

Figure 7: Selecting the Dataset Table

Page 13: 0-Day Detection in Malware Software Dataset · 3.1 Oracle Relational Database Management System (RDBMS): Oracle RDBMS is one of the most used and known Database management systems.

10

Figure 8: Build Activity

Figure 9: Build Wizard (Model Choosing)

Page 14: 0-Day Detection in Malware Software Dataset · 3.1 Oracle Relational Database Management System (RDBMS): Oracle RDBMS is one of the most used and known Database management systems.

11

Figure 10: Build Wizard (Table and Primary key)

Figure 11: Build Wizard (Data Usage)

Page 15: 0-Day Detection in Malware Software Dataset · 3.1 Oracle Relational Database Management System (RDBMS): Oracle RDBMS is one of the most used and known Database management systems.

12

After all activities have completed, click on result as shown in the

following figure:

Figure 12: Build Wizard (Model Completion)

Now, applying the model on the dataset, to see how accurate the model.

See the following figure:

Figure 13: Apply Activity

Page 16: 0-Day Detection in Malware Software Dataset · 3.1 Oracle Relational Database Management System (RDBMS): Oracle RDBMS is one of the most used and known Database management systems.

13

Figure 14: Apply Activity (Model)

Figure 15: Apply Activity (Attributes Options)

Page 17: 0-Day Detection in Malware Software Dataset · 3.1 Oracle Relational Database Management System (RDBMS): Oracle RDBMS is one of the most used and known Database management systems.

14

The result of applying the model on the dataset is shown in figure 23:

Figure 16: Apply Activity (Results)

Page 18: 0-Day Detection in Malware Software Dataset · 3.1 Oracle Relational Database Management System (RDBMS): Oracle RDBMS is one of the most used and known Database management systems.

15

5. Results and Discussion:

After creating the model, we applied it on the dataset. The Model is able

to detect the malware programs. There is no need to wait until the

antivirus updates the signature database, the malware can be detected

instantly. Signature-Based method is updated by the user reports, so there

is always at least some victims of a new malware. While the Intrusion

Detection System might detects that in milliseconds.

The Model -created in this study- accuracy is 81%. That means it can

predict 81% of 0-Day malwares, so it can decrease the victims by 81%.

6. Future Work:

I would like to design or participate in designing an IDS or network IDS.

Such IDS will generate a report for detected malwares. So that 0-day

malwares will be detected effectively. The next improvement to such

IDS is that it can learn and provide better detection with time. To do so

we need an algorithm to learn more when detecting malwares.

Page 19: 0-Day Detection in Malware Software Dataset · 3.1 Oracle Relational Database Management System (RDBMS): Oracle RDBMS is one of the most used and known Database management systems.

16

7. References:

[1] Tan P-N, Steinbach M, Kumar V (2006) Introduction to data mining. Pearson Addison-

Wesley.

[2] Inho Kang, Myong K. Jeong, and Dongjoon Kong. 2012. A differentiated one-class

classification method with applications to intrusion detection. Expert Syst. Appl. 39, 4

(March 2012), 3899-3905.

[3] Siva S. Sivatha Sindhu, S. Geetha, and A. Kannan. 2012. Decision tree based light weight

intrusion detection using a wrapper approach. Expert Syst. Appl. 39, 1 (January 2012), 129-

141.

[4] Yinhui Li, Jingbo Xia, Silan Zhang, Jiakai Yan, Xiaochuan Ai, and Kuobin Dai. 2012. An

efficient intrusion detection system based on support vector machines and gradually feature

removal method. Expert Syst. Appl. 39, 1 (January 2012), 424-430.

[5] Yun Wang and Zhengdong Lun. 2011. Intrusion detection in a K-Gaussian distributed

wireless sensor network. J. Parallel Distrib. Comput. 71, 12 (December 2011), 1598-1607.

[6] Phurivit Sangkatsanee, Naruemon Wattanapongsakorn, and Chalermpol Charnsripinyo.

2011. Practical real-time intrusion detection using machine learning approaches. Comput.

Commun. 34, 18 (December 2011), 2227-2235.

[7] Oracle Documentation Library, http://tahiti.oracle.com/pls/db102/homepage [Access Date

5/2/2012]

[8] Association for Computing Machinery, http://www.acm.org/ [Access Date 4/2/2012]

[9] CS Mining Group, http://csmining.org/index.php/ [Access Date 5/2/2012]