Malicious JavaScript Detection by Features Extraction · Malicious JavaScript Detection by Features...

e-Informatica Software Engineering Journal, Volume 8, Issue 1, 2014, pages: 65–78, DOI 10.5277/e-Inf140105

Malicious JavaScript Detectionby Features Extraction

Gerardo Canfora∗, Francesco Mercaldo∗, Corrado Aaron Visaggio∗

∗Department of Engineering, University of [email protected], [email protected], [email protected]

AbstractIn recent years, JavaScript-based attacks have become one of the most common and successfultypes of attack. Existing techniques for detecting malicious JavaScripts could fail for differentreasons. Some techniques are tailored on specific kinds of attacks, and are ineffective for others.Some other techniques require costly computational resources to be implemented. Other techniquescould be circumvented with evasion methods. This paper proposes a method for detecting maliciousJavaScript code based on five features that capture different characteristics of a script: executiontime, external referenced domains and calls to JavaScript functions. Mixing different types offeatures could result in a more effective detection technique, and overcome the limitations ofexisting tools created for identifying malicious JavaScript. The experimentation carried out suggeststhat a combination of these features is able to successfully detect malicious JavaScript code (inthe best cases we obtained a precision of 0.979 and a recall of 0.978).

1. Introduction

JavaScript [1] is a scripting language usually em-bedded in web pages with the aim of creatinginteractive HTML pages. When a browser down-loads a page, it parses, compiles, and executes thescript. As with other mobile code schemes, ma-licious JavaScript programs can take advantageof the fact that they are executed in a foreignenvironment that contains private and valuableinformation. As an example, a U.K. researcherdeveloped a technique based on JavaScript tim-ing attacks for stealing information from thevictim machine and from the sites the victimvisits during the attack [2]. JavaScript code isused by attackers for exploiting vulnerabilitiesin the user’s browser, browser’s plugins, or fortricking the victim into clicking on a link hostedby a malicious host. One of the most widespreadattacks accomplished with malicious JavaScriptis drive-by-download [3, 4], consisting of down-loading (and running) malware on the victim’s

machine. Another example of JavaScript-basedattack is represented by scripts that abuse sys-tems resources, such as opening windows thatnever close or creating a large number of pop-upwindows [5].

JavaScript can be exploited for accomplish-ing web based attacks also with emerging webtechnologies and standards. As an example, thisis happening with Web Workers [6], a technologyrecently introduced in HTML 5. A Web Workeris a JavaScript process that can perform com-putational tasks, and send and receive messagesto the main process or to other workers. A WebWorker differs from a worker thread in Java orPython in a fundamental aspect of the design:there is no sharing of the state. Web Workerswere designed to execute portions of JavaScriptcode asynchronously, without affecting the per-formance of the web page. The operations per-formed byWebWorkers are therefore transparentfrom the point of view of the user who remainsunaware of what is happening in the background.

http://www.e-informatyka.pl/wiki/e-Informatica

http://www.e-informatyka.pl/attach/e-Informatica_-_Volume_8/eInformatica2014Art5.pdf

66 Gerardo Canfora, Francesco Mercaldo, Corrado Aaron Visaggio

The literature offers many techniques to de-tect malicious JavaScripts, but all of them showsome limitations. Some existing detection solu-tions leverage previous knowledge about malware,so they could be very effective against well-knownattacks, but they are ineffective against zero-dayattacks [7]. Another limitation of many detectorsof malicious JavaScript code is that they aredesigned for recognizing specific kinds of attack,thus for circumventing them, attackers usuallymix up different attack’s types [7]. This paperproposes a method to detect malicious JavaScriptthat consists of extracting five features from theweb page under analysis (WUA in the remainingof the paper), and using them for building aclassifier. The main contribution of this methodis that the proposed features are independent ofthe technology used and the attack implemented.So it should be robust against zero-day attacksand JavaScripts which combine different typesof attacks.

1.1. Assumptions and ResearchQuestions

The features have been defined on the basis ofthree assumptions. One assumption is that a ma-licious website could require more resources thana trusted one. This could be due to the need toiterate several attempts of attacks until at leastone succeeds, to executing botnets functions, orto examining and scanning machine resources.Based on this assumption, two features have beenidentified. The first feature (avgExecTime) com-putes the average execution time of a JavaScriptfunction. As discussed in [8, 9], the malware isexpected to be more resource-consuming than atrusted application. The second feature (maxEx-ecTime) computes the maximum execution timeof JavaScript function.

The second assumption is that a maliciousweb page generally calls a limited number ofJavaScript functions to perform an attack. Thiscould have different justifications, i.e. a maliciouscode could perform the same type of attacksover and over again with the aim of maximizingthe probability of success: this may mean thata reduced number of functions is called many

times. Conversely, a benign JavaScript usuallyexploits more functions to implement the busi-ness logic of a web application [10]. One featurehas been defined on this assumption (funcCalls)that counts the number of function calls done byeach JavaScript.

The third assumption is that a JavaScriptfunction can make use of malicious URLs formany purposes, i.e. performing drive-by down-load attacks or sending data stolen from thevictim’s machine. The fourth feature (totalUrl)counts the total number of the URLs into aJavaScript function, while the fifth feature (ex-tUrl) computes the percentage of URLs outsidethe domain of the WUA.

We build a classifier by using these five fea-tures in order to distinguish malicious web appli-cations from trusted ones; the classifier runs sixclassification algorithms.

The paper poses two research questions:– RQ1: can the five features be used for discrim-

inating malicious from trusted web pages?– RQ2: does a combination of the features exist

that is more effective than a single feature todistinguish malicious web pages from trustedones?

The paper proceeds as follows: next section dis-cusses related work; the following section illus-trates the proposed method; the fourth sectiondiscusses the evaluation, and, finally, conclusionsare drawn in the last section.

2. Related Work

A number of approaches have been proposedin the literature to detect malicious web pages.Traditional anti-virus tools use static signaturesto match patterns that are commonly found inmalicious scripts [11]. As a countermeasure, com-plex obfuscation techniques have been devisedin order to hide malicious code to detectors thatscan the code for extracting the signature. Black-listing of malicious URLs and IPs [7] requiresthat the user trusts the blacklist provider andentails high costs for management of database,especially for guaranteeing the dependability ofthe information provided. Malicious websites, in

Malicious JavaScript Detection by Features Extraction 67

fact, change frequently the IP addresses espe-cially when they are blacklisted.

Others approaches have been proposedfor observing, analysing, and detectingJavaScript attacks in the wild, for exam-ple, using high-interaction honeypots [12–14] and low-interaction honeypots [15–17].High-interaction honey-clients assess the sys-tem integrity, by searching for changes to theregistry entries, and to the network connections,alteration of the file system, and suspect usage ofphysical resources. This category of honey-clientsis effective, but entails high computational costs:they have to load and run the web applica-tion for analysing it, and nowadays websitescontain a large number of heavy components.Furthermore, high-interaction honey-clients areineffective with time-based attacks, and mosthoney-clients’ IPs are blacklisted in the deepweb, or they can be identified by an attackeremploying CAPTCHAs [7].

Low-interaction honey-clients reproduce au-tomatically the interaction of a human user withthe website, within a sandbox. These tools com-pare the execution trace of the WUA with asample of signatures: this makes this techniqueto fail against zero-day attacks.

Different systems have been proposed foroff-line analysis of JavaScript code [3, 18–20].While all these approaches are successful with re-gard to the malicious code detection, they sufferfrom a serious weakness: they require a signifi-cant time to perform the analysis, which makesthem inadequate for protecting users at run-time.Dewald [21] proposes an approach based on asandbox to analyse JavaScript code by mergingdifferent approaches: static analysis of sourcecode, searching forbidden IFrames and dynamicanalysis of JavaScript code’s behaviour.

Concurrently to these offline approaches, sev-eral authors focused on the detection of spe-cific attack types, such as heap-spraying at-tacks [22,23] and drive-by downloads [24]. Theseapproaches search for symptoms of certain at-tacks, for example the presence of shell-code inJavaScript strings. Of course, the main limitationis that such approaches cannot be used for allthe threats.

Recent work has combined JavaScript anal-ysis with machine learning techniques for deriv-ing automatic defences. Most notably are thelearning-based detection systems Cujo [25], Zoz-zle [26], and IceShield [27]. They classified mal-ware by using different features, respectively:q-grams from the execution of Javascript, con-text’s attributes obtained from AST and someDOM tree’s characteristics. Revolver [28] aimsat finding high similarity between the WUA anda sample of known signatures. The authors ex-tract and compare the AST structures of the twoJavaScripts. Blanc et al. [29] make use of ASTfingerprints for characterizing obfuscating trans-formations found in malicious JavaScripts. Themain limitation of this technique is the high falsenegatives rate due to the quasi similar subtrees.

Clone detection is a direction explored bysome researchers [2, 30], consisting on findingsimilarity among WUA and known JavaScriptfragments. This technique can be effective inmany cases but not all, because some attackscan be completely original.

Wang et al. [31] propose a method forblocking JavaScript extensions by interceptingCross-Platform Component Object Model calls.This method is based on the recognition of pat-terns of malicious calls; misclassification couldoccur with this technique so innocent JavaScriptextensions could be signaled as malicious. Baruaet al. [32] also faced the problem of protect-ing browsers from JavaScript injections of ma-licious code by transforming the original andlegitimate code with a key. By this way, theinjected code is not recognized after the deci-phering process and thus detected. This methodis applicable only to the code injection attacks.Sayed et al. [33] deal with the problem of de-tecting sensitive information leakage performedby malicious JavaScript. Their approach relieson a dynamic taint analysis of the web pagewhich identifies those parts of the informationflow that could be indicators of a data theft.This method does not apply to those attackswhich do not entail sensitive data exfiltration.Schutt et al. [34] propose a method for earlyidentification of threats within javascripts atruntime, by building a classifier which uses the


events produced by the code as features. A rel-evant weakness of the method is representedby evasion techniques, described by authorsin the paper, which are able to decrease theperformance of the classification. Tripp et al.[35] substitute concrete values with some spe-cific properties of the document object. Thisallows for a preliminary analysis of threatswithin the JavaScript. The method seems tonot solve the problem of code injection. Xuand colleagues [36] propose a method whichcaptures some essential characteristics of obfus-cated malicious code, based on the analysis offunction invocation. The method demonstratedto be effective, but the main limitation is itspurpose: it just detects obfuscated (malicious)JavaScripts, but does not recognize other kindsof threats.

Cova et al. [3] make use of a set of features toidentify malicious JavaScript including the num-ber and target of redirections, the browser per-sonality and history-based differences, the ratioof string definition and string uses, the numberof dynamic code executions and the length ofdynamically evaluated code. They proposed anapproach based on an anomaly detection system;our approach is similar but different because usesthe classification.

Wang et al. [37] combine static analysis andprogram execution to obtain a call graph usingthe abstract syntax tree. This could be very ef-fective with attacks that reproduce other attacks(this practice is very common among inexperi-enced attackers, known also as “script-kiddies”)but it is ineffective with zero-day attacks.

Yue et al. [38] focus on two types of insecurepractices: insecure JavaScript inclusion and inse-cure JavaScript dynamic generation. Their workis a measurement study focusing on the countingof URLs, as well as on the counting of the eval()and the document.write() functions.

Techniques known as language-based sand-boxing [33, 39–42] aimed at isolating the un-trusted JavaScript content from the originalwebpage. BrowserShield [43], FBJS from Face-book [44], Caja from Google [45], and AD-safe which is widely used by Yahoo [39], areexamples of this technique. It is very effec-

tive when coping with widget and mashupwebpages, but it fails if the web page con-tains embedded malicious code. A relevant lim-itation of this technique is that third par-ties’ developers are forced to use the Soft-ware Development Kits delivered by sandboxes’producers.

Ismail et al. [46] developed a method whichdetects XSS attacks with a proxy that analy-ses the HTTP traffic exchanged between theclient (web browser) and the web application.This approach has two main limitations. Firstly,it only detects reflected XSS, also known asnon-persistent XSS, where the attack is per-formed through a single request and response.Second, the proxy is a possible bottleneck forperformance as it has to analysing all the re-quests and responses transmitted between theclient and the server. In [47], Kirda et al. proposea web proxy that analyses dynamically generatedlinks in web pages and compares those links witha set of filtering rules for deciding if they aretrusted or not. The authors leverage a set ofheuristics to generate filtering rules and thenleave the user to allow or disallow suspiciouslinks. A drawback of this approach if that involv-ing users might negatively affect their browsingexperience.

3. The Proposed Method

Our method extracts three classes of featuresfrom a web application: JavaScript executiontime, calls to JavaScript functions and URLsreferred by the WUA’s JavaScript.

To gather the required information we use:1. dynamic analysis, for collecting information

about the execution time of JavaScript codewithin the WUA and the called functions;

2. static analysis, to identifying all the URLsreferred in the WUA within and outside thescope of the JavaScript.

The first feature computes the average executiontime required by JavaScript function:

avgExecTime = 1n

n∑k=1

ti


where: ti is the execution time of the i-thJavaScript function, and n is the number ofJavaScript functions in the WUA.

The second feature computes the maximumexecution time of all JavaScript functions:

maxExecTime = max(ti)where ti is the execution time of the i-thJavaScript function in the WUA.

The third feature computes the number offunctions calls made by the JavaScript code:

funcCalls =n∑

i=1ci

where n is the is the number of JavaScript func-tions in the WUA, and ci is the number of callsfor the i-th function.

The fourth feature computes the total numberof URLs retrieved in a web page:

totalUrl =m∑

i=0ui

where: ui is the number of times the i-th urlis called by a JavaScript function and m is thenumber of different urls referenced within theWUA.

The fifth feature computes the percentage ofexternal URLs referenced within the JavaScript:

extUrl =

j∑k=0

uk

m∑i=0

ui

∗ 100

where: uk is the number of times the k-th url iscalled by a JavaScript function, for j differentexternal URLs referenced within the JavaScript,while ui is the number of times the i-th url iscalled by a JavaScript function, form total URLsreferenced within the JavaScript.

We used these features for building severalclassifiers. Specifically, six different algorithmswere run for the classification, by using the Wekasuite [48]: J48, LADTree, NBTree, RandomFor-est, RandomTree and RepTree.

3.1. Implementation

The features extracted from the WUA by dy-namic analysis were:– execution time;– calls to javascript function;– number of function calls made by the

javascript code.The features extracted from the WUA by staticanalysis were:– number of URLs retrieved in the WUA;– URLs referenced within the WUA.The dynamic features were captured withChrome developer [49], a publicly available toolfor profiling Web Applications. Each WUA wasopened with a Chrome browser for a fixed timeof 50 seconds, and the Chrome developer toolperformed a default exploration of the WUA,mimicking user interaction and collecting thedata with the Mouse and Keyboard Recordertool [50], a software able to record all mouse andkeyboard actions, and then repeat all the actionsaccurately.

The static analysis aimed at capturing all theURLs referenced in the JavaScript files includedin the WUA. URLs were recognized throughregular expressions: when an URL was found, itwas compared with the domain of the WUA: ifthe URL’s domain was different from the WUA’sdomain, it was tagged as an external URL.

We have created a script to automate the dataextraction process. The script takes as input alist of URLs to analyse and perform the followingsteps:– step 1: start the Chrome browser;– step 2: start the Chrome Dev Tools on the

panel Profiles;– step 3: start the tool for profiling;– step 4: confirm the inserted URL as parame-

ter to the browser and waiting for the timerequired to collect profiling data;

– step 5: stop profiling;– step 6: save data profiling in the file system;– step 7: close Chrome;– step 8: start the Java program to parse pro-

filing saved;– step 9: extract the set of dynamic features;– step 10: save source code of the WUA;


– step 11: extract the set of static features;– step 12: save the values of the features ex-

tracted into a database.The dynamic features of the WUA are extractedfrom the log obtained with the profiling.

4. Experimentation

The aim of the experimentation is to evaluate theeffectiveness of the proposed features, expressedthrough the research questions RQ1 and RQ2.

The experimental sample included a set of5000 websites classified as “malicious”, while thecontrol sample included a set of 5000 websitesclassified as “trusted”.

The trusted samples includes URLs belongingto a number of categories, in order to make theresults of experimentation independent of thetype of web-site: Audio-video, Banking, Cook-ing, E-commerce, Education, Gardening, Gov-ernment, Medical, Search Engines, News, News-papers, Shopping, Sport News, Weather.

As done by other authors [33] the trustedURLs were retrieved from the repository “Alexa”[51], which is an index of the most visited web-sites. For the analysis, the top ranked websitesfor each category were selected, which weremostly official websites of well-known organi-zations. In order to have a stronger guaran-tee that the websites were not phishing web-sites or did not contain threats, we submit-ted the URLs to a web-based engine, Virus-Total [52], which checks the reliability of theweb sites, by using anti-malware software andby searching the web site URLs and IPsin different blacklists of well-known antiviruscompanies.

The “malicious” sample was built from therepository hpHosts [53], which provides a clas-sification of websites containing threats sortedby the type of the malicious attack they per-form. Similarly to the trusted sample, websitesbelonging to different threat’s type were chosen,in order to make the results of the analysis in-dependent of the type of threat. We retrievedURLs from various categories: sites engaged inmalware distribution, in selling fraudulent ap-

plications, in the use of misleading marketingtactics and browser hijacking, and sites engagedin the exploitation of browser and OS vulner-abilities. For each URL belonging to the twosamples, we extracted the five features definedin section 3.

Two kinds of analysis were performed ondata: hypothesis testing and classification. Thetest of hypothesis was aimed at understandingwhether the two samples show a statisticallysignificant difference for the five features. Thefeatures that yield the most relevant differencesbetween the two samples were then used for theclassification.

We tested the following null hypothesis:H0 : malware and trusted websites have sim-

ilar values of the proposed features.The H0 states that, given the i-th feature fi,

if fiT denotes the value of the feature fi measuredon a trusted web site, and fiM denoted the valueof the same feature measured on a malicious website:

σ(fiT ) = σ(fiM ) for i = 1, . . . , 5

being σ(fi) the means of the (control or experi-mental) sample for the feature fi.

The null hypothesis was tested withMann-Whitney (with the p-level fixed to 0.05)and with Kolmogorov-Smirnov Test (with thep-level fixed to 0.05). Two different tests of hy-potheses were performed in order to have astronger internal validity since the purpose isto establish that the two samples (trusted andmalicious websites) do not belong to the samedistribution.

The classification analysis was aimed atassessing whether the features where able tocorrectly classify malicious and trusted WUA.Six algorithms of classification were used: J48,LadTree, NBTree, RandomForest, RandomTree,RepTree. Similarly to hypothesis testing, differ-ent algorithms for classification were used forstrengthening the internal validity.

These algorithms were first applied to eachof the five features and then to the groups offeatures. As a matter of fact, in many cases aclassification is more effective if based on groupsof features rather than a single feature.


4.1. Analysis of Data

Figure 1 illustrates the boxplots of each feature.Features avgExecTime, maxExecTime and func-Calls exhibit a greater gap between the distribu-tions of the two samples.

Features totalUrl, and extUrl do not exhibitan evident difference between trusted and mali-cious samples. We recall here that totalUrl countsthe total number of URLs in the JavaScript,while extUrl is the percentage of URLs outsidethe WUA domain contained in the script. A pos-sible reason why these two features are similarfor both the samples is that trusted websites mayinclude external URLs due to external bannersor to external legal functions and componentsthat the JavaScript needs for execution (images,flash animation, functions of other websites thatthe author of the WUA needs to recall). Usingexternal resources in a malicious JavaScript is notso uncommon: examples are drive by downloadand session hijacking. External resources can beused when the attacker injects a malicious webpage into a benign website and needs to lead thewebsite user to click on a malicious link (whichcan not be part of the benign injected website).

We expect that extending this analysis to thecomplete WUA (not limited to JavaScript code)could produce different results: this goal will beincluded in the future work.

On the contrary, features avgExecTime, max-ExecTime and funcCalls seem to be more ef-fective in distinguishing malicious from trustedwebsites, which supports our assumptions.

Malware requires more execution time thantrusted script code because of many reasons(avgExecTime, maxExecTime). Malware may re-quire more computational time for performingmany attempts of the attack till it succeeds. Ex-amples may be: complete memory scanning, al-teration of parameters, and resources occupation.

Some kinds of malware aim at obtaining thecontrol of the victim machine and the commandcentre, once infected the victim, could occupycomputational resources of the victim for sendingand executing remote commands. Furthermore,some other kinds of malware could require timebecause they activate secondary tasks like down-loading and running additional malware, as inthe case of drive-by-download.

The feature funcCalls suggests that trustedwebsites have a larger number of functions called

Figure 1. Boxplots of features


or function-calls. Our hypothesis for explainingthis finding is that trusted websites need callingmany functions for executing the business logicof the website, like data field controls, third partyfunctions such as digital payment, elaborations ofuser inputs, and so on. On the contrary, maliciouswebsites have the only goal to perform the at-tack, so they are poor of business functions withthe only exception for the payload to execute.Instead, they perform their attack at regularintervals; for this reason malicious WUAs show ahigher value of avgExecTime and maxExecTimewith respect to the trusted ones.

In order to optimize the client-server interac-tion, the trusted website could have many func-tions but usually with a low computational time,in order to avoid impacting on the website usabil-ity. This allows, for example, performing controls,such as data input validation, on the client sideand sending to the server only valid data.

The hypothesis test produced evidence thatthe features have different distributions in thecontrol and experimental sample, as shown inTable 1.

Summing up, the null hypothesis can be re-jected for the features avgExecTime, maxExec-Time, funcCalls, totalUrl and extUrl.

With regard to classification, the train-ing set T consisted of a set of labelled webapplications (WUA, l) where the label l ∈{trusted,malicious}. For each WUA we built afeature vector F ∈ Ry, where y is the number ofthe features used in training phase (1 ≤ y ≤ 5).To answer to RQ1 we performed five differentclassifications each with a single feature (y = 1),while for RQ2 we performed three classificationswith 2 ≤ y ≤ 5).

We used k-fold cross-validation: the datasetwas randomly partitioned into k subsets of data.A single subsets of data was retained as the

validation data for testing the model, while theremaining k − 1 subsets was used as trainingdata. We repeated the process k times, each ofthe k subsets of data was used once as validationdata. To obtain a single estimate we computedthe average of the k results from the folds.

Specifically, we performed a 10-fold cross val-idation. Results are shown in Table 2. The rowsrepresent the features, while the columns repre-sent the values of the three metrics used to eval-uate the classification results (precision, recalland roc-area) for the recognition of malware andtrusted samples. The Recall has been computedas the proportion of examples that were assignedto class X, among all examples that truly belongto the class, i.e. how much part of the class wascaptured. The Recall is defined as:

Recall = tp

tp+ fn

where tp indicates the number of true positivesand fn is the number of false negatives.

The Precision has been computed as the pro-portion of the examples that truly belong to classX among all those which were assigned to theclass, i.e.:

Precision = tp

tp+ fp

where fp indicates the number of false positives.The Roc Area is the area under the ROC

curve (AUC), it is defined as the probability thata randomly chosen positive instance is rankedabove randomly chosen negative one. The classi-fication analysis with the single features suggestsseveral considerations.

With regards to the recall:– generally the classification of malicious web-

sites is more precise than the classification oftrusted websites.

Table 1. Results of the test of the null hypothesis H0

Variable Mann-Whitney Kolmogorov-SmirnovavgExecTime 0.000000 p<.001maxExecTime 0.000000 p<.001funcCalls 0.000000 p<.001totalUrl 0.000000 p<.001extUrl 0.002233 p<.001


Table 2. Precision, Recall and RocArea obtained by classifying Malicious and Trusted dataset, using thesingle features of the model, with the algorithms J48, LadTree, NBTree, RandomForest, RandomTree and

RepTree.

Features Algorithm Precision Recall RocAreaMalware Trusted Malware Trusted Malware Trusted

J48 0.872 0.688 0.597 0.898 0.741 0.741LADTree 0.836 0.691 0.606 0.881 0.789 0.789

avgExecT ime NBTree 0.872 0.686 0.59 0.895 0.771 0.771RandomForest 0.744 0.758 0.759 0.744 0.762 0.762RandomTree 0.773 0.762 0.762 0.763 0.766 0.766RepTree 0.971 0.735 0.704 0.819 0.725 0.725J48 0.657 0.635 0.606 0.684 0.675 0.675LADTree 0.638 0.663 0.69 0.606 0.691 0.691

maxExecT ime NBTree 0.672 0.634 0.587 0.713 0.654 0.654RandomForest 0.683 0.703 0.718 0.667 0.775 0.775RandomTree 0.678 0.708 0.731 0.653 0.782 0.782RepTree 0.663 0.686 0.706 0.641 0.724 0.724J48 0.928 0.677 0.582 0.876 0.722 0.722LADTree 0.816 0.678 0.587 0.868 0.75 0.75

funcCalls NBTree 0.824 0.677 0.582 0.896 0.727 0.727RandomForest 0.784 0.719 0.629 0.772 0.672 0.672RandomTree 0.782 0.683 0.646 0.765 0.696 0.696RepTree 0.763 0.675 0.686 0.788 0.787 0.787J48 0.615 0.552 0.381 0.762 0.603 0.603LADTree 0.566 0.555 0.511 0.609 0.6 0.6

totalUrl NBTree 0.607 0.533 0.284 0.717 0.565 0.565RandomForest 0.624 0.653 0.689 0.585 0.691 0.691RandomTree 0.619 0.655 0.7 0.57 0.691 0.691RepTree 0.617 0.609 0.595 0.631 0.66 0.66J48 0.514 0.7 0.993 0.061 0.527 0.527LADTree 0.51 0.704 0.972 0.066 0.512 0.512

extUrl NBTree 0.514 0.716 0.992 0.062 0.527 0.527RandomForest 0.513 0.513 0.992 0.061 0.532 0.532RandomTree 0.514 0.787 0.993 0.061 0.527 0.527RepTree 0.514 0.597 0.593 0.561 0.527 0.527

This could be due to the fact that sometrusted websites have values for the featurescomparable with the ones measured for ma-licious ones. This is evident by looking atthe boxplots (figure 1), which show an areaof overlapping between the boxplots of thetrusted and malicious websites. The prob-lem is that some trusted websites could havevalues comparable with the malware whileothers do not. As a matter of fact, sometrusted WUAs can contain more businessfunctions than other ones, and require moreclient machine resources, and so on. This de-pends on the specific business goals of eachtrusted website. And, consequently, on thetype and numbers of the functions that must

be implemented for supporting the businessgoals. Except for funcCalls, the trusted web-sites’ sample include a greater number ofoutliers than the malware sample, which isthe main cause of the misclassifications oftrusted websites and supports our explana-tion.

– the feature extUrl is the best in termsof recall regarding the malicious websites;in fact, its value is 0.993 using the algo-rithms of classification J48 and RandomTree.This feature is able to reduce the false neg-atives in malicious detection because ex-ternal URLs are commonly used by mali-cious websites, for the reasons previously dis-cussed.


– regarding the recall inherent the recogni-tion of the trusted websites, the best fea-ture is avgExecTime (recall is 0.898 withthe J48 classification algorithm). This con-firms the conjecture that malicious scriptstend to be more resource demanding thantrusted ones.

With regards to the precision:– the features avgExecTime and funcCalls are

the best for the detection of the maliciousJavaScript, with values, respectively of 0.971(with the algorithm RepTree) and 0.928 (withthe algorithm J48). This strengthens the con-jecture that trusted websites make use of lesscomputational time and a larger number offunctions than malicious websites.

– the precision in the classification of sitescategorized as trusted shows the maximumvalue 0.787 (classification of the feature ex-tUrl with the algorithm RandomTree). Thisvalue is largely unsatisfactory, and it will beimproved by using combinations of features,as discussed later in this section.

With regards to the roc area:– the performances of all the algorithms are

pretty the same for malware and trusted ap-plications.

– the feature avgExecTime presents the max-imum roc-area value equal to 0.789 withLADTree algorithm. Reasons have been dis-cussed previously, even if it cannot be consid-ered a good value.In order to make the classification more ef-

fective, we run the classification algorithms byusing groups of features. The first group includesthe features avgExecTime and funcCalls, whilethe second includes avgExecTime, funcCalls, andextUrl. Finally, the last group is made up ofall the five features extracted. The groups weremade on the basis of the classification results ofindividual features, in order to improve both theprecision and the recall of the classification.

avgExecTime and funcCalls were the best inclass, so we grouped them together. In particular,these features were grouped together in order toobtain the maximum precision value for detectingmalicious web applications.

avgExecTime, funcCalls, and extUrl weregrouped together in order to obtain the max-imum precision value in the detection of trustedapplications. We excluded maxExecTime andtotalUrl from the second phase of classification,because they produced the worst results in thefirst phase of classification.

The classification of the groups of featuresconfirms (shown in Table 3) our expectations.The first set of features, avgExecTime and func-Calls, presents the maximum precision regardingmalicious websites, corresponding to 0.982 withthe classification algorithm J48, while in the de-tection of the trusted web sites the precision is0.841 with the classification algorithm REPTree.Compared to the individual features we havetherefore an improvement, in fact avgExecTimehad a precision of 0.971 while funcCalls showed aprecision 0.928 in the recognition of malware web-sites. The recall for malicious websites is 0.873with the classification algorithm J48, while fortrusted sites it is 0.897 with the classificationalgorithm NBTree. With respect to the recog-nition of malicious websites we have registeredan improvement, as with individual features theobtained values were respectively 0.762 (avgEx-ecTime) and 0.686 (funcCalls). With respect tothe trusted websites, the situation is pretty sim-ilar, as the values of single features were 0.898(avgExecTime) and 0.896 (funcCalls), i.e. slightlygreater.

The second group (avgExecTime, funcCalls,extUrl) is very close to the first group (two valuesare slightly higher and two are slightly lower), butprecision and recall are higher than the secondgroup.

We can conclude that the best classificationis based on– avgExecTime, funcCalls, i.e. the average ex-

ecution time (avgExecTime) and the cumu-lative number of function calls done by eachportion of JavaScript code (funcCalls);

– avgExecTime, funcCalls, extUrl, i.e. the set ofthe features of the first group classified alongwith the percentage of external domain URLsthat do not belong to the Web Application’sdomain (extUrl).


Table 3. Precision, Recall and RocArea obtained by classifying Malicious and Trusted dataset, using thethree groups of features of the model, with the algorithms J48, LadTree, NBTree, RandomForest,

RandomTree and RepTree

Features Algorithm Precision Recall RocAreaMalware Trusted Malware Trusted Malware Trusted

J48 0.982 0.823 0.873 0.88 0.888 0.888LADTree 0.87 0.801 0.784 0.873 0.872 0.857

avgExecT ime NBTree 0.848 0.686 0.59 0.897 0.779 0.779funcCalls RandomForest 0.84 0.825 0.815 0.797 0.985 0.985

RandomTree 0.824 0.818 0.779 0.768 0.977 0.977RepTree 0.871 0.841 0.824 0.856 0.913 0.913J48 0.873 0.842 0.835 0.879 0.885 0.885LADTree 0.86 0.801 0.784 0.873 0.857 0.857

avgExecT ime NBTree 0.848 0.69 0.599 0.893 0.789 0.789funcCalls RandomForest 0.969 0.978 0.978 0.969 0.985 0.985extUrl RandomTree 0.97 0.979 0.98 0.97 0.979 0.979

RepTree 0.867 0.852 0.849 0.87 0.918 0.918J48 0.875 0.878 0.879 0.874 0.922 0.922

avgExecT ime LADTree 0.858 0.804 0.788 0.87 0.858 0.858maxExecT ime NBTree 0.847 0.72 0.657 0.881 0.827 0.827funcCalls RandomForest 0.979 0.984 0.985 0.979 0.992 0.992totalUrl RandomTree 0.982 0.978 0.978 0.982 0.98 0.98extUrl RepTree 0.877 0.871 0.87 0.879 0.927 0.927

Although the proposed features show to be ef-fective in detecting malicious javascript, mis-classification occurs however. The explanationmaybe the following: each feature represents anindicator of the possibility that the JavaScriptis malicious, rather than offering the certainty.The fact that in average a malicious JavaScriptrequires a longer execution time (as shown byboxplots) does not mean that all the benignJavaScripts require a small execution time (asoutliers in boxplots show). Many payloads con-tained in malicious javascript entail a long timeto be executed, but also some business logicof benign javascripts may require long time tobe executed. For instance, a benign javascriptmay contain a multimedia file. The same ex-planation applies to justify the presence of mis-classification for all the other features. Benignfiles could have a smaller fragmentation becausethey have a simpler business logic or becauseof the style of the programmer who has writtenthe code.

Finally, the number of the external URLs maybe high in a benign websites for several reasons:the benign websites make use of many resourcesor services hosted in other websites, or it hasmany advertisement links in its pages.

5. Conclusion and Future Work

In this paper we propose a method for detectingmalicious websites that uses a classification basedon five features.

Current detection’s techniques usually failagainst zero-day attacks and websites that mergeseveral techniques. The proposed method shouldovercome these limitations, since its independentof the implementation of the attack and the typeof the attack.

The selected features, combining static anddynamic analysis, respectively compute the aver-age and maximum execution time of a JavaScriptfunction, the number of functions invoked bythe JavaScript code, and finally the number andthe percentage of the URLs contained in theJavaScript code, but that are outside the domainof the WUA.

The analysis of data collected by analysing asample of 5000 trusted and 5000 untrusted web-sites demonstrated that considering groups offeatures for classifications, rather than single fea-tures, produces better performances. As matterof fact the group (avgExecTime and funcCalls)and the group (avgExecTime, funcCalls, extUrl)produce high values of precision and recall, both


for the recognition of malicious websites, and fortrusted websites. Regarding to the second group(avgExecTime, funcCalls, extUrl), the precision is0.979 for malware websites and 0.969 for trustedones. The recall is 0.978 for malware websitesand 0.969 for trusted ones.

In summary, the two groups of featuresseem effective for detecting current maliciousJavaScripts.

Possible evasion techniques that attackerscan assume against this detection method arethe following.

Concerning avgExecTime and funcCalls, theattacker should reduce the time of scripts ex-ecution and improve the fragmentation of thecode. The first workaround is very difficult toimplement, because the large amount of time isoften a needed condition of the attacks performed.Improving the fragmentation of code is possible,but as the attacker should produce a number offunctions similar to a typical trusted website, therequired effort could make very expensive thedevelopment of the malicious website, and thiscould be discouraging. As shown in the boxplot(figure 1), the gap to fill is rather large. Ouropinion is that extUrl is the weakest feature andso the easiest to evade, but it must be consid-ered that in the group of features (avgExecTime,funcCalls, extUrl) the strength of the other onesmay compensate its weakness.

Obfuscation is an evasion technique thatcould be effective especially with regards tofuncCalls, totalUrl and extUrl features; futureworks will address this problem by studying:i) the impact of obfuscated JavaScript on theclassification performances of our method; andii) de-obfuscation methods to precisely cal-culate these features. Many benign websitesmay make use of external libraries which arehighly time-consuming. In a future work wewill investigate the possibility to recognize thesetime-consuming external libraries in order to ex-clude them from the computation of the feature.Additionally we plan to enforce the reliabilityof our findings by extending the experimenta-tion to a larger sample, in order to enforce theexternal validity. Another improvement of ourmethod consists of extending the search of URLs

to the complete WUA, and not limiting it to theJavaScript scope.

References

[1] D. Flanagan, JavaScript: The Definitive Guide,4th ed. O’Reilly Media, 2001. [Online]. http://shop.oreilly.com/product/9780596000486.do

[2] “Javascript and timing attacks usedto steal browser data,” Blackhat 2013,last visit 19th June 2014. [Online].http://threatpost.com/JavaScript-and-timing-attacks-used-to-steal-browser-data/101559

[3] M. Cova, C. Kruegel, and G. Vigna, “Detectionand analysis of drive-by-download attacks andmalicious JavaScript code,” in Proc. of the Inter-national World Wide Web Conference (WWW),2010, pp. 281–290.

[4] C. Eilers, HTML5 Security. Developer Press,2013.

[5] O. Hallaraker and G. Vigna, “Detecting mali-cious JavaScript code in mozilla,” in Proceedingsof the 10th IEEE International Conference ofEngineering of Complex Computer System, 2005,pp. 85–94.

[6] “Web workers, W3C candidate recommenda-tion,” 2012, last visit 19th June 2014. [Online].http://www.w3.org/TR/workers/

[7] B. Eshete, “Effective analysis, characterization,and detection of malicious web page,” in Pro-ceedings of the 22nd International Conferenceon World Wide Web companion. InternationalWorld Wide Web Conferences Steering Commit-tee, 2013, pp. 355–360.

[8] L. Martignoni, R. Paleari, and D. Bruschi, “Aframework for behavior-based malware analy-sis in the cloud,” in Proceedings of the 5th In-ternational Conference on Information SystemsSecurity, 2009, pp. 178–192.

[9] M. F. Zolkipli and A. Jantan, “An approach formalware behavior identification and classifica-tion,” in Proceedings of International Conferenceof Computer Research and Development, 2011.

[10] C. Ardito, P. Buono, D. Caivano, M. Costabile,and R. Lanzilotti, “Investigating and promotingUX practice in industry: An experimental study,”International Journal of Human-Computer Stud-ies, Vol. 72, No. 6, 2014, pp. 542–551.

[11] “ClamAV. Clam antivirus.” last visit 19th June2014. [Online]. http://clamav.net

[12] N. Provos, P. Mavrommatis, M. A. Rajab, andF. Monrose, “All your iFRAMEs point to us,”in Proc. of USENIX Security Symposium, 2008.

http://shop.oreilly.com/product/9780596000486.do

http://shop.oreilly.com/product/9780596000486.do

http://threatpost.com/JavaScript-and-timing-attacks-used-to-steal-browser-data/101559

http://threatpost.com/JavaScript-and-timing-attacks-used-to-steal-browser-data/101559

http://www.w3.org/TR/workers/

http://clamav.net


[13] C. Seifert and R. Steenson, “Capture hon-eypot client (capture hpc),” Victoria Uni-versity of Wellington, NZ, 2006. [Online].https://projects.honeynet.org/capture-hpc

[14] Y. M. Wang, D. Beck, X. Jiang, R. Roussev,C. Verbowsk, S. Chen, and S. T. King, “Auto-mated web patrol with strider honeymonkeys:Finding web sites that exploit browser vulner-abilities,” in Proc. Of Network and DistributedSystem Security Symposium (NDSS), 2006.

[15] A. Büscher, M. Meier, and R. Benzmüller,“Throwing a monkeywrench into web attackersplans,” in Proc. Of Communications and Multi-media Security (CMS), 2010, pp. 28–39.

[16] A. Ikinci, T. Holz, and F. Freiling,“Monkey-spider: Detecting malicious web-sites with low-interaction honeyclients,” inProc. of Conference "Sicherheit, Schutz undZuverl’́assigkeit (SICHERHEIT), 2008, pp.891–898.

[17] J. Nazario, “A virtual client honeypot,” in Proc.Of USENIX Workshop on Large-Scale Exploitsand Emergent Threats (LEET), 2009.

[18] D. Canali, M. Cova, G. Vigna, and C. Kruegel,“Prophiler: a fast filter for the large-scale de-tection of malicious web pages,” in Proc. ofthe International World Wide Web Conference(WWW), 2011, pp. 197–206.

[19] S. Karanth, S. Laxman, P. Naldurg, R. Venkate-san, J. Lambert, and J. Shin, “Zdvue: Prioriti-zation of JavaScript attacks to discover new vul-nerabilities,” in Proceedings of the Fourth ACMWorkshop on Artificial Intelligence and Security(AISEC 2011), 2011, pp. 637–652.

[20] C. Kolbitsch, B. Livshits, B. Zorn, and C. Seifert,“Rozzle: De-cloaking internet malware,” Mi-crosoft Research, Tech. Rep. MSR-TR-2011-94,2011. [Online]. http://research.microsoft.com/pubs/152601/rozzle-tr-10-25-2011.pdf

[21] A. Dewald, T. Holz, and F. Freiling, “ADSand-box: sandboxing JavaScript to fight maliciouswebsites,” in Proceedings of the 2010 ACM Sym-posium on Applied Computing (SAC ’10), 2010,pp. 1859–1864.

[22] M. Egele, P. Wurzinger, C. Kruegel, andE. Kirda, “Defending browsers against drive-bydownloads: Mitigating heap-spraying code injec-tion attacks,” in In Detection of Intrusions andMalware & Vulnerability Assessment (DIMVA),2009, pp. 88–106.

[23] P. Ratanaworabhan, B. Livshits, and B. Zorn,“Nozzle: A defense against heap-spraying codeinjection attacks,” in Proc. of USENIX SecuritySymposium, 2009.

[24] L. Lu, V. Yegneswaran, P. A. Porras, and W. Lee,“Blade: An attack-agnostic approach for prevent-ing drive-by malware infections,” in Proc. ofConference on Computer and CommunicationsSecurity (CCS), 2010, pp. 440–450.

[25] K. Rieck, T. Krueger, and A. Dewald,“Cujo: Efficient detection and prevention ofdrive-by-download attacks,” in 26th AnnualComputer Security Applications Conference (AC-SAC), 2010, pp. 31–39.

[26] C. Curtsinger, B. Livshits, B. Zorn, andC. Seifert, “Zozzle: Fast and precise in-browserJavaScript malware detection,” in Proc. ofUSENIX Security Symposium, 2010, pp. 3–3.

[27] M. Heiderich, T. Frosch, and T. Holz, “Iceshield:Detection and mitigiation of malicious websiteswith a frozen dom,” in Proceedings of RecentAdances in Intrusion Detection (RAID), 2011,pp. 281–300.

[28] A. Kapravelos, Y. Shoshitaishvili, M. Cova,C. Kruegle, and G. Vigna, “Revolver: An au-tomated approach to the detection of eva-sive web-based malware,” in Proceedings of the22nd USENIX conference on Security, 2013, pp.637–652.

[29] G. Blanc, D. Miyamoto, M. Akiyama, andY. Kadobayashi, “Characterizing obfuscatedJavaScript using abstract syntax trees: Experi-menting with malicious scripts,” in Proceedingsof International Conference of Advanced Infor-mation Networking and Applications Workshops,2012.

[30] C. K. Roy and J. R. Cordy, “A survey on softwareclone detection research,” School of ComputingQueen’s University at Kingston, Ontario, TR2007-541, 2007.

[31] P. Wang, L. Wang, J. Xiang, P. Liu, N. Gao, andJ. Jing, “MJBlocker: A lightweight and run-timemalicious JavaScript extensions blocker,” in Pro-ceedings of International Conference on SoftwareSecurity and Reliability, 2013.

[32] A. Barua, M. Zulkernine, and K. Welde-mariam, “Protecting web browser extension fromJavaScript injection attacks,” in Proceedings ofInternational Conference of Complex ComputerSystems, 2013.

[33] B. Sayed, I. Traore, and A. Abdelhalim, “De-tection and mitigation of malicious JavaScriptusing information flow control,” in Proceedings ofTwelfth Annual Conference on Privacy, Securityand Trust (PST), 2014.

[34] K. Schutt, M. Kloft, A. Bikadorov, and K. Rieck,“Early detection of malicious behaviour inJavaScript code,” in Proceedings of AISec 2012,

https://projects.honeynet.org/capture-hpc

http://research.microsoft.com/pubs/152601/rozzle-tr-10-25-2011.pdf

http://research.microsoft.com/pubs/152601/rozzle-tr-10-25-2011.pdf


2012.[35] O. Tripp, P. Ferrara, and M. Pistoia, “Hybrid

security analysis of web JavaScript code via dy-namic partial evaluation,” in Proceedings of In-ternational Symposium on Software Testing andAnalysis, 2014.

[36] W. Xu, F. Zhang, and S. Zhu, “JStill:Mostly static detection of obfuscated maliciousJavaScript code,” in Proceedings of InternationalConference on Data and Application Security andPrivacy, 2013.

[37] Q. Wang, J. Zhou, Y. Chen, Y. Zhang, andJ.Zhao, “Extracting URLs from JavaScript viaprogram analysis,” in Proceedings of the 20139th Joint Meeting on Foundations of SoftwareEngineering, 2013, pp. 627–630.

[38] C. Yue and H. Wang, “Characterizing insecureJavaScript practices on the web,” in Proceedingsof the 18th international conference on Worldwide web, 2009, pp. 961–970.

[39] J. Politz, S. Eliopoulos, A. Guha, and S. Krish-namurthi, “Adsafety: Type-based verification ofJavaScript sasndboxing,” in Proceedings of the20th USENIX conference on Security, 2011.

[40] A. Guha, C. Saftoiu, and S. Krishnamurthi,“The essence of JavaScript,” in ECOOP2010-Object-Oriented, 2011, pp. 1–25.

[41] M. Finifter, J. Weinberger, and A. Barth, “Pre-venting capability leaks in secure JavaScript sub-stes,” in Proceedings of the Network and Dis-tributed System Security Symposium, 2010.

[42] A. Taly, U. Erlingsson, J. Mitchell, M. Miller,and J. Nagra, “Automated analysis ofsecurity-critical JavaScript apis,” in 2011 IEEESymposium on Security and Privacy, 2011, pp.363–379.

[43] C. Reis, J. Dunagan, H. J. Wang,

O. Dubrovsky, and S. Esmeir, “Browsershield:Vulnerability-driven filtering of dynamic html,”ACM Transactions on the Web, Vol. 1, No. 3,2007.

[44] “Facebook SDK for JavaScript,” lastvisit 13th October 2014. [Online]. https://developers.facebook.com/docs/javascript

[45] “Google Caja,” last visit 13th October 2014.[Online]. https://developers.google.com/caja/

[46] O. Ismail, M. Etoh, Y. Kadobayashi, and S. Yam-aguchi, “A proposal and implementation of auto-matic detection/collection system for cross-sitescripting vulnerability,” in Proceedings of the18th International Conference on Advanced In-formation Networking and Applications, Vol. 2,2014.

[47] E. Kirda, C.Kruegel, G. Vigna, and N. Jovanic,“Noxes: A client-side solution for mitigatingcross-site scripting attacks,” in Proceedings ofthe 2006 ACM symposium on Applied computing,2006, pp. 330–337.

[48] “Weka 3: Data mining software in Java,”last visit 19th June 2014. [Online].http://www.cs.waikato.ac.nz/ml/weka/

[49] “Chrome DevTools overview,” last visit 19thJune 2014. [Online]. https://developers.google.com/chrome-developer-tools/

[50] “Robot Soft - mouse and keyboard recorder,”last visit 13th October 2014. [Online].http://www.robot-soft.com/

[51] “Actionable analytics for the web,” last visit 19thJune 2014. [Online]. http://www.alexa.com/

[52] “VirusTotal,” last visit 19th June 2014. [Online].https://www.virustotal.com/

[53] “hpHosts onliine,” last visit 19th June 2014.[Online]. http://www.hosts-file.net/

https://developers.facebook.com/docs/javascript

https://developers.facebook.com/docs/javascript

https://developers.google.com/caja/

http://www.cs.waikato.ac.nz/ml/weka/

https://developers.google.com/chrome-developer-tools/

https://developers.google.com/chrome-developer-tools/

http://www.robot-soft.com/

http://www.alexa.com/

https://www.virustotal.com/

http://www.hosts-file.net/

Malicious JavaScript Detection by Features Extraction · Malicious JavaScript Detection by Features...

Documents

Transcript of Malicious JavaScript Detection by Features Extraction · Malicious JavaScript Detection by Features...