Identifying Mobile Repackaged Applications through Formal ... · and Corrado Aaron Visaggio 2 1...

10
Identifying Mobile Repackaged Applications through Formal Methods Fabio Martinelli 1 , Francesco Mercaldo 1 , Vittoria Nardone 2 , Antonella Santone 2 and Corrado Aaron Visaggio 2 1 Institute for Informatics and Telematics, National Research Council of Italy (CNR), Pisa, Italy 2 Department of Engineering, University of Sannio, Benevento, Italy Keywords: Security, Malware, Model Checking, Android, Testing. Abstract: Smartphones and tablets are rapidly become indispensable in every day activities. Android has become the most popular operating system for mobile environments in the world. These devices, owing to the open nature of Android, are continuously exposed to attacks, mostly to data exfiltration and monetary fraud. There are many techniques to embed the bad code, i.e. the instructions able to perform a malicious behaviour, into a legitimate application: the most diffused one is the so-called repackaged, that consists of reverse engineer the application in order to embed the malicious code and then (re)distribute them in the official and/or third party markets. In this paper we propose a technique to localize malicious payload of GinMaster family, one of the most representative repackaged trojan in Android environment. We obtain encouraging results, achieving an accuracy equal to 0.9. 1 INTRODUCTION In 2015, the volume of mobile malware continued to grow. From 2004 to 2013 security experts of Se- curList detected nearly 200,000 samples of malicious mobile code. In 2014 there were 295,539 new pro- grams, while the number was 884,774 in 2015. Each malware sample has several installation packages: in 2015, they detected 2,961,727 malicious mobile in- stallation packages (SecureList, 2015). Signature-based malware detection, which is the most common technique adopted by commercial an- timalware for mobile, is often ineffective. Moreover it is costly, as the process for obtaining and classifying a malware signature is laborious and time-consuming. In order to mitigate the malware trend in Febru- ary 2011, Google introduced Bouncer (GoogleMo- bile, 2014) to screen submitted apps for detecting malicious behaviors, but this has not eliminated the problem, as it is discussed in (Oberheide and Miller, 2012). The Fraunhofer Research Institution for Applied and Integrated Security has performed an evaluation of antivirus for Android (Fedler et al., 2014): the con- clusion is there are many techniques for evading the detection of most antivirus and for installing mali- cious payload. The most employed installation technique is the so-called repackaging (Zhou and Jiang, 2012): the attacker decompiles a trusted application to get the source code, then adds the malicious payload and re- compiles the application with the payload to make it available on various market alternatives, and some- times also on the official market. The user is often encouraged to download such malicious applications because they are free versions of trusted applications sold on the official market. Scientific community in last years has proposed a lot of static (Canfora et al., 2016; Liang and Du, 2014; Yerima et al., 2013; Arp et al., 2014; Spreitzenbarth et al., 2013) and dynamic (Isohara et al., 2011; Tchak- ount and Dayang, 2013; Reina et al., 2013) techniques basically based on machine learning methods in order to solve the problem: the main limitation in this case is due to the false positive ratio. Indeed existing solutions for protecting privacy and security on smartphones are still ineffective in many facets (Marforio et al., 2011), and many ana- lysts warn that the malware families and their vari- ants for Android are rapidly increasing. This scenario calls for new security models and tools to limit the spreading of malware for smartphones. For these reasons, in this paper we evaluate the effectiveness of a model-checking approach to iden- tify Android malware. We evaluate our method us- ing GinMaster malware, one of the most widespread Martinelli, F., Mercaldo, F., Nardone, V., Santone, A. and Visaggio, C. Identifying Mobile Repackaged Applications through Formal Methods. DOI: 10.5220/0006287906730682 In Proceedings of the 3rd International Conference on Information Systems Security and Privacy (ICISSP 2017), pages 673-682 ISBN: 978-989-758-209-7 Copyright c 2017 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved 673

Transcript of Identifying Mobile Repackaged Applications through Formal ... · and Corrado Aaron Visaggio 2 1...

Identifying Mobile Repackaged Applications through Formal Methods

Fabio Martinelli1, Francesco Mercaldo1, Vittoria Nardone2, Antonella Santone2

and Corrado Aaron Visaggio2

1Institute for Informatics and Telematics, National Research Council of Italy (CNR), Pisa, Italy2Department of Engineering, University of Sannio, Benevento, Italy

Keywords: Security, Malware, Model Checking, Android, Testing.

Abstract: Smartphones and tablets are rapidly become indispensable in every day activities. Android has become themost popular operating system for mobile environments in the world. These devices, owing to the open natureof Android, are continuously exposed to attacks, mostly to data exfiltration and monetary fraud. There aremany techniques to embed the bad code, i.e. the instructions able to perform a malicious behaviour, into alegitimate application: the most diffused one is the so-called repackaged, that consists of reverse engineer theapplication in order to embed the malicious code and then (re)distribute them in the official and/or third partymarkets. In this paper we propose a technique to localize malicious payload of GinMaster family, one of themost representative repackaged trojan in Android environment. We obtain encouraging results, achieving anaccuracy equal to 0.9.

1 INTRODUCTION

In 2015, the volume of mobile malware continuedto grow. From 2004 to 2013 security experts of Se-curList detected nearly 200,000 samples of maliciousmobile code. In 2014 there were 295,539 new pro-grams, while the number was 884,774 in 2015. Eachmalware sample has several installation packages: in2015, they detected 2,961,727 malicious mobile in-stallation packages (SecureList, 2015).

Signature-based malware detection, which is themost common technique adopted by commercial an-timalware for mobile, is often ineffective. Moreover itis costly, as the process for obtaining and classifying amalware signature is laborious and time-consuming.

In order to mitigate the malware trend in Febru-ary 2011, Google introduced Bouncer (GoogleMo-bile, 2014) to screen submitted apps for detectingmalicious behaviors, but this has not eliminated theproblem, as it is discussed in (Oberheide and Miller,2012).

The Fraunhofer Research Institution for Appliedand Integrated Security has performed an evaluationof antivirus for Android (Fedler et al., 2014): the con-clusion is there are many techniques for evading thedetection of most antivirus and for installing mali-cious payload.

The most employed installation technique is the

so-called repackaging (Zhou and Jiang, 2012): theattacker decompiles a trusted application to get thesource code, then adds the malicious payload and re-compiles the application with the payload to make itavailable on various market alternatives, and some-times also on the official market. The user is oftenencouraged to download such malicious applicationsbecause they are free versions of trusted applicationssold on the official market.

Scientific community in last years has proposed alot of static (Canfora et al., 2016; Liang and Du, 2014;Yerima et al., 2013; Arp et al., 2014; Spreitzenbarthet al., 2013) and dynamic (Isohara et al., 2011; Tchak-ount and Dayang, 2013; Reina et al., 2013) techniquesbasically based on machine learning methods in orderto solve the problem: the main limitation in this caseis due to the false positive ratio.

Indeed existing solutions for protecting privacyand security on smartphones are still ineffective inmany facets (Marforio et al., 2011), and many ana-lysts warn that the malware families and their vari-ants for Android are rapidly increasing. This scenariocalls for new security models and tools to limit thespreading of malware for smartphones.

For these reasons, in this paper we evaluate theeffectiveness of a model-checking approach to iden-tify Android malware. We evaluate our method us-ing GinMaster malware, one of the most widespread

Martinelli, F., Mercaldo, F., Nardone, V., Santone, A. and Visaggio, C.Identifying Mobile Repackaged Applications through Formal Methods.DOI: 10.5220/0006287906730682In Proceedings of the 3rd International Conference on Information Systems Security and Privacy (ICISSP 2017), pages 673-682ISBN: 978-989-758-209-7Copyright c© 2017 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved

673

family in mobile malware landscape, with over 6,000known variants belonging to this family.

The approach can be easily applied to otherrepackaged families: as reported in (citazione), theso-called repackaged malware contains the maliciouspayload at installation time, this is the reason whythe payload is embedded into the application and thelogic rules, able to verify the existance of the mali-cious payload, can be verified without run the sample.

The reason why we explore the effectiveness ofour approach on GinMaster family is represented bythe fact that this is one of the most widespread trojanmalware family in mobile environment. The payloadbelonging to this family is embedded into legitimateapplications using repackaging technique. GinMasterhas gone through three significant generations sinceit was first found by researchers from North CarolinaState University on 17 August 2011.

GinMaster is distributed in third-party app mar-kets in China. The attackers injected GinMastercode into thousands of legitimate game, ringtone andpicture applications. These applications have morechance to lure mobile users into installing the mali-cious payload.

The trojan contains a malicious service able toroot specific devices in order to escalate privileges.It also has the ability to modify and delete contents inthe SD card of device, steal confidential informationand send it to a remote website, execute command-and-control services from the remote website, as wellas download and install applications regardless of userinteraction.

The approach can be easily applied to otherrepackaged families: as reported in (Zhou and Jiang,2012), the so-called repackaged malware contains themalicious payload at installation time, this is the rea-son why the payload is embedded into the applica-tion and the logic rules, able to verify the existence ofthe malicious payload, can be verified without run thesample.

The salient characteristics of our methodologyare:

• the use of formal methods;

• the inspection of Java Bytecode and not on thesource code;

• the use of static analysis;

• the capture of malicious behaviours at a finergranularity.

In practice, from the Java Bytecode applicationfiles we generate CCS processes, which are succes-sively used for checking properties expressing themost common behaviours exhibit by GinMaster fam-ily samples.

Performing automatic analysis on the Bytecodeand not directly on the source code has several ad-vantages:• independence of the source programming lan-

guage;• identification of GinMaster without decompila-

tion even when source code is lacking;• ease of parsing a lower-level code;• independence from obfuscation.

The paper proceeds as follows: Section 2 dis-cusses related work; Section 3 describes and moti-vates our approach; Section 4 illustrates the resultsof experiments; finally, conclusions are drawn in theSection 5.

2 RELATED WORK

In this section we review the current literature relatedto malware identification with particular regards tomalicious family identification.

A very thorough dissecting of GinMaster family isprovided in (Yu, 2013). The paper gives an overviewof three generations of the GinMaster family, exam-ines the core malicious functionality, tracks their evo-lution from source code, and presents notable tech-niques utilized by the specific variants. Basically,Ginmaster is able to set-up via mobile botnet mali-cious code hidden in the affected app. However, in-stead of directly taking advantage of these zombiesdevices to make profit from end-users, the malwarecontroller employs a botnet to generate millions of in-stallations and large volumes of advertising traffic tolegitimate developers and advertising services.

In (Canfora et al., 2016) authors experimentallyevaluate two techniques for detecting Android mal-ware: the first one is based on Hidden Markov Model(HMM), while the second one exploits Structural En-tropy. They demonstrate that these methods obtain aprecision of 0.96 to discriminate a malware applica-tion. In addition they also analyse ransomware sam-ples, obtaining a precision of 0.961 with LADTreeclassification algorithm using the Structural Entropymethod, and a precision of 0.824 with J48 algorithmusing the HMM one in ransomware identification.

Song et al. (Song et al., 2016) propose a frame-work to statically detect Android malware, consistingof four layers of filtering mechanisms: the messagedigest values, the combination of malicious permis-sions, the dangerous permissions, and the dangerousintention. As additional contribute, they propose anovel threat degree threshold model of dangerous per-missions on malware detection. They experiment the

ForSE 2017 - 1st International Workshop on FORmal methods for Security Engineering

674

method on real mobile devices, using 83 real mobiledevices and achieving a 98.8% pass rate, where theversions of Android range from 2.3 to 5.1.

Neuhaus et al. (Neuhaus and Zimmermann, 2010)crawl the vulnerability reports in the Common Vul-nerability and Exposures database by using topicmodels to find prevalent vulnerability types andnew trends semi-automatically. They analyze 39393unique reports until the end of 2009, with the aim ofcharacterizing many vulnerability trends: SQL injec-tion (PHP), buffer overflows, format strings, cross-site request forgery and so on.

Zhao et al. (Zhao et al., 2014) propose a LDA-based method to analyze the trends of network secu-rity which consist of three steps: collect data fromweb sites, extract topics from the collected data, andmakes the curves of trends over time. They select620 documents sorted by time and extract 10 topicfrom each document. Six interesting topics are dis-covered by LDA model, according to the autors: dns-ddos, vulnerability, mobile-malware, mac-malware,Browser malware and Java-vulnerability.

Formal methods have been applied for studyingmalware in some recent papers. In (Kinder et al.,2005) the authors introduce the specification languageCTPL (Computation Tree Predicate Logic) confirm-ing the malicious behavior of thirteen Windows mal-ware variants using as dataset a set of worms datingfrom 2002 to 2004.

Song et al. present an approach to model Mi-crosoft Windows XP binary programs as a PushDownSystem (PDS) (Song and Touili, 2001). They evalu-ate 200 malware variants (generated by NGVCK andVCL32 engines) and 8 benign programs.

The tool PoMMaDe (Song and Touili, 2013) isable to detect 600 real malware, 200 malware gen-erated by two malware generators (NGVCK andVCL32), and proves the reliability of benign pro-grams: a Microsoft Windows binary program is mod-eled as a PDS which allows to track the stack of theprogram.

Song et al. model mobile applications using aPDS in order to discovery private data leaking work-ing at Smali code level (Song and Touili, 2014). Ille-gal flow of information in Java bytecode has been alsostudied in (Bernardeschi et al., 2004), using a staticanalysis approach.

Jacob and colleagues provide a basis for a mal-ware model, founded on the Join-Calculus: they con-sider the system call sequences to build the model (Ja-cob et al., 2010).

Recently, the possibility to identify the maliciouspayload in Android malware using a model checkingbased approach has been explored in (Battista et al.,

2016; Mercaldo et al., 2016a; Mercaldo et al., 2016b).Starting from payload behavior definition they for-mulate logic rules and then test them by using areal world dataset composed by Ransomware, Droid-KungFu, Opfake families and update attack samples.

As it emerges from the literature in the last years,formal methods have been applied to detect mobilemalware, but at the best knowledge of the authors theyhave never been applied for identifying specificallythe repackaged attack provided by GinMaster familyon Android malware.

3 THE METHOD

In this section a model checking-based approach forthe detection of GinMaster apps is presented. Whilemodel checking (Barbuti et al., 2005) was originallydeveloped to verify the correctness of systems againstspecifications, recently it has been highlighted in con-nection with a variety of disciplines such as biol-ogy (De Ruvo et al., 2015), clone detection (Santone,2011), secure information flow (Barbuti et al., 2002),among others. In this paper we present the use ofmodel checking in the security field. Fig. 1 describesall the phases of our approach.

During the first phase, we generate a formal modelfrom the Java Bytecode of the .class files derived bythe app under analysis. As formal specification lan-guage, we use Milner’s Calculus of CommunicatingSystems (CCS) (Milner, 1989), one of the most wellknown process algebras. CCS contains basic oper-ators to build finite processes, communication op-erators to express concurrency, and some notion ofrecursion to capture the infinite behaviour. Thus,the formal model is obtained by transforming eachJava Bytecode instruction in CCS processes. Moreprecisely, from the CCS we generate an automatonsuch that the nodes represent instruction addresseswhile the edges (labeled with opcodes) represent thecontrol-flow transitions from one instruction to it suc-cessors(s).

In the second phase, we try to discover Androidmalware GinMaster apps. The behavior of the Gin-Master family is encoded into a property ϕ expressedin a branching temporal logic: the mu-calculus logic(Stirling, 1989). Temporal logics are logical for-malisms for expressing properties such as livenessand safety properties. The syntax of the mu-calculusis the following, where K ranges over sets of actionsand Z ranges over variables:

φ ::= tt | ff |Z | φ∧φ | φ∨φ | [K]φ |〈K〉φ | νZ.φ | µZ.φ

Identifying Mobile Repackaged Applications through Formal Methods

675

Figure 1: The Workflow of The Approach.

A fixpoint formula has the form µZ.φ (resp. νZ.φ)where µZ (resp. νZ) binds free occurrences of Z in φ.An occurrence of Z is free if it is not within the scopeof a binder µZ (resp. νZ). A formula is closed if itcontains no free variables. µZ.φ is the least fixpoint ofthe recursive equation Z = φ, while νZ.φ is the great-est one. From now on we consider only closed formu-lae.

The satisfaction of a formula φ by a state s of atransition system is defined as follows:

• each state satisfies tt and no state satisfies ff;

• a state satisfies φ1∨φ2 (φ1∧φ2) if it satisfies φ1 or(and) φ2.

• [K] φ is satisfied by a state which, for every perfor-mance of an action in K, evolves to a state obeyingφ.

• 〈K〉 φ is satisfied by a state which can evolve to astate obeying φ by performing an action in K.

For the precise definition of the satisfaction of aclosed formula ϕ by a state s (written s |=ϕ) the readercan refer to (Stirling, 1989).

For example, µY.〈a〉 tt∧〈−a,b〉Y means that “itis possible to perform the action a non preceded bythe action b”.

The CCS formal model, generated by the JavaBytecode during the first phase, is now used to provethe property ϕ: using the model checking we deter-mine the detection of Ginmaster malware apps.

In Table 1 we show the GinMaster formulae togive the reader the flavour of the approach followed.

Table 1 shows the logic formulae to catch the Gin-Master malicious payload: the first one, i.e., ϕ1, is

able to catch the root ability of the GinMaster mali-cious payload; the second one, i.e., ϕ2, identifies thegather of user information, one of the most represen-tative mobile trojan behavior, while ϕ3 identifies thecommands to communicate with the C&C server. Fig-ure 2 shows two snippets of code belonging to a Gin-Master sample. The reported snippets are related tothe two malicious behaviours catch by the logic for-mulae ϕ1 and ϕ2.

More precisely, formula ϕ1 is able to catch follow-ing actions:

• “new javalangStringBuilder”: it represents muta-ble sequence of characters, typically used to buildpath where resources, i.e. files, are located. In thecase of GinMaster the built string represents thepath of the script able to root the device;

• “pushchmod”: to successfully run the script themalware needs to set the admin privileges tothe root script, this action is performed with the“chmod” command;

• “invokeappend”: this represent a method of theStringBuilder class used to concatenate differentString;

• “invokeexec”: this instruction is able to run thespecified command and arguments in a separateprocess with the specified environment and work-ing directory. The “exec” method, belonging to“Runtime” class, represents the command able torun the script to root the phone that already ob-tained the admin privileges;

• “invokewaitFor”: this instruction causes the cur-rent thread to wait, if necessary, until the process

ForSE 2017 - 1st International Workshop on FORmal methods for Security Engineering

676

Table 1: The formulae for GinMaster payload detection.

ϕ1 = µX .〈new javalangStringBuilder〉ϕ11 ∨〈−new javalangStringBuilder〉Xϕ11 = µX .〈pushchmod〉ϕ12 ∨〈−pushchmod〉Xϕ12 = µX .〈invokeappend〉ϕ13 ∨〈−invokeappend〉Xϕ13 = µX .〈invokeexec〉ϕ14 ∨〈−invokeexec〉Xϕ14 = µX .〈invokewaitFor〉ϕ15 ∨〈−invokewaitFor〉Xϕ15 = µX .〈push f ile〉 tt∨〈−push f ile〉Xϕ2 = µX .〈pushphone〉ϕ21 ∨〈−pushphone〉Xϕ21 = µX .〈invokegetSystemService〉ϕ22 ∨〈−invokegetSystemService〉Xϕ22 = µX .〈checkcastandroidtelephonyTelephonyManager〉ϕ23∨

〈−checkcastandroidtelephonyTelephonyManager〉Xϕ23 = µX .〈invokegetDeviceId〉ϕ24 ∨〈−invokegetDeviceId〉Xϕ24 = µX .〈invokegetSubscriberId〉ϕ25 ∨〈−invokegetSubscriberId〉Xϕ25 = µX .〈invokegetSimSerialNumber〉ϕ26 ∨〈−invokegetSimSerialNumber〉Xϕ26 = µX .〈invokegetLineUunoNumber〉ϕ27 ∨〈−invokegetLineUunoNumber〉Xϕ27 = µX .〈new javaioBu f f eredReader〉 tt∨〈−new javaioBu f f eredReader〉Xϕ3 = µX .〈A,B,C,D,E,F,G,H, I,J,K,L,M,N〉 tt∨〈−A,B,C,D,E,F,G,H, I,J,K,L,M,N〉X

Figure 2: Code Snippet Related to Logic Formulae ϕ1 and ϕ2.

represented by this Process object has terminated.This method returns immediately if the subpro-cess has already terminated. If the subprocesshas not yet terminated, the calling thread will beblocked until the subprocess exits. In this case thesubprocess is represented by the execution of thescript to root the device;

• “pushphone”: it represents the retrieval of a filefrom an external source and the storage on the de-vice; in this case the stored file in the device is thescript the will be run in a separate thread.Instead, formula ϕ2 identifies the personal infor-

mation gathering capability of the malicious payloadperforming following actions:• “invokegetSystemService”: it represents an in-

terface to global information about the appli-cation environment. This is an abstract class

whose implementation is provided by the An-droid system. It allows access to application-specific resources and classes, as well as up-callsfor application-level operations such as launchingactivities, broadcasting and receiving intents;

• “checkcastandroidtelephonyTelephonyManager”:it provides access to information about thetelephony services on the device. Applicationscan use the methods in this class to determinetelephony services and states, as well as toaccess some types of subscriber information.Applications can also register a listener to re-ceive notification of telephony state changes;“invokegetDeviceId”: it is a method of the“TelephonyManager” class provided by An-droid environment and it returns the uniquedevice ID, for instance, the IMEI for GSM

Identifying Mobile Repackaged Applications through Formal Methods

677

and the MEID or ESN for CDMA phones;“invokegetSubscriberId”: another method pro-vided by “TelephonyManager” class Returns theunique subscriber ID, for example, the IMSI for aGSM phone;

• “invokegetSimSerialNumber”: this method, be-longing to “TelephonyManager” class, it returnsthe serial number of the SIM inserted into the de-vice;

• “invokegetLineUunoNumber”: this method re-turns the phone number string for line 1;

• “new javaioBu f f eredReader”: an object belong-ing to the BufferedReader class is able to read textfrom a character-input stream and buffer charac-ters. In this case the buffer contains the personalinformation retrieved previously by the maliciouspayload.

The formula ϕ3 identifies the Command and Con-trol (C&C) list of instructions. Table 2 shows all thestrings of commands and reports a brief description ofthem. In some sample these strings are encrypted andthe algorithm used for decryption is shown in Figure3. The decryption module (as Figure 3 shows) usesthe XOR Byte to Byte with key 0x18 after decodingin Base 64. The first generation of GinMaster fam-ily exhibits the C&C instruction not encrypted, whilein the second one the encryption of the commands isintroduced. In order to evade detection by antimal-ware software, the second generation obfuscates classnames and encrypts URLs as well as C&C instruc-tions. It is impossible to catch this variant by detect-ing the class name or URLs. In both the generations,the malware has the capability of reporting packageinformation relating to packages installed/uninstalledin the system, searching and listing package infor-mation from remote websites, and downloading addi-tional applications to the device without the consentof the user.

The model checker accepts two inputs: the formalmodel of the app and the property expressing the mal-ware characteristics of the GinMaster family. If themodel checker returns true it means that we considerthe app belonging to the GinMaster family, while ifit returns false it means that the app can be eithertrusted of belong to another malware family. As for-mal verification environment in this paper we use theConcurrency Workbench of New Century (CWB-NC)(Cleaveland and Sims, 1996) which supports severaldifferent specification languages, among which CCS.

Since CWB-NC is no longer in active develop-ment, as future work we want to substitute CWB-NC with CALL (standing for Concurrency workbenchdeveloped at AALborg university) (Andersen et al.,

2015), which supports CCS as input specification lan-guage (as CWB-NC), but uses a more efficient algo-rithm to perform model checking. Actually, there ex-ist mature tools with modern designs like CADP (Gar-avel et al., 2013) with expressive input languages andefficient analysis methods. However, our aim is to de-velop an initial rapid research prototype to evaluatehow our approach is able to identify GinMaster apps.For the same reason, logic properties are formulatedmanually, but as future work we plan to build rulesautomatically using, for instance, machine learning orclone detection.

An important feature of our approach is that an au-tomatic dissection of an app can be achieved, with theadvantage of the localization into the code of the in-structions that implement the malicious behavior. Tolocalize the payload, no manual inspection is needed.In fact, our approach is able to identify the exact po-sition of the instructions characterizing the maliciousbehaviour, with a precision at method level.

This is a very novel result in the malware analy-sis. In addition, starting from the consideration thatwe analyze Java bytecode, our methodology is able tocorrectly identifies the malicious payload also whentrivial obfuscation techniques (i.e., nop insertion, junkcode, call reordering) are applied to Java source code(Mercaldo et al., 2016b).

4 EXPERIMENT

In this section we discuss the experiment we per-formed to evaluate the effectiveness of our approachin recognizing GinMaster payload, discriminatingsamples belonging to other Android malware fami-lies.

4.1 Dataset

The real world samples examined in the experimentwere gathered from the Drebin project’s dataset (Arpet al., 2014; Spreitzenbarth et al., 2013): a very wellknown collection of malware used in many scien-tific works, which includes the most diffused Androidfamilies.

Malware dataset is also partitioned according tothe malware family: each family contains sampleswhich have in common several characteristics, likepayload installation, the kind of attack and events thattrigger malicious payload (Zhou and Jiang, 2012).

Table 3 shows the 10 malware families with thelargest number of applications in our malware datasetwith installation type, kind of attack and event acti-vating malicious payload.

ForSE 2017 - 1st International Workshop on FORmal methods for Security Engineering

678

Table 2: The list of C&C commands.

ACTION STRING DESCRIPTIONA ”htt p : //client.go360days.com/report/ f irst run.do” Report the starting of GinMaster.B ”htt p : //client.go360days.com/request/tableclass.do” show information stored in SQLite databaseC ”htt p : //client.go360days.com/request/con f ig.do” Change the frequency configuration for checking into the server.D ”htt p : //client.go360days.com/request/alert.do” alert last idE ”htt p : //client.go360days.com/request/push.do” soft last idF ”htt p : //client.go360days.com/report/return con f ig.do” show configurationG ”htt p : //client.go360days.com/report/return alert.do” send alertH ”htt p : //client.go360days.com/report/return push.do” push file into the deviceI ”htt p : //client.go360days.com/report/install list.do” Report information when installing a list of packages.J ”htt p : //client.go360days.com/report/listener.do” check the communication between server and deviceK ”htt p : //client.go360days.com/client.php?action = so f t&so f t id = ” Get a link to a specified software.L ”htt p : //client.go360days.com/client.php?action = so f tlist&type = search&word = ” Search a list of software with a specified word.M ”htt p : //client.go360days.com/client.php?action = so f tlist” Get the list of the available softwareN ”htt p : //client.go360days.com/client.php?action = list&list id = 9” Get the software with the specified id

Figure 3: Decryption algorithm used to decode the C&C list of commands.

The malware was retrieved from the Drebinproject (Arp et al., 2014; Spreitzenbarth et al., 2013)taking into account the top 10 most populous families.

We briefly describe the malicious payload actionfor the top 10 populous families in our dataset.

1. The samples of FakeInstaller family have themain payload in common but have different codeimplementations, and some of them also have anextra payload. FakeInstaller malware is server-side polymorphic, which means the server couldprovide different .apk files for the same URL re-quest. There are variants of FakeInstaller that notonly send SMS messages to premium rate num-bers, but also include a backdoor to receive com-mands from a remote server. There is a largenumber of variants for this family, and it has dis-tributed in hundreds of websites and alternativemarkets. The members of this family hide theirmalicious code inside repackaged version of pop-ular applications. During the installation processthe malware sends expensive SMS messages topremium services owned by the malware authors.

2. DroidKungFu installs a backdoor that allows at-tackers to access the smartphone when they wantand use it as they please. They could even turnit into a bot. This malware encrypts two knownroot exploits, exploit and rage against the cage, tobreak out of the Android security container. When

it runs, it decrypts these exploits and then contactsa remote server without the user knowing.

3. Plankton uses an available native functionality(i.e., class loading) to forward details like IMEIand browser history to a remote server. It ispresent in a wide number of versions as harmfuladware that download unwanted advertisementsand it changes the browser homepage or add un-wanted bookmarks to it.

4. The Opfake samples make use of an algorithm thatcan change shape over time so to evade the anti-malware. The Opfake malware demands paymentfor the application content through premium textmessages. This family represents an example ofpolymorphic malware in Android environment: itis written with an algorithm that can change shapeover time so to evade any detection by signaturebased antimalware.

5. GinMaster family contains a malicious servicewith the ability to root devices to escalate priv-ileges, steal confidential information and sendto a remote website, as well as install applica-tions without user interaction. It is also a tro-jan application and similarly to the DroidKungFufamily the malware starts its malicious servicesas soon as it receives a BOOT COMPLETEDor USER PRESENT intent. The malware cansuccessfully avoid detection by mobile anti-virus

Identifying Mobile Repackaged Applications through Formal Methods

679

Table 3: Families in Drebin dataset with details of the installation method (standalone, repackaging, update), the kind of attack(trojan, botnet), the events that trigger the malicious payload and a brief family description.

Family Installation Attack Activation DescriptionFakeInstaller s t,b server-side polymorphic familyPlankton s,u t,b it uses class loading to forward detailsDroidKungFu r t boot,batt,sys it installs a backdoorGinMaster r t boot malicious service to root devicesBaseBridge r,u t boot,sms,net,batt it sends information to a remote serverAdrd r t net,call it compromises personal dataKmin s t boot it sends info to premium-rate numbersGeinimi r t boot,sms first Android botnetDroidDream r b main botnet, it gained root accessOpfake r t first Android polymorphic malware

Table 4: Performance Evaluation.

GinMaster Malware TP FP FN TN PR RC Fm Acc100 761 81 2 19 759 0.98 0.81 0.89 0.98

software by using polymorphic techniques to hidemalicious code, obfuscating class names for eachinfected object, and randomizing package namesand self-signed certificates for applications.

6. BaseBridge malware sends information to a re-mote server running one ore more malicious ser-vices in background, like IMEI, IMSI and otherfiles to premium-rate numbers. BaseBridge mal-ware is able to obtain the permissions to use Inter-net and to kill the processes of antimalware appli-cation in background.

7. Kmin malware is similar to BaseBridge, but doesnot kill antimalware processes.

8. Geinimi is the first Android malware in the wildthat displays botnet-like capabilities. Once themalware is installed, it has the potential to receivecommands from a remote server that allows theowner of that server to control the phone. Gein-imi makes use of a bytecode obfuscator. The mal-ware belonging to this family is able to read, col-lect, delete SMS, send contact informations to aremote server, make phone call silently and alsolaunch a web browser to a specific URL to startfiles download.

9. Adrd family is very close to Geinimi but with lessserver side commands, it also compromises per-sonal data such as IMEI and IMSI of infected de-vice. In addiction to Geinimi, this one is able tomodify device settings.

10. DroidDream is another example of botnet, itgained root access to device to access unique iden-tification information. This malware could alsodownloads additional malicious programs withoutthe user’s knowledge as well as open the phone up

to control by hackers. The name derives from thefact that it was set up to run between the hours of11pm and 8am when users were most likely to besleeping and their phones less likely to be in use.

4.2 Evaluation

To estimate the detection performance of our method-ology we compute the metrics of precision and recall,F-measure (Fm) and Accuracy (Acc), defined as fol-lows:

PR =T P

T P+FP; RC =

T PT P+FN

;

Fm =2PR RCPR+RC

; Acc =T P+T N

T P+FN +FP+T Nwhere T P is the number of malware that was correctlyidentified in the GinMaster family (True Positives),T N is the number of malware correctly identified asnot belonging to the GinMaster family (True Nega-tives), FP is the number of malware that was incor-rectly identified in the GinMaster family (False Posi-tives), and FN is the number of malware that was notidentified as belonging to the GinMaster family (FalseNegatives).

Table 4 shows the results obtained using ourmethod.

We consider in the column GinMaster the sam-ples belonging to GinMaster family, while in the col-umn Malware the malware samples belonging to oth-ers families considered in the study: the detail aboutthe malicious payload of the family we considered isshown in Table 3. We demonstrate the effectivenessof our approach evaluating 100 malware belongingto GinMaster family and 761 malware belonging toother families.

ForSE 2017 - 1st International Workshop on FORmal methods for Security Engineering

680

Results in Table 4 seems to be very promising: weobtain an Accuracy equal to 0.9. Concerning the Gin-Master results, we are not able to identify the mali-cious payloads of just 2 samples on 100. It is worthnoting the above values are also due to the fact that thedataset is unbalanced, i.e., 100 malware belonging toGinMaster family and 761.

5 CONCLUSION AND FUTUREWORK

The most common way to inject malicious payloadin Android environment is represented by the repack-aging attack, that basically consists to distribute le-gitimate well-known applications with the maliciousbehaviour in order to lure users. In this paper wepropose an approach, based on formal methods, ableto catch the malicious payload related to GinMasterfamily, one of the most populous repackaged trojanembed in legitimate Android applications. GinMasterfamily is able to root Android devices in order to ex-ecute shell scripts with admin privileges, in additionit is able to send personal user information to the at-tacker using C&C server. We identified a set of rulesspecific to GinMaster payload behaviour and we eval-uate the effectiveness of our approach using a datasetof real-world malware, obtaining an accuracy equal to0.9. As future work, we plan to test our approach onmobile malware belonging to other families that ex-hibit trojan behaviour to evaluate the rule set on fam-ilies with similar payload.

ACKNOWLEDGEMENTS

This work has been partially supported by H2020EU-funded projects NeCS and C3ISP and EIT-DigitalProject HII.

REFERENCES

Andersen, J. R., Andersen, N., Enevoldsen, S., Hansen,M. M., Larsen, K. G., Olesen, S. R., Srba, J., andWortmann, J. K. (2015). CAAL: concurrency work-bench, aalborg edition. In Theoretical Aspects ofComputing - ICTAC 2015 - 12th International Col-loquium Cali, Colombia, October 29-31, 2015, Pro-ceedings, pages 573–582.

Arp, D., Spreitzenbarth, M., Huebner, M., Gascon, H., andRieck, K. (2014). Drebin: Efficient and explainabledetection of android malware in your pocket. In Pro-ceedings of 21th Annual Network and Distributed Sys-tem Security Symposium (NDSS). IEEE.

Barbuti, R., De Francesco, N., Santone, A., and Tesei,L. (2002). A notion of non-interference for timedautomata. Fundamenta Informaticae, 51(1-2):1–11.cited By 6.

Barbuti, R., Francesco, N. D., Santone, A., and Vaglini, G.(2005). Reduced models for efficient CCS verifica-tion. Formal Methods in System Design, 26(3):319–350.

Battista, P., Mercaldo, F., Nardone, V., Santone, A., and Vis-aggio, C. A. (2016). Identification of android malwarefamilies with model checking. In International Con-ference on Information Systems Security and Privacy.SCITEPRESS.

Bernardeschi, C., De Francesco, N., Lettieri, G., and Mar-tini, L. (2004). Checking secure information flow injava bytecode by code transformation and standardbytecode verification. Software - Practice and Expe-rience, 34(13):1225–1255.

Canfora, G., Mercaldo, F., and Visaggio, C. A. (2016). Anhmm and structural entropy based detector for androidmalware: An empirical study. Computers & Security,61:1–18.

Cleaveland, R. and Sims, S. (1996). The ncsu concurrencyworkbench. In CAV. Springer.

De Ruvo, G., Nardone, V., Santone, A., Ceccarelli, M.,and Cerulo, L. (2015). Infer gene regulatory networksfrom time series data with probabilistic model check-ing. pages 26–32. cited By 7.

Fedler, R., Schutte, J., and Kulicke, M. (2014). Onthe effectiveness of malware protection on an-droid: An evaluation of android antivirus apps,http://www.aisec.fraunhofer.de/.

Garavel, H., Lang, F., Mateescu, R., and Serwe, W. (2013).CADP 2011: a toolbox for the construction and anal-ysis of distributed processes. STTT, 15(2):89–107.

GoogleMobile (2014). http://googlemobile.blogspot.it/2012/02/android-and-security.html.

Isohara, T., Takemori, K., and Kubota, A. (2011). Kernel-based behavior analysis for android malware detec-tion. In Proceedings of Seventh International Confer-ence on Computational Intelligence and Security, pp.1011-1015.

Jacob, G., Filiol, E., and Debar, H. (2010). Formalization ofviruses and malware through process algebras. In In-ternational Conference on Availability, Reliability andSecurity (ARES 2010). IEEE.

Kinder, J., Katzenbeisser, S., Schallhart, C., and Veith, H.(2005). Detecting malicious code by model checking.Springer.

Liang, S. and Du, X. (2014). Permission-combination-based scheme for android mobile malware detec-tion. In International Conference on Communica-tions, pages 2301–2306.

Marforio, C., Aurelien, F., and Srdjan, C. (2011).Application collusion attack on the permission-based security model and its implications for mod-ern smartphone systems, ftp://ftp.inf.ethz.ch/doc/tech-reports/7xx/724.pdf.

Mercaldo, F., Nardone, V., Santone, A., and Visaggio, C. A.(2016a). Download malware? No, thanks. How for-mal methods can block update attacks. In Formal

Identifying Mobile Repackaged Applications through Formal Methods

681

Methods in Software Engineering (FormaliSE), 2016IEEE/ACM 4th FME Workshop on. IEEE.

Mercaldo, F., Nardone, V., Santone, A., and Visaggio,C. A. (2016b). Ransomware steals your phone. for-mal methods rescue it. In International Conferenceon Formal Techniques for Distributed Objects, Com-ponents, and Systems, pages 212–221. Springer.

Milner, R. (1989). Communication and concurrency. PHISeries in computer science. Prentice Hall.

Neuhaus, S. and Zimmermann, T. (2010). Security trendanalysis with cve topic models. In Software reliabil-ity engineering (ISSRE), 2010 IEEE 21st internationalsymposium on, pages 111–120. IEEE.

Oberheide, J. and Miller, C. (2012). Dissect-ing the android bouncer. In SummerCon,https://jon.oberheide.org/files/summercon12-bouncer.pdf.

Reina, A., Fattori, A., and Cavallaro, L. (2013). A systemcall-centric analysis and stimulation technique to au-tomatically reconstruct android malware behaviors. InProceedings of EuroSec.

Santone, A. (2011). Clone detection through process alge-bras and java bytecode. pages 73–74. cited By 10.

SecureList (2015). https://securelist.com/analysis/kaspersky-security-bulletin/73839/mobile-malware-evolution-2015/.

Song, F. and Touili, T. (2001). Efficient malware detectionusing model-checking. Springer.

Song, F. and Touili, T. (2013). Pommade: Pushdownmodel-checking for malware detection. In Proceed-ings of the 2013 9th Joint Meeting on Foundations ofSoftware Engineering. ACM.

Song, F. and Touili, T. (2014). Model-checking for androidmalware detection. Springer.

Song, J., Han, C., Wang, K., Zhao, J., Ranjan, R., andWang, L. (2016). An integrated static detection andanalysis framework for android. Pervasive and Mo-bile Computing.

Spreitzenbarth, M., Echtler, F., Schreck, T., Freling, F. C.,and Hoffmann, J. (2013). Mobilesandbox: Lookingdeeper into android applications. In 28th InternationalACM Symposium on Applied Computing (SAC). ACM.

Stirling, C. (1989). An introduction to modal and temporallogics for ccs. In Concurrency: Theory, Language,And Architecture, pages 2–20.

Tchakount, F. and Dayang, P. (2013). System calls analysisof malwares on android. In International Journal ofScience and Tecnology (IJST) Volume, 2 No. 9.

Yerima, S. Y., Sezer, S., McWilliams, G., and Muttik, I.(2013). A new android malware detection approachusing bayesian classification. In International Confer-ence on Advanced Information Networking and Appli-cations, pages 121–128.

Yu, R. (2013). Ginmaster: a case study in android malware.In Virus bulletin conference, pages 92–104.

Zhao, Y.-B., Liu, S.-M., and Guo, S.-Q. (2014). Extractionand prediction of hot topics in network security. InComputer Science and Network Security, 2014 Inter-national Conference on, pages 347–353.

Zhou, Y. and Jiang, X. (2012). Dissecting android malware:Characterization and evolution. In 2012 IEEE Sympo-sium on Security and Privacy, pages 95–109. IEEE.

ForSE 2017 - 1st International Workshop on FORmal methods for Security Engineering

682