[IEEE 2012 19th Working Conference on Reverse Engineering (WCRE) - Kingston, ON, Canada...

Code Defactoring: Evaluating the Effectiveness of Java Obfuscations

Andrea CapiluppiBrunel University

Kingston Lane, UxbridgeLondon UB8 3PH

Email: [email protected]

Paolo FalcarinACE - University of East London

4 - 6 University WayLondon, UK. E16 4LZEmail: [email protected]

Cornelia BoldyreffACE - University of East London

4 - 6 University WayLondon, UK. E16 4LZ

Email: [email protected]

Abstract—Obfuscation is a very common protection againstreverse engineering attacks: it modifies a program structure tomake it harder for the adversary to analyse and understandit. Conceptually, obfuscation is the opposite of refactoring: thecode should be more complex to understand, bloated, and withexcessive characteristics from the design point of view.This paper aims at evaluating the code complexity intro-

duced by different obfuscation algorithms by using softwareengineering metrics. Using structural metrics, this paper illus-trates how the various types of obfuscation algorithms performin terms of OO attributes that should be kept low in refactoring.Results show that the majority of the selected algorithms

produce no changes in the structural attributes or the averagecomplexity, but they produce more “dead” code. We arguethat this could not represent the optimal way to protect thecode: when protecting against reverse engineering attacks, apreference should be given to those algorithms that increasethe complexity and alter the structural metrics.

Keywords-security metrics; obfuscation; structural metrics;cyclomatic complexity

I. INTRODUCTIONAttacks performed by a trusted user on a computer system

are called man-at-the-end (MATE) attacks. MATE attackstake many forms: in a tampering attack, the user breaks theintegrity of a piece of software, by modifying it in waysnot intended by the software vendor. In a malicious reverseengineering attack, he violates the confidentiality rights ofthe vendor by extracting intellectual property contained inthe software (such as algorithms) or sensitive data (such aslicense codes or cryptographic keys). Finally, in a cloningattack, copyright laws are violated by software piracy, bycracking and distributing illegal copies of the software.Methods for protecting software against MATE attacks arevariously known as Software Protection [1].Software Protection has increasingly become an important

requirement for industrial software development: the mostrecent Global Software Piracy Study published by the Busi-ness Software Alliance1 clearly shows that well over half ofthe world’s computer users admit they use pirated software.Historically software protection first appeared as attempts atadding license-checking code to computer games, followedby hardware-protections like USB-dongles and smart-cards,

1http://www.bsa.org/globalstudy/

and recent research on white-box cryptography [2] usedfor digital media piracy protection. The software protectionproblem is fundamentally harder than other security prob-lems; the reason is the attack model assuming an almightyadversary who has full access to the chosen software andhardware and can freely examine and modify it. For exam-ple, a very common protection against reverse engineeringattacks is obfuscation which modifies a program to make itharder for the adversary to analyse or comprehend: commonobfuscation techniques [3] include splitting code into smallerpieces, merging pieces of unrelated code, randomizing codeplacement and instruction selection, mapping original datastructures to complex ones, and flattening or complicatingthe program control flow.Essentially, obfuscation works in the opposite direction

of refactoring: code obfuscators should provide facilities for“defactoring” the source code and binaries of an application,that work against the efficiency and the understandabilityof the code. Given that metrics have been proposed tointerpret and guide the refactoring effort, for instance in thepresence of bad smells [4], the obfuscation (defactoring)tools should attempt to increase the same metrics that therefactoring tools are designed to decrease. For instance, thebroad aim of refactoring is to “decrease the complexityof the code”, hence the code obfuscators should providealgorithms to increase such complexity. This aspect and thefact that there are currently no practical security metrics tomeasure the quality of the protection poses the two followingquestions: how to evaluate the effectiveness of differentobfuscation algorithms? How to choose between protectionswhen securing a software system?This paper uses the obfuscation algorithms contained

in three obfuscation packages: the Sandmark tool [5], theAllatori suite and the Zelix KlassmasterTMtoolkit. The aimis to produce a set of obfuscated classes and to checkwhether the algorithms have a structural effect, or affectthe complexity of the system. The OO metrics describedin Chidamber and Kemerer’s work [6] are measured forboth the pre-obfuscated code and for the systems after eachobfuscation algorithm. The objective is to establish whichof the algorithms cause a variation of the different metricsmeasured on the systems under study.

2012 19th Working Conference on Reverse Engineering

1095-1350/91 $25.00 © 4891 IEEE

DOI 10.1109/WCRE.2012.17

71

This paper is structured as follows: Section II discussesthe related work in software protection metrics; Section IIIillustrates the obfuscation algorithms applied in this study;Section IV discusses two prototype case studies and themetrics used to collect the results. Since the results of theobfuscation algorithms can be very large, even for smallsystems, we use this step to show exactly what to look forin the collection of metrics. Section V shows the results ofthe obfuscation algorithms on those systems. The approachis more easily replicated and summarised for larger systemsin Section VI, with the aim of confirming (or not) thefindings on the smaller systems. Section VII-A discussesthe implications and detects the threats to validity, whileSection VIII concludes.

II. SOFTWARE PROTECTION METRICS – RELATED WORK

Obfuscation is a semantic-preserving transformation ofcomputer programs aimed at bringing a program into aform thwarting the understanding of its algorithm and datastructures or preventing the extraction of some valuableinformation from the program. With sufficient effort, allobfuscation techniques can be defeated in reasonable time:a formal proof of the non-existence of a perfect obfuscation(that cannot be reverse-engineered) has been proven byBarak et al. [7]. However current obfuscations are partiallysucceeding as the majority of users are not able to under-stand reverse-engineered obfuscated code, as proven by con-trolled experiments [8]. The main difficulty in designing anobfuscator is predicting a reasonable time in which someonecan break the software protection. The long term objectivesof software protection metrics is then estimating the extracomplexity introduced by obfuscations and correlated itwith an estimated delay the most sophisticated attackerwould incur due to one or more protection techniqueson a given application. For example, renewable protectionschemes [9] [10], utilized to complicate attacker’s task byupdating the running code from a trusted server, could betterconfigure their update times, once known a good estimationof such delays.The present work can be considered as an initial step

to systematically evaluate which metrics are affected bythe obfuscation, and focusing on code complexity. The useof metrics for the evaluation of the increased complex-ity introduced by software protection techniques has beenmainly addressed by the pioneering work of Collberg etal. [11]: in their paper they defined “potency” as the ratiobetween the same complexity metric measured after andbefore obfuscation, and also identified which metrics aresignificantly modified by a particular obfuscation, but theydid not measure those metrics for different case studies aswe did. We believe that ours is an important contribution: thenon-obfuscated program “A” can be already complex withrespect to some metrics, with the obfuscation incrementingonly partially its overall complexity. Anther, more simple

program “B” could more effectively be modified by anobfuscator, but its resulting, after-obfuscation complexitycould be lower than “A” after the obfuscation.Other works are focused on particular attacks and one

or more specific metrics: cyclomatic number and instructioncount have been assessed for binary obfuscations [12], whiledepth of parse tree has been used to estimate source codecomplexity [13]. Udupa et al [14] used the amount of timerequired to perform automatic de-obfuscation to evaluate theeffectiveness of control flow flattening obfuscation, relyingon a combination of static and dynamic analysis. Anotherbinary instrumentation tool measures the fraction of theobfuscating transformations that the attackers can undoautomatically [15].To prove that metrics are effective predictors of the level

of security, experimental evidence should demonstrate thatmetrics correlate to the actual difficulty of performing at-tacks on a different set of programs and with different users.Such difficulty has been evaluated (on very few obfuscationtechniques) with controlled experiments based on attacksperformed by human subjects on obfuscated code [16].Once empirical experiments will be performed on differ-

ent types of obfuscated code, then we will be able to validatewhether a metric is a good predictor in assessing securityand code complexity.

III. OBFUSCATION ALGORITHMS

In order to obfuscate the targeted Java applications, thiswork made use of three obfuscation tools: the Sandmarkobfuscator [3],2 the Allatori Java obfuscator3 (version 4.1)and the Zelix KlassmasterTMJava obfuscator4 (version 5.5.0).The first (Sandmark) was selected because it provides

levels of flexibility, customization and openness that otherobfuscation tools lack;5 the second (Allatori) because itprovides an all-in-one suite for obfuscating Java classes,and it provides a free toolkit; and the third (KlassmasterTM)because it represents the state-of-the-art obfuscation toolkit,albeit at a licensing price. Below we detail how each toolwas used, together with the options that were activated toproduce the obfuscated outputs.

A. Sandmark

The Sandmark software tool can perform obfuscations, aswell as statically and dynamically watermarking the sourcecode and the binaries of Java systems. Being open-source,it also provides a full list of obfuscators, grouped in threecategories: application-, class- and method-level. Method-and class-level obfuscations are more configurable as they

2http://sandmark.cs.arizona.edu/downloads.html3http://www.allatori.com/4http://www.zelix.com/klassmaster/5On the downside, Sandmark is quite old and it cannot handle the newest

Java constructs.

72

allow developers to select which methods (or classes) to ob-fuscate; all other algorithms are considered application-levelobfuscations. Such selections of partial obfuscation mightbe useful to prevent some methods from being obfuscatedfor the sake of performance and reliability: for example,obfuscation of reflective code can break the application (i.e.not preserving program semantics) while some obfuscationscan penalize performance. Table I summarizes the charac-teristics of all the algorithms used in this study, clusteringthem in APP (application-), CL (class-) and MET (method-)level obfuscation algorithms: the extensive information wasgathered from the manual provided with the toolkit.

B. AllatoriThe second obfuscation tool used, Allatori, can also be

streamlined and activated via a command line interface.It provides methods to obfuscate and watermark the Javaclasses, and it does not require the underlying Java files to doso. The java -jar allatori.jar invocation requires an xml con-figuration file to activate the options for the obfuscation andwatermarking. Since we were only interested in the effects ofthe control flow algorithms, the options for the watermarkingand renaming were not activated, since they do not logicallyproduce changes in how the applications work. Browsing thedocumentation provided with the toolkit, we concluded thatonly two configurations for the modification of the controlflow are possible. Therefore two configurations were createdand analysed, one with the control flow obfuscation activated(termed cfo):

<p r o p e r t y name=” c o n t r o l−f low−o b f u s c a t i o n ”v a l u e =” en ab l e ”/>

and another “light” xml configuration, without control-flow obfuscations, but with the renaming of the local vari-ables (termed lvo).

C. KlassmasterTM

The third obfuscation tool used, Zelix KlassmasterTM, isalso a commercial tool that provides several activation pointsfor the obfuscation of the Java classes under study. It alsoprovides a way to preserve methods, classes and packagesfrom obfuscation, so to focus very carefully the effects ofthe obfuscation algorithms. The tool can be streamlined bythe use of powerful scripts, which can be built on top ofautomated batch files.Analysing the documentation of the toolkit, it is evident

that more control on the output obfuscation is given thanAllatori, albeit less than Sandmark. Therefore, we createdtwo configuration files (“aggressive” and “light” obfuscationscenarios) by activating the following switches that affect thecontrol flow attributes in more or less depth6:

6Most of these switches are self-explanatory, but http://www.zelix.com/klassmaster/docs/obfuscateStatement.html provides a full description.

• aggressiveMethodRenaming: “true” for the aggressive(GRR) configuration, or “false” for the simple (light)one;

• keepInnerClassInfo: “false” (GRR) or “true” (light);• keepGenericsInfo: “false” (GRR) or “true” (light);• obfuscateFlow: “aggressive” (GRR) or “light” (light);• encryptStringLiterals: “aggressive” (GRR) or “light”(light);

• exceptionObfuscation: “heavy” (GRR) or “light”(light);

• autoReflectionHandling: “normal” (GRR) or “none”(light);

• lineNumbers: “scramble” for both the GRR and lightconfigurations;

• localVariables: “delete” for both the GRR and lightconfigurations;

• randomize: “true” (GRR) or “false” (light);• allClassesOpened: “true” for both the GRR and lightconfigurations;

• deriveGroupingsFromInputChangeLog: “false” for boththe GRR and light configurations;

IV. CASE STUDIES AND METRICS

We split the empirical Section in two parts: in the first,the data analysis performed is based on two prototype Javasystems (named “ChatClient” and “CarRace”), and it servesas a proof of concept for the empirical approach and thetoolchain. In the following Section VI, the analysis is thenreplicated for larger systems, with more classes and morevisibility for users and malicious attackers.The two prototype systems are provided with a “client”

part (to be distributed to the customers, so more proneto security issues) and a “server” part. The first system(ChatClient) is a network client/server application that al-lows people to have text based conversation through thenetwork. Conversations can be public or private, dependingon how they are initiated. The application shows a list ofavailable rooms: when the application starts, the “default”room is accessed, i.e. a public room where all the usersare participating by default. More rooms can be joined byselecting a name for the new room from the “AvailableRooms” list; doing so, a new tab will be visualized. All themessages sent to a conversation within a room are receivedto all the users registered to that room. Finally, a privateconversation (only two users) can be initiated by clickingthe name of a user from the “Online Users” list.The second system is a client/server application with more

graphical (i.e., Swing and AWT) classes: this system isinteresting, as compared with the previous one, because itcould present different obfuscation issues, given the largersizes of its classes.

73

Name Type DescriptionArray Folding APP takes a one-dimensional array and folds it into a multi-dimensional array.Array Splitting APP takes a one-dimensional array field and splits it into 2 arrays by adding another field of the same type: one array

will contain the first half of the elements and the other array will contain the second halfBLOAT APP BLOAT is a Java bytecode optimizer performing many traditional program optimizations such as constant-copy

propagation, constant folding, dead code elimination, and peephole optimizations [17].Block Marker APP randomly marks all basic blocks in the program with either 0 or 1.Class Encrypter APP encrypts class files and causes them to be decrypted at runtime.Constant Pool Reorder APP reorders the constants in the bytecode constant pool and assigns random indices to them: there is no change in

code as a result of this obfuscation.Dynamic Inliner APP inlines methods at runtime using instanceof checks.False refactoring APP it is performed on two classes that have no common behavior. If both classes have instance variables of the same

type, these can be moved into a new parent class, whose methods can be buggy versions of some of the methodsfrom the original classes.

Integer Array Splitting APP splits a single array, which is a local variable in a method, into two arrays and consistently modifies all the arrayinitialization, read, write, and array-length references.

Interleave-Methods APP finds pairs of methods in the input application and interleaves them into one method. It selects pairs such thatboth methods have the same signature and are not Java library’s methods (e.g. toString()).

Overload Names APP obfuscates methods so that as many methods as possible have the same name. Method overriding relationshipsremain intact, whereas existing overloaded methods may be destroyed, and new ones created.

Parameter Alias APP looks at each class and tries to find a (non-initializer, non-abstract, non-native) method that takes some objecttype as a parameter. It then aliases that parameter within the method using ThreadLocal class.

Rename Registers APP renames local variables to random identifiers.Split-Classes APP obfuscates a class file by splitting a node into two, i.e. some of the fields from the class are moved into a newly

created class and all references to those fields in the given class are modified to reflect the changes.String Encoder APP obfuscates the literal strings of a program. Each string is obfuscated and any string reference is replaced by a

call to a method that de-obfuscates it.Field Assignment CL obfuscates a class by inserting a bogus field into a class and then making assignments to that field in specific

locations throughout the code. The specific locations are determined by the random selection of a siblingfield.Method Merger CL merges all of the public static methods that have the same signature in each class into one large master method.Objectify CL takes a class and replaces all the fields with fields of the same name that have type Object; the algorithm runs

through the entire application and fixes the proper references to the modified fields.Publicize Fields CL Makes the fields of a class public.Simple Opaque Predicates CL implements simple boolean identities and adds them to the code. Opaquely true constructs are embedded in the

code, e.g. some constructs based on algebraic properties and known facts in mathematics.Static Method Bodies CL splits all of the non-static methods into a static helper method and a non-static stub that calls it.Bludgeon Signatures MET converts all methods to take Object[] parameter and return Object.Boolean Splitter MET detects boolean variables and arrays and modifies their uses and definitions, by splitting each into 2.Branch Inverter MET exchanges the ”if” and the ”else” part of an if-else statement. It also negates the if condition so that the semantics

is preserved.Buggy code MET selects a random method from the class file, and a random basic block in the method: a copy of the basic block is

made and some additional bug codes are also introduced in this new basic block which changes the local variablevalues. This basic block is bypassed from execution.

Duplicate Registers MET creates an additional variable that has its value changed according to an original local variable. Each referenceto that variable value may have been changed to reference the new variable instead.

Inliner MET inlines static method bodies throughout the code replacing method invocations.Insert Opaque Predicate MET inserts an opaque predicate into every boolean expression. The boolean expressions are all relational operators

that compare integers, so the opaque predicates will simply add an opaquely false value (i.e. value==0) to one ofthe integer operands.

Irreducibility MET adds conditional branches to a method via opaque predicates so that the control flow graph of the resulting methodis irreducible.

Merge Local Integers MET combines two int variables into a single long variable, making access to either more confusing.Opaque Branch Insertion MET randomly inserts branches into a method.Promotion Primitive Registers MET replaces all the local int variables in a function with local java.lang.Integer. This is possible through byte code

manipulation of the all of the instructions that depend on retrieving and storing int values.Primitive Promoter MET changes all primitives in every method into instances of the respective wrapper classes.Random Dead Code MET adds bogus statements onto the end of a Java method. The appended code may include a variety of other instructions

including return instructions. Methods not ending in a return statements will impede reverse engineering tools.Reorder-Instructions MET tries to reorder the instructions within each basic block of a method. The algorithm first creates a list of expression

trees within each block. Once the dependency graph is obtained it writes out the instruction by doing a topologicalsort of the nodes in the dependency graph.

Parameter Reorderer MET shuffles the argument orders for all methods.Transparent Branch Insertion MET randomly inserts branches into a method. The branch will test to see if an Object field of the class is null, and

if so it will branch.Variable Reassigner MET reallocates the local variables in a method, in order to minimize the number of local variable slots used.

Table ISANDMARK – DESCRIPTION OF TYPES AND OBFUSCATION ALGORITHMS

74

A. Object-Oriented MetricsThe Chidamber and Kemerer (C&K) object-oriented met-

rics framework is a suite of metrics for OO design, andis composed of six design metrics: Weighted Methods PerClass (WMC), Depth of Inheritance Tree (DIT), Number ofChildren (NOC), Coupling Between Object Classes (CBO),Response For a Class (RFC), and Lack of Cohesion inMethods (LCOM). These metrics have been used extensivelyby researchers in the past few years, and they have beenvalidated for OO languages.The C&K metrics were extracted by the ckjm tool [18].7

The ckjm program calculates the C&K metrics by processingthe bytecode of compiled Java files: this feature was veryuseful, since the Sandmark tool only produces the bytecodeof the obfuscated classes as an output. The ckjm programalso calculates (for each class) the Number of AfferentCouplings (Ca), and the Number of Public Methods (NPM).

B. Complexity MetricsThe complexity attribute that was examined in this study

is the McCabe cyclomatic complexity index [19]. One issuethat emerges here is that the McCabe index was used to eval-uate OO methods, although being designed for procedural(e.g., C, ADA, etc.) functions: the usage of a metric whichis essentially designed for procedural languages in an OOcontext has been often discussed and partially validated byother authors [20], [21].As mentioned above, the three obfuscation tools produce

bytecode as an output: in order to evaluate the cyclomaticcomplexity of the OO methods, it was necessary to decom-pile the classes to obtain the relative source code: invocationsto the jad decompiler [22] were piped after the productionof the obfuscated bytecode, and the relative source codewas analysed by the Understand tool.8 An overview ofthe toolchain is visualized below (Figure 1): scripts werewritten to perform these operations in a sequence, for allthe algorithms and throughout the toolchain.Since several of the obfuscation algorithms produce spu-

rious classes, in some cases the obfuscated system resultsin more classes and methods than the clean, pre-obfuscationsystem: in the analysis of the complexity, it was decided toanalyse the average cyclomatic complexity (AvgCC) of theoverall system, by summing up all the mccabe indexes ofthe methods in the clean (or obfuscated) systems, and thenaveraging this term by the amount of items counted (e.g.,methods, constructors, interfaces).

V. OBFUSCATIONS RESULTS – PROTOTYPE SYSTEMSThis Section describes the results of the obfuscations as

obtained by the three tool-kits used, and performed on thetwo prototype systems discussed above.

7Available at http://www.spinellis.gr/sw/ckjm/8http://www.scitools.com/

A. Sandmark Obfuscations

The results on the complexity and the C&K metrics ob-tained by applying the 39 Sandmark obfuscation algorithmsare reported and summarized in Table II. Due to spacelimitations, only the results on the “client side” of the Chatapplication are reported. The results on the other studiedsystems (Chat “server side”, Car-Race “client” and Car-Race“server”) share similar patterns: the analysis of one of thesesystems covers most of the results of all the others.The Table reports, as the first line, the baseline (i.e., pre-

obfuscation) characteristics of the Chat client subsystem: itpossesses 72 methods in 13 classes, and their average sizeand complexity are 9.99 LOCs and 1.61 (as McCabe index),respectively. The C&K characteristics in the baseline are alsosummarized as averages: for example, given the 13 classesof the client subsystem, the average number of methodsper class (WMC) is 5.54, the average Depth of InheritanceTree (DIT) is 1.92 and so on. Since we were interested inthe collective effect of the obfuscation algorithms, usingaverages is acceptable since all the classes and methodsshould be manipulated by the obfuscation tools, and theaverage values should reflect these manipulations.The other rows summarise the effect of the various

Sandmark algorithms on the baseline code: the delta changesof each measured attribute x were recorded, defined as theimprovement compared to a baseline (in this case, the pre-obfuscation system):

Delta(x) =xobf − xbl

xbl

(1)

that is, by dividing the difference between the obfuscated(xobf ) and the baseline (xbl) values by the obfuscated ob-servation. For example, the second row of Table II indicatesthat the “Bloated Code” algorithm has increased the averagesize of the methods by some 5%. Where no value is reported,no changes were recorded, i.e. the Delta was 0.The first observation made during the runs of the ob-

fuscation algorithms by Sandmark is that the outputs andtheir attributes have a non-deterministic nature: differentruns produce slightly different values in the observed val-ues, introducing some degree of randomness to the obfus-cation process. These differences only minimally modifythe Delta’s reported below: the reported values have beenaveraged across runs.The effects of the obfuscation algorithms on the C&K

measures are summarized between the 4th and the 12thcolumns of Table II. As visible, 9 out of 15 application-level, 5 out of 7 class-level and 11 out of 17 method-levelobfuscation algorithms do not produce a measurable effecton the OO metrics proposed by the C&K framework. Sincethe OO metrics are defined at the class and the method level,it is quite surprising that only 2 of the class-level algorithmsproduce some effect.

75

Figure 1. Toolchain used

On the other hand, only 1 application-level, and 6 method-level obfuscation algorithms produce a small effect: most ofthe classes have the same metrics in both the clean andthe obfuscated versions, but in some cases a small set ofclasses present changes in some OO metric.9 In general,at the method level, the obfuscation algorithms have no orsmall effects.Finally, 5 application-level and 2 class-level algorithms

produce some visible effects, regarding the C&K metrics:for the majority of the classes, the clean and the obfuscatedversion differ in some or many of the measured characteris-tics. The “Class Splitter” algorithm is particularly relevant,since it adds several spurious classes with high OO metrics,while most of the existing classes have a drop of C&K valuesin their obfuscated version, due to the splitting effect.

B. Allatori ObfuscationsAs done for the Sandmark algorithms, the same applica-

tions were obfuscated with the Allatori tool, by using the twoconfigurations outlined above, one implementing a specificobfuscation of the control flow, and the other without suchoption. The results are presented in Table III: for each of thefour components, the baseline (i.e., pre-obfuscation) averagemetrics are shown (first four rows of Table, bold). Theother rows summarise the Delta’s of the attributes measured(complexity, size, and the design attributes) per component,and per obfuscation algorihm (either “cfo” or “lvn”).The observation on non-deterministic runs for the obfus-

cation algorithms also applies to the Allatori tool: differentvalues of the design characteristics, size and complexity arerecorded based on various runs, which as above reflectssome degree of randomness in the algorithms of Allatori. Afurther observation is based on the minor reconfigurabilityof this tool: apart from the renaming or the watermarkingfacilities, which do not affect the structural characteristics,the obfuscation of the control flow can be either on or off.9For instance, the “Overload Names” algorithm increases the LCOM in

only one obfuscated class.

Table III also reports the Δ’s of the C&K metrics:as above, the averages of the baselines are compared tothe averages of the obfuscated code, and summarized perconfiguration. It is evident that the obfuscation process doesnot affect the number of classes, that remain constant inany of the runs of Allatori on the systems: this means thatan attacker will at least figure out the correct number ofclasses if the encryption is performed with this tool. Alsothe metrics relative to the inheritance (DIT and NOC) remainstructurally unchanged, hence falling short on the defactor-ing techniques that would make the classes more complexand less understandable, by decreasing their encapsulation(DIT), or the number of children classes (NOC).On the other hand, it is also clear that the factors

related to the message passing (RFC), and the couplingbetween objects (CBO), are properly protected in the twoconfigurations of Allatori: the values of the attributes inthe “cfo” configuration are typically larger than the onesin the “lvn” configuration, reflecting a more specific effortin the obfuscation of such control flow in the former. Forboth metrics, it would be desirable to increase the valuesin the obfuscated methods, since it would increase theircomplexity, and decrease their understandability. Finally, thedecrypted code shows an increasing lack of cohesion inthe resulting methods (LCOM), which is therefore addingcomplexity and decreasing the understandability, althoughthis could be due to the LCOM of the new, fake methods,rather than an increased LCOM of existing methods.

C. KlassmasterTMObfuscationsFinally, the Zelix KlassmasterTMtool was used to analyse

the characteristics of the obfuscated code, when applyingtwo configurations, one with extreme values for the ob-fuscation of the underlying classes and methods, and oneimplementing fewer and less extreme algorithms to protectthe code. The results of the runs (also non-deterministic inthe output values) are presented in Table IV.Several observations were possible when analysing the

76

AvgSize AvgCC Methods Classes WMC DIT NOC CBO RFC LCOMChatC (clean) 9.99 1.61 72 13 5.54 1.92 0.15 2 23.92 8.08

Delta’s (by obfuscation algorithm)APP-BL 5% -6% X X X X X X X XAPP-af X X X X X X X X X XAPP-as X X X X X X X X X XAPP-bm X X X X X X X X X XAPP-ce 41% 46% -134 -92% 8.33% 4% -10 -10 37.94% -38.1%APP-cpr X X X X X X X X X XAPP-di 44% 23% X X 1.39% X X 5 12.54% -82.86%APP-fr X X X X X X X X X XAPP-ias X X X X X X X X X XAPP-im 64% 64% -38% X -19.44% X X -3.54% -82.86%APP-on X X X X X X X X 1.9%APP-pa X X X X X X X X X XAPP-rr -3% X X X X X X X X XAPP-sc 3% 3% 15% 10% -39.58% -24% -5% 9.62% -43.41% -86.19%APP-se 2% 2% 1% 8% -5.85% -3.43% -7.14% 21.43% -2.96% -7.14%CL-cs -16% -12% 28% 10 -29.86% -5% 275% 36.54% -38.26% 32.86%CL-fa X X X X X X X X X XCL-mm X X X X X X X X X XCL-ob X X X X X X X X X XCL-pf X X X X X X X X X XCL-smb -132% -28% 26% X 80.56% X X X 18.65% 929.52%CL-sop 7% -1% X X X X X X X XMET-bc 63% 36% -9% X X X X X X XMET-bi X X X X X X X X X XMET-bs 41% -1 X X X X X X X XMET-bsp X X X X X X X X X XMET-dr X X X X X X X X X XMET-il -2% 1% X X 1.39% X X X 1.29% 0.95%MET-iop 5% 2% X X X X X X X XMET-ir 33% 28% -16% X X X X X X XMET-mli X X X X X X X X X XMET-obi 11% -7% X X X X X X X XMET-ppr 1 -9% -3% X X X X X 1.93% XMET-ppt 14% -9% -3% X X X X X 4.18% XMET-rdc X X X X X X X X X XMET-ri X X X X X X X X X XMET-rp 2% X X X X X X X X XMET-tbi 1% 2% X X X X X X 0.64% XMET-vr -3% X X X X X X X X X

Table IISANDMARK - EFFECTS OF OBFUSCATION ON SIZE, COMPLEXITY AND DESIGN METRICS. AN “X” SIGNALS “NO CHANGES” CONTRIBUTED TO THE

ATTRIBUTE BY THE SANDMARK ALGORITHM

results of this tool on the OO metrics: on the one hand, andsimilarly to the Allatori runs, the jad decompiler decryptsthe same number of classes as the ones originally present inthe pre-obfuscation system, for both the run configurationswith Zelix KlassmasterTM.Also similarly to the Allatori runs, the inheritance charac-

teristics are not generally modified in either the NOC or theDIT values, for any of the systems under stress. On the otherhand, the message-passing characteristics (CBO and RFC)show that, depending on the algorithm, the obfuscated codewill have a larger or a smaller increase in the number ofcouplings between methods, which is a desirable option forthe defactoring and the obfuscation of the classes and theirmethods.Finally, the LCOM attribute, after the decompilation by

jad, shows that the classes increase their cohesion after the

obfuscation process, but this again could be due to thefact that less methods were successfully decrypted, and theLCOM measures (at the class level) could be reflecting that.

VI. REPLICATION OF BASE STUDY

The findings of the previous Section could be biased dueto the small sample, or the size of the considered prototypes.In order to replicate the experiments with more realisticsystems, we considered the most active Java projects hostedin one of the largest Open Source portal (SourceForge).10Ten of the most downloaded and contributed to systemswere studied under the same perspective of obfuscating theirclasses, and the resulting outcomes, as compared with theoriginal classes and their structural characteristics.

10http://sourceforge.net

77

Avg size AvgCC Methods Classes WMC DIT NOC CBO RFC LCOM

Clean

CarC 7.98 1.59 108 13 8.38 1.54 0.08 2.15 22.31 14.38ChatC 9.99 1.61 72 13 5.54 1.92 0.15 2 23.92 8.08carserver 14.73 2.31 26 15 8.4 1.33 0 2.27 21.13 14chatserver 11.96 2.36 56 8 7.25 1.5 0 1.63 28.75 10.13

Delta’s (by obfuscation algorithm)

CarC cfo 50.99% -12.42% 25.00% X 1.83% X X 14.29% 5.17% 2.67%lvn 5.36% -2.38% 42.86% X 1.83% X X 10.71% 4.14% 2.67%

ChatC cfo 50.91% -8.75% 33.33% X 2.78% X X 34.62% 6.43% 6.67%lvn 2.53% -8.83% 29.41% X 2.78% X X 30.77% 5.47% 6.67%

carserver cfo 18.72% -44.55% 54.39% X 1.59% X X 17.65% 5.36% 2.86%lvn -86.01% -53.85% 58.06% X 1.59% X X 23.53% 5.05% -9.52%

chatserver cfo 45.71% -21.71% 29.11% X 13.98% 4.55% X 45.83% 19.50% Xlvn 9.35% -5.67% 38.46% X 13.98% 4.55% X 45.83% 17.86% 25.39%

Table IIIALLATORI - EFFECTS OF OBFUSCATION ON SIZE, COMPLEXITY AND DESIGN METRICS. AN “X” SIGNALS “NO CHANGES”

Avg size AvgCC Methods Classes WMC DIT NOC CBO RFC LCOMDelta’s (by obfuscation algorithm)

CarC GRR 38.36% 6.15% -9.09% X 2.75% X X 10.71% 4.14% -3.21%light 18.41% 19.57% -6.93% X 1.83% X X 10.71% 3.79% -3.21%

ChatC GRR 34.32% 13.35% -12.5 X 11.11% X X 3.85% 10.29% -8.57%light 5.03% 12.84% -9.09% X 11.11% X X X 10.29% -8.57%

carserver GRR 16.54% 3.99% 77.19% X 4.76% X X 23.53% 6.94% -10.48%light -31.86% 3.85% 25.71% X 3.17% X X 20.59% 6.62% -5.24%

chatserver GRR 32.46% 15.92% X X 7.94% X X 27.78% 9.09% 4.71%light 17.04% 11.61% 1.75% X 22.67% X X 27.78% 13.21% 42.55%

Table IVKLASSMASTERTM- EFFECTS OF OBFUSCATION ON SIZE, COMPLEXITY AND DESIGNMETRICS. AN “X” SIGNALS “NO CHANGES”

A summary of the characteristics and the domains of theselected systems is available in Table V: the applicationsrange both in terms of their topics, and in their sizes, but atthe time of writing (July 2012) they represent 10 of the 25most downloaded and active projects on SourceForge.11

Name Domain nr of classesCatacombae File-systems 345Freemind Visualisations, Mind maps 406Ipscan Network scanner 423JBoss Community Front-end to JBoss installer 711SQuirrel SQL front-end 396SweetHome3D 3D Modeling 178TripleA Games 1,700TuxGuitar Audio Editor 478Vuze File Sharing 3,253Weka Data Mining 1,168

Table VREPLICATION OF BASE EXPERIMENT – SUMMARY OF SYSTEMS

For the replication of the prototype studies, the sameapproach was used as explained above, and the three tool-kits applied with the same configurations and switches, inorder to benefit from the automation of the obfuscation

11As provided in the “Recently updated” Section of the Java ap-plications, http://sourceforge.net/directory/language:java/os:linux/freshness:recently-updated/.

and data-gathering process. For the sake of simplifyingthe approach, and in case of several archives of classes(e.g., “jars”) found in a project, only the main archivewas considered (typically named after the project’s name).The application domains and the number of classes in eachproject’s main archive are reported in Table V.Table VI summarises the effects of the obfuscation algo-

rithms on the structural characteristics: for space reasons,the Table reports the results of the Allatori toolkit only.The first observation that we made is about the resultingsize of the obfuscated systems: although the two Allatoriconfigurations (i.e., cfo and lvn above) produce very similarresults, differently from Klassmaster and from the prototypesystems, Allatori generates a large spurious classes in theobfuscated systems. This code often doubles the amountof total classes detected, when reverse engineering theobfuscated code; it does not add to the overall structuralcomplexity, but it only increases artificially the size of theobfuscated outcome.Furthermore, the DIT and NOC attributes, detected as

globally “not changed” in the prototype systems, also mani-fest the same behaviour when obfuscating larger systems:if the KlassmasterTMoutput shows no overall changes inthe two metrics, the Allatori toolkit tends to even de-crease the average depth-of-inheritance-tree and number-of-children per class of the larger systems. Therefore, instead of

78

producing a defactoring, the resulting obfuscated code tendsto be less structured in terms of hierarchy of the memberclasses.Finally, the WMC measure indicates that more “bloated”

classes do not produce an equivalent number of methods,hence decreasing the number of methods per class. Finally,also the coupling-between-objects (CBO), and the lack-of-cohesion (LCOM) tend to decrease overall, making theobfuscated system larger overall, but not necessarily morestructurally complex.

VII. DISCUSSION AND IMPLICATIONS

The obfuscation results by different software tools showthat at least two strands of research should be pursued: atfirst the obfuscation process should reach the point wherethe decompilation (with commercial or freely availabletools) obtains a number of source items (classes, methods,namespaces, etc) significantly different from the original,pre-obfuscation system. The tools analysed in this paper, atvarious degrees, all increase the number of methods, or makethem non-decryptable by decompilation tools (resulting inless methods); on the other hand, the tested obfuscationtools do not modify the number of classes, that are moreclearly identified in the decompilation process. This alreadyproduces an advantage when the attacker tries to isolate themain building blocks of the application at hand.Secondly, the obfuscation process should consider the

relevant OO metrics to work towards and systematicallyproduce code that works against the refactoring efforts:differently from other OO metrics, the DIT and NOC metricsdo not seem to be included in any action by the obfuscationprocess, and the cyclomatic complexity is only partiallycovered by the tested tools. DIT and NOC talk about theencapsulation of the code: the levels of inheritance that theobfuscated code uses should be maximized for decreasingthe understandability by the attacker. Similarly, decompilingcode that has a lower degree of complexity (as the portionof methods that increase the size but decrease the overallcomplexity) produces the effect that it is likely to be easierto detect (and discard) the boated code by isolating large,non-complex code.

A. Threats to ValidityIn the following, threats to internal (whether confounding

factors can influence the findings), external (whether resultscan be generalized), and construct validity (relationshipbetween theory and observation) are illustrated.Regarding the internal validity, the usage of the selected

tools should be complemented with other tools: other ob-fuscators12 should also be used to cross-check which of thealgorithms provide the obfuscation that optimizes which one

12Available obfuscation tools are ProGuard, yGuard, JODE, JavaGuard,RetroGuard, jarg, etc

of the structural metrics or complexity characteristics. Re-garding the external validity,few obfuscated systems cannotbe enough to draw conclusions on the effectiveness of thestudied algorithms: as a partial remedy to that, the processof obfuscating the classes and to extract the results has nowbeen full automatized, so it is not an issue to replicate it onmore and larger systems.Regarding the construct validity, it was assumed that

obfuscations that affect the naming of variables and con-stants are less structurally-oriented and increase “less” thecomplexity of the underlying source code. While this is notalways true, the resolution of obfuscated names is typicallyeasier, once the logic of the naming obfuscation has beendecrypted. We also assumed that obfuscation should alwaysaffect the structure of a program, but in some cases the C&Kmetrics could be too coarse metrics to detect such structuralchanges.

VIII. CONCLUSION AND FURTHER WORK

The obfuscation of source code and binaries should havemeasurable and visible effects: this paper attempted toquantify the effects of various obfuscation techniques on thestructural and complexity metrics of Java code. The suite ofalgorithms provided by the Sandmark tool, two configura-tions of the Allatori tool, and other two configurations of theZelix KlassmasterTMsuite were used to produced obfuscatedcode to be later measured using the Chidamber and Kemerersuite of metrics.It was found that the majority of the Sandmark algorithms

have no effect on the structural and complexity attributes,while often inserting more code that could create issues ofunderstandability, while not adding more complexity. Whenmore complexity is added to the obfuscated system, thesame algorithms are also producing changes in the structuralmetrics: this is observed consistently in the application-level obfuscation algorithms. On the other hand, it was alsofound that the tested commercial tools (Allatori and ZelixKlassmasterTM) produce obfuscated code that does not affectcertain OO metrics (the DIT and the NOC metrics), or thenumber of classes; and that they are not consistently actingtowards the defactoring of the underlying code, in terms ofthe obfuscation of the control flow.This work should be expanded in the future by using a

diverse array of obfuscation tools, and by applying a se-quence of obfuscation algorithms to obtain a more complexfinal output.

IX. ACKNOWLEDGEMENTS

The authors would like to thank the ZelixKlassmasterTMdevelopers for the full evaluation copyof their tool, and the feedback provided.

79

Delta’s(cfo and lvn) Classes WMC DIT NOC CBO RFC LCOMCatacombae 77% -22.23% -12.93% -7.96% -3.67% -18.91% -37.98%Freemind 116% -40.26% 9.91% -44.34% -26.17% -37.7% -53.03%Ipscan 34% -21.73% 8.91% -14.77% 8.52% -16.38% -25.56%JBoss Community 44% -19.11% 2.29% -3.52% 1.22% -15.95% -28.58%SQuirrel 74% -35.43% -5.15% -7.05% -4.39% -26.29% -42.5%SweetHome3D 701% -75.24% -22.55% -32.38% -65.65% -74.38% -87.11%TripleA 111% -39.15% -10.51% -7.44% -14.7% -34.46% -52.12%TuxGuitar 99% -37.00% -8.69% -42.54% -25.64% -36.70% -49.33%Vuze 137% -42.08% -9.07% 10.12% -19.75% -36.86% -55.94%Weka 114% -42.6% 6% -35.22% -24.01% -38.25% -52.65%

Table VIREPLICATION OF EXPERIMENTS - EFFECTS OF OBFUSCATION ON SIZE, COMPLEXITY AND DESIGN METRICS (ALLATORI)

REFERENCES

[1] P. Falcarin, C. Collberg, M. Atallah, and M. Jakubowski,“Guest editors’ introduction: Software protection,” IEEESoftw., vol. 28, pp. 24–27, March 2011. [Online]. Available:http://dx.doi.org/10.1109/MS.2011.34

[2] B. Wyseur, “White-box cryptography,” Ph.D. dissertation,Katholieke Universiteit Leuven, 2009. [Online]. Available:http://www.cosic.esat.kuleuven.be/publications/talk-98.pdf

[3] C. S. Collberg and C. Thomborson, “Watermarking, tamper-proofing, and obfuscation: tools for software protection,”IEEE Trans. Softw. Eng., vol. 28, pp. 735–746, August2002. [Online]. Available: http://dl.acm.org/citation.cfm?id=636196.636198

[4] F. Simon, F. Steinbruckner, and C. Lewerentz, “Metrics basedrefactoring,” in Proc. of the Fifth European Conference onSoftware Maintenance and Reengineering, ser. CSMR ’01.IEEE, 2001, pp. 30–.

[5] C. Collberg, G. Myles, and A. Huntwork, “Sandmark–atool for software protection research,” IEEE Security andPrivacy, vol. 1, pp. 40–49, July 2003. [Online]. Available:http://dl.acm.org/citation.cfm?id=939830.939941

[6] S. R. Chidamber and C. F. Kemerer, “A metrics suite forobject oriented design,” IEEE Trans. Softw. Eng., vol. 20, pp.476–493, June 1994.

[7] B. Barak, O. Goldreich, R. Impagliazzo, S. Rudich, A. Sahai,S. Vadhan, and K. Yang, “On the (im)possibility of obfuscat-ing programs,” in Advances in Cryptology CRYPTO 2001,ser. LNCS. Springer, 2001, vol. 2139, pp. 1–18.

[8] M. Ceccato, M. Di Penta, J. Nagra, P. Falcarin, F. Ricca,M. Torchiano, and P. Tonella, “Towards experimentalevaluation of code obfuscation techniques,” in Proceedingsof the 4th ACM workshop on Quality of protection, ser. QoP’08. New York, NY, USA: ACM, 2008, pp. 39–46. [Online].Available: http://doi.acm.org/10.1145/1456362.1456371

[9] R. Scandariato, Y. Ofek, P. Falcarin, and M. Baldi,“Application-oriented trust in distributed computing,” inARES 08. IEEE, 2008, pp. 434–439.

[10] P. Falcarin, R. Scandariato, and M. Baldi, “Remote trust withaspect-oriented programming,” in AINA 2006. IEEE, 2006,pp. 451–456.

[11] C. Collberg, C. Thomborson, and D. Low,“A taxonomy of obfuscating transformations,”Tech. Rep. 148, Jul. 1997. [Online].Available: http://www.cs.auckland.ac.nz/collberg/Research/Publications/CollbergThomborsonLow97a/index.html

[12] B. Anckaert, M. Madou, B. De Sutter, B. De Bus,K. De Bosschere, and B. Preneel, “Program obfuscation:

a quantitative approach,” in Proceedings of the 2007 ACMworkshop on Quality of protection, ser. QoP ’07. NewYork, NY, USA: ACM, 2007, pp. 15–20. [Online]. Available:http://dx.doi.org/10.1145/1314257.1314263

[13] H. Goto, M. Mambo, K. Matsumura, and H. Shizuya, “An ap-proach to the objective and quantitative evaluation of tamper-resistant software,” in Proc. of the Third Int. Workshop onInformation Security, ser. ISW ’00. Springer, 2000, pp. 82–96.

[14] S. K. Udupa, S. K. Debray, and M. Madou, “Deobfuscation:Reverse engineering obfuscated code,” in Proc. of the 12thWorking Conference on Reverse Engineering. IEEE, 2005,pp. 45–54.

[15] I. Sutherland, G. E. Kalb, A. Blyth, and G. Mulley, “Anempirical examination of the reverse engineering process forbinary files,” Computers & Security, vol. 25, no. 3, pp. 221–228, 2006.

[16] M. Ceccato, M. D. Penta, J. Nagra, P. Falcarin, F. Ricca,M. Torchiano, and P. Tonella, “The effectiveness of sourcecode obfuscation: An experimental assessment,” in ICPC.IEEE Computer Society, 2009, pp. 178–187.

[17] A. L. Hosking, N. Nystrom, D. Whitlock, Q. Cutts,and A. Diwan, “Partial redundancy elimination for accesspath expressions,” Software: Practice and Experience,vol. 31, no. 6, pp. 577–600, 2001. [Online]. Available:http://dx.doi.org/10.1002/spe.371

[18] M. Jureczko and D. Spinellis, Using Object-Oriented DesignMetrics to Predict Software Defects, ser. Monographs of Sys-tem Dependability. Wroclaw, Poland: Oficyna WydawniczaPolitechniki Wroclawskiej, 2010, vol. Models and Methodol-ogy of System Dependability, pp. 69–81.

[19] T. J. McCabe, “A complexity measure.” IEEE Trans. SoftwareEng., pp. 308–320, 1976.

[20] R. Vasa and J.-g. Schneider, “Evolution of cyclomaticcomplexity in object oriented software,” Proceedings of7th ECOOP Workshop on Quantitative Approaches inObjectOriented Software Engineering QAOOSE 03, pp.1–5, 2003. [Online]. Available: http://www.it.swin.edu.au/personal/jschneider/Pub/qaoose03.pdf

[21] Z. Lv, S. Ri, D. E. Uhvhdufk, D. Dw, Y. Wkh, X. Ri,W. Srsxodu, Q. D. S. Zrun, and Z. H. Vkrzhg, “On therelationship between cyclomatic complexity and oo ness,” 9thECOOP Workshop on Quantitative Approaches in ObjectO-riented Software Engineering, 2005.

[22] P. Kouznetsov, “Jad - the fast JAva Decompiler.” [Online].Available: http://www.kpdus.com/jad.html

80

[IEEE 2012 19th Working Conference on Reverse Engineering (WCRE) - Kingston, ON, Canada...

Documents

Transcript of [IEEE 2012 19th Working Conference on Reverse Engineering (WCRE) - Kingston, ON, Canada...