Chapter 4 Analysis and Detection technology of Malicious code Malicious code defense in mobile...

Chapter 4

Analysis and Detection technology of Malicious code

Malicious code defense in mobile networks Funded by Intel Corp.

OUTLINE

• Static analysis techniques

• Dynamic Analysis and Virtualization Technology

• Information Flow Analysis

• Rootkit Analysis

4.1 Static Analysis techniques


• What is Static Analysis techniques• Static analysis, static projection, and static scoring are terms

for simplified analysis wherein the effect of an immediate change to a system is calculated without respect to the longer term response of the system to that change. Such analysis typically produces poor correlation to empirical results.

• Its opposite, dynamic analysis or dynamic scoring, is an attempt to take into account how the system is likely to respond to the change. One common use of these terms is budget policy in the United States, although it also occurs in many other statistical disputes.


• Applying Machine Learning for Phishing Detection– Machine learning involves building computer

applications that can learn and improve from experience.

– However, unlike predicting spam, only a few studies have used machine learning techniques to predict phishing.


– In the literature, there exist several machine learning techniques for binary classification—that is, classifiers that assign instances into two groups of data.

– For example, spam or phishing prediction is a binary classification problem since e-mails are either classified as legitimate or phishing based according to certain characteristics.


• Most of the machine learning algorithms discussed here are categorized as supervised machine learning, where an algorithm (classifier) is used to map inputs to desired outputs using a specific function.


Bayesian Additive Regression Trees•Bayesian Additive Regression Trees (BART) is a new learning technique, proposed by Chipman et al.,3 to discover the unknown relationship between a continuous output and a dimensional vector of inputs.


• BART discovers the unknown relationship f between a continuous output Y and a p dimensional vector of inputs x = (x1,…,xp).

• Assume Y = f(x) + ε, where ε ∼ N(0,s2) is the random error. Motivated by ensemble methods in general, and boosting algorithms in particular, the basic idea of BART is to model or at least approximate f(x) using a sum of regression trees,


• where each gi denotes a binary tree with arbitrary structure, and contributes a small amount to the overall model as a weak learner, when m is chosen large. .

Note that the BART contains multiple binary trees since it is an additive model. Each node in the tree represents a feature in the dataset, while the terminal nodes represent the probability that a specific e-mail is phishing, given that it contains certain features


An Example of a Binary Tree


• Classification and Regression Trees– CART, or Classification and Regression Trees, is a model that

describes the conditional distribution of y given x. The model consists of two components: a tree T with b terminal nodes; and a parameter vector Θ = (θ1, θ2, …, θb), where θi is associated

with the ith terminal node. – The model can be considered a classification tree if the

response y is discrete, or a regression tree if y is continuous. A binary tree is used to partition the predictor space recursively into distinct homogenous regions, where the terminal nodes of the tree correspond to the distinct regions.


An Example

of CART


• Logistic Regression– Logistic regression is the most widely used

statistical model in many fields for binary data(0/1 response) prediction, due to its simplicity and great interpretability.

– Logistic regression performs well when the relationship in the data is approximately linear.


• Neural Networks– A neural network is structured as a set of

interconnected identical units (neurons). The interconnections are used to send signals from one neuron to the other. In addition, the interconnections have weights to enhance the delivery among neurons.


• Random Forests– Random forests are classifiers that combine many

tree predictors, where each tree depends on the values of a random vector sampled independently.

– Random forests can handle large numbers of variables in a dataset. Also, during the forest building process they generate an internal unbiased estimate of the generalization error. In addition, they can estimate missing data well.


• Support Vector Machines– Support Vector Machines (SVM) is one of the most popular

classifiers these days. The idea here is to find the optimal separating hyperplane (line; N+1) between two classes by maximizing the margin between the classes’ closest points.

– Assume that we have a linear discriminating function and two linearly separable classes with target values +1 and –1.


Support Vector Machines


• Examining the General Analysis Process– Preparing an Isolated Environment– Collecting the Necessary Tools– Performing a Static Analysis– Dynamic Analysis


Detailing the Analysis of FlexiSPY– FlexiSPY is a unique code that serves as

an example of why debugging skills are necessary.

– A deep analysis of this code provides a researcher not only with knowledge of how the program works, but also exposes flaws in this gray ware that can be exploited to make it much more malicious.


• What Is FlexiSPY– FlexiSPY represents a unique example of

malware for mobile devices. This program is essentially spyware, in the most classic sense.

– Its main function is to sit behind the scenes and monitor e-mails, text messages, phone logs, and URLs visited, and then post this data to a central site that can be viewed by the phone’s alleged owner.


– In addition to this, the software allows a remote person to call the phone and listen into local conversations, as well as listen into live phone calls.

– To most members of the public, this kind of software is threatening and is unwanted—if not out and out malware.


• Static Analysis of FlexiSPY– Before looking at this example during execution,

we need to first examine it as a set of files. The following breaks down how we handled this process.

• Installer Analysis– FlexiSPY comes in the form of a CAB file, which serves

as an executable installation package. Contained in the file are all the pieces and parts needed to allow the program to hook into the various communication aspects of the phone.


• In addition to this, the CAB file contains instructions for the installation process in the _setup.xml file:

• 1. Create the \Windows\VPhone directory.• 2. Extract RBackup.exe to \Windows\VPhone.• 3. Extract config to \Windows\VPhone.• 4. Extract setting to \Windows\VPhone.• 5. Extract VCStatus to \Windows\VPhone.


• 6. Extract 1.sys, 2.sys, and 3.sys files to \Windows\VPhone.

• 7. Extract Response.txt to \Windows\VPhone.• 8. Extract VPhone.dll to \Windows directory.• 9. Extract FPMapi.dll to \Windows directory.• 10. Extract VRILLibCM.dll to \Windows directory.• 11. Create HKLM\Software\Microsoft\Inbox\Svc\SMS\

Rules\{F1488272-B6ED-455d-8D38-F3F00F6DA55F} in Registry and assign it a value of 1.


• 12. Create HKCR\CLSID\{F1488272-B6ED-455d-8D38-F3F00F6DA55F}\InProcServer32 and assign it a value of FPMapi.dll.

• 13. Create HKLM\Services\VPhone and create the following values:a. Dll = VPhone.dll

• b. Prefix = FPS• c. Order = 9• d. Keep = 1• e. Index = 0• f. Context = 0• g. DisplayName = FP Service• h. Description = FP Service


• 14. Create HKLM\Software\VPhone\UC key and assign it a value of 1.

• From this, we know where the core files are located and how the application is staged to intercept communications.


• File Analysis– In this case, the next step was to sit down with IDA

and a hex editor and examine the files to determine what they did and give an idea of where to take the research.

• We first loaded up each of the core DLL files into IDA and examined them for anything of interest.

• This included a close look at the Strings and Names data, which tend to provide numerous valuable tips. The following are some things we learned.


• VPhone.DLL This file is the core component to FlexiSPY and is responsible for managing the other pieces of the program.

• VRILLibCM.dll This file is responsible for obtaining cell tower information.

• fpmapi.dll This file collects the data related to e-mail, text messages, and more.

• rbackup.exe This file handles the posting of data to the Internet, verifies the program is properly activated, and that it is associated with the right phone number.

• 1.sys, 2.sys, 3.sys Files to which data is stored.

• Setting An encrypted file that holds the setting information.


• Setting File Analysis– Of all the files, the setting file was the most

interesting because it was encrypted.– we took a look at the first segment in the file:

f&r g&v f&u f&y h&r g&v.– When we looked at it in its HEX equivalent, we

noted a pattern (## 26 ## 20 ## 26 ## 20…):

66 26 72 20 67 26 76 20 66 26 75 20 66 26 79 20 68 26 72 20 67 26 76


• Subtract 0x36 from the left side of the “&” character.

• Subtract 0x41 from the right side of the “&”.

• We get the result:• 66 26 72 20 67 26 76 20 66 26 75 20 66 26 79 20 68 26

72 20 67 26 76 -36 -41 -36 -41 -36 -41 -36 -41 -36 -41 -36 -41 30 31 31 35 30 34 30 38 32 31 31 35

• =011504082115


• With the ability to view this file, victims can access the hidden control panel of the software and learn who is spying on them.

• This includes the mobile number that is permitted to remotely monitor the device, the phone numbers in the watch list, as well as what the software is monitoring.


• 0345612356655 ← Access code to control panel

• +017173236542 ← Remote number

• 323165498843894 ← SIM number

• mobile.flexispy.com/service ← Address where data is posted

• mobile.aabackup.info/service

• mobile.000-111-222-333.info/service






• mobile.555-666-777-888.info/service• mobile.666-777-888-999.info/service• mobile.777-888-999-111.info/service• mobile.888-999-111-222.info/service• mobile.999-111-222-333.info/service• vervata.com/t4l-mcli/cmd/productactivate• aabackup.com/t4l-mcli/cmd/productactivate• 000-111-222-333.com/t4l-mcli/cmd/productactivate


• 111-222-333-444.com/t4l-mcli/cmd/productactivate• 222-333-444-555.com/t4l-mcli/cmd/productactivate• 333-444-555-666.com/t4l-mcli/cmd/productactivate• 444-555-666-777.com/t4l-mcli/cmd/productactivate• 555-666-777-888.com/t4l-mcli/cmd/productactivate• 666-777-888-999.com/t4l-mcli/cmd/productactivate


• The following PHP code will allow you to decrypt your own file:

// THIS FUNCTION BORROWED BY adlerweb AT

//www.thescripts.com/forum/thread519762.html

function ascii2hex($ascii) {

$hex = ‘’;

for ($i = 0; $i < strlen($ascii); $i++) {

$byte = strtoupper(dechex(ord($ascii{$i})));

$byte = str_repeat(‘0’, 2 - strlen($byte)).$byte;

$hex.=$byte;

}

return $hex;

}


// THIS FUNCTION BORROWED BY adlerweb AT

//www.thescripts.com/forum/thread519762.html

function hex2ascii($hex){

$ascii=‘’;

$hex=str_replace(“ “, ““, $hex);

for($i=0; $i<strlen($hex); $i=$i+2) {

$ascii.=chr(hexdec(substr($hex, $i, 2)));

}

return($ascii);

}


$handle = @fopen(‘<input file>’, “r”);

if ($handle) {

while (!feof($handle)) {

$lines[] = fgets($handle, 4096);

}

fclose($handle);

foreach ($lines as &$value) {

$temp=ascii2hex($value);

$lineArray=str_split($temp,2);


foreach ($lineArray as $char){

if ((($char == “26”) and ($lineArray[$i+2]==”20”))){

$orgString=$orgString.hex2ascii($lineArray[$i1]).hex2ascii($char).hex2ascii

($line Array[$i+1]);

print hex2ascii(dechex(hexdec($lineArray[$i-1]) hexdec(36))).hex2ascii(dechex

(hexdec($lineArray[$i+1])-hexdec(41)));

$breakFlag=”on”;

}


elseif (($char == “26”) and ($lineArray[$i-2]==”20”) and ($lineArray[$i+2] != “26”)){

$orgString=$orgString.hex2ascii($char).hex2ascii($lineArray[$i-1]);

print hex2ascii(dechex(hexdec($lineArray[$i-1])-hexdec(36)));

$breakFlag=”on”;

}

If ($char == “00” and $breakFlag==”on”){

print “<br>”;//.$orgString.”<br>”;

$breakFlag=”off”;

$orgString=””;

}

}

}}

4.2 Dynamic analysis andVirtualization technology

Introduction

• This section will introduce analysis techniques for mobile malware. It will transfer well known techniques from the common computer world to the platforms of mobile devices.

• One item growing in popularity is the dynamic analysis of programs. A program will be started in an environment, where all of its actions are logged at the level of system calls.

• This section explains how to design a software tool (a sandbox) for dynamic software analysis and how to use the tool MobileSandbox for dynamic software analysis.

Introduction

• The main idea of dynamic analysis is executing a given sample in a controlled environment, monitoring its behavior, and obtaining information about its nature and purpose. – This is especially important in the field of malware

research because a malware analyst must be able to assess a program’s threat and create proper countermeasures.

– While static analysis might provide more precise results, the sheer mass of newly emerging malware each day makes it impossible to conduct a static analysis for even a small portion of today’s malware.

Outline

• 4.2.1 Learning about Dynamic Software Analysis– Designing a Sandbox Solution– Import Address Table Patching– Kernel-Level Interception– Porting to Other Mobile Operating Systems– Notes on Interception Completeness

• 4.2.2 Using MobileSandbox– Using the Local Interface– Using the Web Interface– Analyzing within the Device Emulator– Analyzing on a Real Device– Reading an Analysis Report

Designing a Sandbox Solution

• When designing a sandbox, the questions arise: – What extent of the behavioral data of a

sample should be detected and logged? – The second design decision is the

environment in which the sandbox works. – Another design decision is defining a place

to store the log data. – A remaining question is: How much time do

we want to analyze a sample?


• Components of MobileSandbox• The sandbox consists of the following files:• ■ MSandboxDLL.dll This is where the user-level

hooking and the main part of the hook-handling are implemented. The DLL is injected into each analyzed process.

• ■ KernelHookService.dll This DLL contains all the kernel-level system call interception code. It is injected into the kernel process nk.exe. See the section “Kernel-Level Interception” later in this chapter.


• ■Start.exe This program initializes the process, which should then be analyzed, and thus performs the injection of MSandboxDLL.

• ■ Host.exe In contrast to the already mentioned files, Host.exe is a Win32 PC program. It holds a TCP connection to an attached Windows Mobile device via ActiveSync. It is responsible for the initialization of an analysis and receives log data directly from the device’s MSandboxDLL.


• Prolog and Epilog• There exist two different central functions,

named MainProlog and MainEpilog. The former gets called before the execution is passed on to the original system call, while the latter is called directly after the original system call has finished.

• In addition to these general functions, each hooked system call needs individual stubs that prepare the entrance of MainProlog and MainEpilog and perform cleanup operations when the hook is finished.


• Therefore, four stubs are set up at runtime for every system call: PreProlog, PostProlog, PreEpilog, PostEpilog.

• Each stub is made up of a small number of ARM assembler instructions. This is necessary because we need direct access to the CPU registers to not corrupt the parameters, which would inevitably lead to program inconsistency sooner or later.

• MainProlog is responsible for logging the hook and also handles special system calls that need to be intercepted explicitly in order to sustain the completeness and integrity of the sandbox.


• Extracting Additional API Parameter Information• Now that we have shown how the sandbox intercepts API

calls in a generic way, the question arises as to what additional call information it detects and extracts. Of course, we would also like to log the parameters of a hooked system call.

• Since we have a generic handler, we need to have a database that holds information about all the relevant system calls and their number of parameters—ideally, also the name and type of each parameter for increased expressiveness.


• In order to generate this database in an automatic and therefore convenient way, we made use of the tools doxygen and dumpbin.– Along with the Windows Mobile Platform SDK (available from

Microsoft for free), we can then parse the standard Windows include files with doxygen, dump the linking information from the corresponding LIB files with the help of dumpbin, and afterwards combine both results in an automatic way with a self-made Perl script.

– The result is a database that holds the number of parameters with their individual type and name for all standard Windows Mobile APIs.


• DLL Injection• The injection procedure, in detail, is as follows:• 1. The sample is started in “suspended mode,”

which means that the executable file is loaded into the device memory, but the main thread is not started.

• 2. MobileSandbox saves a part of the sample’s program code and overwrites it with its own instructions.

• 3. The CPU context is changed with SetThreadContext so that the PC register points to the custom code of MobileSandbox.


• 4. Now the sample’s main thread is started. The custom code then uses LoadLibrary to load MSandboxDLL into the sample’s address space. Subsequently, MSandboxDLL initializes the hooking.

• 5. Finally, the sample is suspended, its original state is restored, and the main thread is started again.


• Talking with the Host Computer• An important requirement to ensure the integrity of the

analysis is logging to a remote place rather than saving the log on the device only. MobileSandbox implements this communication of the device to a host system with a TCP connection over ActiveSync.

• The ActiveSync connection is a feature of Windows Mobile and is established automatically when a device is connected to the host via USB.

• An ActiveSync connection between the emulator and the host can also be set up with the help of the freely available Device Emulator Manager. Both endpoints get an IP address and can subsequently establish a TCP communication.


• In order to access the device from the connected host, ActiveSync provides the Remote API (RAPI) functions. Therefore, it is possible to perform file system operations or start processes on the device.

After the successful injection of MSandboxDLL, a TCP connection to the host system is established and every log entry is sent immediately upon occurrence.


• Dereferencing Pointer Parameters• MobileSandbox tries to dereference pointer

parameters automatically when possible. Whenever this fails, for example when a pointer points to more complex data structures, we provide a manual solution for a given subset of system calls.

• MobileSandbox is hence able to dereference the pointers and additionally log the data structure when it is required. This is especially true for the mobile messaging methods.

Import Address Table Patching

• Environment• CWSandbox rewrites the first portion of the method in

the DLLs. This is impossible in Windows Mobile because many DLLs are saved in read-only memory. We use another standard method instead, patching the import address table (IAT).

• When an executable starts, the Windows loader looks up the addresses of each used system call and inserts them into the IAT, because these addresses are not known at compile time. A system call in the program reads the system call’s address out of the IAT, and then jumps to this address.


• Patching the Loaded Executable• After the Windows loader filled the IAT, MobileSandbox

does some steps that address of every entry in the IAT is changed.– For every changed address, four functions are set up

(PreProlog, PostProlog, PreEpilog, and PostEpilog). They handle saving and restoring the current processor state and calling the two main functions of MobileSandbox (MainProlog and MainEpilog).

– The IAT entry for each system call now points to its corresponding PreProlog function, which is the unique entry point for every system call.


• Unfortunately, a malware sample is able to circumvent the MobileSandbox method.

• A program does not need to use the IAT, but may calculate the system call address itself in advance.

• Whenever it wants to use a system call, it sets the address and sets the system into kernel mode.

• MobileSandbox is not able to log this event with the IAT patching technique because it has no access to the kernel structures.

Kernel-Level Interception

• Windows CE System Calls• System calls are typically implemented by

executing dedicated software interrupts like int2e in Windows NT. Subsequently, a handler function is executed in the kernel, the requested system call is processed, and finally the kernel gives execution back to the initiator of the system call in user space.

• The requested function and the parameters are given by the parameters of the interrupt call and the user space stack. Windows CE uses a slightly different approach.

Kernel-Level Interception• The transition from user space to kernel space is

achieved by jumping to a specially crafted invalid memory address consisting of an architecture-dependent fixed offset, an APISet number and a method number.

• Consequently, the exception dispatcher is executed and checks whether or not the address is assigned to a certain system call. Therefore, a special area of the memory is reserved for such system call traps (called the “kernel trap area”).

• On ARM processors, this area is located between the memory addresses 0xF0008000 and 0xF0010000, and kernel trap addresses can be computed by the formula

• 0xF0010000–((APISetID<<8)_MethodID)*4


• Protected Server Libraries– Windows CE loads device drivers as non-privileged user

mode processes. As a consequence, system calls are processed in separate processes, whose executions must take place in kernel mode.

– Each device driver process that exports system call APIs must register its own APISet first by calling the special functions CreateAPISet and RegisterAPISet.

– The number of different APISets is limited to 32, where the lower 16 identifiers are reserved for the kernel.

Kernel-Level Interception• Windows CE lets threads migrate between both

processes in a system call for the sake of performance. Therefore, the current process of a thread does not necessarily have to be the thread’s owner. – This information can be obtained by calling

GetCurrentProcess, GetOwnerProcess, and GetCallerProcess.

• The latter returns the caller process of the current protected server library (PSL) API, while GetOwnerProcess obtains the process which really owns the thread performing the function call.

• a system call in its original form goes through the following stages:

Kernel-Level Interception• 1. The program initiates an API call by invoking the designated

export in a DLL (usually CoreDLL).

• 2. The DLL jumps to the corresponding kernel trap address. This step is omitted if the program performs the jump directly.

• 3. The kernel exception dispatcher extracts the APISet and method number, switches to the process belonging to the APISet, and jumps to the requested method by checking the method pointer table.

• 4. After the method has finished, it returns to the exception handler.

• 5. A context switch to the caller process takes place and execution continues.

Kernel-Level Interception• Internal Kernel Data Structures• Each APISet contains all its information in a CINFO

structure. This includes all the parameters that were passed to CreateAPISet, as well as the dispatch type.

• Currently, Windows CE distinguishes handle-based from implicit APISets, the former ones being direct system calls, while the latter ones are attached to handles. A handle-based API is given by its handle and the method identifier.

• In order to access each implicit APISet’s data, the kernel maintains an array that holds all CINFO structures.

• A pointer to this array can be found in the UserKInfo array, which is always located at the fixed offset 0xFFFFCB00 on the ARM architecture.


• Since even the kernel mode APISets are registered when the system boots, all the relevant pointers are contained in writable memory pages. Thus, they can simply be altered and redirected to different functions.

• On the other hand, for each handle, there exists a CINFO structure that is allocated when the handle is created, and deallocated when it is closed.

• For the purpose of completely intercepting system calls, the attached CINFO pointer must be changed after its creation.


• Preventing Kernel Mode• The sandbox wants to hide its presence from other

programs, so that investigated malware does not alter its behavior because of the sandbox. This is only effective if it is the only process besides system processes that has superior access to the operating system.

• Fortunately, there are only a limited number of ways of doing this. The separation between user mode and kernel mode is effective in Windows CE, so the only way to enter kernel mode is to use a system call. And all system calls are hooked by our solution, so we are always able to prevent a program from entering kernel mode, if all ways into kernel mode are intercepted.


• The simplest way to gain kernel mode privileges is to call the SetKMode. Apart from that, an application might also register its own APISet and perform a system call.

• As system calls are always executed in kernel mode, the application temporarily has full privileges. Both examples must be handled and the remaining approaches must be taken into account for a dependable solution.

Porting to Other Mobile Operating Systems

• It is an interesting question as to whether presented techniques for Windows Mobile can be used for other mobile operating systems as well. Unfortunately, the answer to this is “generally, no.” – The system architectures are very different from

Windows Mobile. Our approach is based on the fact that it is very easy for untrusted software to run as a kernel-mode process.

– Other operating systems are more restricted, so the support of the operating system manufacturer would be required to get a sufficient trust level for the sandbox program.

Porting to Other Mobile Operating Systems

• Examples of the more restricted operating systems are Symbian OS and the iPhone operating system.

• The upcoming Linux phones promise to be more accessible because of the open-source nature of their operating system.

• Examples are the Open Handset Alliance (Android), the LiMo foundation, and Openmoko. But the future still must determine which of these platforms will really be used and gain wide acceptance.

Notes on Interception Completeness

• Interception• The most important part is to see every system call. We

change the central pointer for the data structures to point to our own data structures, and there is no other way for a program to enter kernel mode when using system calls. However, there are several special cases to consider.

• A special case is our own KernelHookServiceDLL. It provides some services that are necessary for the system, but that are not intercepted.


• Interception• Handle-based system

calls load the kernel space addresses at the handle’s creation time. Therefore, it is necessary to change the addresses there so that these system calls do not circumvent our system.

• An example system call is CreateFile, where pointers to handle-based system calls (such as ReadFile, WriteFile) are maintained in an individual CINFO structure, which is connected to the handle object. Hence, one has to patch the handle right after it was created.


• Signature Recognition• The signatures of the system calls can be found in

the header files of the shared Windows CE source code that is distributed with the Platform Builder.

• These header files have a unique format that can be parsed by some scripts. The system calls are grouped into different APISets. These are documented as comments in the header files.

• The source code can be parsed with a tool like doxygen and the actual signatures can be assigned to the system call in its corresponding APISet.


• Signature Recognition• Some undocumented system calls are not present

in the shared source header files.• Typical examples are the GWES (graphics,

window, and event subsystem) API functions.

• All of these are intercepted, but it might happen that their signature is unknown. This case requires manual effort to locate the signature. This can be solved by using a debugger (like IDA Pro) and decompiling the library file.

4.2.2 Using MobileSandbox

• This section explains how the MobileSandbox tool can be used for dynamic malware analysis.

• It presents the two interfaces and shows the differences between analyzing within the device emulator and on a real device.

Using the Local Interface

• Connecting the Device• For Windows Mobile, we need an ActiveSync

connection. This connection is automatically set up when connecting a real Windows Mobile device via USB with the host computer.

• When using the device emulator, one has to set up a DMA connection with the Device Emulator Manager program.

• The host computer will prepare the malware sample for analysis using the Microsoft RAPI for performing file system operations or managing processes.

Using the Local Interface

• Choosing an Analysis Mode• Based on what we want to analyze, the analysis mode

must be chosen.– The manual mode lets the analyst choose the analysis

parameters himself. The device emulator and the ActiveSync connection must be set up manually. The analysis target can be chosen in the host program.

– The automatic mode uses command-line parameters to set all necessary parameters. It starts the ActiveSync connection and if needed the device emulator and the Device Emulator Manager. The analysis is started, and after an arbitrary time interval the analysis is terminated.

Using the Web Interface

• The Web interface simplifies usage of MobileSandbox even more by taking care of most parameters by itself. The main parameter is the sample to be analyzed.

• The automatic analysis mode will be chosen and the device connection will be set up automatically.

• This has many advantages for getting a quick analysis of an unknown sample without the need to know about the fields of reverse-engineering or malware analysis.

Analyzing within the Device Emulator

• As already said, MobileSandbox can use the device emulator or a real device. The device emulator has two main advantages, especially for the automatic environment that the public Web interface provides. – First, restoring the original state is simple after a sample has

been executed. It just needs restoring its directories on the host file system and restarting the emulator. This will effectively remove any changes that the malware might have made to the emulated operating system.

– Second, it is easily possible to execute the sample on a variety of different operating system versions. Since the device emulator is an official part of the software development kit, an emulator image is available for every operating syste

Analyzing within the Device Emulator

• However, the device emulator has one major drawback: It only has limited networking functionality for the messaging and phone APIs because it does not have a SIM card, and therefore no connection to the mobile network.m version of Windows Mobile.

• Another drawback of the emulator is the possibility that malware recognizes being run in an emulated environment and because of that it might not show its malicious behavior.

Analyzing on a Real Device

• One advantage of using a real device is its connectivity to the mobile network, so an analysis is not restricted by a nonfunctional network connection. But it is unclear if malicious code should be analyzed with a possible worldwide connectivity. So this is no real advantage over the device emulator.

• As another advantage, you can be sure you have running code because there might be differences between the device emulator and a real device.

Analyzing on a Real Device

• Real devices pose many challenges to be solved:• ■ Real devices are expensive and need care. They also need

to be managed, and so on. • ■ Reinitializing the device after an analysis is much more

complicated. The device emulator has the host system as an umbrella environment. But to reliably set a real device to a defined starting state, its firmware should be flashed.

• ■ When you plan the automatic environment of a public Web interface, you’ll need to answer the following question: How can the previous two points be automated reliably?

Reading an Analysis Report

• This section explains the format in which the reports are displayed by the Web interface in their most human-readable presentation. For automatic processing, the XML or text logs can be used.

• Figure 4.1 shows the header of an analysis. It displays some metadata of the analysis and most notably the result of an antivirus scan.

• The example shows that Avira AntiVir did recognize the sample as the Duts virus. Afterwards, the detailed system call log starts—in this case, with a message box


Figure 4.1 The Analysis Header


• Figure 4.2 shows some noteworthy parts of an analysis. – The first two are a system call sequence that shows an

interesting behavior of Windows Mobile software.

– System call ID #12 shows the log of a C library call to wcsncmp as part of the Process32Next call.

– This happens when programs are compiled using the Visual Studio compiler because it does not optimize these calls with inline code. So a malware analyst is lucky to get more information.


Figure 4.2 Examples of Logged System Calls


• Call IDs #13 and #15 show calls to delete a Registry key (RegDeleteKeyW ) and to display a message box (MessageBoxW).

• Even without deep knowledge of the Windows API, a malware analyst is able to understand what is going on there.

• Call ID #14 shows how pointers are dereferenced. The left part shows the value of the pointer, that means the address of the structure.

• The right part shows the content of the referenced data structure, revealing the useful information of this system call: what the message content of this call to SmsSendMessage was ( pbData).

Chapter 4 Analysis and Detection technology of Malicious code Malicious code defense in mobile...

Documents

Transcript of Chapter 4 Analysis and Detection technology of Malicious code Malicious code defense in mobile...