Saumya Debray The University of Arizona Tucson, AZ 85721.

UNDERSTANDING SOFTWARE THAT DOESN’T WANT TO BE UNDERSTOOD REVERSE ENGINEERING OBFUSCATED BINARIES

Saumya DebrayThe University of ArizonaTucson, AZ 85721

The Problem

Rapid analysis and understanding of malware code essential for swift response to new threats‒ Malicious software are usually heavily

obfuscated against analysis Existing approaches to reverse

engineering such code are primitive‒ not a lot of high-level tool support‒ requires a lot of manual intervention‒ slow, cumbersome, potentially error-prone

Delays development of countermeasures

Develop automated techniques for analysis and reverse engineering of obfuscated binaries

semantics-based‒ output is functionally equivalent to, but simpler

than, the input program

generality‒ should work on any obfuscation

even ones we haven’t thought of yet!

‒ should minimize assumptions about obfuscations

Challenges

can’t make assumptions about obfuscations‒ what do we leverage for deobfuscation?‒ distinguishing code we care about from code we

don’t how do we know which instructions we care about?

scale‒ “needle in haystack”

no. of instructions executed increases by 270x (VMprotect) to 4300x (Themida) [Lau 2008]

anti-analysis defenses‒ runtime unpacking‒ anti-emulation, anti-debug checks

Our Approach

no obfuscation-specific assumptions‒ treat programs as input-to-output transformations‒ use semantics-preserving transformations to

simplify execution traces dynamic analysis to handle runtime

unpacking

Taint analysis

(bit-level)

Control flow reconstructi

Semantics-preserving

transformations

map flow of valuesfrom input to output

simplify logic ofinput-to-outputtransformation

reconstruct logic ofsimplified computation

Ex 1:Emulation-based Obfuscation

examination of the code reveals only the emulator’s logic‒ actual program logic embedded in byte code

lots of “chaff” during execution‒ separating emulator logic from payload logic

tricky

emulators can be nested

Obfuscatorinput program

random seed

bytecode logic (data)

emulator (code)

mutation engine

Ex 2:Return-Oriented Programs (ROP)

Originally designed to bypass anti-code-injection defenses‒ stitches together existing code fragments

( “gadgets” ), e.g., in system libraries Logic can be difficult to discern

‒ gadgets are typically scattered across many different functions and/or libraries

‒ gadgets can overlap in memory in weird ways‒ control flow structures (if-else, loops, function

calls) are typically implemented using non-standard idioms

Example 1 (emulation-obfuscation)

factorial (Themida)

Example 2 (ROP)

original ROP

factorial

Interactions between ObfuscationsExample: Unpacking + Emulation

unpack

output

instructions “tainted” as propagating values from input to output

execution traceinput-to-output computation(further simplified)

ol flow

Results

Ex. 1. binary search : Themida

original obfuscated (cropped) deobfuscated

Results

Ex. 2. Hunatcha (drive infection code) : ExeCryptor

original obfuscated (cropped) deobfuscated

Results

Ex. 3. fibonacci: ROP

original obfuscated deobfuscated

Results

Ex. 4. Win32/Kryptik.OHY: Code Virtualizer

obfuscated deobfuscated

multiple layers of runtime code generationunpacking

initial unpacker is emulation-obfuscated

the CFG shown materializes incrementally

Results: CFG Similarity

OBFUSCATEDDEOBFUSCATED

Programs

Lessons and Issues

Static vs. dynamic analysis‒ multiple layers of runtime code

generation/unpacking limits utility of static analysis

‒ dynamic analysis can run into problems of scale O(n2) algorithms impractical ; even O(n log n) can be

problematic trade memory space for execution time/complexity code coverage — multi-path exploration?

Taint propagation‒ byte/word-level analyses may not be precise

enough we use (enhanced) bit-level taint propagation

Simplified trace → CFG: NP-hard‒ semantic considerations?

Conclusions

Rapid analysis and understanding of malware code essential for swift response to new threats‒ need to deal with advanced code obfuscations‒ obfuscation-specific solutions tend to be fragile

We describe a semantics-based framework for automatic code deobfuscation ‒ no assumptions about the obfuscation(s) used‒ promising results on obfuscators (e.g.,

Themida) not handled by prior research

ADDITIONAL MATERIAL

Semantics-based simplification

Quasi-invariant locations: locations that have the same value at each use.

Our transformations (currently):‒ Arithmetic simplification

adaptation of constant folding to execution traces consider quasi-invariant locations as constants controlled to avoid over-simplification

‒ Data movement simplification use pattern-driven rules to identify and simplify data

movement.

‒ Dead code elimination need to consider implicit destinations, e.g., condition code

flags.

Saumya Debray The University of Arizona Tucson, AZ 85721.

Documents

Transcript of Saumya Debray The University of Arizona Tucson, AZ 85721.

Saumya Debray ARIZONA UNIV BOARD OF REGENTS … · AFRL-OSR-VA-TR-2015-0109 TOOLS FOR RAPID UNDERSTANDING OF MALWARE CODE Saumya Debray ARIZONA UNIV BOARD OF REGENTS TUCSON Final

Intermediate Representations Saumya Debray Dept. of Computer Science The University of Arizona Tucson, AZ 85721.

CSc 352 Performance Tuning Saumya Debray Dept. of Computer Science The University of Arizona, Tucson debray@cs.arizona.edu.

Bhaumik Goda / Saumya Sheth

A brief yacc tutorial Saumya Debray The University of Arizona Tucson, AZ 85721.

Tkr by dr. saumya agarwal

CSc 352 Debugging Tools Saumya Debray Dept. of Computer Science The University of Arizona, Tucson

CSc 453 Syntax Analysis (Parsing) Saumya Debray The University of Arizona Tucson.

CSc 453 Intermediate Code Generation Saumya Debray The University of Arizona Tucson.

CSc 352 Shell Scripts Saumya Debray Dept. of Computer Science The University of Arizona, Tucson debray@cs.arizona.edu.

Binary Obfuscation Using Signals Igor V. Popov ( University of Arizona) Saumya K. Debray (University of Arizona) Gregory R. Andrews (University of Arizona)

Kjell Erik PostThe chapter on mutual exclusion has bene ted from discussion with Saumya Debray, Peter Van Roy, and Ola Petersson. I am also indebted to the reviewers on the ILPS’93

CSc 453 Interpreters & Interpretation Saumya Debray The University of Arizona Tucson.

Saumya Debray ARIZONA UNIV BOARD OF REGENTS …Debray, Saumya K PROJECT NUMBER n/a 5e. TASK NUMBER n/a 5f. WORK UNIT NUMBER n/a 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) Department

Saumya Orthocare - Saumya: Center for Advanced Surgeries ... · Saumya Orthocare, team does its best to save the knee joint using various options available ranging from Non-operative

CSc 352 An Introduction to the C Preprocessor Saumya Debray Dept. of Computer Science The University of Arizona, Tucson debray@cs.arizona.edu.

Saumya Kharbanda_Portfolio

Saumya Debray The University of Arizona Tucson, AZ 85721

CSc 352: Basic Unix Saumya Debray Dept. of Computer Science The University of Arizona, Tucson debray@cs.arizona.edu.

A Generic Approach to Automatic Deobfuscation of Executable Code Paper by Babak Yadegari, Brian Johannesmeyer, Benjamin Whitely, Saumya Debray.