Binary Analysis for Botnet Reverse Engineering & Defense

Post on 25-Feb-2016

51 views 4 download

Tags:

description

Binary Analysis for Botnet Reverse Engineering & Defense. Dawn Song UC Berkeley. Binary Analysis Is Important for Botnet Defense. Botnet programs: no source code, only binary Botnet defense needs internal understanding of botnet programs C&C reverse engineering - PowerPoint PPT Presentation

Transcript of Binary Analysis for Botnet Reverse Engineering & Defense

Binary Analysis for Botnet Reverse Engineering & Defense

Dawn SongUC Berkeley

Binary Analysis Is Important for Botnet Defense

• Botnet programs: no source code, only binary• Botnet defense needs internal understanding of

botnet programs– C&C reverse engineering• Different possible commands, encryption/decryption

– Botnet traffic rewriting– Botnet infiltration– Botnet vulnerability discovery

BitBlaze Binary Analysis Infrastructure: Architecture

• The first infrastructure:– Novel fusion of static, dynamic, formal analysis methods

• Loop extended symbolic execution• Grammar-aware symbolic execution

– Whole system analysis (including OS kernel) – Analyzing packed/encrypted/obfuscated code

Vine:Static AnalysisComponent

TEMU:Dynamic AnalysisComponent

Rudder:Mixed ExecutionComponent

BitBlaze Binary Analysis Infrastructure

DissectingMalware

BitBlaze Binary Analysis Infrastructure

DetectingVulnerabilities

GeneratingFilters

BitBlaze: Security Solutions via Program Binary Analysis

Unified platform to accurately analyze security properties of binaries

Security evaluation & audit of third-party code

Defense against morphing threats

Faster & deeper analysis of malware

The BitBlaze Approach & Research Foci

Semantics based, focus on root cause:Automatically extracting security-related properties from binary code for effective vulnerability detection & defense

1. Build a unified binary analysis platform for security– Identify & cater common needs of different security applications– Leverage recent advances in program analysis, formal methods, binary

instrumentation/analysis techniques for new capabilities

2. Solve real-world security problems via binary analysis• Extracting security related models for vulnerability detection• Generating vulnerability signatures to filter out exploits• Dissecting malware for real-time diagnosis & offense: e.g., botnet

infiltration• More than a dozen security applications & publications

Plans

• Building on BitBlaze to develop new techniques• Automatic Reverse Engineering of C&C protocols

of botnets• Automatic rewriting of botnet traffic to facilitate

botnet infiltration• Vulnerability discovery of botnet

Preliminary Work

• Dispatcher: Enabling Active Botnet Infiltration using Automatic Protocol Reverse-Engineering

• Binary code extraction and interface identification for botnet traffic rewriting

• Botnet analysis for vulnerability discovery

Dispatcher: Enabling Active Botnet Infiltration using Automatic Protocol

Reverse-Engineering

Juan CaballeroPongsin Poosankam

Christian KreibichDawn Song

Automatic Protocol Reverse-Engineering

• Process of extracting the application-level protocol used by a program, without the specification– Automatic process– Many undocumented protocols (C&C, Skype, Yahoo)

• Encompasses extracting: 1. the Protocol Grammar2. the Protocol State Machine

• Message format extraction is prerequisite

Challenges for Active Botnet Infiltration

2. Access to one side of dialog only

1. Understand both sides of C&C protocol– Message structure– Field semantics

3. Handle encryption/obfuscation

• Goal: Rewrite C&C messages on either dialog side

Technical Contributions

1. Buffer deconstruction, a technique to extract the format of sent messages Earlier work only handles received messages

2. Field semantics inference techniques, for messages sent and received

3. Designing and developing Dispatcher4. Extending a technique to handle encryption5. Rewriting a botnet dialog using information

extracted by Dispatcher

Message Format Extraction• Extract format of a single message• Required by Grammar and State Machine extraction

GET / HTTP/1.1HTTP/1.1 200 OK

[Polyglot]

[Dispatcher]

Message Field TreeField Range: [3:3]Field Boundary: FixedField Semantics: DelimiterField Keywords: <none>Target: Version

HTTP/1.1 200 OK\r\n\r\n

MSG[0:18]

Status Line[0:16]

Version[0:7]

Delimiter[8:8]

Status-Code[9:11]

Delimiter[12:12]

Reason[13:14]

Delimiter[15:16]

Delimiter[17:18]

Message format extraction has 2 steps: 1. Extract tree structure2. Extract field attributes

Sent vs. Received

• Both protocol directions from single binary• Different problems– Taint information harder to leverage– Focus on how message is constructed,

not processed• Different techniques needed: – Tree structure Buffer Deconstruction– Field attributes New heuristics

Outline

Introduction

Problem

TechniquesBuffer Deconstruction

Evaluation

Field Semantics Inference

Handling encryption

Buffer Deconstruction• Intuition– Programs keep fields in separate memory buffers– Combine those buffers to construct sent message

• Output buffer– Holds message when “send” function invoked – Or holds unencrypted message before encryption

• Recursive process– Decompose a buffer into buffers used to fill it– Starts with output buffer– Stops when there’s nothing to recurse

Buffer Deconstruction

• Message field tree = inverse of output buffer structure• Output is structure of message field tree– No field attributes, except range

Output Buffer (19)

A(17)

G(2)D(1) E(3) F(1)C(8) H(2) [0:18]

[0:16] [17:18]

[0:7] [8:8] [9:11] [12:12] [13:14] [15:16]

MSG

DelimiterStatus Line

ReasonStatusCode

DelimiterVersion

B(2)

Delimiter Delimiter

HTTP/1.1 200 OK\r\n\r\n

Field Attributes Inference

• Attributes capture extra information – E.g., inter-field relationships

Attribute Value

Field Range [StartOffset : EndOffset]

Field Boundary Fixed, Length, Delimiter

Field Semantics IP address, Timestamp, …

Field Keywords <list of keyworkds in field>

• Techniques identify– Keywords– Length fields– Delimiters– Variable-length field– Arrays

Field Semantics

Field SemanticsCookies Keyboard inputError codes KeywordsFile data LengthFile information PaddingFilenames PortsHash / Checksum Registry dataHostnames Sleep timersHost information Stored dataIP addresses Timestamps

• A field attribute in the message field tree• Captures the type of data in the field

• Programs contain much semantic info leverage it!

• Semantics in well-defined functions and instructions– Prototype

• Similar to type inference • Differs for received and sent

messages

Field Semantic Inference

GET /index.html HTTP/1.1

struct stat { … off_t st_size; /* total size in bytes */ …}

int stat(const char*path, struct stat *buf);OUT OUTIN

HTTP/1.1 200 OKContent-Length: 25

<html>Hello world!</html>

File path

File length

stat(“index.html”, &file_info);

Detecting Encoding Functions

• Encoding functions = (de)compression, (de)(en)cryption, (de)obfuscation…

• High ratio of arithmetic & bitwise instructions• Use read/write set to identify buffers• Work-in-progress on extracting and reusing

encoding functions

MegaD C&C protocoltype MegaD_Message = record { msg_len : uint16; encrypted_payload: bytestring &length = 8*msg_len;} &byteorder = bigendian;

type encrypted_payload = record { version : uint16; mtype : uint16; data : MegaD_data (mtype);};

type MegaD_data (msg_type: uint16) = case msg_type of { 0x00 -> m00 : msg_0; […] default -> unknown : bytestring &restofdata;};

• C&C on tcp/443 using proprietary encryption

• Use Dispatcher’s output to generate grammar– 15 different messages

seen (7 recv, 8 sent)– 11 field semantics

C&C Server

Cmd?EHLO

MegaD Dialog

Test

SMTP

Failed

SMTP Test Server

Template ServerC&C Server

EHLOCmd?Failed

MegaD Rewriting

Test

SMTPGet

Template

Template?

GrammarSuccess

SMTP Test Server

Summary

• Buffer deconstruction, a technique to extract the format of sent messages

• Field semantics inference techniques, for messages sent and received

• Designed and developed Dispatcher• Extended technique to handle encryption• Rewrote MegaD dialog using information

extracted by Dispatcher