Binary Analysis for Botnet Reverse Engineering & Defense

25
Binary Analysis for Botnet Reverse Engineering & Defense Dawn Song UC Berkeley

description

Binary Analysis for Botnet Reverse Engineering & Defense. Dawn Song UC Berkeley. Binary Analysis Is Important for Botnet Defense. Botnet programs: no source code, only binary Botnet defense needs internal understanding of botnet programs C&C reverse engineering - PowerPoint PPT Presentation

Transcript of Binary Analysis for Botnet Reverse Engineering & Defense

Page 1: Binary Analysis for  Botnet  Reverse Engineering & Defense

Binary Analysis for Botnet Reverse Engineering & Defense

Dawn SongUC Berkeley

Page 2: Binary Analysis for  Botnet  Reverse Engineering & Defense

Binary Analysis Is Important for Botnet Defense

• Botnet programs: no source code, only binary• Botnet defense needs internal understanding of

botnet programs– C&C reverse engineering• Different possible commands, encryption/decryption

– Botnet traffic rewriting– Botnet infiltration– Botnet vulnerability discovery

Page 3: Binary Analysis for  Botnet  Reverse Engineering & Defense

BitBlaze Binary Analysis Infrastructure: Architecture

• The first infrastructure:– Novel fusion of static, dynamic, formal analysis methods

• Loop extended symbolic execution• Grammar-aware symbolic execution

– Whole system analysis (including OS kernel) – Analyzing packed/encrypted/obfuscated code

Vine:Static AnalysisComponent

TEMU:Dynamic AnalysisComponent

Rudder:Mixed ExecutionComponent

BitBlaze Binary Analysis Infrastructure

Page 4: Binary Analysis for  Botnet  Reverse Engineering & Defense

DissectingMalware

BitBlaze Binary Analysis Infrastructure

DetectingVulnerabilities

GeneratingFilters

BitBlaze: Security Solutions via Program Binary Analysis

Unified platform to accurately analyze security properties of binaries

Security evaluation & audit of third-party code

Defense against morphing threats

Faster & deeper analysis of malware

Page 5: Binary Analysis for  Botnet  Reverse Engineering & Defense

The BitBlaze Approach & Research Foci

Semantics based, focus on root cause:Automatically extracting security-related properties from binary code for effective vulnerability detection & defense

1. Build a unified binary analysis platform for security– Identify & cater common needs of different security applications– Leverage recent advances in program analysis, formal methods, binary

instrumentation/analysis techniques for new capabilities

2. Solve real-world security problems via binary analysis• Extracting security related models for vulnerability detection• Generating vulnerability signatures to filter out exploits• Dissecting malware for real-time diagnosis & offense: e.g., botnet

infiltration• More than a dozen security applications & publications

Page 6: Binary Analysis for  Botnet  Reverse Engineering & Defense

Plans

• Building on BitBlaze to develop new techniques• Automatic Reverse Engineering of C&C protocols

of botnets• Automatic rewriting of botnet traffic to facilitate

botnet infiltration• Vulnerability discovery of botnet

Page 7: Binary Analysis for  Botnet  Reverse Engineering & Defense

Preliminary Work

• Dispatcher: Enabling Active Botnet Infiltration using Automatic Protocol Reverse-Engineering

• Binary code extraction and interface identification for botnet traffic rewriting

• Botnet analysis for vulnerability discovery

Page 8: Binary Analysis for  Botnet  Reverse Engineering & Defense

Dispatcher: Enabling Active Botnet Infiltration using Automatic Protocol

Reverse-Engineering

Juan CaballeroPongsin Poosankam

Christian KreibichDawn Song

Page 9: Binary Analysis for  Botnet  Reverse Engineering & Defense

Automatic Protocol Reverse-Engineering

• Process of extracting the application-level protocol used by a program, without the specification– Automatic process– Many undocumented protocols (C&C, Skype, Yahoo)

• Encompasses extracting: 1. the Protocol Grammar2. the Protocol State Machine

• Message format extraction is prerequisite

Page 10: Binary Analysis for  Botnet  Reverse Engineering & Defense

Challenges for Active Botnet Infiltration

2. Access to one side of dialog only

1. Understand both sides of C&C protocol– Message structure– Field semantics

3. Handle encryption/obfuscation

• Goal: Rewrite C&C messages on either dialog side

Page 11: Binary Analysis for  Botnet  Reverse Engineering & Defense

Technical Contributions

1. Buffer deconstruction, a technique to extract the format of sent messages Earlier work only handles received messages

2. Field semantics inference techniques, for messages sent and received

3. Designing and developing Dispatcher4. Extending a technique to handle encryption5. Rewriting a botnet dialog using information

extracted by Dispatcher

Page 12: Binary Analysis for  Botnet  Reverse Engineering & Defense

Message Format Extraction• Extract format of a single message• Required by Grammar and State Machine extraction

GET / HTTP/1.1HTTP/1.1 200 OK

[Polyglot]

[Dispatcher]

Page 13: Binary Analysis for  Botnet  Reverse Engineering & Defense

Message Field TreeField Range: [3:3]Field Boundary: FixedField Semantics: DelimiterField Keywords: <none>Target: Version

HTTP/1.1 200 OK\r\n\r\n

MSG[0:18]

Status Line[0:16]

Version[0:7]

Delimiter[8:8]

Status-Code[9:11]

Delimiter[12:12]

Reason[13:14]

Delimiter[15:16]

Delimiter[17:18]

Message format extraction has 2 steps: 1. Extract tree structure2. Extract field attributes

Page 14: Binary Analysis for  Botnet  Reverse Engineering & Defense

Sent vs. Received

• Both protocol directions from single binary• Different problems– Taint information harder to leverage– Focus on how message is constructed,

not processed• Different techniques needed: – Tree structure Buffer Deconstruction– Field attributes New heuristics

Page 15: Binary Analysis for  Botnet  Reverse Engineering & Defense

Outline

Introduction

Problem

TechniquesBuffer Deconstruction

Evaluation

Field Semantics Inference

Handling encryption

Page 16: Binary Analysis for  Botnet  Reverse Engineering & Defense

Buffer Deconstruction• Intuition– Programs keep fields in separate memory buffers– Combine those buffers to construct sent message

• Output buffer– Holds message when “send” function invoked – Or holds unencrypted message before encryption

• Recursive process– Decompose a buffer into buffers used to fill it– Starts with output buffer– Stops when there’s nothing to recurse

Page 17: Binary Analysis for  Botnet  Reverse Engineering & Defense

Buffer Deconstruction

• Message field tree = inverse of output buffer structure• Output is structure of message field tree– No field attributes, except range

Output Buffer (19)

A(17)

G(2)D(1) E(3) F(1)C(8) H(2) [0:18]

[0:16] [17:18]

[0:7] [8:8] [9:11] [12:12] [13:14] [15:16]

MSG

DelimiterStatus Line

ReasonStatusCode

DelimiterVersion

B(2)

Delimiter Delimiter

HTTP/1.1 200 OK\r\n\r\n

Page 18: Binary Analysis for  Botnet  Reverse Engineering & Defense

Field Attributes Inference

• Attributes capture extra information – E.g., inter-field relationships

Attribute Value

Field Range [StartOffset : EndOffset]

Field Boundary Fixed, Length, Delimiter

Field Semantics IP address, Timestamp, …

Field Keywords <list of keyworkds in field>

• Techniques identify– Keywords– Length fields– Delimiters– Variable-length field– Arrays

Page 19: Binary Analysis for  Botnet  Reverse Engineering & Defense

Field Semantics

Field SemanticsCookies Keyboard inputError codes KeywordsFile data LengthFile information PaddingFilenames PortsHash / Checksum Registry dataHostnames Sleep timersHost information Stored dataIP addresses Timestamps

• A field attribute in the message field tree• Captures the type of data in the field

• Programs contain much semantic info leverage it!

• Semantics in well-defined functions and instructions– Prototype

• Similar to type inference • Differs for received and sent

messages

Page 20: Binary Analysis for  Botnet  Reverse Engineering & Defense

Field Semantic Inference

GET /index.html HTTP/1.1

struct stat { … off_t st_size; /* total size in bytes */ …}

int stat(const char*path, struct stat *buf);OUT OUTIN

HTTP/1.1 200 OKContent-Length: 25

<html>Hello world!</html>

File path

File length

stat(“index.html”, &file_info);

Page 21: Binary Analysis for  Botnet  Reverse Engineering & Defense

Detecting Encoding Functions

• Encoding functions = (de)compression, (de)(en)cryption, (de)obfuscation…

• High ratio of arithmetic & bitwise instructions• Use read/write set to identify buffers• Work-in-progress on extracting and reusing

encoding functions

Page 22: Binary Analysis for  Botnet  Reverse Engineering & Defense

MegaD C&C protocoltype MegaD_Message = record { msg_len : uint16; encrypted_payload: bytestring &length = 8*msg_len;} &byteorder = bigendian;

type encrypted_payload = record { version : uint16; mtype : uint16; data : MegaD_data (mtype);};

type MegaD_data (msg_type: uint16) = case msg_type of { 0x00 -> m00 : msg_0; […] default -> unknown : bytestring &restofdata;};

• C&C on tcp/443 using proprietary encryption

• Use Dispatcher’s output to generate grammar– 15 different messages

seen (7 recv, 8 sent)– 11 field semantics

Page 23: Binary Analysis for  Botnet  Reverse Engineering & Defense

C&C Server

Cmd?EHLO

MegaD Dialog

Test

SMTP

Failed

SMTP Test Server

Page 24: Binary Analysis for  Botnet  Reverse Engineering & Defense

Template ServerC&C Server

EHLOCmd?Failed

MegaD Rewriting

Test

SMTPGet

Template

Template?

GrammarSuccess

SMTP Test Server

Page 25: Binary Analysis for  Botnet  Reverse Engineering & Defense

Summary

• Buffer deconstruction, a technique to extract the format of sent messages

• Field semantics inference techniques, for messages sent and received

• Designed and developed Dispatcher• Extended technique to handle encryption• Rewrote MegaD dialog using information

extracted by Dispatcher