Binary Analysis for Grammar and Model Extraction: Techniques and ...
description
Transcript of Binary Analysis for Grammar and Model Extraction: Techniques and ...
![Page 1: Binary Analysis for Grammar and Model Extraction: Techniques and ...](https://reader033.fdocuments.in/reader033/viewer/2022061302/5491d3cab4795976558b46ee/html5/thumbnails/1.jpg)
Binary Analysis for Botnet Reverse Engineering & Defense
Dawn SongUC Berkeley
![Page 2: Binary Analysis for Grammar and Model Extraction: Techniques and ...](https://reader033.fdocuments.in/reader033/viewer/2022061302/5491d3cab4795976558b46ee/html5/thumbnails/2.jpg)
Binary Analysis Is Important for Botnet Defense
• Botnet programs: no source code, only binary• Botnet defense needs internal understanding of
botnet programs– C&C reverse engineering• Different possible commands, encryption/decryption
– Botnet traffic rewriting– Botnet infiltration– Botnet vulnerability discovery
![Page 3: Binary Analysis for Grammar and Model Extraction: Techniques and ...](https://reader033.fdocuments.in/reader033/viewer/2022061302/5491d3cab4795976558b46ee/html5/thumbnails/3.jpg)
BitBlaze Binary Analysis Infrastructure: Architecture
• The first infrastructure:– Novel fusion of static, dynamic, formal analysis methods
• Loop extended symbolic execution• Grammar-aware symbolic execution
– Whole system analysis (including OS kernel) – Analyzing packed/encrypted/obfuscated code
Vine:Static AnalysisComponent
TEMU:Dynamic AnalysisComponent
Rudder:Mixed ExecutionComponent
BitBlaze Binary Analysis Infrastructure
![Page 4: Binary Analysis for Grammar and Model Extraction: Techniques and ...](https://reader033.fdocuments.in/reader033/viewer/2022061302/5491d3cab4795976558b46ee/html5/thumbnails/4.jpg)
DissectingMalware
BitBlaze Binary Analysis Infrastructure
DetectingVulnerabilities
GeneratingFilters
BitBlaze: Security Solutions via Program Binary Analysis
Unified platform to accurately analyze security properties of binaries
Security evaluation & audit of third-party code
Defense against morphing threats
Faster & deeper analysis of malware
![Page 5: Binary Analysis for Grammar and Model Extraction: Techniques and ...](https://reader033.fdocuments.in/reader033/viewer/2022061302/5491d3cab4795976558b46ee/html5/thumbnails/5.jpg)
The BitBlaze Approach & Research Foci
Semantics based, focus on root cause:Automatically extracting security-related properties from binary code for effective vulnerability detection & defense
1. Build a unified binary analysis platform for security– Identify & cater common needs of different security applications– Leverage recent advances in program analysis, formal methods, binary
instrumentation/analysis techniques for new capabilities
2. Solve real-world security problems via binary analysis• Extracting security related models for vulnerability detection• Generating vulnerability signatures to filter out exploits• Dissecting malware for real-time diagnosis & offense: e.g., botnet
infiltration• More than a dozen security applications & publications
![Page 6: Binary Analysis for Grammar and Model Extraction: Techniques and ...](https://reader033.fdocuments.in/reader033/viewer/2022061302/5491d3cab4795976558b46ee/html5/thumbnails/6.jpg)
Plans
• Building on BitBlaze to develop new techniques• Automatic Reverse Engineering of C&C protocols
of botnets• Automatic rewriting of botnet traffic to facilitate
botnet infiltration• Vulnerability discovery of botnet
![Page 7: Binary Analysis for Grammar and Model Extraction: Techniques and ...](https://reader033.fdocuments.in/reader033/viewer/2022061302/5491d3cab4795976558b46ee/html5/thumbnails/7.jpg)
Preliminary Work
• Dispatcher: Enabling Active Botnet Infiltration using Automatic Protocol Reverse-Engineering
• Binary code extraction and interface identification for botnet traffic rewriting
• Botnet analysis for vulnerability discovery
![Page 8: Binary Analysis for Grammar and Model Extraction: Techniques and ...](https://reader033.fdocuments.in/reader033/viewer/2022061302/5491d3cab4795976558b46ee/html5/thumbnails/8.jpg)
Dispatcher: Enabling Active Botnet Infiltration using Automatic Protocol
Reverse-Engineering
Juan CaballeroPongsin Poosankam
Christian KreibichDawn Song
![Page 9: Binary Analysis for Grammar and Model Extraction: Techniques and ...](https://reader033.fdocuments.in/reader033/viewer/2022061302/5491d3cab4795976558b46ee/html5/thumbnails/9.jpg)
Automatic Protocol Reverse-Engineering
• Process of extracting the application-level protocol used by a program, without the specification– Automatic process– Many undocumented protocols (C&C, Skype, Yahoo)
• Encompasses extracting: 1. the Protocol Grammar2. the Protocol State Machine
• Message format extraction is prerequisite
![Page 10: Binary Analysis for Grammar and Model Extraction: Techniques and ...](https://reader033.fdocuments.in/reader033/viewer/2022061302/5491d3cab4795976558b46ee/html5/thumbnails/10.jpg)
Challenges for Active Botnet Infiltration
2. Access to one side of dialog only
1. Understand both sides of C&C protocol– Message structure– Field semantics
3. Handle encryption/obfuscation
• Goal: Rewrite C&C messages on either dialog side
![Page 11: Binary Analysis for Grammar and Model Extraction: Techniques and ...](https://reader033.fdocuments.in/reader033/viewer/2022061302/5491d3cab4795976558b46ee/html5/thumbnails/11.jpg)
Technical Contributions
1. Buffer deconstruction, a technique to extract the format of sent messages Earlier work only handles received messages
2. Field semantics inference techniques, for messages sent and received
3. Designing and developing Dispatcher4. Extending a technique to handle encryption5. Rewriting a botnet dialog using information
extracted by Dispatcher
![Page 12: Binary Analysis for Grammar and Model Extraction: Techniques and ...](https://reader033.fdocuments.in/reader033/viewer/2022061302/5491d3cab4795976558b46ee/html5/thumbnails/12.jpg)
Message Format Extraction• Extract format of a single message• Required by Grammar and State Machine extraction
GET / HTTP/1.1
HTTP/1.1 200 OK
[Polyglot]
[Dispatcher]
![Page 13: Binary Analysis for Grammar and Model Extraction: Techniques and ...](https://reader033.fdocuments.in/reader033/viewer/2022061302/5491d3cab4795976558b46ee/html5/thumbnails/13.jpg)
Message Field TreeField Range: [3:3]Field Boundary: FixedField Semantics: DelimiterField Keywords: <none>Target: Version
HTTP/1.1 200 OK\r\n\r\n
MSG[0:18]
Status Line[0:16]
Version[0:7]
Delimiter[8:8]
Status-Code[9:11]
Delimiter[12:12]
Reason[13:14]
Delimiter[15:16]
Delimiter[17:18]
Message format extraction has 2 steps: 1. Extract tree structure2. Extract field attributes
![Page 14: Binary Analysis for Grammar and Model Extraction: Techniques and ...](https://reader033.fdocuments.in/reader033/viewer/2022061302/5491d3cab4795976558b46ee/html5/thumbnails/14.jpg)
Sent vs. Received
• Both protocol directions from single binary• Different problems– Taint information harder to leverage– Focus on how message is constructed,
not processed• Different techniques needed: – Tree structure Buffer Deconstruction– Field attributes New heuristics
![Page 15: Binary Analysis for Grammar and Model Extraction: Techniques and ...](https://reader033.fdocuments.in/reader033/viewer/2022061302/5491d3cab4795976558b46ee/html5/thumbnails/15.jpg)
Outline
Introduction
Problem
TechniquesBuffer Deconstruction
Evaluation
Field Semantics Inference
Handling encryption
![Page 16: Binary Analysis for Grammar and Model Extraction: Techniques and ...](https://reader033.fdocuments.in/reader033/viewer/2022061302/5491d3cab4795976558b46ee/html5/thumbnails/16.jpg)
Buffer Deconstruction• Intuition– Programs keep fields in separate memory buffers– Combine those buffers to construct sent message
• Output buffer– Holds message when “send” function invoked – Or holds unencrypted message before encryption
• Recursive process– Decompose a buffer into buffers used to fill it– Starts with output buffer– Stops when there’s nothing to recurse
![Page 17: Binary Analysis for Grammar and Model Extraction: Techniques and ...](https://reader033.fdocuments.in/reader033/viewer/2022061302/5491d3cab4795976558b46ee/html5/thumbnails/17.jpg)
Buffer Deconstruction
• Message field tree = inverse of output buffer structure• Output is structure of message field tree– No field attributes, except range
Output Buffer (19)
A(17)
G(2)D(1) E(3) F(1)C(8) H(2) [0:18]
[0:16] [17:18]
[0:7] [8:8] [9:11] [12:12] [13:14] [15:16]
MSG
DelimiterStatus Line
ReasonStatusCode
DelimiterVersion
B(2)
Delimiter Delimiter
HTTP/1.1 200 OK\r\n\r\n
![Page 18: Binary Analysis for Grammar and Model Extraction: Techniques and ...](https://reader033.fdocuments.in/reader033/viewer/2022061302/5491d3cab4795976558b46ee/html5/thumbnails/18.jpg)
Field Attributes Inference
• Attributes capture extra information – E.g., inter-field relationships
Attribute Value
Field Range [StartOffset : EndOffset]
Field Boundary Fixed, Length, Delimiter
Field Semantics IP address, Timestamp, …
Field Keywords <list of keyworkds in field>
• Techniques identify– Keywords– Length fields– Delimiters– Variable-length field– Arrays
![Page 19: Binary Analysis for Grammar and Model Extraction: Techniques and ...](https://reader033.fdocuments.in/reader033/viewer/2022061302/5491d3cab4795976558b46ee/html5/thumbnails/19.jpg)
Field Semantics
Field SemanticsCookies Keyboard inputError codes KeywordsFile data LengthFile information PaddingFilenames PortsHash / Checksum Registry dataHostnames Sleep timersHost information Stored dataIP addresses Timestamps
• A field attribute in the message field tree• Captures the type of data in the field
• Programs contain much semantic info leverage it!
• Semantics in well-defined functions and instructions– Prototype
• Similar to type inference • Differs for received and sent
messages
![Page 20: Binary Analysis for Grammar and Model Extraction: Techniques and ...](https://reader033.fdocuments.in/reader033/viewer/2022061302/5491d3cab4795976558b46ee/html5/thumbnails/20.jpg)
Field Semantic Inference
GET /index.html HTTP/1.1
struct stat { … off_t st_size; /* total size in bytes */ …}
int stat(const char*path, struct stat *buf);OUT OUTIN
HTTP/1.1 200 OKContent-Length: 25
<html>Hello world!</html>
File path
File length
stat(“index.html”, &file_info);
![Page 21: Binary Analysis for Grammar and Model Extraction: Techniques and ...](https://reader033.fdocuments.in/reader033/viewer/2022061302/5491d3cab4795976558b46ee/html5/thumbnails/21.jpg)
Detecting Encoding Functions
• Encoding functions = (de)compression, (de)(en)cryption, (de)obfuscation…
• High ratio of arithmetic & bitwise instructions• Use read/write set to identify buffers• Work-in-progress on extracting and reusing
encoding functions
![Page 22: Binary Analysis for Grammar and Model Extraction: Techniques and ...](https://reader033.fdocuments.in/reader033/viewer/2022061302/5491d3cab4795976558b46ee/html5/thumbnails/22.jpg)
MegaD C&C protocoltype MegaD_Message = record { msg_len : uint16; encrypted_payload: bytestring &length = 8*msg_len;} &byteorder = bigendian;
type encrypted_payload = record { version : uint16; mtype : uint16; data : MegaD_data (mtype);};
type MegaD_data (msg_type: uint16) = case msg_type of { 0x00 -> m00 : msg_0; […] default -> unknown : bytestring &restofdata;};
• C&C on tcp/443 using proprietary encryption
• Use Dispatcher’s output to generate grammar– 15 different messages
seen (7 recv, 8 sent)– 11 field semantics
![Page 23: Binary Analysis for Grammar and Model Extraction: Techniques and ...](https://reader033.fdocuments.in/reader033/viewer/2022061302/5491d3cab4795976558b46ee/html5/thumbnails/23.jpg)
C&C Server
Cmd?EHLO
MegaD Dialog
Test
SMTP
Failed
SMTP Test Server
![Page 24: Binary Analysis for Grammar and Model Extraction: Techniques and ...](https://reader033.fdocuments.in/reader033/viewer/2022061302/5491d3cab4795976558b46ee/html5/thumbnails/24.jpg)
Template ServerC&C Server
EHLO
Cmd?Failed
MegaD Rewriting
Test
SMTPGet
Template
Template?
GrammarSuccess
SMTP Test Server
![Page 25: Binary Analysis for Grammar and Model Extraction: Techniques and ...](https://reader033.fdocuments.in/reader033/viewer/2022061302/5491d3cab4795976558b46ee/html5/thumbnails/25.jpg)
Summary
• Buffer deconstruction, a technique to extract the format of sent messages
• Field semantics inference techniques, for messages sent and received
• Designed and developed Dispatcher• Extended technique to handle encryption• Rewrote MegaD dialog using information
extracted by Dispatcher