Reversing Data Formats: What Data Can Reveal

46
Reversing Data Formats: What Data Can Reveal by Anton Dorfman ZeroNights 0x03, Moscow, 2013

description

Reversing Data Formats: What Data Can Reveal. by Anton Dorfman ZeroNights 0x03, Moscow, 2013. About me. Fan of & Fun with Assembly language Researcher Scientist Teach Reverse Engineering since 2001 Candidate of technical science. Observed topics. Samples Basics Thoughts Fields - PowerPoint PPT Presentation

Transcript of Reversing Data Formats: What Data Can Reveal

Page 1: Reversing Data Formats: What Data Can Reveal

Reversing Data Formats:What Data Can Reveal

by Anton Dorfman ZeroNights 0x03, Moscow, 2013

Page 2: Reversing Data Formats: What Data Can Reveal

About me

• Fan of & Fun with Assembly language• Researcher• Scientist• Teach Reverse Engineering since 2001• Candidate of technical science

Page 3: Reversing Data Formats: What Data Can Reveal

Observed topics

• Samples• Basics• Thoughts• Fields• Related works• Applications

Page 4: Reversing Data Formats: What Data Can Reveal

Samples

Page 5: Reversing Data Formats: What Data Can Reveal

What is it?

Page 6: Reversing Data Formats: What Data Can Reveal

What is it? Hint 1

Page 7: Reversing Data Formats: What Data Can Reveal

What is it? Hint 2

Page 8: Reversing Data Formats: What Data Can Reveal

What is it?

Page 9: Reversing Data Formats: What Data Can Reveal

What is it?

Page 10: Reversing Data Formats: What Data Can Reveal

What is it? Hint 1

Page 11: Reversing Data Formats: What Data Can Reveal

What is it? Hint 2

Page 12: Reversing Data Formats: What Data Can Reveal

Basics

Page 13: Reversing Data Formats: What Data Can Reveal

Standard way

• Hex editor• Researcher• Brain (equally important as the Hex editor)• Basic knowledge how data can be organized

(in brain)• Analysis of the executable file that

manipulates with data format

Page 14: Reversing Data Formats: What Data Can Reveal

3 ways of researching

• Dynamic analysis – use reverse engineering of application that manipulate with data format

• Static analysis – try to extract header, structures, fields and try to find relationship between data

• Statistic analysis – when have some amount of samples and use statistics of changes and ranges of values

Page 15: Reversing Data Formats: What Data Can Reveal

Breaking the rules

• There is the rule RTFM (Read The F**king Manual)

• Nobody likes it• I’m not exception

• First of all I start my research, and second – try to find related works and generalize ideas from them

Page 16: Reversing Data Formats: What Data Can Reveal

Definitions

• Field – some value used to describe Data Format

• Structure – way for organizing various fields• Data – information for representing which

Data Format is developed• Header – common structure before data, may

contain substructures

Page 17: Reversing Data Formats: What Data Can Reveal

Thoughts

Page 18: Reversing Data Formats: What Data Can Reveal

Main Idea

• Data format is developed by human (not pets or aliens)

• Sometimes looks like that no human works on it…• Format developers use common data organization

concepts and similar thoughts when creating new data formats

• If we find regularities in data format organization rules we can automate searching of them

Page 19: Reversing Data Formats: What Data Can Reveal

Three types of data format

• File formats – multimedia formats, database formats, internal formats for exchanging between program components and etc.

• Protocols – network protocols, hardware device interaction protocols, protocols of interaction between driver and user space application and etc.

• Structures in memory – OS structures, application structures and etc.

Page 20: Reversing Data Formats: What Data Can Reveal

Levels of abstraction

• Bit Order• Byte Order• Fields Size• Field Basic Type• Field Type• Structure• Field Semantics

Page 21: Reversing Data Formats: What Data Can Reveal

Tasks to solve

• Extract header, separate it from data• Find field boundaries• Find structures and substructures• Find types of fields• Detect bit and byte ordering• Determine semantics of fields• Interpret goal of field – it’s semantics

Page 22: Reversing Data Formats: What Data Can Reveal

Bit order

• There are most significant bit – MSB and least significant bit - LSB

• MSB is a left-most bit – commonly used – value is 10010101b – 95h - 149

• LSB is a left-most bit - value is 10101001b – A9h - 169

Page 23: Reversing Data Formats: What Data Can Reveal

Byte order

• Little-endian – Intel byte order• Big-endian – network byte order

• Middle-endian or mixed-endian – mixed byte order – when length of value more then default processor machine word

Page 24: Reversing Data Formats: What Data Can Reveal

Fields

Page 25: Reversing Data Formats: What Data Can Reveal

Types of fields

• Service fields – for describing Data Format (size of structure and etc.)

• Common fields – “fields from life” (time, date and etc.)

• Specific fields – we can find range for that type (bit flags and etc.)

• Application specific – can be interpret only by application, we should use dynamic analysis

Page 26: Reversing Data Formats: What Data Can Reveal

Field Size and Levels of field interpretation

• Commonly field is a byte sequence• Sometimes field is a bit sequence

• Fixed size in bytes (1, 2, 3, 4, …)• Fixed size in bits (1, 2, 3, 4, …)• Variable size in bytes• Variable size in bits

Page 27: Reversing Data Formats: What Data Can Reveal

Basic field types

• Hex value – by default• Decimal value• Character value (up to 4 symbols)• String (ASCII or Unicode)• Float value• Bits value

Page 28: Reversing Data Formats: What Data Can Reveal

Types of field

• Text• Offset• Size• Quantity• ID• Flag• Counter• DateTime• Protect

Page 29: Reversing Data Formats: What Data Can Reveal

Text

• Usually fixed size, remaining space after text filled with 0

• Variable sized text ended with 0, or there are size of this text field

• Usually text string contain their meanings

Page 30: Reversing Data Formats: What Data Can Reveal

Offset

• Offset to another field• Offset to data• Pointer to another structure (may be child or

parent)• Pointer to another instance of such structure

(next or previous in linked list)

• Can be relative and absolute

Page 31: Reversing Data Formats: What Data Can Reveal

Export in PE-EXE

IMAGE_DOS_HEADER

+03Che_lfanew DWORD

...

...

Module ImageBase

+078h

IMAGE_NT_HEADERS

OptionalHeaderIMAGE_OPTIONAL_HEADER

Signature DWORD

IMAGE_DIRECTORY_ENTRY_EXPORT

FileHeader IMAGE_FILE_HEADER

...

+024h

IMAGE_EXPORT_DIRECTORY

AddressOfNames

...

AddressOfNameOrdinals

AddressOfFunctions

...

+020h

+01Ch

Names (Offsets)

Offset FuncName X+1

Offset FuncName X

Offset FuncName X+2

Name Ordinals

Index FuncAddr X+1

Index FuncAddr X

...

Functions Addresses

FuncAddr X+1

...

...

Function Names

...

‘LoadLibraryA’,0

...

...

Index FuncAddr X+2

...

‘LoadLibraryExA’,0

FuncAddr X

FuncAddr X+2

+ImageBase +ImageBase

...

+ImageBase

+ImageBase

+ImageBase

‘LoadLibraryExW’,0LoadLibraryExA

+ImageBase

LoadLibraryExA EP:

Index Index=(Elem size - dword)

(Elem size - word)

(Elem size - dword)

mov edi,edipush ebp...

+ImageBase

Find

Page 32: Reversing Data Formats: What Data Can Reveal

Size

• Size of data• Size of structure• Size of substructure• Size of field

• Can be not only in bytes, but also in bits, words (2 byte), double words, paragraphs (16 byte) and etc.

Page 33: Reversing Data Formats: What Data Can Reveal

Quantity

• Quantity of substructures• Quantity of elements

• IMAGE_NT_HEADERS.IMAGE_FILE_HEADER. NumberOfSections

• IMAGE_EXPORT_DIRECTORY. NumberOfFunctions

Page 34: Reversing Data Formats: What Data Can Reveal

ID

• Identifier of data format, often called magic number

• ID of structure• ID of substructure

• Often ID is value consist of char symbols• Often ID looks like “magic” - BE BA FE CA

Page 35: Reversing Data Formats: What Data Can Reveal

Flag

• Bit Flags• Enum values as a flags• Special case – Bool value

Page 36: Reversing Data Formats: What Data Can Reveal

Counter

• Increment (possibly decrement)• Starts with 0 / begins with another value• Changes by 1 or another value

Page 37: Reversing Data Formats: What Data Can Reveal

DateTime

• Storing format• Resolution• Moment of beginning• Range

Page 38: Reversing Data Formats: What Data Can Reveal

Protect

• CRC value of data• CRC value of substructure• Various Hash functions and etc.

Page 39: Reversing Data Formats: What Data Can Reveal

Related works

Page 40: Reversing Data Formats: What Data Can Reveal

Projects, articles and authors• Protocol Informatics Project – based on “Network Protocol Analysis

using Bioinformatics Algorithms” by Marshall A. Beddoe• Discoverer - “Discoverer: Automatic Protocol Reverse Engineering

from Network Traces” by Weidong Cui, Jayanthkumar Kannan, Helen J. Wang

• Laika – “Digging For Data Structures” by Anthony Cozzie, Frank Stratton, Hui Xue, and Samuel T. King

• RolePlayer – “Protocol-Independent Adaptive Replay of Application Dialog” by Weidong Cuiy, Vern Paxsonz, Nicholas C. Weaverz, Randy H. Katzy

• Tupni – “Tupni: Automatic Reverse Engineering of Input Formats” by Weidong Cui, Marcus Peinado, Karl Chen, Helen J. Wang, Luiz Irun-Briz

Page 41: Reversing Data Formats: What Data Can Reveal

Protocol Informatics Project• Global and local sequence alignment -

Needleman Wunsch and Smith Waterman algorithms

Page 42: Reversing Data Formats: What Data Can Reveal

Discoverer

Page 43: Reversing Data Formats: What Data Can Reveal

Laika

• Using Bayesian unsupervised learning• For fixed size structures only

Page 44: Reversing Data Formats: What Data Can Reveal

Tupni

• Based on taint tracking engine

Page 45: Reversing Data Formats: What Data Can Reveal

Applications

• RE any program• RE undocumented/proprietary file formats• RE undocumented/proprietary network protocols• RE undocumented structures in memory• Fuzzing • Examination of protocol implementation• Replay network interaction• Zero-day vulnerability signature generation