PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary...

42
PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing Yuma Kurogome CODE BLUE 2015 [U-25] 2015.10.29 1 This material is partially based upon work supported by Asian Office of Aerospace Research and Development, U.S. Air Force Office of Scientific Research under Award No. FA2386-15-1-4068.

Transcript of PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary...

Page 1: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

PANDEMONIUM:

Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing

Yuma Kurogome

CODE BLUE 2015 [U-25]

2015.10.29

1

This material is partially based upon work supported by

Asian Office of Aerospace Research and Development,

U.S. Air Force Office of Scientific Research under Award No. FA2386-15-1-4068.

Page 2: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

$ whoami

2

• Yuma Kurogome(@ntddk)

• ntddk.github.io

Peer reviewSecurity Camp lecturer AVTOKYO speaker

Page 3: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Abstract

• Malware utilize many cryptographic algorithms• To conceal messages and configurations

• DBI(Dynamic Binary Instrumentation)• Dynamic analysis on PANDA(QEMU)• Translate x86 code to LLVM IR(Intermediate representation) per

BB(Basic Block)• Remove obfuscated code by optimization

• Fuzzy hash based pattern matching• Detect and avoid anti-analysis code• Identify cryptographic algorithms from the similarity of handling

received data

3

One entry, one exit

Page 4: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Malware and crypto-algorithms

4

Malware utilize many crypto-algorithms

to conceal messages and configurations

• Banking trojan• Decrypt configuration files

• Ransomware• Encrypt victim files

We deal with banking trojan in this researchs

Server(C&C) has key

Key is hardcoded in own body

Page 5: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Evolution of banking trojan

5

Malware come to birth one after

another from the black market

• Many variants were born from leaked Zeus• Citadel• IceIX• GameOver• KINS

• New spiecies have also been born• Dyre• Vawtrak• Chthonic

http://www.wontok.com/wp-content/uploads/2014/10/wdt0185_MalwareTimeline_largeV2.jpg

Page 6: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Banking trojan and crypto-algorithms

6

Many banking trojan utilize encrypted

configuration files and commands

• Ex. Communication between Dyre and C&C

We have to identify crypto-algorithms promptly

……

Key + IV

Encrypted data

Page 7: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Related work (1/2)

7

Identify crypto-algorithms by paying

attention to the arithmetic/bit operations

• Dispatcher[CCS’09]• Find crypto-routines from insns ratio between call and ret insns

• Impossible to find if crypto-routines are made of multiple subroutines

• ReFormat[ESORICS’09]• Find crypto-routines from the peak in the overall execution log

• Impossible to find if multiple algorithms are implemented

Page 8: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Related work (2/2)

8

Identify crypto-algorithms by paying

attention to the loop structures

• Aligot[CCS’11]• Extract the input of the loop structures, and give it to known algorithms

implementation

• If output is same, algorithm is same

• The amount of calculation is O(n^2) a lot, it can only extract known crypto-algorithm

• Kerckhoffr[RAID’11]• Extract the input of the loop structures, and compare with known algorithms

signatures

• If pattern is matched, regard as crypto-routines

• Can only extract known crypto-algorithm

Page 9: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Downside of related work

9

Method Known algorithms Unknown algorithms Anti anti-analysis

Dispatcher ☓

ReFormat ☓

Aligot ☓ ☓

Kerckhoffr ☓ ☓

• Previous approaches assumes execution log is infallible

• PANDEMONIUM can analyze if malware has anti-analysis routines and has been obfuscated

Page 10: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Anti-analysis

10

Many malware try to detect debugger

and sandbox to avoid analysis

••

we cannot often obtain expected analysis results

Page 11: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

There is no silver bullet

11

Analysis platform hasn’t been able to follow

complex technique of malware

••

We need extensible analysis platform

Page 12: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

PANDEMONIUM

Avoid anti-analysisNetwork

communication

Remove obfuscated

code

Identify crypto-

algotiyhms

12

Combine different approaches to identify

decrypt-routines of malware

PANDA

Guest OS malware LLVM IR Analysis log

PANDEMONIUM

Dynamic analysis Static analysis

Page 13: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Emulation by QEMU

• TCG(Tiny Code Generator)

13

1. Disassemble target code, and create BB(Basic Block) separated by branch insns

2. Translate BB to RISC-like TCG IR

3. Translate TCG IR to host code

4. Build chain of translated BBs and execute

Page 14: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

PANDA[REcon’14]

• DBI(Dynamic Binary Instrumentation)

14

1. Disassemble target code, and create BB(Basic Block) separated by branch insns

2. Translate BB to RISC-like TCG IR

3. Translate TCG IR to LLVM IR

4. Translate TCG IR to host code

5. Build chain of translated BBs and execute

1. 2. 3.push esppush ebppush ebx

movi_i64 tmp12,$0x8260a634st_i64 tmp12,env,$0xdae0ld_i64 tmp12,env,$0xdad0

Can apply taint analysis and symbolic executionCallback before/after translation

We can obtain LLVM IR corresponded to malware code

%2 = add i64 %env_v, 128%3 = inttoptr i64 %2 to i64*store i64 2187372084, i64* %3

github.com/moyix/panda

Page 15: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Extract decrypt-routines (1/5)

15

Combine different approaches to identify

decrypt-routines of malware

OS

MalwareObfuscated code

Anti-analysis routine

Handler to received data

……

Decrypt-routine

Obfuscated code

Page 16: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

16

EPROCESS

ActiveProcessLi

nks

PEB

Flink

Blink

EPROCESS

ActiveProcessLi

nks

PEB

Flink

Blink

EPROCESS

ActiveProcessLi

nks

PEB

Flink

Blink…

PsActiveProcess

Head

Flink

Blink

FS:[0x30]

KPCR

KdVersionBlock

FS:[0x1c] KDEBUGGER_DATA32

PsLoadedModuleList

+0x34 +0x70

+0x78

EPROCESS is generated when process created

panda/qemu/panda_plugins/

osi_winxpsp3x86/osi_winxpsp3x86.cpp

Extract malware process from running guest OS

(Register is different from the Windows 7 or later)

Expand

Page 17: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Extract decrypt-routines (2/5)

17

Combine different approaches to identify

decrypt-routines of malware

MalwareObfuscated code

Anti-analysis routine

Handler to received data

……

Decrypt-routine

Obfuscated code

Page 18: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

LLVM (1/2)

18

Optimization pass of LLVM can remove

some obfuscated code

x86

FrontendPANDA

TCG IR

LLVM IR

llvm.org

Page 19: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Remove obfuscated code

19

Optimization pass of LLVM can remove

some obfuscated code

• Insert dead/nop equivalent insns• -dse, -simplifycfg

• Substitute with equivalent insns/Reorder insns• -constprop

• -instcombine

Absorb difference of insns by implementation of compiler

(x = 14; y = x + 8) → (x = 14; y = 22)

(y = 3; ...; y = x + 1) → (...; y = x + 1)

(y = x + 2; z = y + 3) → (z = x + 5)

Cf. opticode.coseinc.com

Page 20: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Extract decrypt-routines (3/5)

20

Combine different approaches to identify

decrypt-routines of malware

Malware

Anti-analysis routine

Handler to received data

……

Decrypt-routine

Obfuscated code

Page 21: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Anti-emulation

21

••

We also have to consider anti-emulation

Page 22: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Fuzzy hashing (1/2)

22

Techniques for identifying the data

that are partially different but similar

• ssdeep• World leading security researchers will come together for this unique

international conference in Tokyo• Bb7g86hvE/

• W0rld leading security researchers will come together for this unique international conference in Tokyo• GT7g86hvE/

Create signature of some anti-analysis and crypto-algorithms

Page 23: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Fuzzy hashing (2/2)

23

Techniques for identifying the data

that are partially different but similar

• Create fuzzy hash per BB• Normalize operand

• Anti-analysis• NtDelayExecution(), WaitForSingleObject(), GetCursorPos(),……

• Crypto-algorithms• MD5, DES, RC4, ……

Create signature of some anti-analysis and crypto-algorithms

From Beecrypt, Crypto++, OpenSSL

Page 24: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

LLVM (2/2)

24

Modify TCG IR based on pattern matching

of LLVM IR before execution

x86

FrontendPANDA

TCG IR

LLVM IR Fuzzy hash table

Feedback

Pattern matching

llvm.org

(Red-black tree)

Page 25: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Symbolic execution (1/2)

25

Technique for extracting path constraints

through operation of symbolic variables

cmp eax, 0x7DFje 0xdeadbaad

if(x!=2015) Invalid.ASSERT( INPUT_*_*_* =0hex7DF );

Source code Trace log Conterexample

2015 affect the branch

Page 26: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Symbolic execution (2/2)

26

Technique for extracting path constraints

through operation of symbolic variables

mov esi, 0x13mov edx, 0x7DF

• Insns must be SSA(Static Single Assignment) form• On x86, Assignment may collide

mov esi, 0x13…mov esi, 0x7DF

(esi == 0x13) and (edx == 0x7DF)

(esi == 0x13) and (esi == 0x7DF)

LLVM IR is suitable for symbolic execution

Page 27: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Anti anti-analysis

27

static inline int IsSleepPatched(){DWORD time1 = GetTickCount();Sleep(500);DWORD time2 = GetTickCount();if ((time2- time1) > 450)

return 0;else

return 1;}

Avoid anti-analysis code which matched

pattern by using symbolic execution

• Ex. Avoid patch detection of Sleep()•

• RDTSC, GetTickCount(), ……

• Which branch to go?1. Get snapshot2. Rewrite branch constraints3. Long-lasting branch is taken

Or the number of expected clock is spent

(Check 50 insns)

Page 28: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Extract decrypt-routines (4/5)

28

Combine different approaches to identify

decrypt-routines of malware

Malware

Handler to received data

……

Decrypt-routine

Obfuscated code

Page 29: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

VMM

Taint analysis (1/2)

29

mov eax, edx

Guest OS

Technology that analyzes dependencies

between data from propagation of tag

Page 30: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Taint analysis (2/2)

30

Handler BB of received data from virtual

NIC would be contain decrypt-routines

• Taint source(origin of tags)• Virtual NIC

• Taint sink(check position of tags)• End of BB

• Propagation rule• Reference of register and memory

r3 = Load(r2) tr3 = tr2

Page 31: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Anti taint analysis

31

Obfuscation technique that causes

interrupting the propagation of taint tag

• Under-tainting• Data is not assigned directly

But we have LLVM

x = get_input();if (x == "a"){

uri = "c2.php";msg = "a";

}send(uri, msg);

x = get_input();if (x > "a"){

tmp = x + "a"; msg = tmp − x;

} send(uri, msg);

-early-cse,-constprop,-instcombine

Page 32: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Extract decrypt-routines (5/5)

32

Combine different approaches to identify

decrypt-routines of malware

Malware

Handler to received data

……

Decrypt-routine

Page 33: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Now what?

33

Handler BBs of received data from virtual

NIC would be contain decrypt-routines

Decrypt

1. Execute malware

2. Avoid anti-analysis

3. Remove obfuscated code

4. Extract handler BBs of received data

5. Identify crypto-algorithms

Page 34: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Criteria for crypto-algorithm

34

Is fuzzy hash per BB useful for

Identify crypto-algorithms?

• Comparing per BB can not be maintained the uniqueness as a signature• There are many similar insns, many false positives

• Feature does not come out as anti-analysis routines

• Compare the whole point referring received data• Combine their fuzzy hash, calculate LCS

Page 35: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Experiments

35

Experiments of crypto-algorithms

identification using PANDEMONIUM

• Experiment A: Obfuscated sample program

• Experiment B: Real-world malware

Page 36: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Experiment A

36

Analysis of obfuscated sample program

Algorithm Obf A Obf B

MD5

DES

RC4

AES

Blowfish

RSA

A) Insert dead/nop equivalent insns

B) Substitute with equivalent insns/Reorder insns≒ under-tainting

Receive packet, decrypt it(by Crypto++)

Page 37: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Experiment B (1/3)

37

Analysis of real-world malware

• Dyre sample• 999bc5e16312db6abff5f6c9e54c546f• b44634d90a9ff2ed8a9d0304c11bf612• dd207384b31d118745ebc83203a4b04a• B44634d90a9ff2ed8a9d0304c11bf612• 999bc5e16312db6abff5f6c9e54c546f

• Anti-analysis using PEB.NumberOfProcessors

Page 38: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Experiment B (2/3)

38

Analysis of real-world malware

• KINS(ZeusVM) sample• eee1bdb8d4ad98cce0031ed6ca43274a

• 84826d5e65987c131a80b1a3aa53ce17

• a2a7d4f75fc263648824facb0757a3c7

• Obfuscation by original code virtualizer• Ex. nop(0x90) is represented as 0x32, 0x26, 0xF3

• Use

Page 39: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Experiment B (3/3)

39

Analysis of real-world malware

Malware Detection ratio algorithm Cause

Dyre 4/5 RSA

KINS 0/3 RC4 VM

• PANDEMONIUM could avoid anti-analysis of Dyre

• Taint tag might have not been propagated• Might've gone a point to be analyzed by the optimization

• LLVM is not suitable for analyzing modern code virtualizer• Themida, ZeusVM, ……

Page 40: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Consideration

• Is LLVM suitable for analyzing malware?• LLVM doesn't try to operate carry flags very much

• If the implementation improved, there might appear more features of algorithms

• Or detection rate will vary depending on the type of encryption algorithm?• Varies among implementation

• Can not be affirmed for now at criteria such as whether the Feistel structure or SPN structure

• PANDEMONIUM was compared by connecting the fuzzy hash of BBs• It may be necessary to weight the massive block

40

Page 41: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Task

• Extract encryption keys

• Analyze unknown algorithms• Should we focus on the density and the data length of the input and

output of function?

• Analyze code virtualizer• Should we implement optimization pass?

41

We need analysis platform can follow evolution of malware

Page 42: PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynamic Binary Instrumentation and Fuzzy Hashing by Yuma Kurogome - CODE BLUE 2015

Summary

• Malware utilize many cryptographic algorithms• To conceal messages and configurations

• DBI(Dynamic Binary Instrumentation)• Dynamic analysis on PANDA(QEMU)• Translate x86 code to LLVM IR(Intermediate representation) per

BB(Basic Block)• Remove obfuscated code by optimization

• Fuzzy hash based pattern matching• Detect and avoid anti dynamic analysis code• Identify cryptographic algorithms from the similarity of handling

received data

42

One entry, one exit