Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from...
Transcript of Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from...
![Page 1: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/1.jpg)
Harnessing Intelligence from Malware
Repositories
Arun Lakhotia and Vivek NotaniSoftware Research Lab
University of Louisiana at [email protected] , [email protected]
7/22/2015 (C) 2015 U. Louisiana at Lafayette 1
![Page 2: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/2.jpg)
Self Introduction
Software Research Lab 10 years research on Malware Graduate course on malware analysis Active interaction with industry Funded by AFOSR, ARO, DARPA, ONR, and
State of Louisiana
Research Focus How does malware evade detection? How to detect stealthy malware? Malware analysis in the large
Results Papers: 50+ peer-reviewed Patents: one granted Degrees: 6 Ph.D., 8 M.Sc. Research Funding: $5MM+
7/22/2015 (C) 2015 U. Louisiana at Lafayette 2
![Page 3: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/3.jpg)
Targeted Attacks
7/22/2015 (C) 2015 U. Louisiana at Lafayette 3
![Page 4: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/4.jpg)
4
Machine Generation of Malware
![Page 5: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/5.jpg)
CyberSecurity Paradox
Physical world: FAILED attempts are INVESTIGATED
Cyber World: FAILED attempts are CELEBRATED
![Page 6: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/6.jpg)
Extract Intelligence from
Malware
7/22/2015 (C) 2015 U. Louisiana at Lafayette 6
Connect families
Discover trends Malware evolution
Connect actors
![Page 7: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/7.jpg)
![Page 8: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/8.jpg)
Requirement: “Google” for Malware
8
“Google”
Malware Collection
0.90
0.82
0.76
0.30
Nearest Match
Query Malware
![Page 9: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/9.jpg)
Challenges
9
❶ ❸❷ ❹
❶ Remove obfuscation
❷ Map to document
❸ Create index
❹ Search
![Page 10: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/10.jpg)
Key Innovation: VM Introspection
in the Cloud
VM VM
HYPERVISOR
7/22/2015 (C) 2015 U. Louisiana at Lafayette 10
![Page 11: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/11.jpg)
Key Innovation: Semantic
Fingerprints
(380091df) (0091df96) (91df96f6) (df96f633)
eax = def(ebp)ebp = -4+def(esp)esp = -8+def(esp)
memdw(-8+def(esp))= def(ebp)memdw(-4+def(esp))= def(ebp)memdw(4+def(esp)) = def(memdw(def(esp)))
STATE OF PRACTICE VIRUSBATTLE
Bit shreds Semantics
Fingerprint Fingerprint
Susceptible to obfuscation Obfuscation resistent7/22/2015 (C) 2015 U. Louisiana at Lafayette 11
![Page 12: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/12.jpg)
Semantics Enabled: Connecting Malware
through Code
Aldibot
Ponyloader
Smokeloader
Darkcomet
Semantically similar binaries between malware families
7/22/2015 (C) 2015 U. Louisiana at Lafayette 12
![Page 13: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/13.jpg)
Unpacking Malware
(C) 2015 U. Louisiana at Lafayette 137/22/2015
![Page 14: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/14.jpg)
Challenge 1: Packing
7/22/2015 (C) 2015 U. Louisiana at Lafayette 14
![Page 15: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/15.jpg)
Classes of Packers (Protectors)
• Classification parameter• Based on execution behavior
• When and how much of the original code is decrypted
• Traditional Packer• Entire original code is decrypted at one time
• Entire original code is in clear text before it is executed
• Paged Packer• Just-In-Time decryption of a page when it is executed
• Only a ‘page’ of the original code is in clear text at any time
• Virtual Machine Protectors• Decrypt a single instruction at a time
• None of the original code is ever in clear text
7/22/2015 (C) 2015 U. Louisiana at Lafayette 15
![Page 16: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/16.jpg)
Unpacking: State-of-Practice
7/22/2015 (C) 2015 U. Louisiana at Lafayette 16
Malware
Windows Guest OS
UNPACKER CODE
Virtualization Stack
![Page 17: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/17.jpg)
Innovation: Unpacking using VM
Introspection
17(C) 2015 U. Louisiana at Lafayette7/22/2015
Malware
Windows Guest OS
QEMU Hypervisor
Linux Guest OS
Virtualization Stack
VB UNPACKER CODE
Observe malware below ring 0
![Page 18: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/18.jpg)
Unpacking Traditionally Packed
Malware
18(C) 2015 U. Louisiana at Lafayette7/22/2015
Execute Malware
Detect When to
Stop
Extract Revealed
Code
![Page 19: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/19.jpg)
When to Stop: Hump and Dump
Addresses Ordered by last execution time
• Traditional Packer• Decryption in a loop
• High instruction execution frequency
• Spike in frequency graph
• Hump & Dump Algorithm• Detect spike – hump• Detect end – flat
19(C) 2015 U. Louisiana at Lafayette7/22/2015
PE Compact2.5 (calc.exe), Linear Scale
![Page 20: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/20.jpg)
When to Stop: TimeOut
• What if Hump is never detected?• TimeOut
• Limits execution time
7/22/2015 (C) 2015 U. Louisiana at Lafayette 20
![Page 21: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/21.jpg)
Constructing PE
• Modify OEP using last PC value
• Fix Section Headers
• Copy Memory Contents to new PE
7/22/2015 (C) 2015 U. Louisiana at Lafayette 21
![Page 22: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/22.jpg)
Extracting Memory Contents:
Challenge
• Extracting memory through hypervisor
• Memory contents may be paged out by GuestOS
• Solution:• Determine memory is paged out
• Analyze execution profile
• Re-run unpacker with new parameters• Catch before memory is paged out
7/22/2015 (C) 2015 U. Louisiana at Lafayette 22
![Page 23: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/23.jpg)
Dataset Description:• File Type: PE-32• Source: FBI• Availability: Upon request• Collection period: 1 year
Bot Family # Executables
Aldibot 19
Armageddon 1
Blackenergy 65
Darkcomet 339
Darkshell 379
Ddoser 5
Illusion 17
Nitol 11
Optima 160
Ponyloader 1,312
Smokeloader 31
Umbraloader 25
Yzf 4
Zeus 41
Total 2,409
Case Study
23(C) 2015 U. Louisiana at Lafayette7/22/2015
![Page 24: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/24.jpg)
Case Study: Results
7/22/2015 (C) 2015 U. Louisiana at Lafayette 24
Unpacked Binary “very similar” to Original => Poor Unpacking
HEURISTIC
Original Unique
Unpacked
Poor
Unpacking
Poor
Unpacking
(%)
Hump and Dump 1,671 1,523 205 12.27
TimeOut 515 500 46 8.93
Self-tuning 168 163 23 13.69
TOTAL 2,354 2,186 274 11.64
• Input : 2,409 • Unpacked : 2,354• Output : 2,185
![Page 25: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/25.jpg)
Unpacker’s Impact: Analysis Cost
Reduction
UnpackedMalware
CNC IP
Malware Binaries # %
Original binaries 2,409
Unpacked successfully 2,354 97.7%
Unique unpack result 2,185 92.8%
Reduction in analyst work 169 7.2%Multiple malware produce the same unpacked result
7/22/2015 (C) 2015 U. Louisiana at Lafayette 25
![Page 26: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/26.jpg)
Matching Code
(C) 2015 U. Louisiana at Lafayette 267/22/2015
![Page 27: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/27.jpg)
Challenge 2: Code Obfuscation
7/22/2015 (C) 2015 U. Louisiana at Lafayette 27
push ecx
mov ecx,ebp
add ecx,33
push esi
mov esi,ecx
sub esi,34
mov [esi-2],eax
pop esi
pop ecx
push ecx
mov ecx, ebp
push eax
mov eax, 33
add ecx, eax
pop eax
push esi
mov esi, ecx
push edx
mov edx, 34
sub esi, edx
pop edx
mov [esi - 2], eax
pop esi
pop ecx
push ecx
mov ecx, [ebp + 10]
mov ecx, ebp
push eax
add eax, 2342
mov eax, 33
add ecx, eax
pop eax
mov eax, esi
push eax
mov esi, ecx
push edx
xor edx, 778f
mov edx, 34
sub esi, edx
pop edx
mov [esi-2], eax
pop esi
pop ecx
push ecx
mov ecx,ebp
add ecx,33
mov [ecx-36],eax
pop ecx
![Page 28: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/28.jpg)
Requirements
• Scale requirement• Search in collection of thousands to millions of malware
• Performance requirement• Provide results in seconds, or less
• Quality requirement• Error rates should be comparable to pairwise matching
7/22/2015 (C) 2015 U. Louisiana at Lafayette 28
![Page 29: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/29.jpg)
Representations for Matching
Binaries
29
Map binary to ‘document’
Word = N-Bytes
(380091df) (0091df96) (91df96f6) (df96f633)
Disassemble
Word = N-mnemonic
(je push)(push mov)(mov pop)(pop xor)
Abstracted Bytecode
(C) 2015 U. Louisiana at Lafayette7/22/2015
![Page 30: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/30.jpg)
VirusBattle Strategy
30
Map binary to CFG to Document
Binary Disassembly CFG
Word = Block
Abstracted Bytecode
Abstracted Disassembly
Semantics
Juice
(C) 2015 U. Louisiana at Lafayette7/22/2015
![Page 31: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/31.jpg)
Code to Semantics
31
push ebpmov ebp,espsub esp,4mov eax, DWORD ebp+4mov DWORD ebp+8,eaxmov eax, DWORD ebpmov DWORD ebp-4,eax
eax = def(ebp)ebp = -4+def(esp)esp = -8+def(esp)
memdw(-8+def(esp))= def(ebp)memdw(-4+def(esp))= def(ebp)memdw(4+def(esp)) = def(memdw(def(esp)))
Code Semantics• Sequential• Focus on operations
• Parallel• Captures affect
Interpret: seq(Instruction) -> State -> StateState = LValue -> RValue
LValue = Register + Mem
+ def(RValue)+ RValue op Rvalue+ op RValue
Unsimplified
Value in previous state
RValue = Int
7/22/2015 (C) 2015 U. Louisiana at Lafayette
![Page 32: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/32.jpg)
Limitations of (Block) Semantics
• Does not capture:• Register renaming
• Memory address reassignment
• Code motion between blocks
• Evolutionary changes• Hashes good for strict equality
• Solution:• Generalize semantics
• Juice
• Use n-Block semantics
• Use fuzzy hashes
327/22/2015 (C) 2015 U. Louisiana at Lafayette
![Page 33: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/33.jpg)
Semantics to ‘words’
• Challenge:• How to map equal semantics to the same `word’?
• Solution:• Define canonical ordering
• RValue structures are ground
• Use ordering over symbols
• Account for commutativity
• Sum-of-product form
• Simplify
• Word = Hash (md5, SHA1) of linearized semantics
33
RValue = Int+ def(RValue)+ RValue op Rvalue+ op RValue
7/22/2015 (C) 2015 U. Louisiana at Lafayette
![Page 34: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/34.jpg)
Computing Juice
34
Problem: Establish constraints between induced variables?
Solution
1. Track simplification steps
2. Generalize simplification steps
7/22/2015 (C) 2015 U. Louisiana at Lafayette
![Page 35: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/35.jpg)
Semantics and Juice
35
push ebpmov ebp,espsub esp,4mov eax, DWORD ebp+4mov DWORD ebp+8,eaxmov eax, DWORD ebpmov DWORD ebp-4,eax
eax = def(ebp)ebp = -4+def(esp)esp = -8+def(esp)
memdw(-8+def(esp))= def(ebp)memdw(-4+def(esp))= def(ebp)memdw(4+def(esp)) = def(memdw(def(esp)))
Code Semantics
A = def(B), B = N1+def(C), C = N2+def(C),
memdw(N2+def(C)) = def(B)memdw(N1+def(C)) = def(B)memdw(N3+def(C)) = def(memdw(def(C)))
where A, B, C are ‘registers’N1, N2, N3 are ‘Int’
Juice
• Inductive GeneralizationReplace registers and constants by variables
7/22/2015 (C) 2015 U. Louisiana at Lafayette
RValue = Int+ def(RValue)+ RValue op Rvalue+ op RValue+ Variable
![Page 36: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/36.jpg)
Challenge 3: Scalable Search
7/22/2015 (C) 2015 U. Louisiana at Lafayette 36
![Page 37: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/37.jpg)
Featurization Process
37
Unpack Disassembly
Procedure
Procedure
Procedure
Hash
Hash
Hash
MinHash
MinHash
Feature VectorBinary
Binary
Compiler Attributes
7/22/2015 (C) 2015 U. Louisiana at Lafayette
![Page 38: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/38.jpg)
MinHash: A form of LSH
7/22/2015 (C) 2015 U. Louisiana at Lafayette 38
A Feature B
Light Brown Hair Color Dark Brown
Long Hair Length Long
Brown Eye Color Brown
MinHash Signature-1 MinHash Signature-2
Feature = MinHash FunctionSet of Features = MinHash Signature
Compose for Deterministic manipulations
![Page 39: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/39.jpg)
Architecture
7/22/2015 (C) 2015 U. Louisiana at Lafayette 39
![Page 40: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/40.jpg)
VirusBattle Webservice Architecture
7/22/2015 (C) 2015 U. Louisiana at Lafayette 40
![Page 41: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/41.jpg)
Empirical Results
7/22/2015 (C) 2015 U. Louisiana at Lafayette 41
![Page 42: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/42.jpg)
Dataset
7/22/2015 (C) 2015 U. Louisiana at Lafayette 42
0
500
1000
1500
2000
2500
3000
unpacked
original
Bots harvested in 2013
![Page 43: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/43.jpg)
“Interesting” Procedures
7/22/2015 (C) 2015 U. Louisiana at Lafayette 43
![Page 44: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/44.jpg)
7/22/2015 (C) 2015 U. Louisiana at Lafayette 44
Libraries ID’ed by IDA
![Page 45: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/45.jpg)
Transitive Library via Semantics
7/22/2015 (C) 2015 U. Louisiana at Lafayette 45
![Page 46: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/46.jpg)
RE Cost Reduction
1.7 Million+ procedures
32K+ semantically unique procedures
7/22/2015 (C) 2015 U. Louisiana at Lafayette 46
# of semantically unique procedures
# of procedures in binaries
# of IDA unique procedures105K+ procedures
Procedures All IDA Unique
JuiceUnique
Lib Procs 65,113 11,482 4,382
Non LibProcs
1,644,355 96.2%
93,916 89.1%
27,859 86.4%
Total 1,709,468 105,398 32,241
![Page 47: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/47.jpg)
Intelligence: Connecting Families
Aldibot
Ponyloader
Smokeloader
Darkcomet
Semantically similar binaries between malware families
7/22/2015 (C) 2015 U. Louisiana at Lafayette 47
![Page 48: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/48.jpg)
Intelligence: Code Sharing
7/22/2015 (C) 2015 U. Louisiana at Lafayette 48
Optima DarkComet
Shared in 13/159 samplesShared in 65/319 samples
Non-Lib Unpacked Procedure
![Page 49: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/49.jpg)
Intelligence: Code Evolution
7/22/2015 (C) 2015 U. Louisiana at Lafayette 49
ddoser: 10 samples#
Pro
ced
ure
s Sh
ared
Percent binaries
200 in 100%
100 in 10%
![Page 50: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/50.jpg)
Intelligence: Needle in Haystack
7/22/2015 (C) 2015 U. Louisiana at Lafayette 50
darkcomet: 649 samples
20 in 60-65%
930 in 0-5%
# P
roce
du
res
Shar
ed
Percent binaries
![Page 51: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/51.jpg)
Performance
95% percentileSemantic analysis: < 15sUnpacking: < 30sMalware Search: < 7sProcedure Search: < 100ms
Distribution of analysis time
7/22/2015 (C) 2015 U. Louisiana at Lafayette 51
![Page 52: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/52.jpg)
VirusBattle: In a nutshell
Key Innovations
• Automated unpacking using VM introspection
• Semantic fingerprints, as against bits-based fingerprints
• Innovative 2-tier search algorithm for fast searches
• Search at various granularity:Whole binary, procedures, blocks, strings
• Interfaces with Palantir’s Forensic Investigation platform
Rapidly extract intelligence from malwareApplication
Impact• Order of magnitude improvement in malware
analysts capabilityUnpacking time:
Reduced from days/weeks to minutes
Analysis work: Reduce efforts from weeks/months to minutes
New capability: Build knowledge base of analysis indexed on similar codeShare analysts’ experience across malware families
Component Time(*) Accuracy
Unpacker 30 sec 97%
Semantic Juice 15 sec
Binary search 7 sec 95%
Procedure search 100ms
Performance
* Based on analysis of 2,500 botnets binaries; ** Max time to process 95% of files
7/22/2015 (C) 2015 U. Louisiana at Lafayette 52
![Page 53: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/53.jpg)
Blackhat Sound Bytes
• Malware repositories are great source of intelligence
• Semantic juice peers through code obfuscation
• Semantic hashing enables fast search over large repositories
• VM Introspection gives you X-Ray vision over malware
• VirusBattle.com: Malware Intelligence Mining in the Cloud
7/22/2015 (C) 2015 U. Louisiana at Lafayette 53
![Page 54: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/54.jpg)
Contact
Prof. Arun Lakhotia
Vivek Notani
University of Louisiana at Lafayette, Lafayette, LA, USA
7/22/2015 (C) 2015 U. Louisiana at Lafayette 54
![Page 55: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/55.jpg)
Extras
7/22/2015 (C) 2015 U. Louisiana at Lafayette 55
![Page 56: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/56.jpg)
MinHash: A form of LSH
• Consider Set A and Set B
• Let h(x)->int be a function that takes a member of A or B and gives an integer
• Let hmin (s) represent minimum member of set s w.r.t. h.
• Then,
Pr(hmin(A)=hmin(B)) =
Problem: High Variance!
7/22/2015 (C) 2015 U. Louisiana at Lafayette 56
![Page 57: Harnessing Intelligence from Malware Repositories · 2018-05-11 · Harnessing Intelligence from Malware Repositories Arun Lakhotia and Vivek Notani Software Research Lab University](https://reader033.fdocuments.in/reader033/viewer/2022050221/5f66c7322d51d84c7e3ea02a/html5/thumbnails/57.jpg)
MinHash Signatures:
• Compose d minhash functions:• Signature Match then implies each of the d functions agree on match
• Pr (sig(A)=sig(B)) = J(A,B)d
Problem: Too many False Negatives!
• Check r minhash signatures:• A Match then implies atleast one of the r signatures agree on match
• Pr (match(A,B)) = 1 - (1 - J(A,B)d )r
7/22/2015 (C) 2015 U. Louisiana at Lafayette 57