My Script Engines Know What You Did In The Dark ... · My Script Engines Know What You Did In The...
Transcript of My Script Engines Know What You Did In The Dark ... · My Script Engines Know What You Did In The...
My Script Engines Know What You Did In The Dark:
Converting Engines into Script API Tracers
Toshinori Usui†‡, Yuto Otsuki†, Yuhei Kawakoya†,
Makoto Iwamura†, Jun Miyoshi†, Kanta Matsuura‡
†NTT Secure Platform Laboratories‡Institute of Industrial Science, The University of Tokyo
Session: Malware
ACSAC 35, San Juan, Puerto Rico, USA 2019
Session: Malware
• Malicious scripts are used in multiple stages of attacks
– Exploitation: malicious spam with scripts, drive-by download attacks
– Persistent attack: script-based fileless malware
• Detailed behavior analysis is essential to counter them
2
Attacks with Malicious Scripts
Attacker
.docm .js .vbs .wsf.ps
.js.ps
Exploitation
Persistent attack
Attack target
Solution: Script API※ Tracer
Script API tracerSuspicious script Script API logs
Input Output
!
Analyst
Execute
※Script API
Built-in functions and objects
of script languages
Obfuscated code Behavior
CreateObject(“Micro
soft.XMLHTTP”)
CreateObject(“Wscri
pt.shell”)
Log
[CreateObject]
Param:
Microsoft.XMLHTTP
[CreateObject]
Param: Wscript.shell
sTheme$ = aDocumentProperties.getPropertyValue("Theme") sTitle$ = aDocumentProperties.getPropertyValue("Title") bUserData = aDocumentProperties.getPropertyValue("UserData") ' Eine Zeichenkette zusammenbasteln, welche die Werteformatiert darstellt. sOutLine$ = "[OWString]" + Chr$(9) + "Author" + Chr$(9) + "= {" + Chr$(9) + sAuthor$ + "}" + Chr$(13) sOutLine$ = sOutLine$ + "[sal_Bool]" + Chr$(9) + "AutoloadEnabled" + Chr$(9) + "= {" + Chr$(9) + bAutoloadEnabled + "}" + Chr$(13) sOutLine$ = sOutLine$ + "[sal_Int16]" + Chr$(9) + "AutoloadSecs" + Chr$(9) + "= {" + Chr$(9) + n 3
Generic Design of Script API Tracer
Kernel
Script engine
Script
System library (e.g., kernel32.dll)
Application library (e.g., COM DLLs)
User land
Kernel land
System callSystem-level
Script
engine-level
System-level
Script-level
4
Logging codeOutput logs of
executed APIs
⇒ Behavior analysis
Insert with hooks
We assume the three requirements:
1. Universal applicability
– Applicable dependent on script language specification
2. Preservability of script semantics※
– Possible to output logs with script semantics
3. Binary applicability
– Possible to build only with script engine binaries
※Log output with a same perspective to scripts
– E.g., If script calls CreateObject, “CreateObject” is logged
Requirements of Script API Tracer Design
5
Problems: System-Level Monitoring
Kernel
Script engine
Script
System library (e.g., kernel32.dll)
Application library (e.g., COM DLLs)
User land
Kernel land
System callSystem-level
System-level
Document.Cookie.Set
WriteFile
NtWriteFile
6
Semantics of “setting Cookie” has lost
→ Not script-semantics-preservable
Problems: Script-Level Monitoring
Kernel
Script engine
Script
System library (e.g., kernel32.dll)
Application library (e.g., COM DLLs)
Script-level
7
Possible only when a hook mechanism
is provided by the target language
→ Not universally applicable
User land
Kernel land
System call
Problems: Script Engine-Level Monitoring
Kernel
Script engine
Script
System library (e.g., kernel32.dll)
Application library (e.g., COM DLLs)
Script
engine-level
Requires implementation details
of (sometimes proprietary) script engines
→ Not binary applicable
8
User land
Kernel land
System call
• No design pattern fulfills the all three requirements
Problem Summary
Existing script API tracers
[1] Script-level: jäk, Revelo, box-js, jsunpack-n, JSDetox
[2] System-level: Ether, CWSandbox, API Chaser, Alkanet
[3] Script engine-level: Sulo, JSAND, FlashDetect, ViperMonkey
9
Design1. Universal
applicability
2. Preservability of
script semantics
3. Binary
applicability
Script-level[1] ✘ ✔ ✔
System-level[2] ✔ ✘ ✔
Script
engine-level[3] ✔ ✔ ✘
• Fulfill all the three requirements
Our Goal
Design1. Universal
applicability
2. Preservability of
script semantics
3. Binary
applicability
Script-level[1] ✘ ✔ ✔
System-level[2] ✔ ✘ ✔
Script
engine-level[3] ✔ ✔ ✘
Our method ✔ ✔ ✔
Does it become applicable by binary analysis?
10
• Problem
– Script engine-level monitoring is inapplicable to (proprietary) binaries
• Root cause: where to insert hooks and what to output to logs are unknowns
• Key idea
– Automatically analyze script engine binaries to know the unknowns
Problem Definition and Key Idea
Hook point:Where to insert hooks?
Tap point:What memory should be logged as what type?
Script engine Our method Script Analyzer
Input Output
11
• For simplification, we assume:
– A hook point is top of any subroutine
– A tap point is any argument of the corresponding hook point
Assumption on Hook and Tap Point
Hook point
Tap point
• ebp+8
• ebp+0xC
• ebp+0x10Any one
of them12
Our method
Method Overview
Script API Tracer
Knowledge on
Lang. Spec.
Script engine binaryTest script
Automated dynamic analysis with test scripts
Out-
put
Input
Hook point detection
Tap point detection
Execution trace logging
STEP 1
• Create test scripts for dynamic analysis
Manual
STEP 3
• Insert logging code
• Use this for
malicious script analysis
13
STEP 2
• Acquire execution traces
• Detect hook/tap points
• Scripts which only call a script API with no error
– Used to specify the script API of analysis target
– Works as a indicator of arguments on memory
• To know where the arguments appear in the process memory of the script engine
Test Script
Eval(“1+1”)
Example: test script for Eval API
14Formal definitions and more examples are available in our paper
• Logs all branches and system API calls in the target script engine
Execution Trace Logging
Example of execution trace
…
…
Branch trace
System API call trace
15
APIname: LoadLibraryA RetAddr: 0x7797deb7 Arg1: kernel32.dll
Type: jmp Src: 0x7797da46 Dest: 0x759fbdbf
Type: call Src: 0x7797de9b Dest: 0x77b2522b
• Analysis by diffing multiple execution traces obtained under
different conditions
16
Hook Point Detection
by Differential Execution Analysis
Test script of calling only once
Eval-related branches Eval-related branches
Eval-related branches
Eval-related branches
Execution
trace
Eval(“1+1“)
Eval(“1+1“)
Eval(“1+1“)
Eval(“1+1“)
Calling N times
Extract the subsequence which is observed:
once in the left
N times in the right
→ Use an algorithm borrowed from bioinformatics
Hook point
candidates
S A B C M
0 0 0 0 0 0
S 0 2 0 0 0 0
A 0 0 4 2 0 0
B 0 0 2 6 4 2
C 0 0 0 4 8 6
E 0 0 0 0 6 7
S A B C M
0 0 0 0 0 0
S 0 2 0 0 0 0
A 0 0 4 2 0 0
B 0 0 2 6 4 2
C 0 0 0 4 8 6
E 0 0 0 0 6 7
• Algorithm for finding common subsequences by dynamic programming
– Rule: cells get high score if matched (left) and low score if not (right)
Preliminary:
(Original) Smith-Waterman Algorithm
… A
… 2 0
A 0 4
… A
… 2 0
B 0 1
Seq. 2
Seq. 1
STEP 1
• Fill the table of dynamic
programming
(DP table)
… A
… 4 8
B 0 6+2 -1 -2
DP table
17
S A B C M
0 0 0 0 0 0
S 0 2 0 0 0 0
A 0 0 4 2 0 0
B 0 0 2 6 4 2
C 0 0 0 4 8 6
E 0 0 0 0 6 7
S A B C M
0 0 0 0 0 0
S 0 2 0 0 0 0
A 0 0 4 2 0 0
B 0 0 2 6 4 2
C 0 0 0 4 8 6
E 0 0 0 0 6 7
• Algorithm for finding common subsequences by dynamic programming
– Rule: cells get high score if matched (left) and low score if not (right)
Preliminary:
(Original) Smith-Waterman Algorithm
… A
… 2 0
A 0 4
… A
… 2 0
B 0 1
Seq. 2
Seq. 1
… A
… 4 8
B 0 6+2 -1 -2
STEP 2
• Find the point of
max score
DP table
18
S A B C M
0 0 0 0 0 0
S 0 2 0 0 0 0
A 0 0 4 2 0 0
B 0 0 2 6 4 2
C 0 0 0 4 8 6
E 0 0 0 0 6 7
S A B C M
0 0 0 0 0 0
S 0 2 0 0 0 0
A 0 0 4 2 0 0
B 0 0 2 6 4 2
C 0 0 0 4 8 6
E 0 0 0 0 6 7
• Algorithm for finding common subsequences by dynamic programming
– Rule: cells get high score if matched (left) and low score if not (right)
Preliminary:
(Original) Smith-Waterman Algorithm
… A
… 2 0
A 0 4
… A
… 2 0
B 0 1
Seq. 2
Seq. 1
… A
… 4 8
B 0 6+2 -1 -2
STEP 3
• Backtrack the path of max score
DP table
19
• We append the concept of the number of appeared common
subsequences that does not exist in the original algorithm
– “extract common subsequences that appeared N times”
Our Customized Algorithm
S A B C M A B C M A B C M E
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
S 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0
A 0 0 4 2 0 0 2 0 0 0 2 0 0 0 0
B 0 0 2 6 4 2 0 4 2 0 0 4 2 0 0
C 0 0 0 4 8 6 4 2 6 4 2 2 6 4 2
E 0 0 0 0 6 7 5 3 4 5 0 1 4 5 3
STEP 1
• Same as the original algorithm
until backtrack20
Execution trace which calls multiple times
Execution trace
which calls just once
• We append the concept of the number of appeared common
subsequences that does not exist in the original algorithm
– “extract common subsequences that appeared N times”
Our Customized Algorithm
S A B C M A B C M A B C M E
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
S 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0
A 0 0 4 2 0 0 2 0 0 0 2 0 0 0 0
B 0 0 2 6 4 2 0 4 2 0 0 4 2 0 0
C 0 0 0 4 8 6 4 2 6 4 2 2 6 4 2
E 0 0 0 0 6 7 5 3 4 5 0 1 4 5 3
STEP 2
• Recursively extract N-1 sequences
from the same rows of the rest of the table21
Execution trace which calls multiple times
Execution trace
which calls just once
• We append the concept of the number of appeared common
subsequences that does not exist in the original algorithm
– “extract common subsequences that appeared N times”
22
Our Customized Algorithm
S A B C
0 0 0 0 0
S 0 2 0 0 0
A 0 0 4 2 0
B 0 0 2 6 4
C 0 0 0 4 8
E 0 0 0 0 6
Execution trace which calls multiple times
M A B C
0 0 0 0 0
S 0 0 0 0 0
A 0 0 2 0 0
B 0 0 0 4 2
C 0 0 0 2 6
E 0 0 0 0 4
M A B C M E
0 0 0 0 0 0 0
S 0 0 0 0 0 0 0
A 0 0 2 0 0 0 0
B 0 0 0 4 2 0 0
C 0 0 0 2 6 4 2
E 0 0 0 0 4 5 3
STEP 3
• Check if the similarity
is above threshold t
Execution trace
which calls just once
• Heuristically explorer the arguments (on the stack/registers) of the
obtained hook point candidates and dereference as various types
• If matched: detect as tap points and decide the hook point candidate
Tap Point Detection
“”
“1+1”
34214738int
char *
wchar *
…
…
int *5701715
Eval(“1+1“)
0544 0e30 8606 410e 3883 0747 0e90 46036d02 0a0e 3841 0e30 410e 2842 0e20 420e1842 0e10 420e 0848 0b00 0000 4400 0000fc68 0000 5213 0a02 3a01 0000 0042 0e108e02 420e 188d 0347 0e20 8c04 410e 28860544 0e30 8306 470e 9005 0315 010a 0e30410e 2841 0e20 420e 1842 0e10 420e 08410b00 0000 1800 0000 4469 0000 4436 ffff
Exec-
ute
Tap point
Match
Test script
23
• STAGER: our prototype system
– Implemented with C++, Python and Intel Pin
• Experimental setup
Evaluation
OS Windows 7 (32-bit)
CPU Intel Core i7-6600U CPU @ 2.60GHz
RAM 2GB
Target
script engines
VBA VBE7.dll (Version 7.1.10.48)
VBScriptvbscript.dll (Version 5.8.9600.18698)
vbscript.dll (ReactOS 0.4.9)
PowerShell PowerShell Core 6.0.3
Use open source
implementation for confirming
the detection results 24
Detection AccuracyScript Script API # of all branches
# of hook point
candidates
Hook and tap point
detected
Proper
log available
VBA
CreateObject 93000090 53
Invoke (COM) 101993701 98
Declare 94281492 34
Open 85641170 42
Print 90024821 29
VBScript
(Microsoft)
CreateObject 390836 48
Invoke (COM) 1148225 92
Eval 369070 121
Execute 371040 134
VBScript
(ReactOS)
CreateObject 89213 32
Invoke (COM) 128511 43
EvalNot implemented Not applicable
Execute
PowerShell
New-Object 210852 54
Import-Module 185192 48
New-Item 198327 93
Set-Content 200822 54
Start-Process 152841 119
Invoke-WebRequest 315380 98
Invoke-Expression 271054 82
Filter by hook point detection Final decision by tap point detection
25
Script API Hook and Tap Points
CreateObject
Invoke
(COM dispatch)
Import-Module
Start-Process
Invoke-Expression
• Confirm the location of detected points in the source code
Case study: Detected Hook and Tap Points
26
Execution Duration
0
5
10
15
20
25
30
35
40
Execution trace logging
(once)
Hook point detection Tap point detection Overall
Avg
. exe
cuti
on
du
rati
on
(sec)
Steps
Execution duration for each step VBA VBScript PowerShell
27
Overhead of execution
with Intel Pin
Execution Duration
0
5
10
15
20
25
30
35
40
Execution trace logging
(once)
Hook point detection Tap point detection Overall
Avg
. exe
cuti
on
du
rati
on
(sec)
Steps
Execution duration for each step VBA VBScript PowerShell
28
Our algorithms seem
to be quick enough
Execution Duration
0
5
10
15
20
25
30
35
40
Execution trace logging
(once)
Hook point detection Tap point detection Overall
Avg
. exe
cuti
on
du
rati
on
(sec)
Steps
Execution duration for each step VBA VBScript PowerShell
29
About 30 secs/one script API
⇒ realistic time
• 10 people in CS department are gathered
• Asked to create valid test scripts for STAGER (in slide P.23)
– We measured the required time
– Note: They previously spent a time to understand language specification
30
Human Effort: Test Script Preparation
30~100 sec/one script API
→ Much less efforts than manual
reverse-engineering
[CreateObject] ProgID: Microsoft.XMLHTTP
[CreateObject] ProgID: Wscript.shell
[CreateObject] ProgID: Adodb.streaM
[CreateObject] ProgID: shell.Application
[Invoke] ActiveX Object: Wscript.shell
[Invoke] ActiveX Method: Environment
[Invoke] Param_1: Process
[Invoke] ActiveX Method: Environment
[Invoke] Param_1: TeMP
[Invoke] ActiveX Object: Microsoft.XMLHTTP
[Invoke] ActiveX Method: Open
[Invoke] Param_1: GeT
[Invoke] Param_2: http://zhongjianbao.com/8yfh4gfff
[Invoke] Param_3: False
[Invoke] ActiveX Object: Microsoft.XMLHTTP
[Invoke] ActiveX Method: setRequestHeader
[Invoke] Param_1: User-Agent
[Invoke] Param_2: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101
Firefox/52.0
[Invoke] ActiveX Object: Microsoft.XMLHTTP
[Invoke] ActiveX Method: Send
[Invoke] ActiveX Object: Microsoft.XMLHTTP
[Invoke] ActiveX Method: Status
Obtain Temp folder path
Quick Example of Analysis Log (1/2)
Download with HTTP GET
Source:
About 1,500 lines of obfuscated VBA code
31
[Invoke] ActiveX Object: Adodb.streaM
[Invoke] ActiveX Method: Type
[Invoke] Param_1: Empty
[Invoke] Param_2: 1
[Invoke] ActiveX Object: Adodb.streaM
[Invoke] ActiveX Method: Open
[Invoke] ActiveX Object: Microsoft.XMLHTTP
[Invoke] ActiveX Method: responseBody
[Invoke] ActiveX Object: Adodb.streaM
[Invoke] ActiveX Method: Write
[Invoke] Param_1: ^/A# _|LDXpETRr?:$U}
>3:
**
[Invoke] ActiveX Object: Adodb.streaM
[Invoke] ActiveX Method: saveToFile
[Invoke] Param_1: C:¥Users¥ntt¥AppData¥Local¥Temp¥kkloepp8
[Invoke] Param_2: 2
[Invoke] ActiveX Object: shell.Application
[Invoke] ActiveX Method: Open
[Invoke] Param_1: C:¥Users¥ntt¥AppData¥Local¥Temp¥miniramon8.exe
Save as a file
to Temp folder
Execute the saved file
Quick Example of Analysis Log (2/2)
Now we use this script API tracer for analyzing 10K+ malicious scripts
32
• We proposed an dynamic analysis method for script engines
– Hook point detection
– Tap point detection
• Achieved the script-level monitoring only with engine binaries
• The evaluation on STAGER showed that:
– Properly detect hook/tap points
– Quick enough for real-world application
– Logs are usable to understand malicious behavior
Conclusion
33
Q & A ?
Please tell me if you can accept a visiting student in the malware analysis field!