o-checker : Malicious document file detection tool - Malicious feature can be detected based on file...
-
Upload
code-blue -
Category
Technology
-
view
803 -
download
0
description
Transcript of o-checker : Malicious document file detection tool - Malicious feature can be detected based on file...
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
o-checker :Malicious document file detection tool
- File sizes tell whether the document file is malicious or not -
Yuhei Otsubo
1
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Agenda
2
1. Background2. Structure of malicious document files 3. Overview of o-checker4. Detection mechanism5. Demo6. Application7. Conclusion
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
1. BACKGROUND
3
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Increase in targeted email attacks (1/3)
4http://www.symantec.com/threatreport/topic.jsp?aid=industrial_espionage&id=malicious_code_trends
1. Background
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Increase in targeted email attacks (2/3)
5
Number of security advisories on targeted email attacks to governmentinstitutions
※GSOC:Government Security Operation Coordination team
1. Background
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Increase in targeted email attacks (3/3)
6
5.4% → 33%
research commissioned by METI(Ministry of Economy, Trade and Industry), (2007,2011)
Rate of companies that had experienced targeted attacks
1. Background
2007 2011
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Example of target attacks
7
secret
Send emails with malware
Open an attachment
Infected with malware
Network of companies and private individuals
Attacker Victim
Data exfiltration
1
2
3
④
1. Background
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
File types of targeted email attacks
8Trend of the extension of the attachment of targeted email attacks,Trend Micro Japan(2013)http://is702.jp/special/1431/
Executable files :59%Document files :41%
1. Background
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
2. STRUCTURE OF MALICIOUS DOCUMENT FILES
9
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
文書ファイル
Exploit
Shellcode
Malware executable file
Decoy file(for display)
Structure of malicious document files
10
Abuses a browsingsoftwarevulnerability
Creates a malwareexecutable file anda decoy file thenexecutes/opens them
Encoded by various ways.No relation with document contents
2. Structure of malicious document files
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Example of malicious document files
11
2. Structure of malicious document files
Bitmap View Hex View
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Exploit(1/2)
12
• Object 29• A JavaScript action• Its script is stored in object
31.
2. Structure of malicious document files
• Object 31• A JavaScript script(Exploit)• Flate compression method
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Exploit(2/2)
13
After decoding the Flate compression data
↓Shellcode encoded by escape() function
2. Structure of malicious document files
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Shellcode
14
2. Structure of malicious document files
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Shellcode
15
2. Structure of malicious document files
Decoder : 40 Bytes
Shellcode is encoded with printable characters
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Executable file(1/2)
16
Encoded executable file
2. Structure of malicious document files
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Executable file(2/2)
17
After decode
2. Structure of malicious document files
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Decoy file
18
2. Structure of malicious document files
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
3. OVERVIEW OF O-CHECKER
19
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Utilizes particular patterns foundthrough static/dynamic analysis.
Document file
Exploit
Shellcode
Malware executable file
Decoy file(for display)
traditional methods of malicious document detection
20
malicious code
3. Overview of o-checker
Traditional methods
• Particular patterns can be changedby encode.
• There are cases when exploits onlywork on specific environments anddynamic analysis is difficult.
Problems
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Vicious circle
21
If a detection method focuses on codes that can be writtenarbitrarily by attackers
Vicious circle
Creates a signature based onmalicious code.
Defender
Changes malicious code so that it canavoid detection.
Attacker
3. Overview of o-checker
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Breaking the vicious circle
22
Vicious circle
Creates a signature based onstructural analysis of file formats.
Defender
Changes malicious code so that it can avoid detection.
Attacker
Changes file format syntax so that itcan avoid detection.
3. Overview of o-checker
If a detection method focuses on file formats that cannot be writtenarbitrarily by attackers
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Characteristic of malicious document ~based on file format
23
A document file is an aggregate of pictures, text, and auxiliary data.
There is no data which a browsing software does not process.
Whether a browsing software processes data or not
reason
Exploit for abusing a browsing software vulnerabilities
Shellcode exploit code includes shellcode
Executable file -If a browsing software parses it, the contents will be displayed garbled or a browsing software will malfunction.
Decoy file -If a browsing software parses it, the contents will be displayed garbled or a browsing software will malfunction.
Each data has its purpose.
3. Overview of o-checker
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Detection mechanism (simplified)
All of structures match directly to contents
Contents Structures
A part of structures is mismatched to contents
Contents Structures
Normal document files Malicious document files
Detection based on structural analysis of document formats
3. Overview of o-checker
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Performance of o-checker
25
• High speed and high detection ratesdetection rates:98.9% Average execution time:0.3s
• Almost maintenance-free
Updating frequency
Remarks
Anti-virus software Every day 200,000 new type of malware per day (2012)※
o-checkerAlmost none
It needs update, if a new document file format comes out.
msanalysis.pyinput
Documentation files embedded with
executable files
pdfanalysis.pyinput
Alert
3. Overview of o-checker
※:http://www.kaspersky.com/about/news/virus/2012/2012_by_the_numbers_Kaspersky_Lab_now_detects_200000_new_malicious_programs_every_day
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
4. DETECTION MECHANISM
26
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Inspection items
27
(A) Attached data after EOF
(B) Anomaly file size
(C) Data not referred from FAT
(D) Free sector in the last sector
(E) Unaccounted-for sector
(F) Unaccounted-for section
(G) Unreferenced object
(H) Camouflaged stream
Rich Text
CFB
o-checker
4. Detection mechanism
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Structure of Rich Text files
28
{¥rtfHello!¥parThis is some {¥b bold} text.¥par}
RTF files are usually 7-bit ASCII plain text. RTF consists of control words, control symbols, and groups. ※
※:wikipedia
an example, RTF code
The signature that indicates a RTF file
(}) corresponding to the first ({) is located at the end of a file (EOF).
4. Detection mechanism
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
(A) Attached data after EOF
29
{¥rtfHello!¥parThis is some {¥b bold} text.¥par}MZ・ク
・ コ エ ヘ!クL!This program cannot be run in DOS mode.$ 猝t讀ォオ、ォオ、ォオュモ招ヲォオュモ楫喚オ、ォオ0ェオュモ卸・オュモ匏ウォ
Rich、ォオPE d・ ヤノ[J ・ "
An executable file is inserted at the end of a file in order not to affect the display.
a RTF file embedded with an executable files.
Data exists after EOF.
4. Detection mechanism
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
CFB (doc,xls,ppt,jtd/jtdc)
30
Root Storage
Storage 1 Storage 2
Storage 3
Stream A
Stream B Stream C
[MS-CFB] – v20130118 Compound File Binary Format (http://msdn.microsoft.com/en-us/library/dd942138.aspx)
In file systemStream → File Storage → Folder
CFB:Compound File Binary• A layered structure can be stored in one file.• An archive format which Microsoft Corp. developed• It is used by Microsoft Word etc.
doc,ppt,xls,jtd/jtdc※
4. Detection mechanism
※:jtd and jtdc are used by Ichitaro (一太郎),a Japanese word processor developed byJustSystems Corp.
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Structure of CFB files
31
Header
FAT0
Directory Entry
Stream A
Stream A
Free Sector
Stream B
1
2
3
4
5
Physical Structure
-2
-2
3
-2
-1
-2
Directory Entry
index
sector
Stream Name:a.txtSize:696 Index:2
Stream Name:b.txtSize:318 Index:5
Storage Name:rootSize:- Index:-
FAT(File Allocation Table)
4. Detection mechanism
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Physical structure of CFB
32
Header
FAT0
Directory Entry
Stream A
Stream A
Free Sector
Stream B
1
2
3
4
5
Physical Structure
512 Byte
(512 or 4096) x N Byte
FileSize = 512 + (512 or 4096) x N= 512 x M
The file size of a regular CFB file iscertainly a multiple of 512.
4. Detection mechanism
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
(B) Anomaly file size
33
Header
FAT0
Directory Entry
Stream A
Stream A
Free Sector
Stream B
1
2
3
4
5
Physical Structure
-2
-2
3
-2
-1
-2
FAT(File Allocation Table)
Directory Entry
malware6
7The size of the file except a header is nota multiple of the size of sector. If the file size is divided by 512, the
remainder will come out.
Stream Name:a.txtSize:696 Index:2
Stream Name:b.txtSize:318 Index:5
Storage Name:rootSize:- Index:-
4. Detection mechanism
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
(C) Data not referred from FAT
34
Header
FAT0
Directory Entry
Stream A
Stream A
Free Sector
Stream B
1
2
3
4
5
Physical Structure
-2
-2
3
-2
-1
-2
Directory Entry
malware6
7
The file exceeds the area
-1
The area which can be referred to by FAT:(The number of sectors of FAT)×128×512 (Byte)
Stream Name:a.txtSize:696 Index:2
Stream Name:b.txtSize:318 Index:5
Storage Name:rootSize:- Index:-
FAT(File Allocation Table)
?
4. Detection mechanism
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
(D) Free sector in the last sector
35
Header
FAT0
Directory Entry
Stream A
Stream A
Free Sector
Stream B
1
2
3
4
5
Physical Structure
-2
-2
3
-2
-1
-2
Directory Entry
malware6
The sector corresponding to the end of thefile (n-th sector) is a free sector.
-1
When the size of sector is 512, n = (file size-512)/512
n
Stream Name:a.txtSize:696 Index:2
Stream Name:b.txtSize:318 Index:5
Storage Name:rootSize:- Index:-
FAT(File Allocation Table)
4. Detection mechanism
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
(E) Unaccounted-for sector
36
Header
FAT0
Directory Entry
Stream A
Stream A
Free Sector
Stream B
1
2
3
4
5
Physical Structure
-2
-2
3
-2
-1
-2
Directory Entry
malware6
There is a sector which cannot be classifiedinto FAT (DI-FAT and mini-FAT are included),DE, stream and free sector.
-2
Stream Name:a.txtSize:696 Index:2
Stream Name:b.txtSize:318 Index:5
Storage Name:rootSize:- Index:-
FAT(File Allocation Table)
4. Detection mechanism
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
PDF document
Structure of PDF:Physical structure
Comment (Header)
Body
Cross-reference table
TrailerComment(EOF)
Sequence of indirectobjects (fonts, pages andsampled images)
1 0 obj
2 0 obj
n 0 obj
End-of-file marker
x 0 obj <</R2 /P-64 /V 2 /O (dfhjaklgk… …>>
A PDF file is an aggregate of many objects(numeric, string, asequence of bytes etc.)
4 elements
4. Detection mechanism
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Structure of PDF:Types of objects
38
Basic Objects(A) Numeric(B) String(C) Name(D) Boolean(E) Null
Composite Objects(F) Array(G) Dictionary
Others(H) Stream(a sequence of bytes)(I) Indirect(referring to other objects)
(PDF32000-1:2008 7.3)
4. Detection mechanism
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Structure of PDF:Stream filter
39
Filter Name Description
/ASCIIHexDecode Decodes data encoded in an ASCII hexadecimal representation, reproducingthe original binary data.
/ASCII85Decode Decodes data encoded in an ASCII base-85 representation, reproducing theoriginal binary data.
/LZWDecode Decompresses data encoded using the LZW adaptive compression method,reproducing the original text or binary data.
/FlateDecode Decompresses data encoded using the zlib/deflate compression method,reproducing the original text or binary data.
/RunLengthDecode Decompresses data encoded using a byte-oriented run-length encodingalgorithm, reproducing the original text or binary data.
/CCITTFaxDecode Decompresses data encoded using the CCITT facsimile standard, reproducingthe original data.
/JBIG2Decode Decompresses data encoded using the JBIG2 standard, reproducing theoriginal monochrome image data.
/DCTDecode Decompresses data encoded using a DCT technique based on the JPEG standard,reproducing image sample data that approximates the original data.
/JPXDecode Decompresses data encoded using the wavelet-based JPEG2000 standard,reproducing the original image data.
(PDF32000-1:2008 7.4)
Stream filters indicate how to decode stream data. The standard filters are summarized in the following table.
4. Detection mechanism
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Structure of PDF:Document structure
40
Trailer
Document information
Document catalog
Outline hierarchy
Page tree
Page Page
Content stream Annotations Content stream Thumbnail image
:Object
Structure of a PDF document
:Link
By following the link from trailer, all objects can be referred to.
4. Detection mechanism
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Structure of PDF:Encryption
41
Structure of a PDF document enctypted
Encryption applies to almost all strings and streams in the PDF file.Leaving the other object types unencrypted allows random access to theobjects within a document. (except for the object stored in ObjStm)
Trailer
Document information
Document catalog
Outline hierarchy
Page tree
Page Page
Content stream Annotations Content stream Thumbnail image
:Object
:Link
4. Detection mechanism
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Structure of PDF:ObjStm (Object Streams)
42
ObjStm (Object Streams) are introduced in PDF 1.5. The purpose of ObjStm is to allow indirect objects other than streams to be stored more compactly by using the facilities provided by stream compression filters.( PDF32000-1:2008 7.5.7)
Packing Compressing Encryption
4. Detection mechanism
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
(F) Unaccounted-for section
43
PDF Document
Comment(Header)
Body
Cross-reference table
TrailerComment(EOF)
Executable file
Classifying objects into fourelements from the head of afile, there is data whichcannot be classified.
4. Detection Mechanism
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
(G) Unreferenced object
44
Executable file
A PDF file embedded with a executable files
When an executable file is inserted as an object in disregard ofdocument structure, it is often unreferenced.
Executable file :Object embedded withan executable file
Trailer
Document information
Document catalog
Outline hierarchy
Page tree
Page Page
Content stream Annotations Content stream Thumbnail image
:Object
:Link
4. Detection mechanism
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Executable file
(H) Camouflaged Stream
45
Camouflaged Filter
Putting to the end of Streams
Regular Stream EOD
End-of-data marker
Data used for decoding(Decoding is successful.)
Data which is not used for decoding
When the filter is FlateDecode, DCTDecode or JBIG2Decode,
entropy
Plain Text small
FlateDecode big
Execution file big
An attacker camouflages as theobject is using the filter of whichvalue of entropy is similar to thevalue of entropy of executablefiles.(Decoding goes wrong.)
4. Detection mechanism
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Malicious documents Clean documents
File type Extension QuantityAveragesize(KB)
QuantityAveragesize(KB)
Rich Text rtf 98 266.5 199 516.2
doc 36 252.2 1,195 106.1
CFB xls 49 180.4 298 191.7
jtd/jtdc 17 268.5 - -
PDF pdf 164 351.2 9,109 101.7
Total 364 291.8 10,801 322.7
Experiment
46
Document files used for targeted email attacks from 2009 to 2012※1
Clean document files classified according to contagio a malware dump site※2
※1:Rich Text by which the extension was camouflaged by doc is counted as rtf. ※2:http://contagiodump.blogspot.jp/2013/03/16800-clean-and-11960-malicious-files.html
4. Detection mechanism
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Detection rate of o-checker
47
99.0%
77.5%
90.2%
97.1%
96.1%
49.4%
43.9%
63.4%
99.0%
98.0%
99.4%
Rich Text
CFB
o-checker
4. Detection mechanism
(A) Attached data after EOF
(B) Anomaly file size
(C) Data not referred from FAT
(D) Free sector in the last sector
(E) Unaccounted-for sector
(F) Unaccounted-for section
(G) Unreferenced object
(H) Camouflaged stream
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
5. DEMO
48
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Output of o-checker
49
C:¥tmp>pdfanalysis.py a.pdf
00000000-00000008:comment,00000009-0000000F:comment,00000010-00000110:obj 25 0 old(not used)00000111-00000197:obj 26 0 old(not used)
:00003622-000036B0:trailer000036B1-000036C2:startxref 00003617000036C3-000036C9:comment,000036CA-0000E9E2:unknown0000E9E3-0000E9E9:comment,0000E9EA-0000EAEA:obj 25 0 ObjStm [7, 8, 13]
:0001209D-000120A3:comment,000120A4-000120A7:unknownFFFFFFFF-FFFFFFFF:obj 7 0 xref from NoneFFFFFFFF-FFFFFFFF:obj 8 0 xref from None
:
Offset address Classification result
Decoy Document
ObjStm
5. Demo
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Judgment option
50
C:¥tmp>pdfanalysis.py a.pdf -j
00000000-00000008:comment,00000009-0000000F:comment,00000010-00000110:obj 25 0 old(not used)00000111-00000197:obj 26 0 old(not used)
::
0001209D-000120A3:comment,FFFFFFFF-FFFFFFFF:obj 7 0 xref from NoneFFFFFFFF-FFFFFFFF:obj 8 0 xref from None
:Malicious!
“-j” is a judgment option.
The three judgment types“Malicious!”,“Suspicious!” or“None!”will be shown at the end of an output.
5. Demo
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
51
DEMO 1
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
6. APPLICATION
52
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Application to NIDS
53
Network cable
NIDS
o-checker
Packet Capture
Reconstructe-mails
Alert
o-checker can be introduced into an existing systemwithout updating.
6. Application
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
NIDS
Problems of the application
54
o-checker
Alert
Failure to recover e-mails
~2%
Broken document files~2%
False positiveup ~2%
False positives increases because of the performance of e-mail recovering software.
Network cable
Packet Capture
Reconstructe-mails
6. Application
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
NIDS
Enhanced o-checker
55
newo-checker
Alert
Deselection of broken document files based onstructural analysis of file formats
Network cable
Packet Capture
Reconstructe-mails
Failure to recover e-mails
~2%
Broken document files~2%
False positiveup ~0%
6. Application
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
o-checker
Application to android (1)
56
Mail server
Manual delete
Manual check
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
57
DEMO 2
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
E-mailer
o-checker
Application to android (2)
58
Mail server
Auto delete
Auto check
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
7. CONCLUSION
59
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Conclusion
60
• Tradition detectional method reachesits limit.
• Structural analysis of file formatsis effective to detect maliciousdocument files that have embeddedexecutable files.
• Various application of o-checker ispossible. Because it can detectmalicious documents by highprobability at high speed.
7. Conclusion
CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
61
Thank you!