Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate...
Transcript of Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate...
![Page 1: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection](https://reader033.fdocuments.in/reader033/viewer/2022042812/5fab37c787ac237f8e35ec59/html5/thumbnails/1.jpg)
Embracing the new threat: towards automatically, self-diversifying malware
Mathias Payer <[email protected]>UC Berkeley and (soon) Purdue University
Image (c) http://ucrtoday.ucr.edu/9768/assassin-bugs
![Page 2: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection](https://reader033.fdocuments.in/reader033/viewer/2022042812/5fab37c787ac237f8e35ec59/html5/thumbnails/2.jpg)
Malware landscape is changing
Image (c) Wikimedia
![Page 3: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection](https://reader033.fdocuments.in/reader033/viewer/2022042812/5fab37c787ac237f8e35ec59/html5/thumbnails/3.jpg)
The ongoing malware arms race
Generate newmalwareinstance
Attack a bunchof targets
AV vendorgets first sample
Malwareanalysis
Signaturesupdated
![Page 4: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection](https://reader033.fdocuments.in/reader033/viewer/2022042812/5fab37c787ac237f8e35ec59/html5/thumbnails/4.jpg)
Defense limitations
● Newly diversified samples are not detected– Basically a “new” attack
● New malware spreads fast– Time lag between analysis and updated signatures
● Can we automate this process?
![Page 5: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection](https://reader033.fdocuments.in/reader033/viewer/2022042812/5fab37c787ac237f8e35ec59/html5/thumbnails/5.jpg)
Fully automatic diversity
*.cpp Compiler
Malware Malware Malware
?
![Page 6: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection](https://reader033.fdocuments.in/reader033/viewer/2022042812/5fab37c787ac237f8e35ec59/html5/thumbnails/6.jpg)
Outline
State of the art:Malware detection
A new threat:Malware diversification
Possible mitigation:Better security practices
![Page 7: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection](https://reader033.fdocuments.in/reader033/viewer/2022042812/5fab37c787ac237f8e35ec59/html5/thumbnails/7.jpg)
State of the art: Malware detection
Image (c) Wikimedia
![Page 8: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection](https://reader033.fdocuments.in/reader033/viewer/2022042812/5fab37c787ac237f8e35ec59/html5/thumbnails/8.jpg)
Malware detection is limited
● Performance– Don't slow down a user's machine (too much)
● Precision– Behavioral, generic matching
● Latency– Time lag between spread and protection
![Page 9: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection](https://reader033.fdocuments.in/reader033/viewer/2022042812/5fab37c787ac237f8e35ec59/html5/thumbnails/9.jpg)
Detection mechanisms
Image (c) Wikimedia
![Page 10: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection](https://reader033.fdocuments.in/reader033/viewer/2022042812/5fab37c787ac237f8e35ec59/html5/thumbnails/10.jpg)
Signature-based detection
● Compare against database of known-bad– Extract pattern
– Match sequence of bytes or regular expression
● Advantages– Fast
– Low false positive rate
● Disadvantages– Precision limited to known-bad samples
![Page 11: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection](https://reader033.fdocuments.in/reader033/viewer/2022042812/5fab37c787ac237f8e35ec59/html5/thumbnails/11.jpg)
Static analysis-based detection
● Search potentially bad patterns– API calls
– System calls
● Advantages– Low overhead
● Disadvantages– False positives
– Based on well-known heuristics
![Page 12: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection](https://reader033.fdocuments.in/reader033/viewer/2022042812/5fab37c787ac237f8e35ec59/html5/thumbnails/12.jpg)
Behavioral-based detection
● Execute “file” in a virtual machine– Detect modifications
● Advantages– Most precise
● Disadvantages– High overhead
– Precision limited due to emulation detection
![Page 13: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection](https://reader033.fdocuments.in/reader033/viewer/2022042812/5fab37c787ac237f8e35ec59/html5/thumbnails/13.jpg)
Summary: Malware protection
● Arms race due to manual diversification– Signature-based techniques loose effectiveness
● Cope with limited resources– On the target machine, for the analysis, and to push
new signatures/heuristics
● No perfect solution– Either false positives and/or negatives or huge
performance impact
![Page 14: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection](https://reader033.fdocuments.in/reader033/viewer/2022042812/5fab37c787ac237f8e35ec59/html5/thumbnails/14.jpg)
New threat: Malware diversification
Image (c) Wikimedia
![Page 15: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection](https://reader033.fdocuments.in/reader033/viewer/2022042812/5fab37c787ac237f8e35ec59/html5/thumbnails/15.jpg)
Software diversification
*.cpp Compiler
Program Program Program
?
![Page 16: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection](https://reader033.fdocuments.in/reader033/viewer/2022042812/5fab37c787ac237f8e35ec59/html5/thumbnails/16.jpg)
C/C++ liberties
● Data layout changes– Data structure layout on stack
– Layout for heap objects (limited for structs)
● Code changes– Register allocation (shuffle or starve)
– Instruction selection
– Basic block splitting, merging, shuffling
![Page 17: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection](https://reader033.fdocuments.in/reader033/viewer/2022042812/5fab37c787ac237f8e35ec59/html5/thumbnails/17.jpg)
Malware diversification
● Generate unique binaries– Minimize common substrings (code or data)
– Performance overhead not an issue
● Diversify code and data layout● Diversify static data as well
![Page 18: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection](https://reader033.fdocuments.in/reader033/viewer/2022042812/5fab37c787ac237f8e35ec59/html5/thumbnails/18.jpg)
Implementation
● Prototype built on LLVM 3.4– Small changes in code generator, code layouter,
register allocator, stack frame layouter, some data obfuscation passes
● Input: LLVM bitcode● Output: diversified binary
● Source: http://github.com/gannimo/MalDiv
![Page 19: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection](https://reader033.fdocuments.in/reader033/viewer/2022042812/5fab37c787ac237f8e35ec59/html5/thumbnails/19.jpg)
Similarity limitations
12
34
56
78
910
1112
1314
1516
1718
1920
2122
2324
2526
2728
2930
3132
3334
3536
3738
3940
4142
4344
4546
4748
4950
5152
5354
5556
57
1
10
100
1000
10000
100000
1000000Common subsequences in diversified binaries
400.perlbench401.bzip2429.mcf433.milc444.namd445.gobmk450.soplex453.povray456.hmmer458.sjeng462.libquantum464.h264ref470.lbm471.omnetpp473.astar482.sphinxperlbench vs. bzip2perlbench vs. gobmksoplex vs. omnetppnmapsimple port scanner
Lenght of subsequence
Num
ber
of s
ubse
quen
ces
(log
sca
le)
![Page 20: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection](https://reader033.fdocuments.in/reader033/viewer/2022042812/5fab37c787ac237f8e35ec59/html5/thumbnails/20.jpg)
Demo
● Simple hello world– Let's see how far we can push this!
#include <stdio.h, string.h>
const char foo[] = "foobar";char bar[7];
int main(int argc, char* argv[]) {strcpy(bar, "barfoo");printf("Hello World %s %s\n", foo, bar);printf("Arguments: %d, executable: %s\n",
argc, argv[0]);return 0;
}
![Page 21: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection](https://reader033.fdocuments.in/reader033/viewer/2022042812/5fab37c787ac237f8e35ec59/html5/thumbnails/21.jpg)
Scenario 1: malware generator
*.cpp Compiler
Malware Malware Malware
?
![Page 22: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection](https://reader033.fdocuments.in/reader033/viewer/2022042812/5fab37c787ac237f8e35ec59/html5/thumbnails/22.jpg)
Scenario 2: self-diversifying MW
LLVM Opt
Malware bc
LLVM bc
Malware
LLVM Opt*
Malware*
Malware bc*
LLVM bc*
LLVM Opt*
Malware*
Malware bc*
LLVM bc*
LLVM Opt*
Malware*
Malware bc*
LLVM bc*
![Page 23: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection](https://reader033.fdocuments.in/reader033/viewer/2022042812/5fab37c787ac237f8e35ec59/html5/thumbnails/23.jpg)
Possible mitigation:Better security practices
Image (c) Wikimedia
![Page 24: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection](https://reader033.fdocuments.in/reader033/viewer/2022042812/5fab37c787ac237f8e35ec59/html5/thumbnails/24.jpg)
Mitigation
● Recover high-level semantics from code– Hard (and results in an arms race)
● Full behavioral analysis– Harder
● Prohibit initial intrusion– Fix broken software & educate users
– Hardest
![Page 25: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection](https://reader033.fdocuments.in/reader033/viewer/2022042812/5fab37c787ac237f8e35ec59/html5/thumbnails/25.jpg)
Conclusion
Image (c) Wikimedia
![Page 26: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection](https://reader033.fdocuments.in/reader033/viewer/2022042812/5fab37c787ac237f8e35ec59/html5/thumbnails/26.jpg)
Conclusion
● Diversity evades malware detection– Fully automatic, built into compiler
– No need for packers anymore
● Adopts to new similarity metrics● New arms race between defenders and
compiler writers● Don't rely on simple, static similarity!
![Page 27: Embracing the new threat: towards automatically, self ... · – Fast – Low false positive rate Disadvantages – Precision limited to known-bad samples. Static analysis-based detection](https://reader033.fdocuments.in/reader033/viewer/2022042812/5fab37c787ac237f8e35ec59/html5/thumbnails/27.jpg)
Questions?
Mathias Payer <[email protected]>Project: https://github.com/gannimo/MalDiv Homepage: https://nebelwelt.net