Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu,...
-
Upload
dortha-hodges -
Category
Documents
-
view
224 -
download
1
Transcript of Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu,...
Detecting Software Theft Detecting Software Theft via System Call Based via System Call Based BirthmarksBirthmarks
Xinran Wang, Yoon-Chan Jhi, Sencun Zhu,
Peng Liu ACSAC 2009
OUTLINEOUTLINEIntroduction and Related WorkSystem Call Based BirthmarksSystem Design and
ImplementationEvaluationDiscussion and Conclusion
Software Theft (or Software Theft (or plagiarism)plagiarism)Reuse someone else’s code
◦Even only a small part of the original program
Obfuscation techniques◦Different compilers◦Different compiler optimization
levels◦SandMark
DefenderDefenderSoftware watermark
◦Theoretically, any watermark can be removed
Software birthmark◦A unique characteristic that a
program inherently possesses
Defender(Cont.)Defender(Cont.)Requirements
◦R1: Resiliency to obfuscation techniques
◦R2: Capability to detect theft of components
◦R3: Large-scale◦R4: Applicability to binary
executables◦R5: Independence to platforms
Related WorkRelated WorkSoftware Birthmark
◦Static source code based birthmark◦Static executable code based birthmark◦Dynamic whole program path(WPP)
based birthmark◦Dynamic API based birthmark
Clone Detection◦String-based, AST-based, Token-based
and PDG-basedCannot satisfy all requirements
System Call Based System Call Based BirthmarksBirthmarksBehavior based birthmarks
◦Unique behaviors in features and implementation details
SCSSB (System Call Short Sequence Birthmark)
IDSCSB (Input Dependant System Call Subsequence Birthmark)
SCSSB (System Call Short SCSSB (System Call Short Sequence Birthmark)Sequence Birthmark)Definition 1: (System Call Trace)
Definition 2: (System Call Sequence Set)
SCSSB (System Call Short SCSSB (System Call Short Sequence Birthmark)Sequence Birthmark)
SCSSB (System Call Short SCSSB (System Call Short Sequence Birthmark)Sequence Birthmark)Definition 3: (SCSSB: System Call
Short Sequence Birthmark)
SCSSB(p, I, k) is a subset of set S(p, I, k) that satisfies
SCSSB (System Call Short SCSSB (System Call Short Sequence Birthmark)Sequence Birthmark)Definition 4: (Containment) The
containment of A in B is defined as:
Here A is the birthmark of a plaintiff program or its component, and B is the birthmark of a suspect program.
System Design and System Design and ImplementationImplementation
System Design and System Design and ImplementationImplementationSystem Call Tracer
System Call Abstraction
Birthmark Generator
Input Dependant System Call Subsequence Birthmarks
System Call TracerSystem Call TracerThe simplest way
◦straceWith thread identifier
◦SATracer based on ValgrindPrepare a list of all subroutines of the
component in SATracer◦The list is automatically generated by Elsa
SATracer checks the execution stack of the running thread when a system call is called
System Call AbstractionSystem Call AbstractionIgnore the system calls that do not
represent the behavior characteristic◦brk , mmap
Consider aliases or multiple versions of a system call as the same◦Ex: fstat(int fd, struct stat *sb) and
stat(const char *path, struct stat *sb)Ignore failed system calls
Birthmark GeneratorBirthmark GeneratorRemove those loading-
environment-dependent system calls◦Run multiple times with the same
input
Remove the (noisy) system calls◦Establish a database of common
system call short sequences
Input Dependant System Call Input Dependant System Call Subsequence BirthmarksSubsequence BirthmarksDefinition 7: (IDSCSB: Input
Dependant System Call Subsequence Birthmark)
Containment:
Input Dependant System Call Input Dependant System Call Subsequence BirthmarksSubsequence Birthmarks
“file id” and “process id” are ignoredLarge parameters are hashed by the
MD5
EvaluationEvaluationSCSSB and IDSCSB:
◦Against some advanced obfuscation techniques and 15 real-world large applications
SandMark implements 39 byte code obfuscators
x86 Linux executableGCJ 4.1.2
Evaluation(Cont.)Evaluation(Cont.)Programs
◦bzip2.c, gzip.c and oggenc.cImpact of Compiler Optimization
Levels◦five optimization switches (-O0,-O1,-
O2,-O3 and -Os) of GCC (e.g., bzip2-O0, bzip2-O3, etc.)
Impact of Different Compilers◦GCC, TCC and Watcom (e.g., bzip2-
gcc, bzip2-tcc)
SCSSB Experiment I(JLex and SCSSB Experiment I(JLex and JFlexJFlex))
SCSSB Experiment SCSSB Experiment I(Cont.)I(Cont.)JLex and JFlex
SCSSB Experiment SCSSB Experiment I(Cont.)I(Cont.)Containment
scores◦JLex
CO: 87.9% DO: 85.2%
◦JFlex CO: 96% DO: 96%
SCSSB Experiment SCSSB Experiment II(Gecko)II(Gecko)Gecko: Layout engine used in all
Mozilla software and its derivatives
SCSSB Experiment SCSSB Experiment II(Cont.)II(Cont.)
IDSCSB Experiment I(JLex and IDSCSB Experiment I(JLex and JFlexJFlex))The containment scores between
original and obfuscated JLex are all 100%
Between JLex and obfuscated JFlex are less than 46%
Between JLex/JFlex and other programs are no more than 7%.
IDSCSB Experiment IDSCSB Experiment II(Gecko)II(Gecko)
DiscussionDiscussionCounterattacks
◦System call injection attack◦System call reordering attack
Limitations◦If the program does not involve any
system calls…◦Need unique system call behaviors◦The detection result of our tool
depends on the threshold a user defines
ConclusionConclusionA novel type of birthmarks
Resilient to discriminates code obfuscated by SandMark, a state-of-the-art obfuscator
The first birthmark that:◦ Detect software component theft◦ Scalability to detect large-scale software
theft