Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu,...

29
Detecting Software Detecting Software Theft via System Call Theft via System Call Based Birthmarks Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009

Transcript of Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu,...

Page 1: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

Detecting Software Theft Detecting Software Theft via System Call Based via System Call Based BirthmarksBirthmarks

Xinran Wang, Yoon-Chan Jhi, Sencun Zhu,

Peng Liu ACSAC 2009

Page 2: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

OUTLINEOUTLINEIntroduction and Related WorkSystem Call Based BirthmarksSystem Design and

ImplementationEvaluationDiscussion and Conclusion

Page 3: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

Software Theft (or Software Theft (or plagiarism)plagiarism)Reuse someone else’s code

◦Even only a small part of the original program

Obfuscation techniques◦Different compilers◦Different compiler optimization

levels◦SandMark

Page 4: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

DefenderDefenderSoftware watermark

◦Theoretically, any watermark can be removed

Software birthmark◦A unique characteristic that a

program inherently possesses

Page 5: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

Defender(Cont.)Defender(Cont.)Requirements

◦R1: Resiliency to obfuscation techniques

◦R2: Capability to detect theft of components

◦R3: Large-scale◦R4: Applicability to binary

executables◦R5: Independence to platforms

Page 6: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

Related WorkRelated WorkSoftware Birthmark

◦Static source code based birthmark◦Static executable code based birthmark◦Dynamic whole program path(WPP)

based birthmark◦Dynamic API based birthmark

Clone Detection◦String-based, AST-based, Token-based

and PDG-basedCannot satisfy all requirements

Page 7: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

System Call Based System Call Based BirthmarksBirthmarksBehavior based birthmarks

◦Unique behaviors in features and implementation details

SCSSB (System Call Short Sequence Birthmark)

IDSCSB (Input Dependant System Call Subsequence Birthmark)

Page 8: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

SCSSB (System Call Short SCSSB (System Call Short Sequence Birthmark)Sequence Birthmark)Definition 1: (System Call Trace)

Definition 2: (System Call Sequence Set)

Page 9: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

SCSSB (System Call Short SCSSB (System Call Short Sequence Birthmark)Sequence Birthmark)

Page 10: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

SCSSB (System Call Short SCSSB (System Call Short Sequence Birthmark)Sequence Birthmark)Definition 3: (SCSSB: System Call

Short Sequence Birthmark)

SCSSB(p, I, k) is a subset of set S(p, I, k) that satisfies

Page 11: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

SCSSB (System Call Short SCSSB (System Call Short Sequence Birthmark)Sequence Birthmark)Definition 4: (Containment) The

containment of A in B is defined as:

Here A is the birthmark of a plaintiff program or its component, and B is the birthmark of a suspect program.

Page 12: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

System Design and System Design and ImplementationImplementation

Page 13: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

System Design and System Design and ImplementationImplementationSystem Call Tracer

System Call Abstraction

Birthmark Generator

Input Dependant System Call Subsequence Birthmarks

Page 14: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

System Call TracerSystem Call TracerThe simplest way

◦straceWith thread identifier

◦SATracer based on ValgrindPrepare a list of all subroutines of the

component in SATracer◦The list is automatically generated by Elsa

SATracer checks the execution stack of the running thread when a system call is called

Page 15: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

System Call AbstractionSystem Call AbstractionIgnore the system calls that do not

represent the behavior characteristic◦brk , mmap

Consider aliases or multiple versions of a system call as the same◦Ex: fstat(int fd, struct stat *sb) and

stat(const char *path, struct stat *sb)Ignore failed system calls

Page 16: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

Birthmark GeneratorBirthmark GeneratorRemove those loading-

environment-dependent system calls◦Run multiple times with the same

input

Remove the (noisy) system calls◦Establish a database of common

system call short sequences

Page 17: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

Input Dependant System Call Input Dependant System Call Subsequence BirthmarksSubsequence BirthmarksDefinition 7: (IDSCSB: Input

Dependant System Call Subsequence Birthmark)

Containment:

Page 18: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

Input Dependant System Call Input Dependant System Call Subsequence BirthmarksSubsequence Birthmarks

“file id” and “process id” are ignoredLarge parameters are hashed by the

MD5

Page 19: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

EvaluationEvaluationSCSSB and IDSCSB:

◦Against some advanced obfuscation techniques and 15 real-world large applications

SandMark implements 39 byte code obfuscators

x86 Linux executableGCJ 4.1.2

Page 20: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

Evaluation(Cont.)Evaluation(Cont.)Programs

◦bzip2.c, gzip.c and oggenc.cImpact of Compiler Optimization

Levels◦five optimization switches (-O0,-O1,-

O2,-O3 and -Os) of GCC (e.g., bzip2-O0, bzip2-O3, etc.)

Impact of Different Compilers◦GCC, TCC and Watcom (e.g., bzip2-

gcc, bzip2-tcc)

Page 21: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

SCSSB Experiment I(JLex and SCSSB Experiment I(JLex and JFlexJFlex))

Page 22: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

SCSSB Experiment SCSSB Experiment I(Cont.)I(Cont.)JLex and JFlex

Page 23: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

SCSSB Experiment SCSSB Experiment I(Cont.)I(Cont.)Containment

scores◦JLex

CO: 87.9% DO: 85.2%

◦JFlex CO: 96% DO: 96%

Page 24: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

SCSSB Experiment SCSSB Experiment II(Gecko)II(Gecko)Gecko: Layout engine used in all

Mozilla software and its derivatives

Page 25: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

SCSSB Experiment SCSSB Experiment II(Cont.)II(Cont.)

Page 26: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

IDSCSB Experiment I(JLex and IDSCSB Experiment I(JLex and JFlexJFlex))The containment scores between

original and obfuscated JLex are all 100%

Between JLex and obfuscated JFlex are less than 46%

Between JLex/JFlex and other programs are no more than 7%.

Page 27: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

IDSCSB Experiment IDSCSB Experiment II(Gecko)II(Gecko)

Page 28: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

DiscussionDiscussionCounterattacks

◦System call injection attack◦System call reordering attack

Limitations◦If the program does not involve any

system calls…◦Need unique system call behaviors◦The detection result of our tool

depends on the threshold a user defines

Page 29: Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009.

ConclusionConclusionA novel type of birthmarks

Resilient to discriminates code obfuscated by SandMark, a state-of-the-art obfuscator

The first birthmark that:◦ Detect software component theft◦ Scalability to detect large-scale software

theft