Deriving Input Syntactic Structure From Execution Zhiqiang Lin Xiangyu Zhang
description
Transcript of Deriving Input Syntactic Structure From Execution Zhiqiang Lin Xiangyu Zhang
Deriving Input Syntactic Structure From Execution
Zhiqiang Lin Xiangyu Zhang
Purdue University
November 11th, 2008
The 16th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE’08)
Motivation -- Most software takes structural input
Applications -- Software Testing/Debugging
Using Input Grammar to Generate Test Cases K. Hanford. Automatic Generation of Test Cases. In IBM
Systems Journal, 9(4), 1970. P. Purdom. A sentence generator for testing parsers. In BIT
Numerical Mathematics, 12(3), 1972 Grammar based whitebox fuzz [PLDI’08]
Delta Debugging Reducing large failure input [TSE’02] Hierarchical Delta Debugging (HDD) [ICSE’06]
Execution Fast Forwarding Reducing Event Log for failure replay[FSE’06]
Applications -- Computer Security
Malware, Attack instanceSignature generation Exploit (input) Signature
Payload length, keywords, Field structure…
Penetration testing Software vulnerability Play with Input (fuzz)
Packet Vaccine [CCS’06] ShieldGen [IEEE S&P’07]
Malware Protocol Replayer Malware feature
Replay the protocol Input Format
Challenges
Input structure exists in a machine unfriendly way Plain text (ASCII Stream, e.g., C File) Binary Code (Protocol Message Stream)
Known specification (RFC) Implementation Deviation
Unknown Specification Malware
Bot Botnet protocol Legal software
SAMBA protocol (12 years for open source community)
Challenges
May not have the Source Code Access
Penetration testing Malware analysis Legal software
Working on binary
Our Contributions
2 different approaches to handling 2 types of parsers Using Dynamic Control Dependency to handle
top down parsers
A new dynamic analysis to handle bottom up parsers by identifying and analyzing the parsing stack
Experimental results show that the proposed analyses are highly effective in producing very precise input syntax trees
Outline
Motivation Technical Description
Handling Inputs with A Top-down Parser Handling Inputs with A Bottom-up Parser
Evaluation Discussion Related Work Conclusion
I. Top down Parser
Parse input in a top-down manner.
S
BH
S HBH hNN 1|2B bB|ε
h N
1
Bb
ε
Bb
h1bbε
ImplementationVoid Parser (){ char c =getchar(); if (c == ’h’) { c = getchar(); if c ==‘1’ || c==‘2’) {
c=getchar(); }else error(); } else error (); while(c==‘b’){ c=getchar(); if(c==‘ε’){ break; } }error(); }
1 2 3 4 5 6 7 8 910111213141516
S HBH hNN 1|2B bB|ε
H
B
Execution TraceVoid Parser (){ char c =getchar(); if (c == ’h’) { c = getchar(); if c ==‘1’ || c==‘2’) {
c=getchar(); }else error(); } else error (); while(c==‘b’){ c=getchar(); if(c==‘ε’){ break; } }error(); }
1 2 3 4 5 6 7 8 910111213141516
c=getchar()
if(c==‘h’)
c = getchar()
if(c==‘1’||’2’)
c = getchar()
c = getchar()
break
c = getchar()
h
1
while(c==‘b’)b1
if(c==‘ε’’) b2
while(c==‘b’)b2
if(c==‘ε’’) εh1bbε
Control Dependency: A Stmt Y is control-dependent on X iff X directly determines whether Y executes
Execution TraceVoid Parser (){ char c =getchar(); if (c == ’h’) { c = getchar(); if c ==‘1’ || c==‘2’) {
c=getchar(); }else error(); } else error (); while(c==‘b’){ c=getchar(); if(c==‘ε’){ break; } }error(); }
1 2 3 4 5 6 7 8 910111213141516
c=getchar()
if(c==‘h’)
c = getchar()
while(c==‘b’)
break
if(c==‘ε’’)
c = getchar()
hc = getchar()
if(c==‘1’||’2’) 1
b1
if(c==‘ε’’)
c = getchar()
b2
while(c==‘b’)b2
εh1bbε
Control Dependency: A Stmt Y is control-dependent on X iff X directly determines whether Y executes
if(c==‘ε’’)
c = getchar()
while(c==‘b’)
Void Parser (){ char c =getchar(); if (c == ’h’) { c = getchar(); if c ==‘1’ || c==‘2’) {
c=getchar(); }else error(); } else error (); while(c==‘b’){ c=getchar(); if(c==‘ε’){ break; } }error(); }
1 2 3 4 5 6 7 8 910111213141516
Control dependency graph for the execution trace
c=getchar() if(c==‘h’)
c = getchar() if(c==‘1’||’2’)
c = getchar()
while(c==‘b’)
if(c==‘ε’’) c = getchar()
break
while(c==‘b’)
if(c==‘ε’’) c = getchar()
h
1
b2
b1
b2
ε
START
A Control Dependency Graph: A Graph in which any given node directly controls its child node execution
Eliminate non data use node
c=getchar() if(c==‘h’)
c = getchar() if(c==‘1’||’2’)
c = getchar()
while(c==‘b’)
if(c==‘ε’’) c = getchar()
break
while(c==‘b’)
if(c==‘ε’’) c = getchar()
START
h
1
b2
b1
b2
ε
Add Data Use Leaf Node
if(c==‘h’)
if(c==‘1’||’2’)
while(c==‘b’)
if(c==‘ε’’)
while(c==‘b’)
if(c==‘ε’’)
START
h
1
b2
b1
b2
ε
Add Data Use Leaf Node
if(c==‘h’)
if(c==‘1’||’2’)
while(c==‘b’)
if(c==‘ε’’)
while(c==‘b’)
if(c==‘ε’’)
START
h
1
b1
b2
b2
ε
Eliminate Redundant Node
2 if(c==‘h’)
4 if(c==‘1’||’2’)
91 while(c==‘b’)
111 if(c==‘ε’’)
START
h
1
b1
b292 while(c==‘b’)
112 if(c==‘ε’’) b2
εIdentical Node
II. Bottom up parser
Parse input in a bottom up manner Programming languages lex/yacc
S ABA aaB b
aab
S
a
B
a b
A
A General Bottom Up Parsing Algorithmwhile (…) { if (stack should not be reduced ) { stack.push(c); … } else{ //A→ β stack.pop (|β|); stack.push (A); } }
aab
S ABA aaB b
Trace:while (…) ; if (stack should not be reduced ) ; stack.push(a), while (…) ; if (stack should not be reduced ) ; stack.push(a), while (…) ; if (stack should not be reduced ) ; stack.pop(aa); stack.push(A)….
A General Bottom Up Parsing Algorithmwhile (…) { if (stack should not be reduced ) { stack.push(c); … } else{ //A→ β stack.pop (|β|); stack.push (A); } }
aab
S ABA aaB b
Trace:while (…) ; if (stack should not be reduced ) ; stack.push(a), while (…) ; if (stack should not be reduced ) ; stack.push(a), while (…) ; if (stack should not be reduced ) ; stack.pop(aa); stack.push(A)….
Tree Construction
aab
S ABA aaB b
Stack Operation Trace:Push(a), Push(a), Pop(aa), Push(A)Push(b), Pop(b), Push(B), Pop(AB), Push(S)
Pop(b)
Push(B)
Push(b)
Push(a)
Push(A)
Push(a)
Push(S)
Identify the parsing stack
Identical Node
Evaluation – Top down grammar
Bad?
Evaluation – Top down grammar
Evaluation – Bottom up grammar
Identical Node
Performance Overhead
5X-45X 6X-8X
Discussion
Grammar categories Top down, bottom up, any others? Possible to evade the control dependency
structure in top down parser implementation.
Individual input Multiple input final grammar
Syntactic Structure Semantics
Related Work Network Protocol Format Reverse Engineering
Instruction Semantics (Comparison, loop keyword, delimiter) Polyglot [CCS’07] Automatic Network Protocol Analysis [NDSS’08] Tupni [CCS’08]
Execution Context (Call stack, PC) AutoFormat [NDSS’08]
Limitations Part of the problem space
Only top-down parsers. Part of the problem’s essence.
Comparison (predicate), call stack control dependency
Conclusion
Two dynamic analyses to construct input structure from program execution.
No source code access or any symbolic information.
Highly effective and produce input syntax trees with high quality.
Thank you
To further contact us:
{zlin,xyzhang}@cs.purdue.edu
Q & A