Neuro-Symbolic Execution: Augmenting Symbolic Execution ......Problem Inputs: 1. Source code 2....

24
Neuro-Symbolic Execution: Augmenting Symbolic Execution with Neural Constraints Shiqi Shen Shweta Shinde Soundarya Ramesh Abhik Roychoudhury Prateek Saxena National University of Singapore

Transcript of Neuro-Symbolic Execution: Augmenting Symbolic Execution ......Problem Inputs: 1. Source code 2....

Page 1: Neuro-Symbolic Execution: Augmenting Symbolic Execution ......Problem Inputs: 1. Source code 2. Symbolic Variables (e.g., filename& file) 3. Candidate Vulnerability Points (CVPs)-Divide

Neuro-Symbolic Execution:Augmenting Symbolic Execution

with Neural ConstraintsShiqi Shen Shweta Shinde Soundarya Ramesh

Abhik Roychoudhury Prateek Saxena

National University of Singapore

Page 2: Neuro-Symbolic Execution: Augmenting Symbolic Execution ......Problem Inputs: 1. Source code 2. Symbolic Variables (e.g., filename& file) 3. Candidate Vulnerability Points (CVPs)-Divide

Symbolic Execution for Bug Finding

Kite SAGE

jCUTEManticore

S²EAngr

2

Page 3: Neuro-Symbolic Execution: Augmenting Symbolic Execution ......Problem Inputs: 1. Source code 2. Symbolic Variables (e.g., filename& file) 3. Candidate Vulnerability Points (CVPs)-Divide

Recap: Symbolic Execution (SE)

1 def f (x, y): 2 if (x>y)3 x = x+y4 if (x–y > 0)5 assert false 6 return (x, y)

x ↦ Ay ↦ B

x ↦ A+By ↦ B

x ↦ Ay ↦ B

A>B A≤B

x ↦ A+By ↦ B

x ↦ A+By ↦ B

A>0 A≤0

A and B are symbolic variables

Dynamic Symbolic Execution (DSE): A widely used variation of SE

3

assert false

Page 4: Neuro-Symbolic Execution: Augmenting Symbolic Execution ......Problem Inputs: 1. Source code 2. Symbolic Variables (e.g., filename& file) 3. Candidate Vulnerability Points (CVPs)-Divide

Fundamental Limitations of Classic DSE1 int main (…) {2 if (strlen(filename)>1 && filename[0]==‘-’)3 exit(1)4 copy_data(…);5 …6 }7 void copy_data(..., int *file,...) {8 static double data[4096], value;9 read_double_value(file, ... );10 value = fabs (data [0]); 11 for(i=0; i<4096; i++)12 if(file[i] == 0.0) count++;13 data[1] /= (value+count-3);14 …15 }

data[1] /= (value+count-3); Candidate Vulerability

Point (CVP): Divide-by-zero

#1 Limitations of

SMT Solvers

#2 Unmodeled

Semantics

#3 Path Explosion

file

[0]=

0 file[0

]≠0

i=

0

file

[1]=

0

file

[1]=

0

file[1

]≠0

file[1

]≠0

i=

1

file

[..]

=0

file

[..]

=0

file

[..]

=0

file

[..]

=0

file[..]≠

0

file[..]≠

0

file[..]≠

0

… … … … … … … …

i=

…i

= 4

09

5

24096 paths

4

Page 5: Neuro-Symbolic Execution: Augmenting Symbolic Execution ......Problem Inputs: 1. Source code 2. Symbolic Variables (e.g., filename& file) 3. Candidate Vulnerability Points (CVPs)-Divide

Contributions

● Neuro-symbolic execution○ A new approach to tackle the limitations of DSE○ Reasons about exact (symbolic) & approximate constraints (neural nets)

● A Tool − NeuEx○ Enhances the widely used DSE engine (KLEE)

● Evaluation○ Finds 94% more bugs than KLEE in 12 hours

5

Page 6: Neuro-Symbolic Execution: Augmenting Symbolic Execution ......Problem Inputs: 1. Source code 2. Symbolic Variables (e.g., filename& file) 3. Candidate Vulnerability Points (CVPs)-Divide

ProblemInputs:1. Source code2. Symbolic Variables (e.g., filename & file)3. Candidate Vulnerability Points (CVPs)- Divide by zero- Buffer overflow

Outputs: Validated Exploits

1 int main (…) {2 if (strlen(filename)>1 && filename[0]==‘-’)3 exit(1)4 copy_data(…);5 …6 }7 void copy_data(..., int *file,...) {8 static double data[4096], value;9 read_double_value(file, ... );10 value = fabs (data [0]); 11 for(i=0; i<4096; i++)12 if(file[i] == 0.0) count++;13 data[1] /= (value+count-3);14 …15 }

6

filename

file

data[1] /= (value+count-3); CVP: Divide-by-zero

Page 7: Neuro-Symbolic Execution: Augmenting Symbolic Execution ......Problem Inputs: 1. Source code 2. Symbolic Variables (e.g., filename& file) 3. Candidate Vulnerability Points (CVPs)-Divide

Key Insights

Values of Vulnerable Variables in CVP

Values of Symbolic Variables

Learn an approximation with small number of I/O

examples7

1 int main (…) {2 if (strlen(filename)>1 && filename[0]==‘-’)3 exit(1)4 copy_data(…);5 …6 }7 void copy_data(..., int *file,...) {8 static double data[4096], value;9 read_double_value(file, ... );10 value = fabs (data [0]); 11 for(i=0; i<4096; i++)12 if(file[i] == 0.0) count++;13 data[1] /= (value+count-3);14 …15 }

filename

file

data[1] /= (value+count-3); CVP: Divide-by-zero

Page 8: Neuro-Symbolic Execution: Augmenting Symbolic Execution ......Problem Inputs: 1. Source code 2. Symbolic Variables (e.g., filename& file) 3. Candidate Vulnerability Points (CVPs)-Divide

1 int main (…) {2 if (strlen(filename)>1 && filename[0]==‘-’)3 exit(1)4 copy_data(…);5 …6 }7 void copy_data(..., int *file,...) {8 static double data[4096], value;9 read_double_value(file, ... );10 value = fabs (data [0]); 11 for(i=0; i<4096; i++)12 if(file[i] == 0.0) count++;13 data[1] /= (value+count-3);14 …15 }

Key Insights

Approximation:

Machine learning can learn such an

approximation8

filename

file

data[1] /= (value+count-3); CVP: Divide-by-zero

!"#$% ==Σ(∈[+,-+./] 123$ 4256 2 == 0

∧ 9 == 4256 0 + 256 4256 1∧ 1 == 123$ 4256 1 < 127

∧ max ==2×1 − 1 × 9 − 256F × (1 − 1)

Page 9: Neuro-Symbolic Execution: Augmenting Symbolic Execution ......Problem Inputs: 1. Source code 2. Symbolic Variables (e.g., filename& file) 3. Candidate Vulnerability Points (CVPs)-Divide

1 int main (…) {2 if (strlen(filename)>1 && filename[0]==‘-’)3 exit(1)4 copy_data(…);5 …6 }7 void copy_data(..., int *file,...) {8 static double data[4096], value;9 read_double_value(file, ... );10 value = fabs (data [0]); 11 for(i=0; i<4096; i++)12 if(file[i] == 0.0) count++;13 data[1] /= (value+count-3);14 …15 }

Approach

1. Neural nets can represent a large category

of functions (universal approximation theorem).

2. Multiple applications show that neural nets are

learnable for many practical functions.

9

filename

file

data[1] /= (value+count-3); CVP: Divide-by-zero

Approximate Constraint (as a neural net):

file count & value

Page 10: Neuro-Symbolic Execution: Augmenting Symbolic Execution ......Problem Inputs: 1. Source code 2. Symbolic Variables (e.g., filename& file) 3. Candidate Vulnerability Points (CVPs)-Divide

DSE Engine (KLEE)

NeuEx Overview

SourceCode CVPs

SymbolicVariables

Input Grammar (optional)

Bottlenecks:1. Unmodeled APIs2. Loop Unrolled

Count > 10K3. Z3 Timeout (> 10

mins)4. Memory Cap > 3GB

SMT Solver(Z3)

Neural Mode

Neural Mode

Neural Mode

Neural Mode

fork

fork

fork

fork

Validated Exploits

NeuEx

10

Page 11: Neuro-Symbolic Execution: Augmenting Symbolic Execution ......Problem Inputs: 1. Source code 2. Symbolic Variables (e.g., filename& file) 3. Candidate Vulnerability Points (CVPs)-Divide

Reaching CVPs

Collecting I/O Examples

TrainingSolving

Starting point

BottleneckDSE

Random fuzz

Neural Mode

11

Page 12: Neuro-Symbolic Execution: Augmenting Symbolic Execution ......Problem Inputs: 1. Source code 2. Symbolic Variables (e.g., filename& file) 3. Candidate Vulnerability Points (CVPs)-Divide

Generated Constraints

12

1 int main (…) {2 if (strlen(filename)>1 && filename[0]==‘-’)3 exit(1)4 copy_data(…);5 …6 }7 void copy_data(..., int *file,...) {8 static double data[4096], value;9 read_double_value(file, ... );10 value = fabs (data [0]); 11 for(i=0; i<4096; i++)12 if(file[i] == 0.0) count++;13 data[1] /= (value+count-3);14 …15 }

CVP: Divide-by-zero

!: #$%#&' → )*&+', -.+$/

1. Reachability constraints:0/1&'$ %#&'$*2' ≤ 1∨ %#&'$*2' ≠ ′ − ′

2. Vulnerability condition:)*&+' + -.+$/ − 3 == 0

Page 13: Neuro-Symbolic Execution: Augmenting Symbolic Execution ......Problem Inputs: 1. Source code 2. Symbolic Variables (e.g., filename& file) 3. Candidate Vulnerability Points (CVPs)-Divide

!: #$%#&' → )*&+', -.+$/

1. Reachability constraints:0/1&'$ %#&'$*2' ≤ 1∨ %#&'$*2' ≠ ′ − ′

2. Vulnerability condition:)*&+' + -.+$/ − 3 == 0

Purely symbolic constraints:No variable shared with neural constraints

SMT solver

Mixed constraints:Including both neural constraints and symbolic constraints with shared variables

Constraint Solving

13

Page 14: Neuro-Symbolic Execution: Augmenting Symbolic Execution ......Problem Inputs: 1. Source code 2. Symbolic Variables (e.g., filename& file) 3. Candidate Vulnerability Points (CVPs)-Divide

How to Solve Mixed Constraints?

Design Choice 2:Symbolic constraints → Optimization objective of the neural net

Design Choice 1: Neural net → CNF

14

The number of clauses increases drastically withthe neural net complexity

("# ∧ %#) ∨ ("( ∧ %() ∨ ⋯ ∨ ("* ∧ %*)

Page 15: Neuro-Symbolic Execution: Augmenting Symbolic Execution ......Problem Inputs: 1. Source code 2. Symbolic Variables (e.g., filename& file) 3. Candidate Vulnerability Points (CVPs)-Divide

value+count

L

Encoding Symbolic Constraints as an Optimization Objective

Symbolic constraint

Criterion for crafting the loss function:The minimum point of the loss function satisfies the symbolic constraints.

3

15

!: #$%#&' → )*&+', -.+$/ ∧)*&+' + -.+$/ − 3 == 0

One possible encoding:7 = *89()*&+' + -.+$/ − 3)

Page 16: Neuro-Symbolic Execution: Augmenting Symbolic Execution ......Problem Inputs: 1. Source code 2. Symbolic Variables (e.g., filename& file) 3. Candidate Vulnerability Points (CVPs)-Divide

Encoding Symbolic Constraints as an Optimization Objective

16

NeuEx supports many symbolic constraints. Checkout the paper for the complete grammar and

the corresponding loss functions.

Page 17: Neuro-Symbolic Execution: Augmenting Symbolic Execution ......Problem Inputs: 1. Source code 2. Symbolic Variables (e.g., filename& file) 3. Candidate Vulnerability Points (CVPs)-Divide

Solving Mixed Constraints via Gradient Descent

file count,value

257 20 2741 20 180 20 17… … …0 3 0

count value loss Gradient:

file: 000…Concretely validate the exploit

17

∇"#$%&

Page 18: Neuro-Symbolic Execution: Augmenting Symbolic Execution ......Problem Inputs: 1. Source code 2. Symbolic Variables (e.g., filename& file) 3. Candidate Vulnerability Points (CVPs)-Divide

Evaluation

● Recall: Neural mode is only triggered when DSE encounters bottlenecks

● Benchmarks: 7 Programs known to be difficult for classic DSE

○ 4 Real programs

■ cURL: Data transferring

■ SQLite: Database

■ libTIFF: Image processing

■ libsndfile: Audio processing

○ LESE benchmarks

■ BIND, Sendmail, and WuFTP

Include:

1. Complex loops

2. Floating-point variables

3. Unmodeled APIs

18

Page 19: Neuro-Symbolic Execution: Augmenting Symbolic Execution ......Problem Inputs: 1. Source code 2. Symbolic Variables (e.g., filename& file) 3. Candidate Vulnerability Points (CVPs)-Divide

CVP Coverage & Bottlenecks for DSE

KLEE gets stuck# of bottlenecks: 61

● Unmodeled APIs (6)● Complicated loops (53)● Z3 timeout (1)● Memory exhaustion (1)

19

Page 20: Neuro-Symbolic Execution: Augmenting Symbolic Execution ......Problem Inputs: 1. Source code 2. Symbolic Variables (e.g., filename& file) 3. Candidate Vulnerability Points (CVPs)-Divide

CVP Coverage of NeuEx vs KLEE

The number of CVPs reached or covered by NeuEx is 25% higher

than vanilla KLEE.

20

Page 21: Neuro-Symbolic Execution: Augmenting Symbolic Execution ......Problem Inputs: 1. Source code 2. Symbolic Variables (e.g., filename& file) 3. Candidate Vulnerability Points (CVPs)-Divide

# of Bugs Found by NeuEx vs KLEE

NeuEx finds 94% and 89% more bugs than

vanilla KLEE in BFS and RAND mode in 12 hours.

21

Page 22: Neuro-Symbolic Execution: Augmenting Symbolic Execution ......Problem Inputs: 1. Source code 2. Symbolic Variables (e.g., filename& file) 3. Candidate Vulnerability Points (CVPs)-Divide

22

Comparison to DSE Extensions

LESE• Structured Approach for Loop Reasoning

o Two orders of magnitude slower on average

Veritesting• Combination of DSE + static analysis

o Fails to find bugs in 12 hourso Poor instruction coverage

Page 23: Neuro-Symbolic Execution: Augmenting Symbolic Execution ......Problem Inputs: 1. Source code 2. Symbolic Variables (e.g., filename& file) 3. Candidate Vulnerability Points (CVPs)-Divide

Key Takeaways

A new approach: Neuro-Symbolic Execution

● Resolves fundamental bottlenecks of DSE● First to learn unstructured representation not amenable to SMT solver ● Unique use of optimization techniques and SMT for solving constraints

Evaluation

● Finds 94% more bugs than KLEE within 12 hours

23

Page 24: Neuro-Symbolic Execution: Augmenting Symbolic Execution ......Problem Inputs: 1. Source code 2. Symbolic Variables (e.g., filename& file) 3. Candidate Vulnerability Points (CVPs)-Divide

24

DSE: 2

Neural Mode: 10

NeuEx: 12

Backup: New Bugs