Lecture 9. Symbolic Execution - Iowa State...

49
Lecture 9. Symbolic Execution Wei Le Thank Cristian Cadar, Patrice Godefroid, Jeff Foster, Nikolai Tillmann, Vijay Ganesh for Some of the Slides 2014.11

Transcript of Lecture 9. Symbolic Execution - Iowa State...

Page 1: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Lecture 9. Symbolic Execution

Wei LeThank Cristian Cadar, Patrice Godefroid, Jeff Foster, Nikolai Tillmann, Vijay

Ganeshfor Some of the Slides

2014.11

Page 2: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Outline

I What is symbolic execution?I Concrete execution versus symbolic executionI Symbolic execution tree

I Applications of symbolic execution: test input generation, infeasiblepaths detection, bug finding, program repair, debugging

I Code hunt

I History of the research since 1975

I The three challengesI Path explosionI Modeling statements and environmentsI Constraint solving

I Implementation and symbolic execution tools

Page 3: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Outline

I What is symbolic execution?I Concrete execution versus symbolic executionI Symbolic execution tree

I Applications of symbolic execution: test input generation, infeasiblepaths detection, bug finding, program repair, debugging

I Code hunt

I History of the research since 1975

I The three challengesI Path explosionI Modeling statements and environmentsI Constraint solving

I Implementation and symbolic execution tools

Page 4: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Concrete Execution Versus Symbolic Execution

Page 5: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Concrete Execution Versus Symbolic Execution

Page 6: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Concrete Execution Versus Symbolic Execution

Page 7: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Some Insights about Symbolic Execution

I ’Execute’ programs with symbols: we track symbolic state ratherthan concrete input

I ’Execute’ many program paths simultaneously: when execution pathdiverges, fork and add constraints on symbolic values

I When ’execute’ one path, we actually simulate many test runs, sincewe are considering all the inputs that can exercise the same path

Page 8: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Symbolic Execution Tree

Page 9: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Applications of Symbolic Execution

General goal: identifying semantics of programs

Basic applications:

I Detecting infeasible paths

I Generating test inputs

I Finding bugs and vulnerabilities

I Proving two code segments are equivalent (Code Hunt)

Advanced applications:

I Generating program invariants

I Debugging

I Repair programs

Page 10: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Detecting Infeasible Paths

Suppose we require α = β

Page 11: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Test Input Generation

Path 1: α = 1, β = 1Path 2: α = 1, β = 6Path 3 ...

Page 12: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Bug Finding

Page 13: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Bug Finding

Page 14: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Test Input Generation: Code Hunt

Page 15: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Code Hunt Demo

Page 16: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Code Hunt: Behind the Scene

Page 17: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

History of Symbolic Execution

Page 18: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Resurgence of Symbolic Execution

The block issues in the past:

I Not scalable: program state has many bits, there are many programpaths

I Not able to go through loops and library calls

I Constraint solver is slow and not capable to handle advancedconstraints

The two key projects that enable the advance:

I DART Godefroid and Sen, PLDI 2005 (introduce dynamicinformation to symbolic execution)

I EXE Cadar, Ganesh, Pawlowski, Dill, and Engler, CCS 2006 (STP:a powerful constraint solver that handles array)

Moving forward:

I More powerful computers and clusters

I Techniques of mixture concrete and symbolic executions

I Powerful constraint solvers

Page 19: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Today: Two Important Tools

KLEE [1]

I Open source symbolic executor

I Runs on top of LLVM

I Has found lots of problems in open-source software

SAGE [3]

I Microsoft internal tool

I Symbolic execution to find bugs in file parsers - E.g., JPEG, DOCX,PPT, etc

I Cluster of n machines continually running SAGE

Page 20: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Other Symbolic Executors

I Cloud9 parallel symbolic execution, also supports threads

I Pex symbolic execution for .NET

I jCUTE symbolic execution for Java

I Java PathFinder a model checker that also supports symbolicexecution

I SymDroid - symbolic execution on Dalvik Bytecode

I Kleenet - testing interaction protocols for sensor network

Page 21: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Internal of Symbolic Executors: KLEE

Page 22: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Three Challenges

I Path explosion

I Modeling program statements and environment

I Constraint solving

Page 23: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Path Explosion

Page 24: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Search Strategies: Naive Approach

DFS (depth first search), BFS (breadth first search)

The two approaches purely are based on the structure of the code

I You cannot enumerate all the paths

I DFS: search can stuck at somewhere in a loop

I BFS: very slow to determine properties for a path if there are manybranches

Page 25: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Search Strategies: Naive Approach

DFS (depth first search), BFS (breadth first search)

The two approaches purely are based on the structure of the code

I You cannot enumerate all the paths

I DFS: search can stuck at somewhere in a loop

I BFS: very slow to determine properties for a path if there are manybranches

Page 26: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Search Strategies: Naive Approach

DFS (depth first search), BFS (breadth first search)

The two approaches purely are based on the structure of the code

I You cannot enumerate all the paths

I DFS: search can stuck at somewhere in a loop

I BFS: very slow to determine properties for a path if there are manybranches

Page 27: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Search Strategies: Naive Approach

DFS (depth first search), BFS (breadth first search)

The two approaches purely are based on the structure of the code

I You cannot enumerate all the paths

I DFS: search can stuck at somewhere in a loop

I BFS: very slow to determine properties for a path if there are manybranches

Page 28: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Search Strategies: Random Search

How to perform a random search?

I Idea 1: pick next path to explore uniformly at random

I Idea 2: randomly restart search if haven’t hit anything interesting ina while

I Idea 3: when have equal priority paths to explore, choose next oneat random

I ...

Drawback: reproducibility, probably good to use psuedo-randomnessbased on seed, and then record which seed is picked

Page 29: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Search Strategies: Coverage Guided Search

Goal: Try to visit statements we haven’t seen before

Approach:

I Select paths likely to hit the new statements

I Favor paths on recently covering new statements

I Score of statement = # times its been seen and how often; Picknext statement to explore that has lowest score

Pros and cons:

I Good: Errors are often in hard-to-reach parts of the program, thisstrategy tries to reach everywhere.

I Bad: Maybe never be able to get to a statement

Page 30: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Search Strategies: Generational Search

I Hybrid of BFS and coverage-guided search

I Generation 0: pick one path at random, run to completion

I Generation 1: take paths from gen 0, negate one branch conditionon a path to yield a new path prefix, find a solution for that pathprefix, and then take the resulting path

I ...

I Generation n: similar, but branching off gen n-1 (also uses acoverage heuristic to pick priority)

Page 31: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Search Strategies: Generational Search [4, 5]

See example of DART

Page 32: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Search Strategies: Combined Search

I Run multiple searches at the same time and alternate between them

I Depends on conditions needed to exhibit bug; so will be as good asbest solution, with a constant factor for wasting time with otheralgorithms

I Could potentially use different algorithms to reach different parts ofthe program

Page 33: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Complex Code and Environment Dependencies

I System calls: open(file)

I Library calls: sin(x), glibc

I Pointers and heap: linklist, tree

I Loops and recursive calls: how many times it should iterate andunfold?

I ...

Page 34: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Solutions

I Build simple versions of library calls

I Summarize the loops

I Simulate system calls

I ...

Page 35: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

An Example

Program was initiated with a symbolic file system with up to N files.Open all N files + one open() failure.

Page 36: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Solutions: Concretization [4, 5]

I Concolic (concrete/symbolic) testing: run on concrete randominputs. In parallel, execute symbolically and solve constraints.Generate inputs to other paths than the concrete one along the way.

I Replace symbolic variables with concrete values that satisfy the pathcondition

I So, could actually do system calls

I And can handle cases when conditions too complex for SMT solver

Page 37: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Solutions: Concretization [4, 5]

See example of DART

Page 38: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Constraint Solving - SAT

SAT: find an assignment to a set of Boolean variables that makes theBoolean formula true

Complexity: NP-Complete

Page 39: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Constraint Solving - SMT [2]

SMT (Satisfiability Modulo Theories) = SAT++

I An SMT formula is a Boolean combination of formulas overfirst-order theories

I Example of SMT theories include bit-vectors, arrays, integer and realarithmetic, strings, ...

I The satisfiability problem for these theories is typically hard ingeneral (NP-complete, PSPACE-complete, ...)

I Program semantics are easily expressed over these theories

I Many software engineering problems can be easily reduced to theSAT problem over first-order theories

Page 40: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Constraint Solving - SMT

The State of the Art: Handle linear integer constraints

Challenges:

I Constraints that contain non-linear operands, e.g., sin(), cos()

I Float-point constraints: no theory support yet, convert to bit-vectorcomputation

I String constraints: a = b.replace(’x’, ’y’)

I Quantifies: ∃, ∀I Disjunction

Page 41: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Tool Design KLEE - Path Explosion

I Random, coverage-optimize search

I Compute state weight using:I Minimum distance to an uncovered instructionI Call stack of the stateI Whether the state recently covered new code

I Timeout: one hour per utility when experimenting with coreutils

Page 42: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Tool Design KLEE - Tracking Symbolic States

Trees of symbolic expressions:

I Instruction pointer

I Path condition

I Registers, heap and stack objects

I Expressions are of C language: arithmetic, shift, dereference,assignment

I Checks inserted at dangerous operations: division, dereferencing

Modeling environment:

I 2500 lines of modeling code to customize system calls (e.g. open,read, write, stat, lseek, ftruncate, ioctl)

I How to generate tests after using symbolic env: supply andescription of symbolic env for each test path; a special drivercreates real OS objects from the description

Page 43: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Tool Design KLEE - Constraint Solving

I STP: a decision procedure for Bit-Vectors and Arrays

I Decision procedures are programs which determine the satisfiabilityof logical formulas that can express constraints relevant to softwareand hardware

I STP uses new efficient SAT solvers

I Treat everything as bit vectors: arithmetic, bitwise operations,relational operations.

Page 44: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Tool Usage KLEE

I Using LLVM to compile to bytecode

I Run KLEE with bytecode

Page 45: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Coverage Results: KLEE

Page 46: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Bug Detection Results: KLEE

Mismatch of CoreUtils and BusyBox

Page 47: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Discussions

I Symbolic environment interaction - how reliable can the customizedmodeling really be? think about concurrent programs, inter-processprograms.

I What is more commonly needed - functional testing orsecurity/completeness/crash testing?

Page 48: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Cristian Cadar, Daniel Dunbar, and Dawson Engler.Klee: Unassisted and automatic generation of high-coverage tests forcomplex systems programs.In Proceedings of the 8th USENIX Conference on Operating SystemsDesign and Implementation, OSDI’08, pages 209–224, Berkeley, CA,USA, 2008. USENIX Association.

Leonardo De Moura and Nikolaj Bjørner.Satisfiability modulo theories: Introduction and applications.Commun. ACM, 54(9):69–77, September 2011.

Patrice Godefroid, Adam Kiezun, and Michael Y. Levin.Grammar-based whitebox fuzzing.In Proceedings of the 2008 ACM SIGPLAN Conference onProgramming Language Design and Implementation, PLDI ’08,pages 206–215, New York, NY, USA, 2008. ACM.

Patrice Godefroid, Nils Klarlund, and Koushik Sen.Dart: Directed automated random testing.In Proceedings of the 2005 ACM SIGPLAN Conference onProgramming Language Design and Implementation, PLDI ’05,pages 213–223, New York, NY, USA, 2005. ACM.

Koushik Sen, Darko Marinov, and Gul Agha.

Page 49: Lecture 9. Symbolic Execution - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/9.SymbolicExecution.pdf · Klee: Unassisted and automatic generation of high-coverage tests for

Cute: A concolic unit testing engine for c.In Proceedings of the 10th European Software EngineeringConference Held Jointly with 13th ACM SIGSOFT InternationalSymposium on Foundations of Software Engineering, ESEC/FSE-13,pages 263–272, New York, NY, USA, 2005. ACM.