“Tracking Pointers with Path and Context Sensitivity for Bug Detection in C Programs” CMSC 838Z...
-
date post
20-Dec-2015 -
Category
Documents
-
view
221 -
download
3
Transcript of “Tracking Pointers with Path and Context Sensitivity for Bug Detection in C Programs” CMSC 838Z...
“Tracking Pointers with Path and Context Sensitivity for Bug Detection in C
Programs” CMSC 838Z – Spring 2004
V. Benjamin Livshits and Monica S. Lam
presented by Mujtaba Ali
(Based on a presentation given at ACM SIGSOFT FSE-11, September 2003)
2
Bugs are Bad
• Costs lots of money to fix after deployment
• Nastiest bugs == security violations– Hard to discover effects of malicious attacks– Legal/liability issues
3
CVE Classification
Buff er over fl ow
Mal f or med input
Shel l Metachar acter s
For mat str ings
Bad per missions
Syml ink Fol lowing
P r ivi lege Handl ing
Cr oss-si te scr ipting
Cr yptogr aphic er r or
Dir ector y T r aver sal
62%
Would like to address these
• Study of security reports in 2002– source: SecurityFocus.com
4
Motivation
• Goal: Report hard-to-find security violations– Errors spanning many functions and files– Reduce false positives
• Applications:– Buffer overflows– Format string vulnerabilities
5
Examples
• Two security vulnerabilities:– Buffer overrun in gzip, compression utility– Format string violation in muh, network game
• Cause: Unsafe use of user-supplied data– gzip copies data to statically-sized buffer
• may result in an overrun
– muh uses data as format argument call to vsnprintf
• user can maliciously embed %n into format string
6
Buffer Overrun in gzip0592 while (optind < argc) {0593 treat_file(argv[optind++]);0594 }
0704 local void treat_file(char *iname){...
0716 if (get_istat(iname, &istat) != OK) return;
0997 local int get_istat(char *iname, struct stat *sbuf){
...1009 strcpy(ifname, iname);
gzip.c:593
gzip.c:1009
gzip.c:716
Need a model of strcpy
0233 char ifname[MAX_PATH_LEN]; /*input file name*/gzip.c:233
7
Format String Violation in muh
0838 s = ( char * )malloc( 1024 );0839 while( fgets( s, 1023, messagelog ) ) {0841 irc_notice(&c_client, status.nickname,
s);0842 }0843 FREESTRING( s );
257 void irc_notice(con_type *con, char nick[],
char *format, ... ){259 va_list va;260 char buffer[ BUFFERSIZE ];261 262 va_start( va, format );263 vsnprintf( buffer, BUFFERSIZE - 10, format, va );
muh.c:839
irc.c:263
8
Easy Bugs are Boring
• Programs have security violations despite code reviews and years of use
• Common observation about hard errors:– Errors on interface boundaries – need to follow
data flow between procedures– Errors occur along complicated control-flow
paths: need to follow long definition-use chains
9
Need for Alias Analysis
• Both examples involved complex flow of data
• Tracking data flow in modern languages requires alias analysis
• Steensgaard’s or Andersen’s analysis?– Flow- and context- insensitive– Fast, but imprecise: too many false positives– But flow- and context- sensitive analyses do not
scale
10
Tradeoff: Scalability vs Precision
slow and expensive
Speed / Scalability
Pre
cisi
on
low
high
Steensgaard
3-valuelogic
This analysis
Andersen
Wilson &Lam
fast
11
A Hybrid Analysis
• Analyze precisely:– Local variables
– Function parameters
– Global variables
– …their dereferences and fields
– These are essentially access paths, i.e. p.next.data.
• The rest: Break into equivalence classes
• Represent by abstract locations:– Recursive data
structures
– Arrays
– Locations accessed through pointer arithmetic
• Maintain precision selectively
12
Linearity
• Regular assignments result in strong updates
• Assignments to abstract memory locations – weak updates
x = 1; x0 = 1 x = 2; x1 = 2y = x; y0 = x1
x is 2
A[i] = 1; m0 = 1 A[j] = 2; m1 = (m0, 2)b = A[k]; b0 = m1
Either 1 or 2
13
Static Single Assignment
• Sparse representation of program
• Propagate facts about definition where needed– Definition-use relationships
• Each variable is defined only once– Give new names (subscripts) to definitions
– Solves flow-sensitivity problem
• But a definition may be used many times
14
SSA: Join points
• In standard SSA, use function at joins– d3 = (d1, d2)
• In Gated SSA, use function at joins– d3 = ( <P, d1>, <¬P, d2>)– Solves path-sensitivity problem
15
IPSSA – Intraprocedurally
• Extension of Gated SSA
• Provides pointer resolution– Replace indirect pointer dereferences with direct
accesses of potentially new temporary locations
16
Example of Pointer Resolution
Load resolution
Store resolution
int a=0,b=1;int c=2,d=3;
if(Q){ p = &a;}else{ p = &b;}
c = *p;
*p = d;
p1 = &a
p2 = &b
p3 = (<Q, p1>, <¬Q, p2>)
c1 = (<Q, a0>, <¬Q, b0>)
a1 = (<Q, d0>, <¬Q, a0>)
b1 = (<Q, b0>, <¬Q, d0>)
a0 = 0, b0 = 1c0 = 2, d0 = 3
17
Pointer Resolution Rules
• When resolving definition d, next step depends on RHS of d
• Expressed as conditional rewrite rules
• A few sample rules:– d = &x, result is x– d = (…), result is d^
– d = (<P1, d1>,…,<Pn, dn>), follow d1…dn
• Refer to the paper for details
18
Interprocedural Example• Data flow in and out of functions:
– Create links between formal and actual parameters– Reflect stores and assignments to globals at the callee
int f(int* p){ *p = 100;}
int main(){ int x = 0; int *q = &x; c: f(q); }
p0 = (<c,q0>)p^1 = 100
x1 = ρ(<f,100>)
q0 = &x
Formal-actual connection for call site c
Reflect store inside of f
within main
x0 = 0
19
Unsound Unaliasing Assumption
A1: No aliased parameters
A2: No aliased abstract locations
Assumption
Locations accessible through different parameters are distinct
Things pulled out of an abstract location is not aliased
JustificationMatches how good interfaces are written
Holds in most usage cases
Consequence
Context-independent procedure summaries
Give unique names when we get data from abstract location
20
Interprocedural Algorithm• Process one SCC of the call graph at a time
– Bottom-up
– SCC == strongly-connected component
• For each SCC, within each procedure:– Resolve all pointer operations (loads and stores)
– Create links between formal and actual parameters
– Reflect stores and assignments to globals at call sites
• Iterate within SCC until the representation stabilizes
21
Framework
Program sources
IPSSA construction
Buffer overrun
sError traces
Format violation
s…others…
Abstracts away many details.
Makes it easy to write tools
IP data flow info
Framework makes it easy to
add new analyses
22
Application1. Start at roots – sources of user input such as
– argv[] elements– Input functions: fgets, gets, recv, getenv,
etc.2. Follow data flow chains provided by IPSSA: for
every definition, IPSSA provides a list of its uses3. A sink is a potentially dangerous usage such as
– A buffer of a statically defined length– A format argument of vulnerable functions:
printf, fprintf, snprintf, vsnprintf4. Report bug, record full path
23
Experimental Setup
• Implementation w/ SUIF2 compiler suite– Pentium IV 2GHz, 2GB of RAM running Linux
Program Version LOC Procedures IPSSA constr.time, seconds
lhttpd 0.1 888 21 5.2 polymorph 0.4.0 1,015 19 1.0 bftpd 1.0.11 2,946 47 3.2 trollftpd 1.26 3,584 48 11.3 man 1.5h1 4,139 83 29.3 pgp4pine 1.76 4,804 69 17.5 cfingerd 1.4.3 5,094 66 15.5 muh 2.05d 5,695 95 20.4 gzip 1.2.4 8,162 93 17.0 pcre 3.9 13,037 47 22.4
24
Summary of Results
Program Total Buffer Format False Defs Procs Tool'sname # of over- string positives spanned spanned runtime
warnings runs vulner. sec
lhttpd 1 1 0 0 24 14 99polymorph 2 2 0 0 7,8 3,3 2.4bftpd 2 1 1 0 5, 7 1, 3 2.3 strollftpd 1 1 0 0 23 5 8.5 sman 1 1 0 0 6 4 9.6 spgp4pine 4 4 0 0 5, 5, 5, 5 3, 3, 3, 3 27.1 scfingerd 1 0 1 0 10 4 7.4 smuh 1 0 1 0 7 3 7.5 sgzip 1 1 0 0 7 5 2.0 spcre 1 0 0 1 6 4 9.2 sTotal 15 11 3 1 Previously unknown: 6
Many definitions
Many procedures
25
False Positive in pcre
• Copying “tainted” user data to a statically-sized buffer may be unsafe– Turns out to be safe in this case
sprintf(buffer, “%.512s”, filename)
Limits the length of copied data.
Buffer is big enough!
Tainted data
26
Related Work
• xgcc– Also unsound– Many more false positives– Even “regular” developers can specify new
analyses
• Cqual– Interprocedural, sound analysis– Requires annotations– Flow-, context-, and path-insensitive
• More false positives
27
The Good
• Unsoundness gives very few false positives
• Unsoundness allows efficient path- and context-sensitive analysis
• Tested on real code
• Good presentations to steal slides from
28
The Bad
• Cqual is much improved now– Very few false positives on these types of security
vulnerabilities– Let’s see some more interesting vulnerabilities
• Does not detect buffer overflows due to array bounds violations
• Not tested with large programs
29
Singular Key Idea
• IPSSA coupled with a (slightly) unsound alias analysis facilitates efficient detection of hard-to-find security violations with very few false positives