Client-Driven Pointer Analysis
description
Transcript of Client-Driven Pointer Analysis
1
Client-Driven Pointer Analysis
Samuel Z. Guyer
Calvin Lin
June 2003
T H E U N I V E R S I T Y O F
T E X A SA T A U S T I N
2
Using Pointer Analysis
Pointer analysis: not a stand-alone analysis Supports other client analyses
Current practice: Client analysis – we’ll focus on error detection Pointer analysis algorithm – choose precision
Pointer Analyzer
Client Analysis
Memory Model
OutputErrorsError
Detector
CIFI Context & Flow Insensitive
CIFS Flow Sensitive
CSFS Context & Flow Sensitive
3
Motivation
Real-life scenario:Check for security vulnerabilities in BlackHole mail filter
Manually inspect reported errors One thing in common: a string processing routine Clone procedure = ad hoc context sensitivity Using CIFI, all 85 false positives go away
Can we automate this process?
Pointer Analyzer
Memory Model
Fast analysis;85 possible
errors
Error Detector
CIFICIFSCSFS
25X slower;85 possible
errors
Out of memory;No results
4
Our solution
Problems Cost-benefit tradeoff – severe for pointer analysis Precision choices are too coarse Choice is made a priori by the compiler writer
Solution: Mixed precision analysis Apply higher precision where it’s needed Use cheap analysis elsewhere
Key: Let the needs of client drive precision Customized precision policy created during analysis
5
Client-Driven Pointer Analysis
Algorithm: Start with fast cheap analysis: FI and CI Monitor: how imprecision causes information loss Adapt: Reanalyze with a customized precision policy
Dependence Graph
MonitorInformation Loss
Pointer Analyzer
Client Analysis
Memory Model
Error Reports
CIFI
Adaptor
Custom Policy
6
Overview
Motivation
Our algorithmAutomatically discover what the client needs
ExperimentsReal programs and challenging error detection problems
Related work and conclusions
7
False Positives
Example: sock = socket(AF_INET, SOCK_STREAM, 0);read(sock, buffer, 100);execl(buffer);!
Remote access vulnerability
!
stdin
??
main
socketexeclexecl
read
no errorerror
maybe
Lattice
8
Client-Driven Pointer Analysis
Dependence Graph
MonitorInformation Loss
Pointer Analyzer
Client Analysis
Memory Model
Error Reports
CIFI
Adaptor
Custom Policy
9
Analysis framework
Iterative dataflow analysis Pointer analysis: flow values are points-to sets Client analysis: flow values form typestate lattice
Fine-grained precision policies Context sensitivity: per procedure
CS: Clone or inline procedure invocation CI: Merge values from all call sites
Flow sensitivity: per memory location FS: Build factored use-def chains FI: Merge all assignments into a single flow value
10
Client-Driven Pointer Analysis
Dependence Graph
MonitorInformation Loss
Pointer Analyzer
Client Analysis
Memory Model
Error Reports
CIFI
Adaptor
Custom Policy
11
Monitor Runs alongside main analysis Monitors information loss
Detects polluting assignmentsMerge two accurate flow values ambiguous value
Tracks complicit assignmentsPassing an ambiguous value from one variable to another
Records in a dependence graphFor both pointer and client analyses
12
Dependence graph (I) Polluting assignment
Add a node for the variable – annotate with a diagnosis Complicit assignment
Add an edge from left side back to right side
paramCS: foo
xFS: x x
y
foo( )foo( )
foo(param )
Context insensitive
x =
x =
x
Flow insensitive
x = y;
x
Complicit
13
Dependence graph (II) Indirect assignments
x =(*ptr)
Pointer dereference
x
orptr
Polluted target
y
ptr
Polluted pointer
x
y
x
ptr
14
Adaptor
After analysis... Start at the “maybe error” variables Find all reachable nodes – collect the diagnoses
Often a small subset of all imprecision
?
DependenceGraph
Precision policy
CS: fooCS: barFS: xFS: ptrCS:bar
CS:foo
FS:x
FS:ptr
15
In action...
Monitor analysis
Polluting assignments
Diagnose and apply “fix” In this case: one procedure context-sensitive
Reanalyze
main
socketexeclexecl
read
stdin
??
readread
!
16
Programs 18 real C programs
Unmodified source – all the issues of production code Many are system tools – run in privileged mode
Representative examples:
Name Description Priv Lines of code Procedures CFG nodes
muh IRC proxy 5K (25K) 84 5,191
blackhole E-mail filter 12K (244K) 71 21,370
wu-ftpd FTP daemon 22K (66K) 205 23,107
named DNS server 26K (84K) 210 25,452
nn News reader 36K (116K) 494 46,336
17
Methodology
5 typestate error checkers: Represent non-trivial program properties Stress the pointer analyzer
Compare client-driven with fixed-precision
Goals: First, reduce number of errors reported
Conservative analysis – fewer is better Second, reduce analysis time
18
Increasing number of CFG nodes
Results
10X
0 0 0 0 0 0 07 29 6 85 28 2 31 4 5 93 41
00 0
0
0
00
7 186
85
15
1
26 4
5 8941
07 18
615
1
26 4
5
8841
CS-FI
CI-FS
CI-FI
CS-FS
Client-DrivenRemote access vulnerability
1000X
1
100X
stunnelpfingerm
uh (I)m
uh (II)pureftpfcronapachem
akeblackholew
u-ftp (I)ssh clientprivoxyw
u-ftp (II)nam
edssh servercfengineS
QLite
nn
No
rmal
ized
an
alys
is t
ime
0
0 0
0
7
29
28310
0 0
7 1526
? ? ? ? ? ? ? ? ? ? ??
19
Why it works
Notice: Different clients have different precision requirements Amount of extra precision is small
Name
Total procs
# procedures context-sensitive
Remote
Access
File
Access
FSV RFSV FTP
muh 84 6
apache 313 8 2 2 10
blackhole 71 2 5
wu-ftpd 205 4 4 17
named 210 1 2 1 4
cfengine 421 4 1 3 31
nn 494 2 1 1 30
20
Related work Pointer analysis and typestate error checking
Iterative flow analysis [Plevyak & Chien ‘94]
Demand-driven pointer analysis [Heintze & Tardieu ’01]
Combined pointer analysis [Zhang, Ryder, Landi ’98]
Effects of pointer analysis precision [Hind ’01 & others] More precision is more costly Does it help? Is it worth the cost?
21
Conclusions Client-driven pointer analysis
Precision should match the client and program
Not all pointers are equal
Need fine-grained precision policies
Key: knowing where to add more and what kind
Roadmap for scalabilityUse more expensive analysis on small parts of progams
22
Thank You
23
Time
24
Precision policies
Name
# procedures context-sensitive % variables flow-sensitive
RA File FSV RFSV FTP RA File FSV RFSV FTP
muh 6 0.1 0.07 0.31
apache 8 2 2 10 0.89 0.18 0.91 1.07 0.83
blackhole 2 5 0.24 0.04 0.32
wu-ftpd 4 4 17 0.63 0.09 0.51 0.53 0.23
named 1 2 1 4 0.14 0.01 0.23 0.20 0.42
cfengine 4 1 3 31 0.43 0.04 0.46 0.48 0.03
nn 2 1 1 30 1.82 0.17 1.99 2.03 0.97
25
Thank You
26
Error detection problems
Remote access vulnerabillity:
File access: Format string vulnerability (FSV):
Remote FSV: FTP behavior:
Data from an Internet socket should not specify a program to execute
Files must be open when accessed
Format string may not contain untrusted data
Check if FSV is remotely exploitable
Can this program be tricked into reading and transmitting arbitrary files
27
Annotations (I) Dependence and pointer information
Describe pointer structures Indicate which objects are accessed and modified
procedure fopen(pathname, mode){ on_entry { pathname --> path_string mode --> mode_string }
access { path_string, mode_string }
on_exit { return --> new file_stream }}
28
Annotations (II) Library-specific properties
Dataflow lattices
property State : { Open, Closed} initially Open
property Kind : { File, Socket { Local, Remote } }
SocketFile
Local RemoteOpenClosed
29
Annotations (III) Library routine effects
Dataflow transfer functions
procedure socket(domain, type, protocol){ analyze Kind { if (domain == AF_UNIX) IOHandle <- Local if (domain == AF_INET) IOHandle <- Remote }
analyze State { IOHandle <- Open }
on_exit { return --> new IOHandle }}
30
Annotations (IV) Reports and transformations
procedure execl(path, args){ on_entry { path --> path_string }
report if (Kind : path_string could-be Remote) “Error at “ ++ $callsite ++ “: remote access”;}
procedure slow_routine(first, second){ when (condition) replace-with %{ quick_check($first); fast_routine($first, $second); }%}
31
Type Theory Equivalent to dataflow analysis (heresy?)
Different in practice Dataflow: flow-sensitive problems, iterative analysis Types: flow-insensitive problems, constraint solver
Commonality No magic bullet: same cost for the same precision Extracting the store model is a primary concern
Flow values Types
Transfer functions Inference rules Remember PhilWadler’s talk?
32
Is it correct?
Three separate questions:
Are Sam Guyer’s experiments correct? Yes, to the best of our knowledge
Checked PLAPACK results Checked detected errors against known errors
Is our compiler implemented correctly? Flip answer: who’s is? Better answer: testing suites
How do we validate a set of annotations?
33
Annotation correctness
Not addressed in my dissertation, but... Theoretical approach
Does the library implement the domain? Formally verify annotations against implementation
Practical approach Annotation debugger: interactive Automated assistance in early stages of development
Middle approach Basic consistency checks
34
Error Checking vs Optimization Optimistic
False positives allowed It can even be unsound Tend to be “may” analyses
Correctness is absolute “Black and white” Certify programs bug-free
Cost tolerant Explore costly analysis
Pessimistic Must preserve semantics Soundness mandatory Tend to be “must” analyses
Performance is relative Spectrum of results No guarantees
Cost sensitive Compile-time is a factor
35
Complexity Pointer analysis
Address taken: linear Steensgaard: almost linear (log log n factor) Anderson: polynomial (cubic) Shape analysis: double exponential
Dataflow analysis Intraprocedural: polynomial (height of lattice) Context-sensitivity: exponential (call graph)
Rarely see worst-case
36
Find the error – part 3 State-of-the-art compiler
struct __sue_23 * var_72;struct __sue_25 * new_f = (struct __sue_25 *) malloc(sizeof (struct __sue_25));_IO_no_init(& new_f->fp.file, 1, 0, ((void *) 0), ((void *) 0));(& new_f->fp)->vtable = & _IO_file_jumps;_IO_file_init(& new_f->fp);if (_IO_file_fopen((struct __sue_23 *) new_f, filename, mode, is32) != ((void *) 0)) { var_72 = & new_f->fp.file; if ((var_72->_flags2 & 1) && (var_72->_flags & 8)) { if (var_72->_mode <= 0) ((struct __sue_23 *) var_72)->vtable = & _IO_file_jumps_maybe_mmap; else ((struct __sue_23 *) var_72)->vtable = & _IO_wfile_jumps_maybe_mmap; var_72->_wide_data->_wide_vtable = & _IO_wfile_jumps_maybe_mmap; }}if (var_72->_flags & 8192U) _IO_un_link((struct __sue_23 *) var_72);if (var_72->_flags & 8192U) status = _IO_file_close_it(var_72); else status = var_72->_flags & 32U ? - 1 : 0;((* (struct _IO_jump_t * *) ((void *) (& ((struct __sue_23 *) (var_72))->vtable) + (var_72)->_vtable_offset))->__finish)(var_72, 0);if (var_72->_mode <= 0) if (((var_72)->_IO_save_base != ((void *) 0))) _IO_free_backup_area(var_72);if (var_72 != ((struct __sue_23 *) (& _IO_2_1_stdin_)) && var_72 != ((struct __sue_23 *) (& _IO_2_1_stdout_)) && var_72 != ((struct __sue_23 *) (& _IO_2_1_stderr_))) { var_72->_flags = 0; free(var_72); }bytes_read = _IO_sgetn(var_72, (char *) var_81, bytes_requested);
37
Challenge 2: Scope Call graph:
Objects flow throughout program No scoping constraints Objects referenced through pointers
We need whole-program analysis
main
readsocketsock = (AF_INET, SOCK_STREAM, 0);
(sock, buffer, 100); (ref);execl
!
38
The Broadway Compiler
Broadway – source-to-source C compiler Domain-independent compiler mechanisms
Annotations – lightweight specification language Domain-specific analyses and transformations
Many libraries, one compiler
ApplicationSource code
LibraryAnnotationsHeader filesSource code
Broadway
Analyzer
Optimizer
Error reportsLibrary-specific messages
Application+LibraryIntegrated source code
39
Security vulnerabilities How does remote hacking work?
Most are not direct attacks (e.g., cracking passwords) Idea: trick a program into unintended behavior
Automated vulnerability detection: How do we define “intended”? Difficult to formalize and check application logic
Libraries control all critical system services Communication, file access, process control Analyze routines to approximate vulnerability
40
End backup slides
= ( , )
Conditions
if(cond)
x = x =