Client-Driven Pointer Analysis

40
1 Client-Driven Pointer Analysis Samuel Z. Guyer Calvin Lin June 2003 T H E U N I V E R S I T Y O F T E X A S A T A U S T I N

description

T H E U N I V E R S I T Y O F T E X A S. A T A U S T I N. Client-Driven Pointer Analysis. Samuel Z. Guyer Calvin Lin June 2003. Output. Errors. CIFI. Context & Flow Insensitive. CIFS. Flow Sensitive. CSFS. Context & Flow Sensitive. Using Pointer Analysis. - PowerPoint PPT Presentation

Transcript of Client-Driven Pointer Analysis

Page 1: Client-Driven  Pointer Analysis

1

Client-Driven Pointer Analysis

Samuel Z. Guyer

Calvin Lin

June 2003

T H E U N I V E R S I T Y O F

T E X A SA T A U S T I N

Page 2: Client-Driven  Pointer Analysis

2

Using Pointer Analysis

Pointer analysis: not a stand-alone analysis Supports other client analyses

Current practice: Client analysis – we’ll focus on error detection Pointer analysis algorithm – choose precision

Pointer Analyzer

Client Analysis

Memory Model

OutputErrorsError

Detector

CIFI Context & Flow Insensitive

CIFS Flow Sensitive

CSFS Context & Flow Sensitive

Page 3: Client-Driven  Pointer Analysis

3

Motivation

Real-life scenario:Check for security vulnerabilities in BlackHole mail filter

Manually inspect reported errors One thing in common: a string processing routine Clone procedure = ad hoc context sensitivity Using CIFI, all 85 false positives go away

Can we automate this process?

Pointer Analyzer

Memory Model

Fast analysis;85 possible

errors

Error Detector

CIFICIFSCSFS

25X slower;85 possible

errors

Out of memory;No results

Page 4: Client-Driven  Pointer Analysis

4

Our solution

Problems Cost-benefit tradeoff – severe for pointer analysis Precision choices are too coarse Choice is made a priori by the compiler writer

Solution: Mixed precision analysis Apply higher precision where it’s needed Use cheap analysis elsewhere

Key: Let the needs of client drive precision Customized precision policy created during analysis

Page 5: Client-Driven  Pointer Analysis

5

Client-Driven Pointer Analysis

Algorithm: Start with fast cheap analysis: FI and CI Monitor: how imprecision causes information loss Adapt: Reanalyze with a customized precision policy

Dependence Graph

MonitorInformation Loss

Pointer Analyzer

Client Analysis

Memory Model

Error Reports

CIFI

Adaptor

Custom Policy

Page 6: Client-Driven  Pointer Analysis

6

Overview

Motivation

Our algorithmAutomatically discover what the client needs

ExperimentsReal programs and challenging error detection problems

Related work and conclusions

Page 7: Client-Driven  Pointer Analysis

7

False Positives

Example: sock = socket(AF_INET, SOCK_STREAM, 0);read(sock, buffer, 100);execl(buffer);!

Remote access vulnerability

!

stdin

??

main

socketexeclexecl

read

no errorerror

maybe

Lattice

Page 8: Client-Driven  Pointer Analysis

8

Client-Driven Pointer Analysis

Dependence Graph

MonitorInformation Loss

Pointer Analyzer

Client Analysis

Memory Model

Error Reports

CIFI

Adaptor

Custom Policy

Page 9: Client-Driven  Pointer Analysis

9

Analysis framework

Iterative dataflow analysis Pointer analysis: flow values are points-to sets Client analysis: flow values form typestate lattice

Fine-grained precision policies Context sensitivity: per procedure

CS: Clone or inline procedure invocation CI: Merge values from all call sites

Flow sensitivity: per memory location FS: Build factored use-def chains FI: Merge all assignments into a single flow value

Page 10: Client-Driven  Pointer Analysis

10

Client-Driven Pointer Analysis

Dependence Graph

MonitorInformation Loss

Pointer Analyzer

Client Analysis

Memory Model

Error Reports

CIFI

Adaptor

Custom Policy

Page 11: Client-Driven  Pointer Analysis

11

Monitor Runs alongside main analysis Monitors information loss

Detects polluting assignmentsMerge two accurate flow values ambiguous value

Tracks complicit assignmentsPassing an ambiguous value from one variable to another

Records in a dependence graphFor both pointer and client analyses

Page 12: Client-Driven  Pointer Analysis

12

Dependence graph (I) Polluting assignment

Add a node for the variable – annotate with a diagnosis Complicit assignment

Add an edge from left side back to right side

paramCS: foo

xFS: x x

y

foo( )foo( )

foo(param )

Context insensitive

x =

x =

x

Flow insensitive

x = y;

x

Complicit

Page 13: Client-Driven  Pointer Analysis

13

Dependence graph (II) Indirect assignments

x =(*ptr)

Pointer dereference

x

orptr

Polluted target

y

ptr

Polluted pointer

x

y

x

ptr

Page 14: Client-Driven  Pointer Analysis

14

Adaptor

After analysis... Start at the “maybe error” variables Find all reachable nodes – collect the diagnoses

Often a small subset of all imprecision

?

DependenceGraph

Precision policy

CS: fooCS: barFS: xFS: ptrCS:bar

CS:foo

FS:x

FS:ptr

Page 15: Client-Driven  Pointer Analysis

15

In action...

Monitor analysis

Polluting assignments

Diagnose and apply “fix” In this case: one procedure context-sensitive

Reanalyze

main

socketexeclexecl

read

stdin

??

readread

!

Page 16: Client-Driven  Pointer Analysis

16

Programs 18 real C programs

Unmodified source – all the issues of production code Many are system tools – run in privileged mode

Representative examples:

Name Description Priv Lines of code Procedures CFG nodes

muh IRC proxy 5K (25K) 84 5,191

blackhole E-mail filter 12K (244K) 71 21,370

wu-ftpd FTP daemon 22K (66K) 205 23,107

named DNS server 26K (84K) 210 25,452

nn News reader 36K (116K) 494 46,336

Page 17: Client-Driven  Pointer Analysis

17

Methodology

5 typestate error checkers: Represent non-trivial program properties Stress the pointer analyzer

Compare client-driven with fixed-precision

Goals: First, reduce number of errors reported

Conservative analysis – fewer is better Second, reduce analysis time

Page 18: Client-Driven  Pointer Analysis

18

Increasing number of CFG nodes

Results

10X

0 0 0 0 0 0 07 29 6 85 28 2 31 4 5 93 41

00 0

0

0

00

7 186

85

15

1

26 4

5 8941

07 18

615

1

26 4

5

8841

CS-FI

CI-FS

CI-FI

CS-FS

Client-DrivenRemote access vulnerability

1000X

1

100X

stunnelpfingerm

uh (I)m

uh (II)pureftpfcronapachem

akeblackholew

u-ftp (I)ssh clientprivoxyw

u-ftp (II)nam

edssh servercfengineS

QLite

nn

No

rmal

ized

an

alys

is t

ime

0

0 0

0

7

29

28310

0 0

7 1526

? ? ? ? ? ? ? ? ? ? ??

Page 19: Client-Driven  Pointer Analysis

19

Why it works

Notice: Different clients have different precision requirements Amount of extra precision is small

Name

Total procs

# procedures context-sensitive

Remote

Access

File

Access

FSV RFSV FTP

muh 84 6

apache 313 8 2 2 10

blackhole 71 2 5

wu-ftpd 205 4 4 17

named 210 1 2 1 4

cfengine 421 4 1 3 31

nn 494 2 1 1 30

Page 20: Client-Driven  Pointer Analysis

20

Related work Pointer analysis and typestate error checking

Iterative flow analysis [Plevyak & Chien ‘94]

Demand-driven pointer analysis [Heintze & Tardieu ’01]

Combined pointer analysis [Zhang, Ryder, Landi ’98]

Effects of pointer analysis precision [Hind ’01 & others] More precision is more costly Does it help? Is it worth the cost?

Page 21: Client-Driven  Pointer Analysis

21

Conclusions Client-driven pointer analysis

Precision should match the client and program

Not all pointers are equal

Need fine-grained precision policies

Key: knowing where to add more and what kind

Roadmap for scalabilityUse more expensive analysis on small parts of progams

Page 22: Client-Driven  Pointer Analysis

22

Thank You

Page 23: Client-Driven  Pointer Analysis

23

Time

Page 24: Client-Driven  Pointer Analysis

24

Precision policies

Name

# procedures context-sensitive % variables flow-sensitive

RA File FSV RFSV FTP RA File FSV RFSV FTP

muh 6 0.1 0.07 0.31

apache 8 2 2 10 0.89 0.18 0.91 1.07 0.83

blackhole 2 5 0.24 0.04 0.32

wu-ftpd 4 4 17 0.63 0.09 0.51 0.53 0.23

named 1 2 1 4 0.14 0.01 0.23 0.20 0.42

cfengine 4 1 3 31 0.43 0.04 0.46 0.48 0.03

nn 2 1 1 30 1.82 0.17 1.99 2.03 0.97

Page 25: Client-Driven  Pointer Analysis

25

Thank You

Page 26: Client-Driven  Pointer Analysis

26

Error detection problems

Remote access vulnerabillity:

File access: Format string vulnerability (FSV):

Remote FSV: FTP behavior:

Data from an Internet socket should not specify a program to execute

Files must be open when accessed

Format string may not contain untrusted data

Check if FSV is remotely exploitable

Can this program be tricked into reading and transmitting arbitrary files

Page 27: Client-Driven  Pointer Analysis

27

Annotations (I) Dependence and pointer information

Describe pointer structures Indicate which objects are accessed and modified

procedure fopen(pathname, mode){ on_entry { pathname --> path_string mode --> mode_string }

access { path_string, mode_string }

on_exit { return --> new file_stream }}

Page 28: Client-Driven  Pointer Analysis

28

Annotations (II) Library-specific properties

Dataflow lattices

property State : { Open, Closed} initially Open

property Kind : { File, Socket { Local, Remote } }

SocketFile

Local RemoteOpenClosed

Page 29: Client-Driven  Pointer Analysis

29

Annotations (III) Library routine effects

Dataflow transfer functions

procedure socket(domain, type, protocol){ analyze Kind { if (domain == AF_UNIX) IOHandle <- Local if (domain == AF_INET) IOHandle <- Remote }

analyze State { IOHandle <- Open }

on_exit { return --> new IOHandle }}

Page 30: Client-Driven  Pointer Analysis

30

Annotations (IV) Reports and transformations

procedure execl(path, args){ on_entry { path --> path_string }

report if (Kind : path_string could-be Remote) “Error at “ ++ $callsite ++ “: remote access”;}

procedure slow_routine(first, second){ when (condition) replace-with %{ quick_check($first); fast_routine($first, $second); }%}

Page 31: Client-Driven  Pointer Analysis

31

Type Theory Equivalent to dataflow analysis (heresy?)

Different in practice Dataflow: flow-sensitive problems, iterative analysis Types: flow-insensitive problems, constraint solver

Commonality No magic bullet: same cost for the same precision Extracting the store model is a primary concern

Flow values Types

Transfer functions Inference rules Remember PhilWadler’s talk?

Page 32: Client-Driven  Pointer Analysis

32

Is it correct?

Three separate questions:

Are Sam Guyer’s experiments correct? Yes, to the best of our knowledge

Checked PLAPACK results Checked detected errors against known errors

Is our compiler implemented correctly? Flip answer: who’s is? Better answer: testing suites

How do we validate a set of annotations?

Page 33: Client-Driven  Pointer Analysis

33

Annotation correctness

Not addressed in my dissertation, but... Theoretical approach

Does the library implement the domain? Formally verify annotations against implementation

Practical approach Annotation debugger: interactive Automated assistance in early stages of development

Middle approach Basic consistency checks

Page 34: Client-Driven  Pointer Analysis

34

Error Checking vs Optimization Optimistic

False positives allowed It can even be unsound Tend to be “may” analyses

Correctness is absolute “Black and white” Certify programs bug-free

Cost tolerant Explore costly analysis

Pessimistic Must preserve semantics Soundness mandatory Tend to be “must” analyses

Performance is relative Spectrum of results No guarantees

Cost sensitive Compile-time is a factor

Page 35: Client-Driven  Pointer Analysis

35

Complexity Pointer analysis

Address taken: linear Steensgaard: almost linear (log log n factor) Anderson: polynomial (cubic) Shape analysis: double exponential

Dataflow analysis Intraprocedural: polynomial (height of lattice) Context-sensitivity: exponential (call graph)

Rarely see worst-case

Page 36: Client-Driven  Pointer Analysis

36

Find the error – part 3 State-of-the-art compiler

struct __sue_23 * var_72;struct __sue_25 * new_f = (struct __sue_25 *) malloc(sizeof (struct __sue_25));_IO_no_init(& new_f->fp.file, 1, 0, ((void *) 0), ((void *) 0));(& new_f->fp)->vtable = & _IO_file_jumps;_IO_file_init(& new_f->fp);if (_IO_file_fopen((struct __sue_23 *) new_f, filename, mode, is32) != ((void *) 0)) {  var_72 = & new_f->fp.file;  if ((var_72->_flags2 & 1) && (var_72->_flags & 8)) {    if (var_72->_mode <= 0) ((struct __sue_23 *) var_72)->vtable = & _IO_file_jumps_maybe_mmap;    else ((struct __sue_23 *) var_72)->vtable = & _IO_wfile_jumps_maybe_mmap;    var_72->_wide_data->_wide_vtable = & _IO_wfile_jumps_maybe_mmap;  }}if (var_72->_flags & 8192U) _IO_un_link((struct __sue_23 *) var_72);if (var_72->_flags & 8192U) status = _IO_file_close_it(var_72);  else status = var_72->_flags & 32U ? - 1 : 0;((* (struct _IO_jump_t * *) ((void *) (& ((struct __sue_23 *) (var_72))->vtable) +                              (var_72)->_vtable_offset))->__finish)(var_72, 0);if (var_72->_mode <= 0)  if (((var_72)->_IO_save_base != ((void *) 0))) _IO_free_backup_area(var_72);if (var_72 != ((struct __sue_23 *) (& _IO_2_1_stdin_)) && var_72 != ((struct __sue_23 *) (& _IO_2_1_stdout_)) &&     var_72 != ((struct __sue_23 *) (& _IO_2_1_stderr_))) { var_72->_flags = 0;  free(var_72); }bytes_read = _IO_sgetn(var_72, (char *) var_81, bytes_requested);

Page 37: Client-Driven  Pointer Analysis

37

Challenge 2: Scope Call graph:

Objects flow throughout program No scoping constraints Objects referenced through pointers

We need whole-program analysis

main

readsocketsock = (AF_INET, SOCK_STREAM, 0);

(sock, buffer, 100); (ref);execl

!

Page 38: Client-Driven  Pointer Analysis

38

The Broadway Compiler

Broadway – source-to-source C compiler Domain-independent compiler mechanisms

Annotations – lightweight specification language Domain-specific analyses and transformations

Many libraries, one compiler

ApplicationSource code

LibraryAnnotationsHeader filesSource code

Broadway

Analyzer

Optimizer

Error reportsLibrary-specific messages

Application+LibraryIntegrated source code

Page 39: Client-Driven  Pointer Analysis

39

Security vulnerabilities How does remote hacking work?

Most are not direct attacks (e.g., cracking passwords) Idea: trick a program into unintended behavior

Automated vulnerability detection: How do we define “intended”? Difficult to formalize and check application logic

Libraries control all critical system services Communication, file access, process control Analyze routines to approximate vulnerability

Page 40: Client-Driven  Pointer Analysis

40

End backup slides

= ( , )

Conditions

if(cond)

x = x =