Efficient Dynamic Detection of Input-Related Security Faults

41
Advanced Computer Architecture Lab University of Michigan Efficient Dynamic Detection of Efficient Dynamic Detection of Input-Related Security Faults Input-Related Security Faults Eric Larson Dissertation Defense University of Michigan April 29, 2004

description

Efficient Dynamic Detection of Input-Related Security Faults. Eric Larson Dissertation Defense University of Michigan April 29, 2004. Security Faults. Keeping computer data and accesses secure is a tough problem Software errors cost companies millions of dollars - PowerPoint PPT Presentation

Transcript of Efficient Dynamic Detection of Input-Related Security Faults

Page 1: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

1

Efficient Dynamic Detection of Efficient Dynamic Detection of Input-Related Security FaultsInput-Related Security Faults

Eric LarsonDissertation DefenseUniversity of Michigan

April 29, 2004

Page 2: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

2

Security FaultsSecurity Faults• Keeping computer data and accesses secure is a tough

problem• Software errors cost companies millions of dollars• Different types of errors can lead to exploits:

– Protocol errors– Configuration errors– Implementation errors (most common)

• Even with a well-designed security protocol, a program can be compromised if it contains bugs!

Page 3: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

3

Input-Related Software FaultsInput-Related Software Faults• Common implementation error is to improperly bound input data

– checks are not present in many cases– when checks are present, they can be wrong– especially important for network data

• Common security exploit: buffer overflow– array references– string library functions in C

• Widespread problem:– 2/3 of CERT security advisories in 2003 were due to buffer overflows– buffer overflow bugs have recently been found in Windows and Linux

Page 4: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

4

Remainderof the stack

foo

Example Buffer Overflow AttackExample Buffer Overflow Attack• Attacking the program involves two steps:

bar

1. Write malicious code onto the stack.

bad code2. Redirect control to execute the malicious data.

Page 5: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

5

Overwriting the Return AddressOverwriting the Return Address

void bar() { char buffer[100]; gets(buffer); printf(“String is %s”, buffer);}

Return address

temporary value 1

temporary value 2

buf[99]

buf[98]

buf[0]

Stack grows to lower addresses

Data grows to higher addresses

Page 6: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

6

Overwriting the Return AddressOverwriting the Return Address

void bar() { char buffer[100]; gets(buffer); printf(“String is %s”, buffer);}

0xbadc0de

0xbadc0de

0xbadc0de

buf[99]

buf[98]

buf[0]

Stack grows to lower addresses

Data grows to higher addresses

The location of the return address is not always known, so overwriteeverything!

Page 7: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

7

Outline of TalkOutline of Talk• Background and Related Work (Ch. 2)• Detecting Input-Related Software Faults (Ch. 3)• MUSE: Instrumentation Infrastructure (Ch. 4)• Implementation and Results (Ch. 5)• Reducing Performance Overhead (Ch. 6)• Conclusions (Ch. 7)

Page 8: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

8

When Should I Look for Software Bugs?When Should I Look for Software Bugs?• Compile-time (static) bug detection

+ no dependence on input+ can prove that a dangerous operation is safe in some cases– often computationally infeasible (too many states or paths)– scope is limited: either high false alarm rate or low bug finding rate– hard to analyze heap data

• Run-time (dynamic) bug detection+ can analyze all variables (including those on the heap)+ execution is on a real path fewer false alarms– error may not manifest as an error in the output– depends on program input– impacts performance of program

Our approach is dynamic, addressing its deficiencies by borrowing ideas from

static bug detection

Page 9: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

9

Contributions of this ThesisContributions of this Thesis• Dynamically Detecting Input-Related Software Faults

– Relaxes dependence on input• MUSE: Instrumentation Infrastructure

– Developed for rapid prototyping of bug detection tools for this and future research

• Removing Unnecessary Instrumentation– Reduces performance overhead

• Improved Shadow State Management– Tighter integration with the compiler, improves performance

Page 10: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

10

Selected Related WorkSelected Related Work• Jones & Kelly: dynamic approach to catching memory access

errors, tracks all valid objects in memory using a table • Tainted Perl: prevents unsafe actions from unvalidated input• STOBO: uses allocation sizes rather than string sizes• CCured: type system used to catch memory access errors,

instrumentation is added when static analysis fails• BOON: derives and solves a system of integer range constraints

statically to find buffer overruns • CSSV: model checking system to find buffer overflows in C,

keeps track of potential string lengths and null termination • MetaCompilation: checks for uses of unbounded input, does not

verify if the checks are correct

Page 11: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

11

Detection of Input-Related Software FaultsDetection of Input-Related Software Faults• Program instrumentation tracks data derived from input

– possible range of integer variables– maximum size and termination of strings

• Dangerous operations are checked over entire range of possible values

• Found 17 bugs in 9 programs, including 2 known high security faults in OpenSSH

Relaxes constraint that the user provides an input that exposes the bug

Page 12: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

12

Detecting Array Buffer OverflowsDetecting Array Buffer Overflows• Interval constraint variables are introduced when

external inputs are read– Holds the lower and upper bounds for each input value– Initial values encompass the entire range– Control points narrow the bounds– Arithmetic operations adjust the bounds

• Potentially dangerous operations are checked:– Array indexing– Controlling a loop or memory allocation size– Arithmetic operations (overflow)

Page 13: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

13

Code Sequence:int x;int array[5];x = get_input_int();if (x < 0 || x > 4) fatal(“bounds”);x++;y = array[x];

Range of x:

-MAX_INT x +MAX_INT

0 x 4

1 x 51 x 5

Value of x:

22

33

ERROR! When x = 5, array reference is out of bounds!

Page 14: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

14

Detecting Dangerous String OperationsDetecting Dangerous String Operations• Strings are shadowed by:

– max_str_size: largest possible size of the string– known_null: set if string is known to contain a null character

• Checking string operations:– source string will fit into the destination– source strings are guaranteed to be null terminated

• Operations involving a string length can narrow the maximum string size– our size counts the null character, the strlen function does not

• Integers that store string lengths are shadowed by:– base address of corresponding string– difference between its value and actual string length

Page 15: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

15

String Fault Detection ExampleString Fault Detection ExampleCode Segment Str. max_str_size known_null

char *bad_copy(char *src)

{

char tmp[16];

char *dst = (char*)malloc(16);

if (strlen(src) > 16)

return NULL;

strncpy(tmp, src, 16);

strcpy(dst, tmp);

return dst;

}

src MAX_INT TRUE

Page 16: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

16

String Fault Detection ExampleString Fault Detection ExampleCode Segment Str. max_str_size known_null

char *bad_copy(char *src)

{

char tmp[16];

char *dst = (char*)malloc(16);

if (strlen(src) > 16)

return NULL;

strncpy(tmp, src, 16);

strcpy(dst, tmp);

return dst;

}

src

tmp

dst

MAX_INT

16

16

TRUE

FALSE

FALSE

Page 17: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

17

String Fault Detection ExampleString Fault Detection ExampleCode Segment Str. max_str_size known_null

char *bad_copy(char *src)

{

char tmp[16];

char *dst = (char*)malloc(16);

if (strlen(src) > 16)

return NULL;

strncpy(tmp, src, 16);

strcpy(dst, tmp);

return dst;

}

src

tmp

dst

src

MAX_INT

16

16

17

TRUE

FALSE

FALSE

TRUE

Page 18: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

18

String Fault Detection ExampleString Fault Detection ExampleCode Segment Str. max_str_size known_null

char *bad_copy(char *src)

{

char tmp[16];

char *dst = (char*)malloc(16);

if (strlen(src) > 16)

return NULL;

strncpy(tmp, src, 16);

strcpy(dst, tmp);

return dst;

}

src

tmp

dst

src

tmp

MAX_INT

16

16

17

16

TRUE

FALSE

FALSE

TRUE

FALSE

Page 19: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

19

String Fault Detection ExampleString Fault Detection ExampleCode Segment Str. max_str_size known_null

char *bad_copy(char *src)

{

char tmp[16];

char *dst = (char*)malloc(16);

if (strlen(src) > 16)

return NULL;

strncpy(tmp, src, 16);

strcpy(dst, tmp);

return dst;

}

src

tmp

dst

src

tmp

MAX_INT

16

16

17

16

TRUE

FALSE

FALSE

TRUE

FALSE

ERROR! tmp may not be null terminated during strcpy

Page 20: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

20

String Fault Detection ExampleString Fault Detection ExampleCode Segment Str. max_str_size known_null

char *bad_copy(char *src)

{

char *dst = (char*)malloc(16);

if (strlen(src) > 16)

return NULL;

strcpy(dst, src);

return dst;

}

src

dst

src

MAX_INT

16

17

TRUE

FALSE

TRUE

ERROR! src may not fit into dst during strcpy

Page 21: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

21

MUSE: Implementation InfrastructureMUSE: Implementation Infrastructure• Developed for rapid prototyping of bug detection tools for this and

future research• General-purpose instrumentation tool

– can also be used to created profilers, coverage tools, and debugging aids

• Implemented in GCC at the abstract syntax tree (AST) level• Simplification phase breaks up complex C statements

– removes C side effects and other nuances– allows matching in the middle of a complex expression

• Specification consists of pattern-function pairs– patterns match against statements, expressions, and special events– on a match, call is made to corresponding external function

Page 22: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

22

Testing ProcessTesting Process

SourceCode

Instrumentationspecification

InstrumentedExecutable

Errorreports

Compile(GCC w/MUSE)

Run test suite

Debug andfix errors

Page 23: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

23

Input Checker ImplementationInput Checker Implementation• Shadow state stores checker bookkeeping info:

– integers: bounds and string length information– arrays: maximum string size, null flag, and actual size

• Stored in hash tables (shadow state table)– hash tables are indexed by address– separate hash tables for integers and arrays

• Pointers use the array hash table• Debug tracing mode can help find source of error

lb: 0ub: 5

ShadowStateTable

int x;shadow

state for x:

&x

Page 24: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

24

Results: Bugs FoundResults: Bugs FoundProgram Description

DefectsFound

Add’l FalseAlarms

anagram anagram generator 2 0ft fast Fourier transform 2 0ks graph partitioning 3 0yacr2 channel router 2 1betaftpd file transfer protocol daemon 2 1gaim instant messaging client 1 1ghttpd web server 3 2openssh secure shell client / server 2 0thttpd web server 0 1

TOTAL 17 6

Page 25: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

25

Results: Comparison to Static ApproachesResults: Comparison to Static ApproachesProgram:anagramftksyacr2betaftpdgaimghttpdopensshthttpd

My approach:223221320

BOON:00000core dump0core dump0

MetaCompilation:Couldnotgetaccesstotheirbugdetectionsystem.

Page 26: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

26

Initial Performance ResultsInitial Performance Results

Orig New Ratio

anagram 0.06 3.15 52.50 1,848 538 48,469,011 83.2%ft 0.18 5.32 29.56 2,881 559 76,221,854 71.7%ks 0.05 3.96 79.20 2,738 582 58,597,111 59.6%yacr2 0.12 22.63 188.58 11,891 3,817 300,490,072 76.7%betaftpd 0.07 0.53 7.57 8,186 2,205 6,320,450 94.5%ghttpd 0.52 1.08 2.08 4,471 1,256 6,178,897 98.4%openssh 0.70 1.00 1.43 97,851 26,858 493,716 94.6%thttpd 0.15 2.57 17.13 23,804 6,362 24,024,093 85.3%

Useless (Lower Bound)

Static sites

Dynamic sites

Run Time (seconds)Simple Stmts

Page 27: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

27

Eliminating Unnecessary InstrumentationEliminating Unnecessary Instrumentation• Many variables do not need shadow state:

– Variables that never hold input data– Variables that do not produce results used in dangerous

operations• Use static analysis to only apply instrumentation to

variables that need shadow state– At least 83% of instrumentation sites are useless!

• Algorithm is similar to that of constant propagation in a compiler

• Implemented in Dflow, a whole program dataflow analysis tool we created

Page 28: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

28

Example: Removing Unneeded InstrumentationExample: Removing Unneeded Instrumentationint a, b, c, d, x[5];

a = get_input_int();

b = get_input_int();

c = 2;

d = b;

x[a] = 3;

x[c] = 6;printf(“%d\n”, d);

Page 29: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

29

Example: Removing Unneeded InstrumentationExample: Removing Unneeded Instrumentationint a, b, c, d, x[5];create_array_state(x);a = get_input_int();create_int_bound_state(&a);b = get_input_int();create_int_bound_state(&b);c = 2;remove_int_state(&c);d = b;copy_int_state(&d, &b);check_array_ref(x, &a);x[a] = 3;check_array_ref(x, &c);x[c] = 6;printf(“%d\n”, d);

Page 30: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

30

Example: Removing Unneeded InstrumentationExample: Removing Unneeded Instrumentationint a, b, c, d, x[5];create_array_state(x);a = get_input_int();create_int_bound_state(&a);b = get_input_int();create_int_bound_state(&b);c = 2;remove_int_state(&c);d = b;copy_int_state(&d, &b);check_array_ref(x, &a);x[a] = 3;check_array_ref(x, &c);x[c] = 6;printf(“%d\n”, d);

Unnecessary!c never holds

input data

Page 31: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

31

Example: Removing Unneeded InstrumentationExample: Removing Unneeded Instrumentationint a, b, c, d, x[5];create_array_state(x);a = get_input_int();create_int_bound_state(&a);b = get_input_int();create_int_bound_state(&b);c = 2;remove_int_state(&c);d = b;copy_int_state(&d, &b);check_array_ref(x, &a);x[a] = 3;check_array_ref(x, &c);x[c] = 6;printf(“%d\n”, d);

Unnecessary!input value in b never used in

dangerousoperation

Page 32: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

32

Results: Removing Unneeded InstrumentationResults: Removing Unneeded Instrumentation

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

anagram ft ks yacr2 betaftpd ghttpd openssh thttpd AVG.

Per

form

ance

Im

prov

emen

t

Input Derived Propgation Algorithm

Dangerous Operation Propagation Algorithm

Both Algorithms

Page 33: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

33

Results: Removing Unneeded InstrumentationResults: Removing Unneeded InstrumentationInput

AlgorithmDanger

AlgorithmBoth

AlgorithmsInput-based Dangerous

Input-based & Dangerous

anagram 39.6% 23.7% 72.4% 18.3% 21.4% 3.1%

ft 73.3% 0.1% 99.7% 13.3% 8.6% 0.0%

ks 27.6% 20.3% 92.5% 42.6% 26.6% 17.2%

yacr2 52.3% 0.0% 52.3% 43.3% 22.9% 12.1%

betaftpd 29.3% 3.3% 48.5% 29.2% 10.1% 3.4%

ghttpd 89.0% 0.0% 89.0% 16.2% 13.1% 3.1%

openssh 35.8% 1.1% 39.6% 30.0% 24.4% 3.5%

thttpd 52.4% 0.0% 52.4% 35.5% 15.5% 9.6%

Instrumentation sites reduced after ... Integers that are …

Page 34: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

34

Approaches to Shadow State ManagementApproaches to Shadow State Management• Shadow state table (Example: Jones & Kelly):

– Slow to maintain and access– Does not modify the variables within the program

• Fat variables (Example: Safe C):– Fast to access, shadow state is contained within the variable – Variables no longer fit in within a register– All variables of a particular type must be instrumented– Must account for functions that were not compiled using fat

variables

Page 35: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

35

Referencing Local Shadow State by NameReferencing Local Shadow State by Name• Compiler creates separate variable to store shadowed

state for local variables– Quick to access, lookup to table not necessary– Original variable is not modified in any form– Only created for local variables that need shadowed state

• Still need shadow state table for:– heap variables– aliased local variables (used in the “address-of (&)” operator)

Page 36: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

36

Results: Shadow State by Name Results: Shadow State by Name (Performance)(Performance)

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

anagram ft ks yacr2 betaftpd ghttpd openssh thttpd AVG.

Pe

rfo

rma

nce

Imp

rove

me

nt

Shadow State by Name

Useless Instrumentation Removal

Both Optimizations

Page 37: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

37

Results: Shadow State by Name Results: Shadow State by Name (Integer Shadow State Table Accesses)(Integer Shadow State Table Accesses)

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

anagram ft ks yacr2 betaftpd ghttpd openssh thttpd AVG.

Inte

ge

r T

ab

le A

cce

sse

s R

ed

uct

ion

%

Shadow State by Name

Useless Instrumentation Removal

Both Optimizations

Page 38: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

38

Overall Performance ResultsOverall Performance Results

Time Ratio Time Ratio Time Ratio Time Ratioanagram 0.06 3.15 52.50 1.32 22.00 2.24 37.33 1.12 18.67ft 0.18 5.32 29.56 0.88 4.89 2.95 16.39 0.90 5.00ks 0.05 3.96 79.20 0.45 9.00 2.28 45.60 0.33 6.60yacr2 0.12 22.63 188.58 11.87 98.92 14.53 121.08 8.96 74.67betaftpd 0.07 0.53 7.57 0.27 3.86 0.29 4.14 0.18 2.57ghttpd 0.52 1.08 2.08 0.69 1.33 0.73 1.40 0.59 1.13openssh 0.70 1.00 1.43 0.91 1.30 0.83 1.19 0.78 1.11thttpd 0.15 2.57 17.13 1.78 11.87 2.14 14.27 1.82 12.13

BothBase line

Useless Inst. Removed

Shadow State by Name

Unoptimized

Page 39: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

39

ConclusionConclusion• Our dynamic approach detects input-related faults

reducing the dependence on the precise input• Shadows variables derived from input with additional

state:– Integers: upper and lower bounds– Strings: maximum string size and known null flag

• Found 17 bugs in 9 programs– 2 known high security faults in OpenSSH

• Improved performance by 58%– removing unneeded instrumentation sites– improved shadow state management

Page 40: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

40

Future WorkFuture Work• Reduce the dependence on the control path• Improve performance overhead by eliminating redundant

instrumentation• Add symbolic analysis support • Address these common scenarios:

– pointer walking (manual string handling)– multiple string concatenation into a single buffer

• Add static bug detection work to prove operations safe• Combine MUSE and Dflow into a single standalone tool• Explore other correctness properties

Page 41: Efficient Dynamic Detection of  Input-Related Security Faults

Advanced Computer Architecture LabUniversity of Michigan

41

Questions and AnswersQuestions and Answers