SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf ·...

61
SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTION PETR HOSEK CRISTIAN CADAR Petr Hosek is a recipient of the Google European Fellowship in Software Engineering and this research is supported in part by this Google Fellowship

Transcript of SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf ·...

Page 1: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONPETR HOSEKCRISTIAN CADAR

Petr Hosek is a recipient of the Google European Fellowship in Software Engineering and this research is supported in part by this Google Fellowship

Page 2: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

2

2009 2010

01 02 03 04 05 06 07 08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 03121110

Page 3: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

2

2009 2010

for (h = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]);

HTTP ETag hash value computation in etag_mutate

01 02 03 04 05 06 07 08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 03121110

Page 4: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

2

2009 2010

HTTP ETag hash value computation in etag_mutate

for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]);

01 02 03 04 05 06 07 08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 03121110

Page 5: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

2

2009 2010

HTTP ETag hash value computation in etag_mutate

for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]);

Bug diagnosed in issue tracker

01 02 03 04 05 06 07 08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 03121110

Page 6: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

2

2009 2010

HTTP ETag hash value computation in etag_mutate

for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]);

etag_mutate(con->physical.etag, srv->tmp_buf);

File (re)compression in mod_compress_physical

Bug diagnosed in issue tracker

01 02 03 04 05 06 07 08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 03121110

Page 7: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

if (use_etag) { etag_mutate(con->physical.etag, srv->tmp_buf);}

2

2009 2010

HTTP ETag hash value computation in etag_mutate

for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]);

File (re)compression in mod_compress_physical

Bug diagnosed in issue tracker

01 02 03 04 05 06 07 08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 03121110

Page 8: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

if (use_etag) { etag_mutate(con->physical.etag, srv->tmp_buf);}

2

2009 2010

HTTP ETag hash value computation in etag_mutate

for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]);

File (re)compression in mod_compress_physical

Bug diagnosed in issue tracker

01 02 03 04 05 06 07 08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 03121110 04 05 06 07 08 09 10 11 12 01 02 03 04

Page 9: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

Introducing novel approach for improving software updates:Multi-version execution based approachRelying on abundance of resources to improve reliabilityRun the new version in parallel with the existing oneSynchronise the execution of the versionsUse output of correctly executing version

3

A year ago in a city far far away...

Page 10: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

4

Synchronisation and fail-recovery mechanism

LIG

HTT

PD 1

.4.2

2

LIG

HTT

PD 1

.4.2

3

Page 11: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

4

Synchronisation and fail-recovery mechanism

LIG

HTT

PD 1

.4.2

2

LIG

HTT

PD 1

.4.2

3

SynchronisationCompare individual

system calls and their arguments

Page 12: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

4

GET /index.html HTTP/1.1Host: srg.doc.ic.ac.ukAccept-Encoding: gzip

Synchronisation and fail-recovery mechanism

LIG

HTT

PD 1

.4.2

2

LIG

HTT

PD 1

.4.2

3

SynchronisationCompare individual

system calls and their arguments

Page 13: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

4

GET /index.html HTTP/1.1Host: srg.doc.ic.ac.ukAccept-Encoding: gzip

Synchronisation and fail-recovery mechanism

LIG

HTT

PD 1

.4.2

2

LIG

HTT

PD 1

.4.2

3

SynchronisationCompare individual

system calls and their arguments

CheckpointingUse clone to take a snapshot of a process

Page 14: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

4

GET /index.html HTTP/1.1Host: srg.doc.ic.ac.ukAccept-Encoding: gzip

for (h = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]);

Synchronisation and fail-recovery mechanism

LIG

HTT

PD 1

.4.2

2

LIG

HTT

PD 1

.4.2

3

SynchronisationCompare individual

system calls and their arguments

CheckpointingUse clone to take a snapshot of a process

Page 15: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

4

GET /index.html HTTP/1.1Host: srg.doc.ic.ac.ukAccept-Encoding: gzip

for (h = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]);

for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]);

Synchronisation and fail-recovery mechanism

LIG

HTT

PD 1

.4.2

2

LIG

HTT

PD 1

.4.2

3

CrashSegmentation fault

SynchronisationCompare individual

system calls and their arguments

CheckpointingUse clone to take a snapshot of a process

Page 16: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

4

GET /index.html HTTP/1.1Host: srg.doc.ic.ac.ukAccept-Encoding: gzip

for (h = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]);

for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]);

Synchronisation and fail-recovery mechanism

LIG

HTT

PD 1

.4.2

2

LIG

HTT

PD 1

.4.2

3

CrashSegmentation fault

SynchronisationCompare individual

system calls and their arguments

CheckpointingUse clone to take a snapshot of a process

Failure recoveryRestart the snapshot and replace the code with the code of the

new version

Page 17: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

4

GET /index.html HTTP/1.1Host: srg.doc.ic.ac.ukAccept-Encoding: gzip

for (h = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]);

for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]);

for (h = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]);

Synchronisation and fail-recovery mechanism

LIG

HTT

PD 1

.4.2

2

LIG

HTT

PD 1

.4.2

3

CrashSegmentation fault

SynchronisationCompare individual

system calls and their arguments

CheckpointingUse clone to take a snapshot of a process

Failure recoveryRestart the snapshot and replace the code with the code of the

new version

ReconvergenceReturn to the original code and continue execution

Page 18: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

4

GET /index.html HTTP/1.1Host: srg.doc.ic.ac.ukAccept-Encoding: gzip

for (h = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]);

for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]);

for (h = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]);

Synchronisation and fail-recovery mechanism

LIG

HTT

PD 1

.4.2

2

LIG

HTT

PD 1

.4.2

3

CrashSegmentation fault

SynchronisationCompare individual

system calls and their arguments

CheckpointingUse clone to take a snapshot of a process

Failure recoveryRestart the snapshot and replace the code with the code of the

new version

ReconvergenceReturn to the original code and continue execution

Page 19: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

Recovery considered successful if versions exhibit the same externally observable behaviour after recovery:

Assumes small bug propagation distanceCrashes are the only type of observable divergencesThe non-crashing version used as an oracleIf unrecoverable, continue with the non-crashing version

5

Assumptions

Page 20: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

6

Synchronisation possible at multiple levels of abstraction

Total Synchronisation

Uncoordinated Execution

Page 21: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

6

System Calls

Synchronisation possible at multiple levels of abstraction

Total Synchronisation

Uncoordinated Execution

Page 22: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

6

Function Calls

System Calls

Synchronisation possible at multiple levels of abstraction

Total Synchronisation

Uncoordinated Execution

Page 23: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

6

Function Calls

System Calls Inputs/Outputs

Synchronisation possible at multiple levels of abstraction

Total Synchronisation

Uncoordinated Execution

Page 24: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

VERSION 1void fib(int n){ int f[n+1]; f[1] = f[2] = 1; for (int i = 3; i <= n; ++i) f[i] = f[i-1] + f[i-2];

printf(“%d\n”, f[n]);}

VERSION 2void fib(int n){ int a = 1, b = 1; for (int i = 3; i <= n; ++i) { int c = a + b; a = b, b = c; } printf(“%d\n”, b);}

7

int main(int argc, char **argv){ fib(5); fib(6);}

Example testing code

System calls define external behaviour

Tested with both implementations

Page 25: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

VERSION 1void fib(int n){ int f[n+1]; f[1] = f[2] = 1; for (int i = 3; i <= n; ++i) f[i] = f[i-1] + f[i-2];

printf(“%d\n”, f[n]);}

VERSION 2void fib(int n){ int a = 1, b = 1; for (int i = 3; i <= n; ++i) { int c = a + b; a = b, b = c; } printf(“%d\n”, b);}

VERSION 1

write(1, “5\n”, 2) = 2write(1, “8\n”, 2) = 2

VERSION 2

write(1, “5\n”, 2) = 2write(1, “8\n”, 2) = 2

7

int main(int argc, char **argv){ fib(5); fib(6);}

Example testing code

System calls define external behaviour

Snippet of system call traceObtained using the strace tool

Snippet of system call traceObtained using the strace tool

Tested with both implementations

Page 26: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

0

0.25

0.5

0.75

1

2379 2393 2411 2432 2473 2494 2517 2546 2578 2599 2621 2635

diffe

renc

e (n

orm

alis

ed)

lighttpd Subversion revision

Taken on Linux kernel 2.6.40 and glibc 2.14 using strace tool and custom post-processing (details in the paper)Measured using lighttpd regression suite on 164 revisions

External behaviour evolves sporadically95% of revisions introduce no change

8

TracesSource code

Page 27: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

9

Mx architecture

MULTI-VERSIONAPPLICATION

CONVENTIONALAPPLICATION

Mx Execution Environment

OPERATING SYSTEM

SYSTEM CALL INTERPOSITION

STAT

IC A

NALY

SIS

RUNT

IME

MAN

IPUL

ATIO

N

LINUX KERNEL

Page 28: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

Implementation for x86 and x86-64 LinuxCombines binary static analysis, lightweight checkpointing and runtime code patching

Completely transparent, runs on unmodified binaries

Runs two versions with small differences in behaviour

Focus on application crashes and recovery

10

Page 29: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

Multi-eXecution Monitor

Execute and monitor multi-version applications:

Intercepting system calls (via ptrace interface)Semantically comparing system calls argumentsEnvironment virtualisation (e.g. files and sockets)

11

MULTI-VERSIONAPPLICATION

Mx Execution EnvironmentSYSTEM CALL INTERPOSITION

STAT

IC A

NALY

SIS

RUNT

IME

MAN

IPUL

ATIO

N

Page 30: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

Runtime Execution Manipulator

Runtime code patching and fault recovery:

OS-level checkpointing (using clone syscall)

Runtime stack rewriting (libunwind)Breakpoint insertion and handling

12

MULTI-VERSIONAPPLICATION

Mx Execution EnvironmentSYSTEM CALL INTERPOSITION

STAT

IC A

NALY

SIS

RUNT

IME

MAN

IPUL

ATIO

N

Page 31: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

Static Executable Analyser

Create various mappings between the two version binaries:

Extracting function symbols from binaries (libbfd)

Machine code disassembling and analysis (libopcodes)Binary call graph reconstruction and matching

13

MULTI-VERSIONAPPLICATION

Mx Execution EnvironmentSYSTEM CALL INTERPOSITION

STAT

IC A

NALY

SIS

RUNT

IME

MAN

IPUL

ATIO

N

Page 32: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

%rsp

0xdeadbf64

VERSION 2VERSION 2VERSION 2

0xdeadbef3 <foo>:0xdeadbef3 <foo>:0xdeadbef3 <foo>:

f5e: callq 0xdeadcaff <bar>

0xdeadcaff <bar>:0xdeadcaff <bar>:0xdeadcaff <bar>:

b07: mov -0x40(%rbp),%rax

b0a: callq *%rax

14

VERSION 1VERSION 1VERSION 10xdeadbeef <foo>:0xdeadbeef <foo>:0xdeadbeef <foo>:

f59: callq 0xdeadcafe <bar>

0xdeadcafe <bar>:0xdeadcafe <bar>:0xdeadcafe <bar>:

b07: mov -0x40(%rbp),%rax

b0a: callq *%rax

%rsp

0xdeadbf5e

Snippet of instruction code Snippet of instruction code

Execution stack Execution stack

Page 33: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

%rsp

0xdeadbf64

14

VERSION 1VERSION 1VERSION 10xdeadbeef <foo>:0xdeadbeef <foo>:0xdeadbeef <foo>:

f59: callq 0xdeadcafe <bar>

0xdeadcafe <bar>:0xdeadcafe <bar>:0xdeadcafe <bar>:

b07: mov -0x40(%rbp),%rax

b0a: callq *%rax

%rsp

0xdeadbf5e

VERSION 2’VERSION 2’VERSION 2’0xdeadbeef <foo>:0xdeadbeef <foo>:0xdeadbeef <foo>:

f59: callq 0xdeadcafe <bar>

0xdeadcafe <bar>:0xdeadcafe <bar>:0xdeadcafe <bar>:

b07: mov -0x40(%rbp),%rax

b0a: callq *%rax

Snippet of instruction code Snippet of instruction code

Execution stack Execution stack

Page 34: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

14

VERSION 1VERSION 1VERSION 10xdeadbeef <foo>:0xdeadbeef <foo>:0xdeadbeef <foo>:

f59: callq 0xdeadcafe <bar>

0xdeadcafe <bar>:0xdeadcafe <bar>:0xdeadcafe <bar>:

b07: mov -0x40(%rbp),%rax

b0a: callq *%rax

%rsp

0xdeadbf5e

%rsp

0xdeadbf5e

VERSION 2’VERSION 2’VERSION 2’0xdeadbeef <foo>:0xdeadbeef <foo>:0xdeadbeef <foo>:

f59: callq 0xdeadcafe <bar>

0xdeadcafe <bar>:0xdeadcafe <bar>:0xdeadcafe <bar>:

b07: mov -0x40(%rbp),%rax

b0a: callq *%rax

Snippet of instruction code Snippet of instruction code

Execution stack Execution stack

Page 35: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

14

VERSION 1VERSION 1VERSION 10xdeadbeef <foo>:0xdeadbeef <foo>:0xdeadbeef <foo>:

f59: callq 0xdeadcafe <bar>

0xdeadcafe <bar>:0xdeadcafe <bar>:0xdeadcafe <bar>:

b07: mov -0x40(%rbp),%rax

b0a: callq *%rax

%rsp

0xdeadbf5e

%rsp

0xdeadbf5e

VERSION 2’VERSION 2’VERSION 2’0xdeadbeef <foo>:0xdeadbeef <foo>:0xdeadbeef <foo>:

f59: callq 0xdeadcafe <bar>

0xdeadcafe <bar>:0xdeadcafe <bar>:0xdeadcafe <bar>:

b07: mov -0x40(%rbp),%rax

b0a: callq *%rax

VERSION 2’VERSION 2’VERSION 2’0xdeadbeef <foo>:0xdeadbeef <foo>:0xdeadbeef <foo>:

f59: callq 0xdeadcafe <bar>

0xdeadcafe <bar>:0xdeadcafe <bar>:0xdeadcafe <bar>:

b07: mov -0x40(%rbp),%rax

b0a: callq *%rax

Snippet of instruction code Snippet of instruction code

Execution stack Execution stack

Page 36: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

14

VERSION 1VERSION 1VERSION 10xdeadbeef <foo>:0xdeadbeef <foo>:0xdeadbeef <foo>:

f59: callq 0xdeadcafe <bar>

0xdeadcafe <bar>:0xdeadcafe <bar>:0xdeadcafe <bar>:

b07: mov -0x40(%rbp),%rax

b0a: callq *%rax

%rsp

0xdeadbf5e

%rsp

0xdeadbf5e

VERSION 2’VERSION 2’VERSION 2’0xdeadbeef <foo>:0xdeadbeef <foo>:0xdeadbeef <foo>:

f59: callq 0xdeadcafe <bar>

0xdeadcafe <bar>:0xdeadcafe <bar>:0xdeadcafe <bar>:

b07: mov -0x40(%rbp),%rax

b0a: callq *%rax

VERSION 2’VERSION 2’VERSION 2’0xdeadbeef <foo>:0xdeadbeef <foo>:0xdeadbeef <foo>:

f59: callq 0xdeadcafe <bar>

0xdeadcafe <bar>:0xdeadcafe <bar>:0xdeadcafe <bar>:

b07: mov -0x40(%rbp),%rax

b0a: callq *%rax

VERSION 2’VERSION 2’VERSION 2’0xdeadbeef <foo>:0xdeadbeef <foo>:0xdeadbeef <foo>:

f59: callq 0xdeadcafe <bar>

0xdeadcafe <bar>:0xdeadcafe <bar>:0xdeadcafe <bar>:

aff: int $3

b07: mov -0x40(%rbp),%rax

b0a: callq *%rax

Snippet of instruction code Snippet of instruction code

Execution stack Execution stack

Page 37: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

Suitable for type of changes and applications:

Changes which do not affect memory layout

e.g., refactorings, security patches

Applications which provide synchronisation points

e.g., servers structured around the main dispatch loop

Where reliability is more important than performance

e.g., interactive apps, some server scenarios

15

Page 38: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

robj *o = lookupKeyRead(c->db, c->argv[1]);if (o == NULL) { addReplySds(c,sdscatprintf(sdsempty(), "*%d\r\n",c->argc-2)); for (i = 2; i < c->argc; i++) { addReply(c,shared.nullbulk); } return;} else { if (o->type != REDIS_HASH) { addReply(c,shared.wrongtypeerr); return; }}addReplySds(c,sdscatprintf(sdsempty(), "*%d\r\n",c->argc-2));

16

Redis regression bug #344 introduced during refactoring

In-memory NoSQL database

HMGET command implementation in hmgetCommand function

Survived a number of crash bugs in several popular server applications

Page 39: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

16

Redis regression bug #344 introduced during refactoring

robj *o, *value;o = lookupKeyRead(c->db,c->argv[1]);if (o != NULL && o->type != REDIS_HASH) { addReply(c,shared.wrongtypeerr); return;}addReplySds(c,sdscatprintf(sdsempty(), "*%d\r\n",c->argc-2));for (i = 2; i < c->argc; i++) { if (o != NULL && (value = hashGet(o,c->argv[i])) != NULL) { addReplyBulk(c,value); decrRefCount(value); } else { addReply(c,shared.nullbulk); }}

In-memory NoSQL database

HMGET command implementation in hmgetCommand function

Survived a number of crash bugs in several popular server applications

Missing return statement

Page 40: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

17

UTILITY BUG TIME SPANmd5sumsha1sum Buffer underflow 1,124 revs. (1 year 7 months)

mkdirmkfifomknod

NULL-pointer dereference 2,937 revs. (over 4 years)

cut Buffer overflow 1,201 revs. (2 years 3 months)

APPLICATION/ISSUE BUG TIME SPANlighttpd #2169 Loop index underflow 87 revs. (2 months 2 days)lighttpd #2140 Off-by-one error 12 revs. (2 months 1 day)

redis #344 Missing return statement 27 revs. (6 days)

Interactive applications:

Server applications:

Page 41: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

0

0.5

1

1.5

2

400.

PERL

BENC

H

401.

BZIP

2

403.

GCC

429.

MCF

445.

GOBM

K

456.

HMM

ER

458.

SJEN

G

462.

LIBQ

UANT

UM

464.

H264

REF

471.

OMNE

TPP

473.

ASTA

R

483.

XALA

NCBM

K

410.

BWAV

ES

416.

GAM

ES

433.

MIL

C

434.

ZEUS

MP

435.

GROM

ACS

436.

CACT

USAD

M

437.

LESL

IE3D

444.

NAM

D

447.

DEAL

II

450.

SOPL

EX

453.

POVR

AY

454.

CALC

ULIX

459.

GEM

SFDT

D

465.

TONT

O

470.

LBM

481.

WRF

482.

SPHI

NX3

exec

utio

n tim

e (n

orm

alis

ed)

17.91% overhead on SPEC CPU2006

Taken on 3.50 GHz Intel Xeon E3 1280 with 16 GB of RAM, Linux kernel 3.1.9Measured using SPEC CPU2006 1.2

18

Native Mx

over single version despite 2x utilisation cost

Page 42: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

Interactive applications:

19

UTILITY INPUT SIZE OVERHEADmd5sumsha1sum <1.25MB <100ms (imperceptible)

mkdirmkfifomknod

<115 nested directories <100ms (imperceptible)

cut <1.10MB <100ms (imperceptible)

APPLICATION SCENARIO OVERHEAD

lighttpdlocalhost/network 2.60x – 3.49x

lighttpddistant networks 1.01x – 1.04x

redislocalhost/network 3.74x – 16.72x

redisdistant networks 1.00x – 1.05x

Taken on 3.50 GHz Intel Xeon E3 1280 with 16 GB of RAM, Linux kernel 3.1.9Measured using redis-benchmark and http_load

Server applications:

Measured using Coreutils 6.10Taken on 3.50 GHz Intel Xeon E3 1280 with 16 GB of RAM, Linux kernel 3.1.9

Page 43: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

Better performance overhead:System call binary rewriting

Tolerance to system call divergences:Event streaming

20

“The New Mx”

Page 44: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

REDIS

read(6, “PING\r\n”, 1024)

21

Snippet of system call trace

Snippet of system call trace

Page 45: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

21

MX

ptrace(PTRACE_GETREGS, 7, {...}, NULL)ptrace(PTRACE_SETREGS, 7, {...}, {...})ptrace(PTRACE_SYSCALL, 7, {...}, NULL)read(8, “PING\r\n”, 1024)

ptrace(PTRACE_GETREGS, 7, {...}, NULL)process_vm_writev(7, {?}, 1, {?}, 1, 0)ptrace(PTRACE_SETREGS, 7, {...}, {...})ptrace(PTRACE_SYSCALL, 7, {...}, NULL)

REDIS

--- SIGTRAP ---

getpid()

--- SIGTRAP ---

Snippet of system call trace

Snippet of system call trace

Page 46: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

VMA

VMA 21

MX

ptrace(PTRACE_GETREGS, 7, {...}, NULL)ptrace(PTRACE_SETREGS, 7, {...}, {...})ptrace(PTRACE_SYSCALL, 7, {...}, NULL)read(8, “PING\r\n”, 1024)

ptrace(PTRACE_GETREGS, 7, {...}, NULL)process_vm_writev(7, {?}, 1, {?}, 1, 0)ptrace(PTRACE_SETREGS, 7, {...}, {...})ptrace(PTRACE_SYSCALL, 7, {...}, NULL)

REDIS

--- SIGTRAP ---

getpid()

--- SIGTRAP ---

Snippet of system call trace

Snippet of system call trace

Page 47: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

VMA

22

GLIBCGLIBCGLIBC0xdeadbeef <__libc_read>:0xdeadbeef <__libc_read>:0xdeadbeef <__libc_read>:

2a: mov $0x0,%eax

2f: syscall

REDISREDISREDIS0x4050f0 <anetRead>:0x4050f0 <anetRead>:0x4050f0 <anetRead>:

405130: callq <read@plt>

Snippet of instruction code

Page 48: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

VMA

22

GLIBCGLIBCGLIBC0xdeadbeef <__libc_read>:0xdeadbeef <__libc_read>:0xdeadbeef <__libc_read>:

2a: mov $0x0,%eax

2f: syscall

REDISREDISREDIS0x4050f0 <anetRead>:0x4050f0 <anetRead>:0x4050f0 <anetRead>:

405130: callq <read@plt>

NXNXNX0x13cd0 <syscall_enter>:0x13cd0 <syscall_enter>:0x13cd0 <syscall_enter>:

13d31: cmp $0x1,%r10

13d3a: callq *%r10

GLIBCGLIBCGLIBC0xdeadbeef <__libc_read>:0xdeadbeef <__libc_read>:0xdeadbeef <__libc_read>:

2a: jmpq $0x13cd0

Snippet of instruction code

Page 49: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

23

System call synchronisation possible at different phases

Page 50: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

23

System call synchronisation possible at different phases

Every system call

Mx

Page 51: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

23

Record-replay

System call synchronisation possible at different phases

Every system call

Mx

Page 52: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

23

Record-replay

System call synchronisation possible at different phases

Every system call

Mx

Event streaming

Nx

Page 53: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

APPLICATION

LEADER

APPLICATION

FOLLOWER

En-14

En-13

En-12

En-11

En-10

En-9

En-8

En-7

En-6

En-5

En-4

24

En-3

En-2

En-1

En

Event log

Page 54: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

APPLICATION

LEADER

APPLICATION

FOLLOWER

En-14

En-13

En-12

En-11

En-10

En-9

En-8

En-7

En-6

En-5

En-4

24

APPLICATION

FOLLOWER

En-3

En-2

En-1

En

Event log

Page 55: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

APPLICATION

LEADER

APPLICATION

FOLLOWER

En-14

En-13

En-12

En-11

En-10

En-9

En-8

En-7

En-6

En-5

En-4

24

APPLICATION

FOLLOWER

APPLICATION

FOLLOWER

En-3

En-2

En-1

En

Event log

Page 56: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

APPLICATION

FOLLOWER

APPLICATION

LEADER

En-14

En-13

En-12

En-11

En-10

En-9

En-8

En-7

En-6

En-5

En-4

24

APPLICATION

FOLLOWER

APPLICATION

FOLLOWER

En-3

En-2

En-1

En

Event log

Page 57: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

APPLICATION

LEADER

APPLICATION

LEADER

En-14

En-13

En-12

En-11

En-10

En-9

En-8

En-7

En-6

En-5

En-4

24

APPLICATION

FOLLOWER

APPLICATION

FOLLOWER

En

En-2

En-1

En-3

Event log

Page 58: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

APPLICATION

LEADER

APPLICATION

LEADER

En-14

En-13

En-12

En-11

En-10

En-9

En-8

En-7

En-6

En-5

En-4

24

APPLICATION

FOLLOWER

APPLICATION

FOLLOWER

En

En-2

En-1

En-3

Event log

Page 59: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

Support for more complex code changes:Data structure inference & excavationControl flow graph isomorphismsCall stack reconstruction

Support for non-crashing type of divergences:Infinite loops and deadlocks

25

Future Work

Page 60: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

Novel approach for improving software updates:Based on multi-version executionMx can survive crash bugs in real apps

Many opportunities for future work:Better performance overheadTolerance to system call divergenciesSupport for more complex code changesSupport for non-crashing type of divergences

26

Summary

Page 61: SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTIONsrg.doc.ic.ac.uk/files/slides/mx-hotswup-13.pdf · h = (h > 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery

Distinct code bases, manually-generated

N-version programming: A fault-tolerance approach to reliability of software operation

Chen, L., and Avizienis, A. FTCS’78

Using replicated execution for a more secure and reliable web browser

Xue, H., Dautenhahn, N., and King, S. T. NDSS’12

Variants of the same code, automatically generated

N-variant systems: a secretless framework for security through diversity

Cox, B., Evans, D., Filipi, A., Rowanhill, J., Hu, W., Davidson, J., Knight, J., Nguyen-Tuong, A., and Hiser, J. USENIX Security’06

Run-time defense against code injection attacks using replicated execution

Salamat, B., Jackson, T., Wagner, G., Wimmer, C., and Franz, M. IEEE Transactions 2011

Online validation of different manually-evolved versions

Efficient online validation with delta execution

Tucek, J., Xiong, W., Zhou, Y. ASPLOS’09

Tachyon: Tandem Execution for Efficient Live Patch Testing

Maurer, M., Brumley, D. USENIX Security’12

27