Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent...

79
Instant OS Updates via Userspace Checkpoint-and-Restart Sanidhya Kashyap, Changwoo Min, Byoungyoung Lee, Taesoo Kim, Pavel Emelyanov

Transcript of Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent...

Page 1: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Instant OS Updates via Userspace Checkpoint-and-Restart

Sanidhya Kashyap, Changwoo Min, Byoungyoung Lee,Taesoo Kim, Pavel Emelyanov

Page 2: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

OS updates are prevalent

Page 3: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

And OS updates are unavoidable

● Prevent known, state-of-the-art attacks– Security patches

● Adopt new features – New I/O scheduler features

● Improve performance– Performance patches

Page 4: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O
Page 5: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Unfortunately, system updates come at a cost

● Unavoidable downtime● Potential risk of system failure

Page 6: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Unfortunately, system updates come at a cost

● Unavoidable downtime● Potential risk of system failure

$109k per minuteHidden costs (losing customers)

Page 7: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Example: memcached

● Facebook's memcached servers incur a downtime of 2-3 hours per machine– Warming cache (e.g., 120 GB) over the network

Page 8: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Example: memcached

● Facebook's memcached servers incur a downtime of 2-3 hours per machine– Warming cache (e.g., 120 GB) over the network

Our approach updates OS in 3 secsfor 32GB of data from v3.18 to v3.19

for Ubuntu / Fedora releases

Page 9: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Existing practices for OS updates

● Dynamic Kernel Patching (e.g., kpatch, ksplice)

– Problem: only support minor patches

● Rolling Update (e.g., Google, Facebook, etc)

– Problem: inevitable downtime and requires careful planning

Page 10: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Existing practices for OS updates

● Dynamic Kernel Patching (e.g., kpatch, ksplice)

– Problem: only support minor patches

● Rolling Update (e.g., Google, Facebook, etc)

– Problem: inevitable downtime and requires careful planning

Losing application state is inevitable → Restoring memcached takes 2-3 hours

Page 11: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Existing practices for OS updates

● Dynamic Kernel Patching (e.g., kpatch, ksplice)

– Problem: only support minor patches

● Rolling Update (e.g., Google, Facebook, etc)

– Problem: inevitable downtime and requires careful planning

Losing application state is inevitable → Restoring memcached takes 2-3 hours

Goals of this work:● Support all types of patches ● Least downtime to update new OS● No kernel source modification

Page 12: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Problems of typical OS update

OS

Memcached

OSOS OSStop service

Page 13: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Problems of typical OS update

OS

Memcached

OS

New OS

OS OSStop service

Soft reboot

Page 14: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Problems of typical OS update

OS

Memcached

OS

New OSNew OS

Memcached

OS OSStop service

Soft reboot

Start service

Page 15: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Problems of typical OS update

OS

Memcached

OS

New OSNew OS

Memcached

OS OSStop service

Soft reboot

Start service

2-3 hours of downtime

Page 16: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Problems of typical OS update

OS

Memcached

OS

New OSNew OS

Memcached

OS OSStop service

Soft reboot

Start service

2-3 hours of downtime

2-10 minutes of downtime

Page 17: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Problems of typical OS update

OS

Memcached

OS

New OSNew OS

Memcached

OS OSStop service

Soft reboot

Start service

Is it possible to keep the

application state?

2-3 hours of downtime

2-10 minutes of downtime

Page 18: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

OS updates loose application states

OS

Memcached

OS

New OSNew OS

Memcached

OS OSStop service

Soft reboot

Start service

KUP: Kernel update with applicationcheckpoint-and-restore (C/R)

Page 19: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

OS updates loose application states

OS

Memcached

OS

New OSNew OS

Memcached

OS OSStop service

Soft reboot

Start service

Memcached

Checkpoint

KUP: Kernel update with applicationcheckpoint-and-restore (C/R)

Page 20: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

OS updates loose application states

OS

Memcached

OS

New OSNew OS

Memcached

OS OSStop service

Soft reboot

Start service

Memcached

Memcahed

In-kernelswitch

Checkpoint

KUP: Kernel update with applicationcheckpoint-and-restore (C/R)

Page 21: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

OS updates loose application states

OS

Memcached

OS

New OSNew OS

Memcached

OS OSStop service

Soft reboot

Start service

Memcached

Memcahed

In-kernelswitch

Checkpoint

Restore

KUP: Kernel update with applicationcheckpoint-and-restore (C/R)

Page 22: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

OS updates loose application states

Stop service

Start service

Checkpoint

Restore

KUP's life cycle

KUP: Kernel update with applicationcheckpoint-and-restore (C/R)

In-kernelswitch

Page 23: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

OS updates loose application states

Stop service

Start service

Checkpoint

Restore

KUP's life cycle

KUP: Kernel update with applicationcheckpoint-and-restore (C/R)

In-kernelswitch

1-10 minutes of downtime

Page 24: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

OS updates loose application states

New OSNew OS

Stop service

Start service

Checkpoint

Restore

KUP's life cycle

KUP: Kernel update with applicationcheckpoint-and-restore (C/R)

In-kernelswitch

Challenge: how to further decrease

the potential downtime?

1-10 minutes of downtime

Page 25: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Techniques to decrease the downtime

Checkpoint

Restore

In-kernelswitch

1) Incremental checkpoint

Page 26: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Techniques to decrease the downtime

Checkpoint

Restore

In-kernelswitch

1) Incremental checkpoint

2) On-demand restore

Page 27: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Techniques to decrease the downtime

Checkpoint

Restore

In-kernelswitch

1) Incremental checkpoint

2) On-demand restore

3) FOAM: a snapshot abstraction

Page 28: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Techniques to decrease the downtime

Checkpoint

Restore

In-kernelswitch

1) Incremental checkpoint

2) On-demand restore

3) FOAM: a snapshot abstraction

4) PPP: reuse memorywithout an explicit dump

Page 29: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Techniques to decrease the downtime

Checkpoint

Restore

In-kernelswitch

1) Incremental checkpoint

2) On-demand restore

3) FOAM: a snapshot abstraction

4) PPP: reuse memorywithout an explicit dump

Page 30: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Incremental checkpoint

Timeline

S1

● Reduces downtime (up to 83.5%)● Problem: Multiple snapshots increase the restore time

Naivecheckpoint

downtime

Si Snapshot instance→

Page 31: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Incremental checkpoint

Timeline

S1

S1

● Reduces downtime (up to 83.5%)● Problem: Multiple snapshots increase the restore time

Naivecheckpoint

Incrementalcheckpoint

downtime

Si Snapshot instance→

Page 32: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Incremental checkpoint

Timeline

S1

S1S2

● Reduces downtime (up to 83.5%)● Problem: Multiple snapshots increase the restore time

Naivecheckpoint

Incrementalcheckpoint

downtime

Si Snapshot instance→

Page 33: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Incremental checkpoint

Timeline

S1

S1S2 S3

● Reduces downtime (up to 83.5%)● Problem: Multiple snapshots increase the restore time

Naivecheckpoint

Incrementalcheckpoint

downtime

Si Snapshot instance→

Page 34: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Incremental checkpoint

Timeline

S1

S1S2 S3

● Reduces downtime (up to 83.5%)● Problem: Multiple snapshots increase the restore time

Naivecheckpoint

Incrementalcheckpoint S4

downtime

downtime

Si Snapshot instance→

Page 35: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

On-demand restore● Rebind the memory once the application

accesses it

– Only map the memory region with snapshot and restart the application

● Decreases the downtime (up to 99.6%)● Problem: Incompatible with incremental

checkpoint

Page 36: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Problem: both techniques together result in inefficient application C/R

● During restore, need to map each pages individually

– Individual lookups to find the relevant pages– Individual page mapping to enable on-demand restore

S1S1

2 43

● An application has 4 pages as its working set size

● Incremental checkpoint has 2 iterations

– 1st iteration all 4 pages (1, 2, 3, 4) are dumped→

– 2nd iteration 2 pages (2, 4) are dirtied→

1

● Increases the restoration downtime (42.5%)

Page 37: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Problem: both techniques together result in inefficient application C/R

● During restore, need to map each pages individually

– Individual lookups to find the relevant pages– Individual page mapping to enable on-demand restore

S1S1

S2

3 2 4

● An application has 4 pages as its working set size

● Incremental checkpoint has 2 iterations

– 1st iteration all 4 pages (1, 2, 3, 4) are dumped→

– 2nd iteration 2 pages (2, 4) are dirtied→

1

● Increases the restoration downtime (42.5%)

Page 38: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

New abstraction: file-offset based address mapping (FOAM)

● Flat address space representation for the snapshot– One-to-one mapping between the address space and the

snapshot– No explicit lookups for the pages across the snapshots

– A few map operations to map the entire snapshot with address space

● Use sparse file representation– Rely on the concept of holes supported by modern file systems

● Simplifies incremental checkpoint and on-demand restore

Page 39: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Techniques to decrease the downtime

Checkpoint

Restore

In-kernelswitch

1) Incremental checkpoint

2) On-demand restore

3) FOAM: a snapshot abstraction

4) PPP: reuse memorywithout an explicit dump

Page 40: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Redundant data copy

Running CheckpointIn-kernel

switchRestore Running

OS

Running

● Application C/R copies data back and forth● Not a good fit for applications with huge memory

Memcached

RAM2 431

Page 41: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Redundant data copy

Running CheckpointIn-kernel

switchRestore Running

S1Snapshot2 431

OS

Checkpoint

● Application C/R copies data back and forth● Not a good fit for applications with huge memory

RAM

Memcached

Page 42: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Redundant data copy

Running CheckpointIn-kernel

switchRestore Running

S1Snapshot2 431

OS

In-kernelswitch

New OS

● Application C/R copies data back and forth● Not a good fit for applications with huge memory

Memcached

RAM

Memcached

Page 43: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Redundant data copy

Running CheckpointIn-kernel

switchRestore Running

S1Snapshot2 431

OS

Restore

New OS

● Application C/R copies data back and forth● Not a good fit for applications with huge memory

Memcached

RAM2 431

Memcached

Page 44: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Redundant data copy

Running CheckpointIn-kernel

switchRestore Running

S1Snapshot2 431

OS

Running

New OS

● Application C/R copies data back and forth● Not a good fit for applications with huge memory

MemcachedMemcached

RAM2 431

Memcached

Page 45: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Redundant data copy

Running CheckpointIn-kernel

switchRestore Running

S1Snapshot2 431

OS

Running

New OS

● Application C/R copies data back and forth● Not a good fit for applications with huge memory

MemcachedMemcached

RAM2 431

Memcached

Dump data Read data

Page 46: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Redundant data copy

Running CheckpointIn-kernel

switchRestore Running

S1Snapshot2 431

OS

Running

New OS

● Application C/R copies data back and forth● Not a good fit for applications with huge memory

Memcached MemcachedMemcached

RAM2 431

MemcachedIs it possible to avoid memory copy?

Dump data Read data

Page 47: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Avoid redundant data copy across reboot

Running CheckpointIn-kernel

switchRestore Running

OS

Running

● Reserve the application's memory across reboot● Inherently rebind the memory without any copy

Memcached

RAM2 431Memory actively used

Page 48: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Avoid redundant data copy across reboot

Running CheckpointIn-kernel

switchRestore Running

S1

Snapshot

OS

Checkpoint

● Reserve the application's memory across reboot● Inherently rebind the memory without any copy

RAM2 431

Memcached

Reserve the memoryin the OS

Page 49: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Avoid redundant data copy across reboot

Running CheckpointIn-kernel

switchRestore Running

S1

Snapshot

OS

In-kernelswitch

New OSOld OS

● Reserve the application's memory across reboot● Inherently rebind the memory without any copy

Memcached

RAM2 431

Memcached

Reserve the samememory in the new OS

Page 50: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Avoid redundant data copy across reboot

Running CheckpointIn-kernel

switchRestore Running

S1

Snapshot

OS

Restore

New OSOld OS

● Reserve the application's memory across reboot● Inherently rebind the memory without any copy

Memcached

RAM2 431

Memcached

Implicitly map the memory region

Page 51: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Avoid redundant data copy across reboot

Running CheckpointIn-kernel

switchRestore Running

S1

Snapshot

OS

Running

New OSOld OS

● Reserve the application's memory across reboot● Inherently rebind the memory without any copy

MemcachedMemcached

RAM2 431

Memcached

Memory again in use

Page 52: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Avoid redundant data copy across reboot

Running CheckpointIn-kernel

switchRestore Running

S1

Snapshot

OS

Running

New OSOld OS

● Reserve the application's memory across reboot● Inherently rebind the memory without any copy

Memcached MemcachedMemcached

RAM2 431

MemcachedChallenge: how to notify the newer

OS without modifying its source?Memory again in use

Page 53: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Persist physical pages (PPP) without OS modification

● Reserve virtual-to-physical mapping information

– Static instrumentation of the OS binary– Inject our own memory reservation function, then

further boot the OS● Handle page-faults for the restored application

– Dynamic kernel instrumentation– Inject our own page fault handler function for

memory binding

Page 54: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Persist physical pages (PPP) without OS modification

● Reserve virtual-to-physical mapping information

– Static instrumentation of the OS binary– Inject our own memory reservation function, then

further boot the OS● Handle page-faults for the restored application

– Dynamic kernel instrumentation– Inject our own page fault handler function for

memory binding

● No explicit memory copy● Does not require any kernel source modification

Page 55: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Implementation

● Application C/R → criu– Works at the namespace level

● In-kernel switch → kexec system call– A mini boot loader that bypasses BIOS while booting

Page 56: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Evaluation

● How effective is KUP's approach compared to the in-kernel hot patching?

● What is the effective performance of each technique during the update?

Page 57: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

KUP can support major and minor updates in Ubuntu

● KUP supports 23 minor/4 major updates (v3.17–v4.1) ● However, kpatch can only update 2 versions

– e.g., layout change in data structure

kpatch failure scenarios

Page 58: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Updating OS with memcached● PPP has the least degradation● Storage also affects the performance

0

50

100

150

190 200 210 220 230 240 250

Bandwidth (MB)

Timeline (sec)

Basic - SSD

Page 59: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Updating OS with memcached● PPP has the least degradation● Storage also affects the performance

0

50

100

150

190 200 210 220 230 240 250

Bandwidth (MB)

Timeline (sec)

Incremental checkpoint - SSD

Page 60: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Updating OS with memcached● PPP has the least degradation● Storage also affects the performance

0

50

100

150

190 200 210 220 230 240 250

Bandwidth (MB)

Timeline (sec)

On-demand restore - SSD

Page 61: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Updating OS with memcached● PPP has the least degradation● Storage also affects the performance

0

50

100

150

190 200 210 220 230 240 250

Bandwidth (MB)

Timeline (sec)

FOAM - SSD

Page 62: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Updating OS with memcached● PPP has the least degradation● Storage also affects the performance

0

50

100

150

190 200 210 220 230 240 250

Bandwidth (MB)

Timeline (sec)

Basic - RP-RAMFS

Page 63: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Updating OS with memcached● PPP has the least degradation● Storage also affects the performance

0

50

100

150

190 200 210 220 230 240 250

Bandwidth (MB)

Timeline (sec)

Incremental checkpoint - RP-RAMFS

Page 64: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Updating OS with memcached● PPP has the least degradation● Storage also affects the performance

0

50

100

150

190 200 210 220 230 240 250

Bandwidth (MB)

Timeline (sec)

On-demand restore - RP-RAMFS

Page 65: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Updating OS with memcached● PPP has the least degradation● Storage also affects the performance

0

50

100

150

190 200 210 220 230 240 250

Bandwidth (MB)

Timeline (sec)

FOAM - RP-RAMFS

Page 66: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Updating OS with memcached● PPP has the least degradation● Storage also affects the performance

0

50

100

150

190 200 210 220 230 240 250

Bandwidth (MB)

Timeline (sec)

PPP

Page 67: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Updating OS with memcached● PPP has the least degradation● Storage also affects the performance

Basic - SSD

Incremental checkpoint - SSD

On-demand restore - SSD

FOAM - SSD

Basic - RP-RAMFS

Incremental checkpoint - RP-RAMFS

On-demand restore - RP-RAMFS

FOAM - RP-RAMFS

200 210 220 230 240 250

Timeline (sec)

PPP

Page 68: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Limitations

● KUP does not support checkpoint and restore all socket implementations

– TCP, UDP and netlink are supported● Failure during restoration

– System call is removal or interface modification

Page 69: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Demo

Page 70: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Summary

● KUP: a simple update mechanism with application checkpoint-and-restore (C/R)

● Employs various techniques:– New data abstraction for application C/R– Fast in-kernel switching technique– A simple mechanism to persist the memory

Page 71: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Summary

● KUP: a simple update mechanism with application checkpoint-and-restore (C/R)

● Employs various techniques:– New data abstraction for application C/R– Fast in-kernel switching technique– A simple mechanism to persist the memory

Thank you!

Page 72: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Backup Slides

Page 73: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Handling in-kernel states

● Handles namespace and cgroups● ptrace() syscall to handle the blocking system calls ,

timers, registers etc. ● Parasite code to fetch / put the application's states● /proc file system exposes the required information

for application C/R● A new mode (TCP_REPAIR) allows handling the TCP

connections

Page 74: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

What cannot be checkpointed

● X11 applications● Tasks with debugger attached● Tasks running in compat mode (32 bit)

Page 75: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Possible changes afterapplication C/R

● Per-task statistics● Namespace IDs● Process start time● Mount point IDs● Socket IDs (st_ino)● VDSO

Page 76: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

Suitable applications

● Suitable for all kinds of applications● PPP approach supports all types of applications

– May fail to restore on the previous kernel● FOAM is not a good candidate for write-

intensive applications

– More confidence in safely restoring the application on the previous kernel

Page 77: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

PPP works effectively

0102030405060708090

0 8 16 24 32 40 48 56 64 72

Dow

ntim

e (s

ec)

WSS (GB) with 50% write

FOAM - SSD

● FOAM on SSD slow→

● FOAM on RP-RAMFS space inefficient→

Page 78: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

PPP works effectively

0102030405060708090

0 8 16 24 32 40 48 56 64 72

Dow

ntim

e (s

ec)

WSS (GB) with 50% write

Out of memory error

FOAM - SSDFOAM - RP-RAMFS

● FOAM on SSD slow→

● FOAM on RP-RAMFS space inefficient→

Page 79: Instant OS Updates via Userspace Checkpoint-and-Restart...And OS updates are unavoidable Prevent known, state-of-the-art attacks – Security patches Adopt new features – New I/O

PPP works effectively

0102030405060708090

0 8 16 24 32 40 48 56 64 72

Dow

ntim

e (s

ec)

WSS (GB) with 50% write

Out of memory error

FOAM - SSDFOAM - RP-RAMFSPPP

● FOAM on SSD slow→

● FOAM on RP-RAMFS space inefficient→