Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a...
Transcript of Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a...
![Page 1: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/1.jpg)
Instant OS Updates via Userspace Checkpoint-and-Restart
Sanidhya Kashyap, Changwoo Min, Byoungyoung Lee,Taesoo Kim, Pavel Emelyanov
![Page 2: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/2.jpg)
OS updates are prevalent
![Page 3: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/3.jpg)
And OS updates are unavoidable
● Prevent known, state-of-the-art attacks– Security patches
● Adopt new features – New I/O scheduler features
● Improve performance– Performance patches
![Page 4: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/4.jpg)
![Page 5: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/5.jpg)
Unfortunately, system updates come at a cost
● Unavoidable downtime● Potential risk of system failure
![Page 6: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/6.jpg)
Unfortunately, system updates come at a cost
● Unavoidable downtime● Potential risk of system failure
$109k per minuteHidden costs (losing customers)
![Page 7: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/7.jpg)
Example: memcached
● Facebook's memcached servers incur a downtime of 2-3 hours per machine– Warming cache (e.g., 120 GB) over the network
![Page 8: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/8.jpg)
Example: memcached
● Facebook's memcached servers incur a downtime of 2-3 hours per machine– Warming cache (e.g., 120 GB) over the network
Our approach updates OS in 3 secsfor 32GB of data from v3.18 to v3.19
for Ubuntu / Fedora releases
![Page 9: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/9.jpg)
Existing practices for OS updates
● Dynamic Kernel Patching (e.g., kpatch, ksplice)
– Problem: only support minor patches
● Rolling Update (e.g., Google, Facebook, etc)
– Problem: inevitable downtime and requires careful planning
![Page 10: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/10.jpg)
Existing practices for OS updates
● Dynamic Kernel Patching (e.g., kpatch, ksplice)
– Problem: only support minor patches
● Rolling Update (e.g., Google, Facebook, etc)
– Problem: inevitable downtime and requires careful planning
Losing application state is inevitable → Restoring memcached takes 2-3 hours
![Page 11: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/11.jpg)
Existing practices for OS updates
● Dynamic Kernel Patching (e.g., kpatch, ksplice)
– Problem: only support minor patches
● Rolling Update (e.g., Google, Facebook, etc)
– Problem: inevitable downtime and requires careful planning
Losing application state is inevitable → Restoring memcached takes 2-3 hours
Goals of this work:● Support all types of patches ● Least downtime to update new OS● No kernel source modification
![Page 12: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/12.jpg)
Problems of typical OS update
OS
Memcached
OSOS OSStop service
![Page 13: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/13.jpg)
Problems of typical OS update
OS
Memcached
OS
New OS
OS OSStop service
Soft reboot
![Page 14: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/14.jpg)
Problems of typical OS update
OS
Memcached
OS
New OSNew OS
Memcached
OS OSStop service
Soft reboot
Start service
![Page 15: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/15.jpg)
Problems of typical OS update
OS
Memcached
OS
New OSNew OS
Memcached
OS OSStop service
Soft reboot
Start service
2-3 hours of downtime
![Page 16: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/16.jpg)
Problems of typical OS update
OS
Memcached
OS
New OSNew OS
Memcached
OS OSStop service
Soft reboot
Start service
2-3 hours of downtime
2-10 minutes of downtime
![Page 17: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/17.jpg)
Problems of typical OS update
OS
Memcached
OS
New OSNew OS
Memcached
OS OSStop service
Soft reboot
Start service
Is it possible to keep the
application state?
2-3 hours of downtime
2-10 minutes of downtime
![Page 18: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/18.jpg)
OS updates loose application states
OS
Memcached
OS
New OSNew OS
Memcached
OS OSStop service
Soft reboot
Start service
KUP: Kernel update with applicationcheckpoint-and-restore (C/R)
![Page 19: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/19.jpg)
OS updates loose application states
OS
Memcached
OS
New OSNew OS
Memcached
OS OSStop service
Soft reboot
Start service
Memcached
Checkpoint
KUP: Kernel update with applicationcheckpoint-and-restore (C/R)
![Page 20: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/20.jpg)
OS updates loose application states
OS
Memcached
OS
New OSNew OS
Memcached
OS OSStop service
Soft reboot
Start service
Memcached
Memcahed
In-kernelswitch
Checkpoint
KUP: Kernel update with applicationcheckpoint-and-restore (C/R)
![Page 21: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/21.jpg)
OS updates loose application states
OS
Memcached
OS
New OSNew OS
Memcached
OS OSStop service
Soft reboot
Start service
Memcached
Memcahed
In-kernelswitch
Checkpoint
Restore
KUP: Kernel update with applicationcheckpoint-and-restore (C/R)
![Page 22: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/22.jpg)
OS updates loose application states
Stop service
Start service
Checkpoint
Restore
KUP's life cycle
KUP: Kernel update with applicationcheckpoint-and-restore (C/R)
In-kernelswitch
![Page 23: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/23.jpg)
OS updates loose application states
Stop service
Start service
Checkpoint
Restore
KUP's life cycle
KUP: Kernel update with applicationcheckpoint-and-restore (C/R)
In-kernelswitch
1-10 minutes of downtime
![Page 24: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/24.jpg)
OS updates loose application states
New OSNew OS
Stop service
Start service
Checkpoint
Restore
KUP's life cycle
KUP: Kernel update with applicationcheckpoint-and-restore (C/R)
In-kernelswitch
Challenge: how to further decrease
the potential downtime?
1-10 minutes of downtime
![Page 25: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/25.jpg)
Techniques to decrease the downtime
Checkpoint
Restore
In-kernelswitch
1) Incremental checkpoint
![Page 26: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/26.jpg)
Techniques to decrease the downtime
Checkpoint
Restore
In-kernelswitch
1) Incremental checkpoint
2) On-demand restore
![Page 27: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/27.jpg)
Techniques to decrease the downtime
Checkpoint
Restore
In-kernelswitch
1) Incremental checkpoint
2) On-demand restore
3) FOAM: a snapshot abstraction
![Page 28: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/28.jpg)
Techniques to decrease the downtime
Checkpoint
Restore
In-kernelswitch
1) Incremental checkpoint
2) On-demand restore
3) FOAM: a snapshot abstraction
4) PPP: reuse memorywithout an explicit dump
![Page 29: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/29.jpg)
Techniques to decrease the downtime
Checkpoint
Restore
In-kernelswitch
1) Incremental checkpoint
2) On-demand restore
3) FOAM: a snapshot abstraction
4) PPP: reuse memorywithout an explicit dump
![Page 30: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/30.jpg)
Incremental checkpoint
Timeline
S1
● Reduces downtime (up to 83.5%)● Problem: Multiple snapshots increase the restore time
Naivecheckpoint
downtime
Si Snapshot instance→
![Page 31: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/31.jpg)
Incremental checkpoint
Timeline
S1
S1
● Reduces downtime (up to 83.5%)● Problem: Multiple snapshots increase the restore time
Naivecheckpoint
Incrementalcheckpoint
downtime
Si Snapshot instance→
![Page 32: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/32.jpg)
Incremental checkpoint
Timeline
S1
S1S2
● Reduces downtime (up to 83.5%)● Problem: Multiple snapshots increase the restore time
Naivecheckpoint
Incrementalcheckpoint
downtime
Si Snapshot instance→
![Page 33: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/33.jpg)
Incremental checkpoint
Timeline
S1
S1S2 S3
● Reduces downtime (up to 83.5%)● Problem: Multiple snapshots increase the restore time
Naivecheckpoint
Incrementalcheckpoint
downtime
Si Snapshot instance→
![Page 34: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/34.jpg)
Incremental checkpoint
Timeline
S1
S1S2 S3
● Reduces downtime (up to 83.5%)● Problem: Multiple snapshots increase the restore time
Naivecheckpoint
Incrementalcheckpoint S4
downtime
downtime
Si Snapshot instance→
![Page 35: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/35.jpg)
On-demand restore● Rebind the memory once the application
accesses it
– Only map the memory region with snapshot and restart the application
● Decreases the downtime (up to 99.6%)● Problem: Incompatible with incremental
checkpoint
![Page 36: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/36.jpg)
Problem: both techniques together result in inefficient application C/R
● During restore, need to map each pages individually
– Individual lookups to find the relevant pages– Individual page mapping to enable on-demand restore
S1S1
2 43
● An application has 4 pages as its working set size
● Incremental checkpoint has 2 iterations
– 1st iteration all 4 pages (1, 2, 3, 4) are dumped→
– 2nd iteration 2 pages (2, 4) are dirtied→
1
● Increases the restoration downtime (42.5%)
![Page 37: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/37.jpg)
Problem: both techniques together result in inefficient application C/R
● During restore, need to map each pages individually
– Individual lookups to find the relevant pages– Individual page mapping to enable on-demand restore
S1S1
S2
3 2 4
● An application has 4 pages as its working set size
● Incremental checkpoint has 2 iterations
– 1st iteration all 4 pages (1, 2, 3, 4) are dumped→
– 2nd iteration 2 pages (2, 4) are dirtied→
1
● Increases the restoration downtime (42.5%)
![Page 38: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/38.jpg)
New abstraction: file-offset based address mapping (FOAM)
● Flat address space representation for the snapshot– One-to-one mapping between the address space and the
snapshot– No explicit lookups for the pages across the snapshots
– A few map operations to map the entire snapshot with address space
● Use sparse file representation– Rely on the concept of holes supported by modern file systems
● Simplifies incremental checkpoint and on-demand restore
![Page 39: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/39.jpg)
Techniques to decrease the downtime
Checkpoint
Restore
In-kernelswitch
1) Incremental checkpoint
2) On-demand restore
3) FOAM: a snapshot abstraction
4) PPP: reuse memorywithout an explicit dump
![Page 40: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/40.jpg)
Redundant data copy
Running CheckpointIn-kernel
switchRestore Running
OS
Running
● Application C/R copies data back and forth● Not a good fit for applications with huge memory
Memcached
RAM2 431
![Page 41: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/41.jpg)
Redundant data copy
Running CheckpointIn-kernel
switchRestore Running
S1Snapshot2 431
OS
Checkpoint
● Application C/R copies data back and forth● Not a good fit for applications with huge memory
RAM
Memcached
![Page 42: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/42.jpg)
Redundant data copy
Running CheckpointIn-kernel
switchRestore Running
S1Snapshot2 431
OS
In-kernelswitch
New OS
● Application C/R copies data back and forth● Not a good fit for applications with huge memory
Memcached
RAM
Memcached
![Page 43: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/43.jpg)
Redundant data copy
Running CheckpointIn-kernel
switchRestore Running
S1Snapshot2 431
OS
Restore
New OS
● Application C/R copies data back and forth● Not a good fit for applications with huge memory
Memcached
RAM2 431
Memcached
![Page 44: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/44.jpg)
Redundant data copy
Running CheckpointIn-kernel
switchRestore Running
S1Snapshot2 431
OS
Running
New OS
● Application C/R copies data back and forth● Not a good fit for applications with huge memory
MemcachedMemcached
RAM2 431
Memcached
![Page 45: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/45.jpg)
Redundant data copy
Running CheckpointIn-kernel
switchRestore Running
S1Snapshot2 431
OS
Running
New OS
● Application C/R copies data back and forth● Not a good fit for applications with huge memory
MemcachedMemcached
RAM2 431
Memcached
Dump data Read data
![Page 46: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/46.jpg)
Redundant data copy
Running CheckpointIn-kernel
switchRestore Running
S1Snapshot2 431
OS
Running
New OS
● Application C/R copies data back and forth● Not a good fit for applications with huge memory
Memcached MemcachedMemcached
RAM2 431
MemcachedIs it possible to avoid memory copy?
Dump data Read data
![Page 47: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/47.jpg)
Avoid redundant data copy across reboot
Running CheckpointIn-kernel
switchRestore Running
OS
Running
● Reserve the application's memory across reboot● Inherently rebind the memory without any copy
Memcached
RAM2 431Memory actively used
![Page 48: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/48.jpg)
Avoid redundant data copy across reboot
Running CheckpointIn-kernel
switchRestore Running
S1
Snapshot
OS
Checkpoint
● Reserve the application's memory across reboot● Inherently rebind the memory without any copy
RAM2 431
Memcached
Reserve the memoryin the OS
![Page 49: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/49.jpg)
Avoid redundant data copy across reboot
Running CheckpointIn-kernel
switchRestore Running
S1
Snapshot
OS
In-kernelswitch
New OSOld OS
● Reserve the application's memory across reboot● Inherently rebind the memory without any copy
Memcached
RAM2 431
Memcached
Reserve the samememory in the new OS
![Page 50: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/50.jpg)
Avoid redundant data copy across reboot
Running CheckpointIn-kernel
switchRestore Running
S1
Snapshot
OS
Restore
New OSOld OS
● Reserve the application's memory across reboot● Inherently rebind the memory without any copy
Memcached
RAM2 431
Memcached
Implicitly map the memory region
![Page 51: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/51.jpg)
Avoid redundant data copy across reboot
Running CheckpointIn-kernel
switchRestore Running
S1
Snapshot
OS
Running
New OSOld OS
● Reserve the application's memory across reboot● Inherently rebind the memory without any copy
MemcachedMemcached
RAM2 431
Memcached
Memory again in use
![Page 52: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/52.jpg)
Avoid redundant data copy across reboot
Running CheckpointIn-kernel
switchRestore Running
S1
Snapshot
OS
Running
New OSOld OS
● Reserve the application's memory across reboot● Inherently rebind the memory without any copy
Memcached MemcachedMemcached
RAM2 431
MemcachedChallenge: how to notify the newer
OS without modifying its source?Memory again in use
![Page 53: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/53.jpg)
Persist physical pages (PPP) without OS modification
● Reserve virtual-to-physical mapping information
– Static instrumentation of the OS binary– Inject our own memory reservation function, then
further boot the OS● Handle page-faults for the restored application
– Dynamic kernel instrumentation– Inject our own page fault handler function for
memory binding
![Page 54: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/54.jpg)
Persist physical pages (PPP) without OS modification
● Reserve virtual-to-physical mapping information
– Static instrumentation of the OS binary– Inject our own memory reservation function, then
further boot the OS● Handle page-faults for the restored application
– Dynamic kernel instrumentation– Inject our own page fault handler function for
memory binding
● No explicit memory copy● Does not require any kernel source modification
![Page 55: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/55.jpg)
Implementation
● Application C/R → criu– Works at the namespace level
● In-kernel switch → kexec system call– A mini boot loader that bypasses BIOS while booting
![Page 56: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/56.jpg)
Evaluation
● How effective is KUP's approach compared to the in-kernel hot patching?
● What is the effective performance of each technique during the update?
![Page 57: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/57.jpg)
KUP can support major and minor updates in Ubuntu
● KUP supports 23 minor/4 major updates (v3.17–v4.1) ● However, kpatch can only update 2 versions
– e.g., layout change in data structure
kpatch failure scenarios
![Page 58: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/58.jpg)
Updating OS with memcached● PPP has the least degradation● Storage also affects the performance
0
50
100
150
190 200 210 220 230 240 250
Bandwidth (MB)
Timeline (sec)
Basic - SSD
![Page 59: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/59.jpg)
Updating OS with memcached● PPP has the least degradation● Storage also affects the performance
0
50
100
150
190 200 210 220 230 240 250
Bandwidth (MB)
Timeline (sec)
Incremental checkpoint - SSD
![Page 60: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/60.jpg)
Updating OS with memcached● PPP has the least degradation● Storage also affects the performance
0
50
100
150
190 200 210 220 230 240 250
Bandwidth (MB)
Timeline (sec)
On-demand restore - SSD
![Page 61: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/61.jpg)
Updating OS with memcached● PPP has the least degradation● Storage also affects the performance
0
50
100
150
190 200 210 220 230 240 250
Bandwidth (MB)
Timeline (sec)
FOAM - SSD
![Page 62: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/62.jpg)
Updating OS with memcached● PPP has the least degradation● Storage also affects the performance
0
50
100
150
190 200 210 220 230 240 250
Bandwidth (MB)
Timeline (sec)
Basic - RP-RAMFS
![Page 63: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/63.jpg)
Updating OS with memcached● PPP has the least degradation● Storage also affects the performance
0
50
100
150
190 200 210 220 230 240 250
Bandwidth (MB)
Timeline (sec)
Incremental checkpoint - RP-RAMFS
![Page 64: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/64.jpg)
Updating OS with memcached● PPP has the least degradation● Storage also affects the performance
0
50
100
150
190 200 210 220 230 240 250
Bandwidth (MB)
Timeline (sec)
On-demand restore - RP-RAMFS
![Page 65: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/65.jpg)
Updating OS with memcached● PPP has the least degradation● Storage also affects the performance
0
50
100
150
190 200 210 220 230 240 250
Bandwidth (MB)
Timeline (sec)
FOAM - RP-RAMFS
![Page 66: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/66.jpg)
Updating OS with memcached● PPP has the least degradation● Storage also affects the performance
0
50
100
150
190 200 210 220 230 240 250
Bandwidth (MB)
Timeline (sec)
PPP
![Page 67: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/67.jpg)
Updating OS with memcached● PPP has the least degradation● Storage also affects the performance
Basic - SSD
Incremental checkpoint - SSD
On-demand restore - SSD
FOAM - SSD
Basic - RP-RAMFS
Incremental checkpoint - RP-RAMFS
On-demand restore - RP-RAMFS
FOAM - RP-RAMFS
200 210 220 230 240 250
Timeline (sec)
PPP
![Page 68: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/68.jpg)
Limitations
● KUP does not support checkpoint and restore all socket implementations
– TCP, UDP and netlink are supported● Failure during restoration
– System call is removal or interface modification
![Page 69: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/69.jpg)
Demo
![Page 70: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/70.jpg)
Summary
● KUP: a simple update mechanism with application checkpoint-and-restore (C/R)
● Employs various techniques:– New data abstraction for application C/R– Fast in-kernel switching technique– A simple mechanism to persist the memory
![Page 71: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/71.jpg)
Summary
● KUP: a simple update mechanism with application checkpoint-and-restore (C/R)
● Employs various techniques:– New data abstraction for application C/R– Fast in-kernel switching technique– A simple mechanism to persist the memory
Thank you!
![Page 72: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/72.jpg)
Backup Slides
![Page 73: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/73.jpg)
Handling in-kernel states
● Handles namespace and cgroups● ptrace() syscall to handle the blocking system calls ,
timers, registers etc. ● Parasite code to fetch / put the application's states● /proc file system exposes the required information
for application C/R● A new mode (TCP_REPAIR) allows handling the TCP
connections
![Page 74: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/74.jpg)
What cannot be checkpointed
● X11 applications● Tasks with debugger attached● Tasks running in compat mode (32 bit)
![Page 75: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/75.jpg)
Possible changes afterapplication C/R
● Per-task statistics● Namespace IDs● Process start time● Mount point IDs● Socket IDs (st_ino)● VDSO
![Page 76: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/76.jpg)
Suitable applications
● Suitable for all kinds of applications● PPP approach supports all types of applications
– May fail to restore on the previous kernel● FOAM is not a good candidate for write-
intensive applications
– More confidence in safely restoring the application on the previous kernel
![Page 77: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/77.jpg)
PPP works effectively
0102030405060708090
0 8 16 24 32 40 48 56 64 72
Dow
ntim
e (s
ec)
WSS (GB) with 50% write
FOAM - SSD
● FOAM on SSD slow→
● FOAM on RP-RAMFS space inefficient→
![Page 78: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/78.jpg)
PPP works effectively
0102030405060708090
0 8 16 24 32 40 48 56 64 72
Dow
ntim
e (s
ec)
WSS (GB) with 50% write
Out of memory error
FOAM - SSDFOAM - RP-RAMFS
● FOAM on SSD slow→
● FOAM on RP-RAMFS space inefficient→
![Page 79: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network](https://reader033.fdocuments.in/reader033/viewer/2022052804/605373b0aee89535ae5c3d94/html5/thumbnails/79.jpg)
PPP works effectively
0102030405060708090
0 8 16 24 32 40 48 56 64 72
Dow
ntim
e (s
ec)
WSS (GB) with 50% write
Out of memory error
FOAM - SSDFOAM - RP-RAMFSPPP
● FOAM on SSD slow→
● FOAM on RP-RAMFS space inefficient→