Fine-grained fault tolerance using device checkpoints
description
Transcript of Fine-grained fault tolerance using device checkpoints
![Page 1: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/1.jpg)
Fine-Grained Fault Tolerance using Device Checkpoints
Asim Kadavwith Matthew Renzelmann and Michael M. Swift
University of Wisconsin-Madison
Thursday, March 7, 13
![Page 2: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/2.jpg)
The (old) elephant in the room
2
device drivers
3rd party developers
+
OSkernel
Thursday, March 7, 13
![Page 3: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/3.jpg)
The (old) elephant in the room
2
device drivers
3rd party developers
+
OSkernel
Thursday, March 7, 13
![Page 4: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/4.jpg)
The (old) elephant in the room
2
device drivers
3rd party developers
+
OSkernel
Recipe for
disaster
Thursday, March 7, 13
![Page 5: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/5.jpg)
Improvement System Validation Validation Validation Improvement SystemDrivers Bus Classes
New functionality Shadow driver migration [OSR09] 1 1 1
RevNIC [Eurosys 10] 1 1 1
Reliability Nooks [SOSP 03] 6 1 2
XFI [ OSDI 06] 2 1 1
CuriOS [OSDI 08] 2 1 2
Type Safety SafeDrive [OSDI 06] 6 2 3
Singularity [Eurosys 06] 1 1 1
Specification Nexus [OSDI 08] 2 1 2
Termite [SOSP 09] 2 1 2
Static analysis tools Windows SDV [Eurosys 06] All All All
Coverity [CACM 10] All All All
Cocinelle [Eurosys 08] All All All
3
Past work mostly looks at detection and isolation
Thursday, March 7, 13
![Page 6: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/6.jpg)
Improvement System Validation Validation Validation Improvement SystemDrivers Bus Classes
New functionality Shadow driver migration [OSR09] 1 1 1
RevNIC [Eurosys 10] 1 1 1
Reliability Nooks [SOSP 03] 6 1 2
XFI [ OSDI 06] 2 1 1
CuriOS [OSDI 08] 2 1 2
Type Safety SafeDrive [OSDI 06] 6 2 3
Singularity [Eurosys 06] 1 1 1
Specification Nexus [OSDI 08] 2 1 2
Termite [SOSP 09] 2 1 2
Static analysis tools Windows SDV [Eurosys 06] All All All
Coverity [CACM 10] All All All
Cocinelle [Eurosys 08] All All All
3
Large kernel subsystems and validity of few device types result in limited adoption of research solutions
Past work mostly looks at detection and isolation
Thursday, March 7, 13
![Page 7: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/7.jpg)
Improvement System Validation Validation Validation Improvement SystemDrivers Bus Classes
New functionality Shadow driver migration [OSR09] 1 1 1
RevNIC [Eurosys 10] 1 1 1
Reliability Nooks [SOSP 03] 6 1 2
XFI [ OSDI 06] 2 1 1
CuriOS [OSDI 08] 2 1 2
Type Safety SafeDrive [OSDI 06] 6 2 3
Singularity [Eurosys 06] 1 1 1
Specification Nexus [OSDI 08] 2 1 2
Termite [SOSP 09] 2 1 2
Static analysis tools Windows SDV [Eurosys 06] All All All
Coverity [CACM 10] All All All
Cocinelle [Eurosys 08] All All All
3
Limited kernel changes + Applicable to lots of drivers => Real Impact
Past work mostly looks at detection and isolation
Thursday, March 7, 13
![Page 8: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/8.jpg)
Improvement System Validation Validation Validation Improvement SystemDrivers Bus Classes
New functionality Shadow driver migration [OSR09] 1 1 1
RevNIC [Eurosys 10] 1 1 1
Reliability Nooks [SOSP 03] 6 1 2
XFI [ OSDI 06] 2 1 1
CuriOS [OSDI 08] 2 1 2
Type Safety SafeDrive [OSDI 06] 6 2 3
Singularity [Eurosys 06] 1 1 1
Specification Nexus [OSDI 08] 2 1 2
Termite [SOSP 09] 2 1 2
Static analysis tools Windows SDV [Eurosys 06] All All All
Coverity [CACM 10] All All All
Cocinelle [Eurosys 08] All All All
3
Limited kernel changes + Applicable to lots of drivers => Real Impact
Goal: Improve recovery with complete solutions that can be applied to many drivers
Past work mostly looks at detection and isolation
Thursday, March 7, 13
![Page 9: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/9.jpg)
State of the art in recovery: Shadow drivers
• Carburizer calls generic recovery service if check fails
• Low cost transparent recovery★ Based on shadow drivers★ Records state of driver at all times★ Transparently restarts and replays
recorded state on failure
Shadow Driver
Device Driver
Device
Taps
Driver-Kernel Interface
4
Swift [OSDI ’04]
Thursday, March 7, 13
![Page 10: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/10.jpg)
Recovery Performance: Device initialization is slow
5
ModuleRegistration
Allocate device structures
Map BAR and I/O ports
Register device operations
Detect chipset capabilities
Cold boot hardware, flash device memory
Perform EEPROM checsumming
Set chipset specific ops
Optional self test on boot
Allocate driver structures
Configure device to working state
Device ready for requests
Allocate device structures
Module registration
Map BAR and I/O ports
Register device operations
Detect chipset capabilities
Self test?Self test on boot
Cold boot the device
Verify EEPROM checksum
Set chipset specific ops
Allocate driver structures
Configure device
Device ready for requests★ Multi-second device probe
★ Identify device★ Cold boot device★ Setup device/driver
structures★ Configuration/Self-test
Thursday, March 7, 13
![Page 11: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/11.jpg)
Recovery Performance: Device initialization is slow
5
ModuleRegistration
Allocate device structures
Map BAR and I/O ports
Register device operations
Detect chipset capabilities
Cold boot hardware, flash device memory
Perform EEPROM checsumming
Set chipset specific ops
Optional self test on boot
Allocate driver structures
Configure device to working state
Device ready for requests
Allocate device structures
Module registration
Map BAR and I/O ports
Register device operations
Detect chipset capabilities
Self test?Self test on boot
Cold boot the device
Verify EEPROM checksum
Set chipset specific ops
Allocate driver structures
Configure device
Device ready for requests★ Multi-second device probe
★ Identify device★ Cold boot device★ Setup device/driver
structures★ Configuration/Self-test
★ What does it hurt?★ Fault tolerance: Driver recovery★ Virtualization: Live migration,
cloning, consolidation★ OS functions: Boot, upgrade,
NVM checkpoints
Thursday, March 7, 13
![Page 12: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/12.jpg)
Driver Code Characteristics
6★ “Understanding Modern Device Drivers” ASPLOS 2012
Thursday, March 7, 13
![Page 13: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/13.jpg)
uwbnet
in!nibandatmscsimtdmdideblockata
watchdogvideotty
soundserialpnp
platformparportmisc
messagemedialedsisdninputhwmonhidgpugpio!rewireedaccryptocharcdrombluetoothacpi
init cleanup ioctl con!g power error proc core intr
0
10
20
30
40
50
Percent-age of LOC
uwbnet
in!nibandatmscsimtdmdideblockata
watchdogvideotty
soundserialpnp
platformparportmisc
messagemedialedsisdninputhwmonhidgpugpio!rewireedaccryptocharcdrombluetoothacpi
init cleanup ioctl con!g power error proc core intr
0
10
20
30
40
50
Percent-age of LOCch
ar d
river
sbl
ock
driv
ers
net d
river
s
Driver Code Characteristics
6★ “Understanding Modern Device Drivers” ASPLOS 2012
Thursday, March 7, 13
![Page 14: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/14.jpg)
uwbnet
in!nibandatmscsimtdmdideblockata
watchdogvideotty
soundserialpnp
platformparportmisc
messagemedialedsisdninputhwmonhidgpugpio!rewireedaccryptocharcdrombluetoothacpi
init cleanup ioctl con!g power error proc core intr
0
10
20
30
40
50
Percent-age of LOC
uwbnet
in!nibandatmscsimtdmdideblockata
watchdogvideotty
soundserialpnp
platformparportmisc
messagemedialedsisdninputhwmonhidgpugpio!rewireedaccryptocharcdrombluetoothacpi
init cleanup ioctl con!g power error proc core intr
0
10
20
30
40
50
Percent-age of LOCch
ar d
river
sbl
ock
driv
ers
net d
river
s
★Initialization/cleanup – 36%★Core I/O & interrupts – 23%★Device configuration – 15%★Power management – 7.4%★Device ioctl – 6.2%
Driver Code Characteristics
6★ “Understanding Modern Device Drivers” ASPLOS 2012
Thursday, March 7, 13
![Page 15: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/15.jpg)
uwbnet
in!nibandatmscsimtdmdideblockata
watchdogvideotty
soundserialpnp
platformparportmisc
messagemedialedsisdninputhwmonhidgpugpio!rewireedaccryptocharcdrombluetoothacpi
init cleanup ioctl con!g power error proc core intr
0
10
20
30
40
50
Percent-age of LOC
uwbnet
in!nibandatmscsimtdmdideblockata
watchdogvideotty
soundserialpnp
platformparportmisc
messagemedialedsisdninputhwmonhidgpugpio!rewireedaccryptocharcdrombluetoothacpi
init cleanup ioctl con!g power error proc core intr
0
10
20
30
40
50
Percent-age of LOCch
ar d
river
sbl
ock
driv
ers
net d
river
s
★Initialization/cleanup – 36%★Core I/O & interrupts – 23%★Device configuration – 15%★Power management – 7.4%★Device ioctl – 6.2%
Driver Code Characteristics
Initialization code dominates driver LOC and adds to complexity
6★ “Understanding Modern Device Drivers” ASPLOS 2012
Thursday, March 7, 13
![Page 16: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/16.jpg)
Recovery works by interposing class defined entry points
7
★ Class definition includes:★ Callbacks registered with the bus,
device and kernel subsystem
networkdriver
bus
net devicesubsystem
kernel
probe
xmit
confignetwork
card
Thursday, March 7, 13
![Page 17: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/17.jpg)
Recovery works by interposing class defined entry points
7
How many drivers follow class behavior?
★ Class definition includes:★ Callbacks registered with the bus,
device and kernel subsystem
networkdriver
bus
net devicesubsystem
kernel
probe
xmit
confignetwork
card
Thursday, March 7, 13
![Page 18: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/18.jpg)
Restart/replay doesn’t work with all drivers
★ Non-class behavior stems from:- Load time parameters, procfs and sysfs interactions, unique ioctls
... qlcnic_sysfs_write_esw_config (...) { ... switch (esw_cfg[i].op_mode) { case QLCNIC_PORT_DEFAULTS: qlcnic_set_eswitch_...(...,&esw_cfg[i]); ... case QLCNIC_ADD_VLAN: qlcnic_set_vlan_config(...,&esw_cfg[i]); ... case QLCNIC_DEL_VLAN: esw_cfg[i].vlan_id = 0; qlcnic_set_vlan_config(...,&esw_cfg[i]); ...Drivers/net/qlcnic/qlcnic_main.c: Qlogic driver(network class)
8★ “Understanding Modern Device Drivers” ASPLOS 2012
Thursday, March 7, 13
![Page 19: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/19.jpg)
Restart/replay doesn’t work with all drivers
★ Non-class behavior stems from:- Load time parameters, procfs and sysfs interactions, unique ioctls
8
★ Results as measured by our analyses:★ 36% of drivers use load time parameters ★ 16% of drivers use proc /sysfs support
★ Overall, 44% of drivers do not conform to class behavior and recovery will not work correctly for these drivers
★ “Understanding Modern Device Drivers” ASPLOS 2012
Thursday, March 7, 13
![Page 20: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/20.jpg)
Limitations of restart/replay recovery
Shadow Driver
Device Driver
Device
Taps
Driver-Kernel Interface
9
★ Device save/restore limited to restart/replay★ Slow: Device initialization is
complex (multiple seconds)★ Incomplete: Unique device
semantics not captured ★ Hard: Need to be written for
every class of drivers★ Large changes: Introduces new
large kernel subsystems
Thursday, March 7, 13
![Page 21: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/21.jpg)
Limitations of restart/replay recovery
Shadow Driver
Device Driver
Device
Taps
Driver-Kernel Interface
9
★ Device save/restore limited to restart/replay★ Slow: Device initialization is
complex (multiple seconds)★ Incomplete: Unique device
semantics not captured ★ Hard: Need to be written for
every class of drivers★ Large changes: Introduces new
large kernel subsystems
Checkpoint/restore of device and driver state removes the need to reboot device and replay state
Thursday, March 7, 13
![Page 22: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/22.jpg)
Fine-Grained Fault Tolerance (FGFT)
10
Goal: Fault isolation and recovery system based on “pay as you go” failure model
Fine-Grained Isolation
★ Ability to run select entry points as transactions
Checkpoint based recovery★ Provides fast and correct recovery semantics
★ Requires incremental changes to drivers and has low overhead
Thursday, March 7, 13
![Page 23: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/23.jpg)
Outline
11
Introduction
Conclusion
Fine-grained isolation
Checkpoint based recovery
Thursday, March 7, 13
![Page 24: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/24.jpg)
Outline
12
Introduction
Conclusion
Fine-grained isolation
Checkpoint based recovery
Thursday, March 7, 13
![Page 25: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/25.jpg)
FGFT overview
If (c==0) {.print (“Driver init”);}..
Driver with checkpoint support
Static modifications13
Thursday, March 7, 13
![Page 26: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/26.jpg)
FGFT overview
If (c==0) {.print (“Driver init”);}..
Driver with checkpoint support
Static modifications13
User supplied annotations
Source transformation (adds driver transactions)
Thursday, March 7, 13
![Page 27: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/27.jpg)
FGFT overview
If (c==0) {.print (“Driver init”);}..
Driver with checkpoint support
Static modifications13
If (c==0) {.print (“Driver init”);}..
If (c==0) {.print (“Driver init”);}..
User supplied annotations
Source transformation (adds driver transactions)
Main driver module
SFI driver module
SFI = software fault isolated
Thursday, March 7, 13
![Page 28: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/28.jpg)
FGFT overview
If (c==0) {.print (“Driver init”);}..
Driver with checkpoint support
Static modifications Run-time support13
If (c==0) {.print (“Driver init”);}..
If (c==0) {.print (“Driver init”);}..
User supplied annotations
Source transformation (adds driver transactions)
Main driver module
SFI driver module
SFI = software fault isolated
Thursday, March 7, 13
![Page 29: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/29.jpg)
FGFT overview
If (c==0) {.print (“Driver init”);}..
Driver with checkpoint support
Communication and recovery
support
Static modifications Run-time support13
If (c==0) {.print (“Driver init”);}..
If (c==0) {.print (“Driver init”);}..
1200 LOC
User supplied annotations
Source transformation (adds driver transactions)
Object tracking
Marshaling/Demarshaling
Kernel undo log
Main driver module
SFI driver module
SFI = software fault isolated
Thursday, March 7, 13
![Page 30: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/30.jpg)
Fault model in FGFT
14
networkdriver
network card
probe
xmit
config
★ Can be applied to untested code, statically and dynamically detected suspicious entry points
★ Detect and recover from: ★ Memory errors like NULL pointer accesses★ Structural errors like malformed structures★ Processor exceptions like divide by zero, stack corruption
Thursday, March 7, 13
![Page 31: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/31.jpg)
Fault model in FGFT
14
★ Provide fault tolerance to specific driver entry points
networkdriver
network card
probe
xmit
config
★ Can be applied to untested code, statically and dynamically detected suspicious entry points
★ Detect and recover from: ★ Memory errors like NULL pointer accesses★ Structural errors like malformed structures★ Processor exceptions like divide by zero, stack corruption
Thursday, March 7, 13
![Page 32: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/32.jpg)
Transactional support through code generation
15
★ Generate code to run driver invocations on a separate stack with a copy of parameters
★ Reduce copy overhead by copying only referenced fields in driver and kernel structures to a range table
★ Instrument all memory references in SFI module to compare accesses against copied fields in range table
Range Table
SFInetwork
driver
networkdriver
get ringparam
netdev->priv->tx_ringnetdev->priv->rx_ring
Address Access rights
0xffffa000 Read
0xffffa008 Write
0xffffa00a Read
Thursday, March 7, 13
![Page 33: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/33.jpg)
Resource access during isolated execution
16
★ Device registers and I/O memory ★ Grant drivers full access to devices★ Restore device checkpoint in case of failure
★ Locks: Spinlocks and semaphores★ Grants read access to locks★ Maintain kernel log of locks acquired★ Release locks at the end of entry point/failures
★ Kernel resources like memory ★ All allocations generate range table entry★ Maintain kernel log of all acquired resources★ Free resources on failures
Address Access rights
0xffffa000 Read
0xffffa008 Write
0xffffa00a Read
Range Tablemalloc ()
Thursday, March 7, 13
![Page 34: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/34.jpg)
Outline
17
Introduction
Conclusion
Fine-grained isolation
Checkpoint based recovery
Thursday, March 7, 13
![Page 35: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/35.jpg)
Checkpointing drivers is hard★Existing mechanisms limited to capturing memory state
18
networkdriver
network card
Thursday, March 7, 13
![Page 36: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/36.jpg)
Checkpointing drivers is hard★Existing mechanisms limited to capturing memory state
18
networkdriver
network card
checkpoint
Thursday, March 7, 13
![Page 37: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/37.jpg)
Checkpointing drivers is hard★Existing mechanisms limited to capturing memory state
18
networkdriver
network card
checkpoint
★ Device state is not captured★ Device configuration space
Thursday, March 7, 13
![Page 38: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/38.jpg)
Checkpointing drivers is hard★Existing mechanisms limited to capturing memory state
18
networkdriver
network card
checkpoint
★ Device state is not captured★ Device configuration space★ Internal device registers and counters
Thursday, March 7, 13
![Page 39: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/39.jpg)
Checkpointing drivers is hard★Existing mechanisms limited to capturing memory state
18
networkdriver
network card
checkpoint
★ Device state is not captured★ Device configuration space★ Internal device registers and counters★ Memory buffer addresses used for DMA
Thursday, March 7, 13
![Page 40: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/40.jpg)
Checkpointing drivers is hard★Existing mechanisms limited to capturing memory state
18
networkdriver
network card
checkpoint
★ Device state is not captured★ Device configuration space★ Internal device registers and counters★ Memory buffer addresses used for DMA★ Unique for every class, bus and vendor
Thursday, March 7, 13
![Page 41: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/41.jpg)
Device checkpoint/restore from PM code
19
Save config state
Save register state
Disable device
Copy-out s/w state
Suspend device
Restore config state
Restore register state
Restore s/w state & reset
Re-attach/Enable device
Device Ready
Suspend Resume
Thursday, March 7, 13
![Page 42: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/42.jpg)
Device checkpoint/restore from PM code
19
Save config state
Save register state
Copy-out s/w state
Suspend device
Restore config state
Restore register state
Restore s/w state & reset
Re-attach/Enable device
Device Ready
Suspend Resume
Thursday, March 7, 13
![Page 43: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/43.jpg)
Device checkpoint/restore from PM code
19
Save config state
Save register state
Copy-out s/w state
Restore config state
Restore register state
Restore s/w state & reset
Re-attach/Enable device
Device Ready
Suspend Resume
Thursday, March 7, 13
![Page 44: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/44.jpg)
Device checkpoint/restore from PM code
19
Save config state
Save register state
Copy-out s/w state
Restore config state
Restore register state
Restore s/w state & reset
Re-attach/Enable device
Device Ready
Suspend Resume
Thursday, March 7, 13
![Page 45: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/45.jpg)
Device checkpoint/restore from PM code
19
Save config state
Save register state
Copy-out s/w state
Restore config state
Restore register state
Restore s/w state & reset
Re-attach/Enable device
Device Ready
Resume Checkpoint
Thursday, March 7, 13
![Page 46: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/46.jpg)
Device checkpoint/restore from PM code
19
Save config state
Save register state
Copy-out s/w state
Restore config state
Restore register state
Restore s/w state & reset
Re-attach/Enable device
Resume Checkpoint
Thursday, March 7, 13
![Page 47: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/47.jpg)
Device checkpoint/restore from PM code
19
Save config state
Save register state
Copy-out s/w state
Restore config state
Restore register state
Restore s/w state & reset
Resume Checkpoint
Thursday, March 7, 13
![Page 48: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/48.jpg)
Device checkpoint/restore from PM code
19
Save config state
Save register state
Copy-out s/w state
Restore config state
Restore register state
Restore s/w state & reset
RestoreCheckpoint
Thursday, March 7, 13
![Page 49: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/49.jpg)
Device checkpoint/restore from PM code
19
Save config state
Save register state
Copy-out s/w state
Restore config state
Restore register state
Restore s/w state & reset
Suspend/resume code provides device checkpoint functionality
RestoreCheckpoint
Thursday, March 7, 13
![Page 50: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/50.jpg)
Intuition with power management
20
Thursday, March 7, 13
![Page 51: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/51.jpg)
Intuition with power management
20
★ Intuition: Power management code captures device specific state for every driver★ Our study: Present in 76% of all common classes
Thursday, March 7, 13
![Page 52: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/52.jpg)
Intuition with power management
20
★ Intuition: Power management code captures device specific state for every driver★ Our study: Present in 76% of all common classes
Linuxdriver Device
Thursday, March 7, 13
![Page 53: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/53.jpg)
Intuition with power management
20
★ Intuition: Power management code captures device specific state for every driver★ Our study: Present in 76% of all common classes
Linuxdriver Device
RAM
Thursday, March 7, 13
![Page 54: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/54.jpg)
Intuition with power management
20
★ Intuition: Power management code captures device specific state for every driver★ Our study: Present in 76% of all common classes
Linuxdriver Device
RAM
suspend()
Thursday, March 7, 13
![Page 55: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/55.jpg)
Intuition with power management
20
★ Intuition: Power management code captures device specific state for every driver★ Our study: Present in 76% of all common classes
Linuxdriver Device
RAM
suspend()resume()
Thursday, March 7, 13
![Page 56: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/56.jpg)
Intuition with power management
20
★ Intuition: Power management code captures device specific state for every driver★ Our study: Present in 76% of all common classes
★ Refactor power management code for device checkpoints★ Correct: Developer captures unique device semantics ★ Fast: Avoids probe and latency critical for applications
Linuxdriver Device
RAM
suspend()resume()
Thursday, March 7, 13
![Page 57: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/57.jpg)
Synergy of isolation and recovery
21
★ Goal: Improve driver recovery with minor changes to drivers
★ Solution: Run drivers as transactions using device checkpoints
Thursday, March 7, 13
![Page 58: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/58.jpg)
Synergy of isolation and recovery
21
★ Goal: Improve driver recovery with minor changes to drivers
★ Solution: Run drivers as transactions using device checkpoints
C R
★ Developers export checkpoint/restore in drivers
Device state
Thursday, March 7, 13
![Page 59: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/59.jpg)
Synergy of isolation and recovery
21
★ Goal: Improve driver recovery with minor changes to drivers
★ Solution: Run drivers as transactions using device checkpoints
C R
★ Developers export checkpoint/restore in drivers
Device state Driver state
★ Run drivers invocations as memory transactions
★ Use source transformation to copy parameters and run on separate stack
SFInetwork
driver
networkdriver
Thursday, March 7, 13
![Page 60: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/60.jpg)
Synergy of isolation and recovery
21
★ Goal: Improve driver recovery with minor changes to drivers
★ Solution: Run drivers as transactions using device checkpoints
C R
★ Developers export checkpoint/restore in drivers
Device state Driver state
★ Run drivers invocations as memory transactions
★ Use source transformation to copy parameters and run on separate stack
SFInetwork
driver
networkdriver
Execution model
★ Checkpoint device
★ Execute driver code as memory transactions
★ On failure, rollback and restore device
★ Re-use existing device locks in the driver
Thursday, March 7, 13
![Page 61: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/61.jpg)
SFInetwork
driver
Example transactional execution
22
networkdriver
probe
xmit
get config
get ringparam
Thursday, March 7, 13
![Page 62: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/62.jpg)
SFInetwork
driver
Example transactional execution
22
networkdriver
probe
xmit
get config
get ringparam netdev
Thursday, March 7, 13
![Page 63: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/63.jpg)
SFInetwork
driver
Example transactional execution
22
networkdriver
probe
xmit
get config
get ringparam
C
netdev
Thursday, March 7, 13
![Page 64: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/64.jpg)
SFInetwork
driver
Example transactional execution
22
networkdriver
netdev->priv->tx_ring
probe
xmit
get config
get ringparam
netdev->priv->rx_ringC
netdev
Thursday, March 7, 13
![Page 65: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/65.jpg)
SFInetwork
driver
Address Access rights
0xffffa000 Read
0xffffa008 Write
0xffffa00a Read
Example transactional execution
22
networkdriver
netdev->priv->tx_ring
probe
xmit
get config
get ringparam
netdev->priv->rx_ringRange TableC
netdev
Thursday, March 7, 13
![Page 66: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/66.jpg)
SFInetwork
driver
Address Access rights
0xffffa000 Read
0xffffa008 Write
0xffffa00a Read
Example transactional execution
22
networkdriver
netdev->priv->tx_ring
probe
xmit
get config
get ringparam
netdev->priv->rx_ringRange TableC
netdev
Thursday, March 7, 13
![Page 67: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/67.jpg)
SFInetwork
driver
Address Access rights
0xffffa000 Read
0xffffa008 Write
0xffffa00a Read
Example transactional execution
22
networkdriver
netdev->priv->tx_ring
probe
xmit
get config
get ringparam
netdev->priv->rx_ringRange TableC
netdev
Thursday, March 7, 13
![Page 68: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/68.jpg)
SFInetwork
driver
Address Access rights
0xffffa000 Read
0xffffa008 Write
0xffffa00a Read
Example transactional execution
22
networkdriver
netdev->priv->tx_ring
probe
xmit
get config
get ringparam
netdev->priv->rx_ringRange Table
netdev->priv->tx_ringnetdev->priv->rx_ring
result
Kernel Log
alloc
C
netdev
Thursday, March 7, 13
![Page 69: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/69.jpg)
SFInetwork
driver
Example failed transaction
23
networkdriver
probe
xmit
get config
get ringparam
Thursday, March 7, 13
![Page 70: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/70.jpg)
SFInetwork
driver
Example failed transaction
23
networkdriver
probe
xmit
get config
get ringparam netdev
Thursday, March 7, 13
![Page 71: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/71.jpg)
SFInetwork
driver
Example failed transaction
23
networkdriver
probe
xmit
get config
get ringparam
C
netdev
Thursday, March 7, 13
![Page 72: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/72.jpg)
SFInetwork
driver
Example failed transaction
23
networkdriver
netdev->priv->tx_ring
probe
xmit
get config
get ringparam
netdev->priv->rx_ringC
netdev
Thursday, March 7, 13
![Page 73: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/73.jpg)
SFInetwork
driver
Address Access rights
0xffffa000 Read
0xffffa008 Write
0xffffa00a Read
Example failed transaction
23
networkdriver
netdev->priv->tx_ring
probe
xmit
get config
get ringparam
netdev->priv->rx_ringRange TableC
netdev
Thursday, March 7, 13
![Page 74: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/74.jpg)
SFInetwork
driver
Address Access rights
0xffffa000 Read
0xffffa008 Write
0xffffa00a Read
Example failed transaction
23
networkdriver
netdev->priv->tx_ring
probe
xmit
get config
get ringparam
netdev->priv->rx_ringRange TableC
netdev
Thursday, March 7, 13
![Page 75: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/75.jpg)
SFInetwork
driver
Address Access rights
0xffffa000 Read
0xffffa008 Write
0xffffa00a Read
Example failed transaction
23
networkdriver
netdev->priv->tx_ring
probe
xmit
get config
get ringparam
netdev->priv->rx_ringRange TableC
netdev
Thursday, March 7, 13
![Page 76: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/76.jpg)
SFInetwork
driver
Address Access rights
0xffffa000 Read
0xffffa008 Write
0xffffa00a Read
Example failed transaction
23
networkdriver
netdev->priv->tx_ring
probe
xmit
get config
get ringparam
netdev->priv->rx_ringRange Table
Kernel Log
alloc
C
netdev
Thursday, March 7, 13
![Page 77: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/77.jpg)
SFInetwork
driver
Address Access rights
0xffffa000 Read
0xffffa008 Write
0xffffa00a Read
Example failed transaction
23
networkdriver
netdev->priv->tx_ring
probe
xmit
get config
get ringparam
netdev->priv->rx_ringRange Table
Kernel Log
alloc
C
netdev
Thursday, March 7, 13
![Page 78: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/78.jpg)
SFInetwork
driver
Address Access rights
0xffffa000 Read
0xffffa008 Write
0xffffa00a Read
Example failed transaction
23
networkdriver
netdev->priv->tx_ring
probe
xmit
get config
get ringparam
netdev->priv->rx_ringRange Table
Kernel Log
alloc
C
netdev
Thursday, March 7, 13
![Page 79: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/79.jpg)
SFInetwork
driver
Address Access rights
0xffffa000 Read
0xffffa008 Write
0xffffa00a Read
Example failed transaction
23
networkdriver
netdev->priv->tx_ring
probe
xmit
get config
get ringparam
netdev->priv->rx_ringRange TableC
netdev
Thursday, March 7, 13
![Page 80: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/80.jpg)
SFInetwork
driver
Address Access rights
0xffffa000 Read
0xffffa008 Write
0xffffa00a Read
Example failed transaction
23
networkdriver
netdev->priv->tx_ring
probe
xmit
get config
get ringparam
netdev->priv->rx_ringRange Table
err
C
netdev
Thursday, March 7, 13
![Page 81: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/81.jpg)
SFInetwork
driver
Address Access rights
0xffffa000 Read
0xffffa008 Write
0xffffa00a Read
Example failed transaction
23
networkdriver
netdev->priv->tx_ring
probe
xmit
get config
get ringparam
netdev->priv->rx_ringRange Table
err
C
R
netdev
Thursday, March 7, 13
![Page 82: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/82.jpg)
SFInetwork
driver
Address Access rights
0xffffa000 Read
0xffffa008 Write
0xffffa00a Read
Example failed transaction
23
networkdriver
netdev->priv->tx_ring
probe
xmit
get config
get ringparam
netdev->priv->rx_ringRange Table
err
C
R
FGFT provides transactional execution of driver entry points
netdev
Thursday, March 7, 13
![Page 83: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/83.jpg)
Outline
24
Introduction
Evaluation & Conclusions
Fine-grained isolation
Checkpoint based recovery
Thursday, March 7, 13
![Page 84: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/84.jpg)
Outline
25
Introduction
Evaluation & Conclusion
Fine-grained isolation
Checkpoint based recovery
Thursday, March 7, 13
![Page 85: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/85.jpg)
Recovery speedup
Driver Class Bus Restart recovery
FGFT recovery
Speedup
8139too net PCI 0.31s 70μs 4400e1000 net PCI 1.80s 295ms 6r8169 net PCI 0.12s 40μs 3000
pegasus net USB 0.15s 5ms 30
ens1371 sound PCI 1.03s 115ms 9
psmouse input serio 0.68s 410ms 1.65
26
FGFT provides significant speedup in driver recovery
Thursday, March 7, 13
![Page 86: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/86.jpg)
Static and dynamic fault injection
Driver Injected Faults
Benign Faults
Native Crashes
FGFT Crashes
8139too 43 0 43 NONEe1000 47 0 47 NONE
r8169 36 0 36 NONEpegasus 34 1 33 NONEens1371 22 1 21 NONE
psmouse 46 0 46 NONETOTAL 258 2 256 NONE
27
FGFT survives multiple static and dynamic faults
Thursday, March 7, 13
![Page 87: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/87.jpg)
Programming effort
Driver LOC Isolation annotationsIsolation annotations Recovery additionsRecovery additions
Driverannotations
Kernelannotations
LOC Moved LOC Added
8139too 1, 904 15 20 26 4
e1000 13, 973 32 32 10r8169 2, 993 10 17 5pegasus 1, 541 26 12 22 5ens1371 2, 110 23 66 16 6psmouse 2, 448 11 19 19 6
28
FGFT requires limited programmer effort and needs only 38 lines of new kernel code
Thursday, March 7, 13
![Page 88: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/88.jpg)
Throughput with isolation and recovery
NativeFGFT-‐off-‐I/OFGFT-‐I/O-‐1/2FGFT-‐I/O-‐all
netperf on Intel quad-core machines29
Thursday, March 7, 13
![Page 89: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/89.jpg)
Throughput with isolation and recovery
0
25
50
75
100
Thr
ough
put
%ag
e (B
asel
ine
844
Mbp
s)
e1000 Network Card
NativeFGFT-‐off-‐I/OFGFT-‐I/O-‐1/2FGFT-‐I/O-‐all
netperf on Intel quad-core machines29
Thursday, March 7, 13
![Page 90: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/90.jpg)
Throughput with isolation and recovery
0
25
50
75
100100
Thr
ough
put
%ag
e (B
asel
ine
844
Mbp
s)
e1000 Network Card
NativeFGFT-‐off-‐I/OFGFT-‐I/O-‐1/2FGFT-‐I/O-‐all
netperf on Intel quad-core machines29
CPU: 2.4%
Thursday, March 7, 13
![Page 91: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/91.jpg)
Throughput with isolation and recovery
0
25
50
75
100100 100
Thr
ough
put
%ag
e (B
asel
ine
844
Mbp
s)
e1000 Network Card
NativeFGFT-‐off-‐I/OFGFT-‐I/O-‐1/2FGFT-‐I/O-‐all
netperf on Intel quad-core machines29
CPU: 2.4% 2.4%
Thursday, March 7, 13
![Page 92: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/92.jpg)
Throughput with isolation and recovery
0
25
50
75
100100 100
96
Thr
ough
put
%ag
e (B
asel
ine
844
Mbp
s)
e1000 Network Card
NativeFGFT-‐off-‐I/OFGFT-‐I/O-‐1/2FGFT-‐I/O-‐all
netperf on Intel quad-core machines29
CPU: 2.4% 2.4% 2.9%
Thursday, March 7, 13
![Page 93: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/93.jpg)
Throughput with isolation and recovery
0
25
50
75
100100 100
96 93
Thr
ough
put
%ag
e (B
asel
ine
844
Mbp
s)
e1000 Network Card
NativeFGFT-‐off-‐I/OFGFT-‐I/O-‐1/2FGFT-‐I/O-‐all
netperf on Intel quad-core machines29
CPU: 2.4% 2.4% 2.9% 3.4%
Thursday, March 7, 13
![Page 94: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/94.jpg)
Throughput with isolation and recovery
0
25
50
75
100100 100
96 93
Thr
ough
put
%ag
e (B
asel
ine
844
Mbp
s)
e1000 Network Card
NativeFGFT-‐off-‐I/OFGFT-‐I/O-‐1/2FGFT-‐I/O-‐all
netperf on Intel quad-core machines29
CPU: 2.4% 2.4% 2.9% 3.4%
FGFT can isolate and recover high bandwidth devices at low overhead without adding kernel subsystems
Thursday, March 7, 13
![Page 95: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/95.jpg)
Summary
30
Thursday, March 7, 13
![Page 96: Fine-grained fault tolerance using device checkpoints](https://reader033.fdocuments.in/reader033/viewer/2022052307/5584595ed8b42adf748b52de/html5/thumbnails/96.jpg)
Summary
30
★ Fine-Grained Fault tolerance based on a pay-as-you go model★ Provides fault tolerance at incremental
performance costs and programmer efforts
★ Introduces fast checkpointing for drivers★ Device checkpoints average ~20micros★ Reduces recovery time significantly★ Should be explored in other domains apart from
fault tolerance like fast reboot, upgrade etc.
Thursday, March 7, 13