Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

38
Outline Introduction Prototype Summary Linux Kernel extensions to minimize effects of Software Aging Ariel Sabiguero Andr´ es Aguirre Fabricio Gonz´ alez Daniel Pedraja Agust´ ın Van Rompaey Instituto de Computaci´on, Facultad de Ingenier´ ıa, Universidad de la Rep´ ublica J. Herrera y Reissig 565, Montevideo, Uruguay {asabigue|aaguirre}@fing.edu.uy {fabgonz|danigpc|fenix.uy}@gmail.com 20/10/2010 A. Sabiguero, A. Aguirre, F. Gonz´ alez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Transcript of Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Page 1: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

Linux Kernel extensions to minimize effects ofSoftware Aging

Ariel Sabiguero Andres Aguirre Fabricio Gonzalez

Daniel Pedraja Agustın Van Rompaey

Instituto de Computacion, Facultad de Ingenierıa, Universidad de la RepublicaJ. Herrera y Reissig 565, Montevideo, Uruguay

{asabigue|aaguirre}@fing.edu.uy {fabgonz|danigpc|fenix.uy}@gmail.com

20/10/2010

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 2: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

1 IntroductionConceptsFinner grained rejuvenation

2 PrototypeProblem definitionKey challenges addressedKernel modifications performedValidationPerformance testing

3 Summary...ongoing workfinally...

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 3: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010
Page 4: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

Concepts

Soft Errors

A soft error is a transient failure in semiconductors causingthe eventual lose of data integrity in memory.

It implies a change in a program or a data value.

Soft errors do not imply a permanent damage on system’shardware, the only damage is to the data that is beingprocessed.

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 5: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

Concepts

Software Aging & Rejuvenation

The term Software aging refers to the deteriorating in the availabilityof OS resources caused by data corruption.

Software Rejuvenation aims at proactive fault management tech-niques addressing the restoration of system’s internal state in orderto prevent the occurrence of failures.

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 6: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

Concepts

Software Aging & Rejuvenation

The term Software aging refers to the deteriorating in the availabilityof OS resources caused by data corruption.

Software Rejuvenation aims at proactive fault management tech-niques addressing the restoration of system’s internal state in orderto prevent the occurrence of failures.

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 7: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

Finner grained rejuvenation

A new approach

Instead of a proactive full process/systemrejuvenation we address a finner grain.

We take advantage of the fact that programcode and parts of program data remainconstant during program execution.

We will apply reactive rejuvenation to the constant areas of thesystem when they get modified.

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 8: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

Finner grained rejuvenation

A new approach

Instead of a proactive full process/systemrejuvenation we address a finner grain.

We take advantage of the fact that programcode and parts of program data remainconstant during program execution.

We will apply reactive rejuvenation to the constant areas of thesystem when they get modified.

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 9: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

Finner grained rejuvenation

A new approach

Instead of a proactive full process/systemrejuvenation we address a finner grain.

We take advantage of the fact that programcode and parts of program data remainconstant during program execution.

We will apply reactive rejuvenation to the constant areas of thesystem when they get modified.

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 10: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

Finner grained rejuvenation

Relevance of R.O. memory

State-of-the-art software engineering techniques suggest thatwe do not code programs that change their own instructions.

Modern systems allows the definition of certain sections ofprograms to be read only, that means, that remain constantall through program execution.

Different portions of code and data are marked R.O. atcompile time.

Modern compilers enforce the usage of R.O. memory on theirnative formats (ELF - Executable and Linking Format and PE- Portable Executable for Linux and Windows respectively).

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 11: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

Finner grained rejuvenation

Relevance of R.O. memory

State-of-the-art software engineering techniques suggest thatwe do not code programs that change their own instructions.

Modern systems allows the definition of certain sections ofprograms to be read only, that means, that remain constantall through program execution.

Different portions of code and data are marked R.O. atcompile time.

Modern compilers enforce the usage of R.O. memory on theirnative formats (ELF - Executable and Linking Format and PE- Portable Executable for Linux and Windows respectively).

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 12: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

Finner grained rejuvenation

Relevance of R.O. memory

State-of-the-art software engineering techniques suggest thatwe do not code programs that change their own instructions.

Modern systems allows the definition of certain sections ofprograms to be read only, that means, that remain constantall through program execution.

Different portions of code and data are marked R.O. atcompile time.

Modern compilers enforce the usage of R.O. memory on theirnative formats (ELF - Executable and Linking Format and PE- Portable Executable for Linux and Windows respectively).

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 13: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

Finner grained rejuvenation

Relevance of R.O. memory

State-of-the-art software engineering techniques suggest thatwe do not code programs that change their own instructions.

Modern systems allows the definition of certain sections ofprograms to be read only, that means, that remain constantall through program execution.

Different portions of code and data are marked R.O. atcompile time.

Modern compilers enforce the usage of R.O. memory on theirnative formats (ELF - Executable and Linking Format and PE- Portable Executable for Linux and Windows respectively).

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 14: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

Problem definition

Objective & target platform

Detect and handle the occurrence of Soft Errors in R.O.memory.

Platform

O.S.: GNU Linux Kernel 2.6.25.9Distribution: OpenSuSE 11.0Architecture: Intel x86

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 15: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

Problem definition

Objective & target platform

Detect and handle the occurrence of Soft Errors in R.O.memory.

Platform

O.S.: GNU Linux Kernel 2.6.25.9Distribution: OpenSuSE 11.0Architecture: Intel x86

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 16: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

Key challenges addressed

Read-Only Memory in Linux

Characteristics

Frame GranularityProtection scheme: User space onlyFrames shared between tasks

Read-only subset: frames mapped to one or more processeswith Read-Only access in every instance

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 17: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

Key challenges addressed

Read-Only Memory in Linux

Characteristics

Frame GranularityProtection scheme: User space onlyFrames shared between tasks

Read-only subset: frames mapped to one or more processeswith Read-Only access in every instance

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 18: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

Key challenges addressed

Error detection mechanism

Memory change-detection algorithm

Frame levelError detection code: CRC32

Search Strategies

System level Frame PollingTask subset PollingTask scheduler checks

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 19: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

Key challenges addressed

Error detection mechanism

Memory change-detection algorithm

Frame levelError detection code: CRC32

Search Strategies

System level Frame PollingTask subset PollingTask scheduler checks

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 20: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

Key challenges addressed

Error Handling actions

Error correction code: Hamming

Automatic File Rejuvenation

User space rejuvenation assistance

Error details =⇒ high granularity actionsAgent notifications =⇒ Synchronous actions

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 21: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

Kernel modifications performed

Kernel Map

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 22: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

Validation

Ensuring correctness of the implementation

Motivation: Separate bugs from Soft Errors

Challenges

Low error probability in our typical scenarioHardware error generation difficult and expensive

Fault Injection

Software based memory error simulationKernel integrated vs High levelExposed as System call

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 23: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

Validation

Ensuring correctness of the implementation

Motivation: Separate bugs from Soft Errors

Challenges

Low error probability in our typical scenarioHardware error generation difficult and expensive

Fault Injection

Software based memory error simulationKernel integrated vs High levelExposed as System call

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 24: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

Performance testing

Case study

We decided to evaluate the impact on an IO-boundedapplication and a CPU-bounded one.

Methodologically, we contrast benchmarks run on a modifiedkernel and a standard one (vanilla).

Different levels of performance in tasks depending onresources used:

Memory corruption correction routines almost do not competewith IO-bounded loads.CPU-bounded applications compete for the same resourceimpacting on system performance.

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 25: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

Performance testing

Case study

We decided to evaluate the impact on an IO-boundedapplication and a CPU-bounded one.

Methodologically, we contrast benchmarks run on a modifiedkernel and a standard one (vanilla).

Different levels of performance in tasks depending onresources used:

Memory corruption correction routines almost do not competewith IO-bounded loads.CPU-bounded applications compete for the same resourceimpacting on system performance.

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 26: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

Performance testing

Case study

We decided to evaluate the impact on an IO-boundedapplication and a CPU-bounded one.

Methodologically, we contrast benchmarks run on a modifiedkernel and a standard one (vanilla).

Different levels of performance in tasks depending onresources used:

Memory corruption correction routines almost do not competewith IO-bounded loads.

CPU-bounded applications compete for the same resourceimpacting on system performance.

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 27: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

Performance testing

Case study

We decided to evaluate the impact on an IO-boundedapplication and a CPU-bounded one.

Methodologically, we contrast benchmarks run on a modifiedkernel and a standard one (vanilla).

Different levels of performance in tasks depending onresources used:

Memory corruption correction routines almost do not competewith IO-bounded loads.CPU-bounded applications compete for the same resourceimpacting on system performance.

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 28: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

Performance testing

Case study: performance results

IO-bounded CPU-bounded

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 29: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

...ongoing work

Future work

Address lose of cache locality.

Consider power consumption due to continuous 100% CPUusage.

Focus in embedded solutions

Improve CPU usage (different approach than on desktops).Test in architectures different from x86

Wish: to be able to test in ambient with more probability ofsoft errors (EMI).

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 30: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

...ongoing work

Future work

Address lose of cache locality.

Consider power consumption due to continuous 100% CPUusage.

Focus in embedded solutions

Improve CPU usage (different approach than on desktops).Test in architectures different from x86

Wish: to be able to test in ambient with more probability ofsoft errors (EMI).

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 31: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

...ongoing work

Future work

Address lose of cache locality.

Consider power consumption due to continuous 100% CPUusage.

Focus in embedded solutions

Improve CPU usage (different approach than on desktops).Test in architectures different from x86

Wish: to be able to test in ambient with more probability ofsoft errors (EMI).

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 32: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

...ongoing work

Future work

Address lose of cache locality.

Consider power consumption due to continuous 100% CPUusage.

Focus in embedded solutions

Improve CPU usage (different approach than on desktops).Test in architectures different from x86

Wish: to be able to test in ambient with more probability ofsoft errors (EMI).

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 33: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

finally...

Conclusions

We built and tested a prototype with the expectedcharacteristics.

The software rejuvenation implementation is based onsoftware instead the traditional hardware based scheme.

Our approach avoids full system restart or full process restart,for the kind of errors addressed.

Being simple and non-intrusive, it is aplicable to any piece of(Linux) software without any modifications.

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 34: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

finally...

Conclusions

We built and tested a prototype with the expectedcharacteristics.

The software rejuvenation implementation is based onsoftware instead the traditional hardware based scheme.

Our approach avoids full system restart or full process restart,for the kind of errors addressed.

Being simple and non-intrusive, it is aplicable to any piece of(Linux) software without any modifications.

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 35: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

finally...

Conclusions

We built and tested a prototype with the expectedcharacteristics.

The software rejuvenation implementation is based onsoftware instead the traditional hardware based scheme.

Our approach avoids full system restart or full process restart,for the kind of errors addressed.

Being simple and non-intrusive, it is aplicable to any piece of(Linux) software without any modifications.

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 36: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

finally...

Conclusions

We built and tested a prototype with the expectedcharacteristics.

The software rejuvenation implementation is based onsoftware instead the traditional hardware based scheme.

Our approach avoids full system restart or full process restart,for the kind of errors addressed.

Being simple and non-intrusive, it is aplicable to any piece of(Linux) software without any modifications.

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 37: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

finally...

Thank you for your time

Questions?

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging

Page 38: Linux Kernel extensions to minimize effects of Software Aging - CLEI2010

Outline Introduction Prototype Summary

finally...

Linux Kernel extensions to minimize effects ofSoftware Aging

Ariel Sabiguero Andres Aguirre Fabricio Gonzalez

Daniel Pedraja Agustın Van Rompaey

Instituto de Computacion, Facultad de Ingenierıa, Universidad de la RepublicaJ. Herrera y Reissig 565, Montevideo, Uruguay

{asabigue|aaguirre}@fing.edu.uy {fabgonz|danigpc|fenix.uy}@gmail.com

20/10/2010

A. Sabiguero, A. Aguirre, F. Gonzalez, D. Pedraja, A. Van Rompaey Linux Kernel extensions to minimize effects of Software Aging