Good Enough Dependability: A Unified Paradigm for...

16
Good Enough Dependability: A Unified Paradigm for Dependable Systems Design Karthik Pattabiraman http://blogs.ubc.ca/karthik

Transcript of Good Enough Dependability: A Unified Paradigm for...

Page 1: Good Enough Dependability: A Unified Paradigm for ...blogs.ubc.ca/karthik/files/2020/07/Good-enough-dsn-rsd.pdf · Good Enough Dependability: Takeaways •Errors and attacks are becoming

Good Enough Dependability: A Unified Paradigm for

Dependable Systems DesignKarthik Pattabiraman

http://blogs.ubc.ca/karthik

Page 2: Good Enough Dependability: A Unified Paradigm for ...blogs.ubc.ca/karthik/files/2020/07/Good-enough-dsn-rsd.pdf · Good Enough Dependability: Takeaways •Errors and attacks are becoming

Computer Systems are Everywhere

Dependability of computer systems is paramount2

Page 3: Good Enough Dependability: A Unified Paradigm for ...blogs.ubc.ca/karthik/files/2020/07/Good-enough-dsn-rsd.pdf · Good Enough Dependability: Takeaways •Errors and attacks are becoming

Traditional Dependability Approaches

Hardware Redundancy

• IBM Mainframes, Tandem Non-stop – full duplication

• Huge energy and performance overheads

Formal Verification

• Space exploration (e.g., NASA Mars rover)

• Requires significant time and resources, as well as expertise

3

Page 4: Good Enough Dependability: A Unified Paradigm for ...blogs.ubc.ca/karthik/files/2020/07/Good-enough-dsn-rsd.pdf · Good Enough Dependability: Takeaways •Errors and attacks are becoming

The “Good Enough” RevolutionSource: WIRED Magazine (Sep 2009) – Robert Kappshttp://www.wired.com/gadgets/miscellaneous/magazine/17-09/ff_goodenough

People prefer “cheap and good-enough” over “costly and near-perfect”

4

Can we build dependable systems with this principle ?

Page 5: Good Enough Dependability: A Unified Paradigm for ...blogs.ubc.ca/karthik/files/2020/07/Good-enough-dsn-rsd.pdf · Good Enough Dependability: Takeaways •Errors and attacks are becoming

“Good Enough” Dependable Systems• Just reliable enough to get the job done

• Do not provide the illusion of perfection to end user• But do not fail catastrophically or cause severe errors• Depends on the application and its context of use

5

Low Reliability:Entertainment Applications

High Reliability: Financial Services

Page 6: Good Enough Dependability: A Unified Paradigm for ...blogs.ubc.ca/karthik/files/2020/07/Good-enough-dsn-rsd.pdf · Good Enough Dependability: Takeaways •Errors and attacks are becoming

Hardware Error Resilience

Web Application Reliability

Good Enough Dependability

Selective Security

Protection

6

Page 7: Good Enough Dependability: A Unified Paradigm for ...blogs.ubc.ca/karthik/files/2020/07/Good-enough-dsn-rsd.pdf · Good Enough Dependability: Takeaways •Errors and attacks are becoming

Hardware Error Resilience

Web Application Reliability

Good Enough Dependability

Selective Security

Protection

7

Page 8: Good Enough Dependability: A Unified Paradigm for ...blogs.ubc.ca/karthik/files/2020/07/Good-enough-dsn-rsd.pdf · Good Enough Dependability: Takeaways •Errors and attacks are becoming

Why does this approach work ?

SDC

Cove

rag

e

Protection Overhead

The Cost-Benefit Curve of Selective Duplication (Liquantum) 8

Impactful Errors

Device/Circuit Level

Architectural Level

Operating System Level

Application Level

Soft Error

Software protection techniques are more flexible

and cost-effective!

About 80% of SDCs can be mitigated by 20% overhead (80-20 rule)

8

Page 9: Good Enough Dependability: A Unified Paradigm for ...blogs.ubc.ca/karthik/files/2020/07/Good-enough-dsn-rsd.pdf · Good Enough Dependability: Takeaways •Errors and attacks are becoming

Good Enough Dependability: Approach

9

Automated Techniques to identify important data

Selective protection to mitigate errors

Rigorous validation through fault Injection

Page 10: Good Enough Dependability: A Unified Paradigm for ...blogs.ubc.ca/karthik/files/2020/07/Good-enough-dsn-rsd.pdf · Good Enough Dependability: Takeaways •Errors and attacks are becoming

Step 1: Automated Identification

- Type System [ASPLOS’11][CSF’11]

- Heuristics [DSN’13][TECS][DSN’15]

- Machine Learning [CASES’14][TECS]

- Analytical Models [DSN’16][DSN’18]

10

Corruption due to errors

CriticalData

ApplicationData

Critical Data is correlated with high-level static

program characteristics

Page 11: Good Enough Dependability: A Unified Paradigm for ...blogs.ubc.ca/karthik/files/2020/07/Good-enough-dsn-rsd.pdf · Good Enough Dependability: Takeaways •Errors and attacks are becoming

Step 2: Selective Protection

11

Original Program Selective Duplication

Instruction:SDC Rate = X%Overhead = Y%

A Knapsack Problem

TargetProgram

Page 12: Good Enough Dependability: A Unified Paradigm for ...blogs.ubc.ca/karthik/files/2020/07/Good-enough-dsn-rsd.pdf · Good Enough Dependability: Takeaways •Errors and attacks are becoming

Step 3: Fault Injection Validation

- LLFI [DSN’14][QRS’15]

- PINFI [DSN’14]

- GPU-Qin [ISPASS’14]

- LLFI-GPU [SC’16]

12

Evaluation

Acceptable ?

Program Source Code

Protection

Overall SDC rate of program

SDC rates of individual instructions

FI tool

Trident, vTrident [DSN’18A][DSN’18B]

Page 13: Good Enough Dependability: A Unified Paradigm for ...blogs.ubc.ca/karthik/files/2020/07/Good-enough-dsn-rsd.pdf · Good Enough Dependability: Takeaways •Errors and attacks are becoming

Hardware Error Resilience

Web Application Reliability

Good Enough Dependability

Selective Security

Protection

13

Page 14: Good Enough Dependability: A Unified Paradigm for ...blogs.ubc.ca/karthik/files/2020/07/Good-enough-dsn-rsd.pdf · Good Enough Dependability: Takeaways •Errors and attacks are becoming

14

Hardware Error

Resilience

Web Application Reliability

Good Enough Dependability

Selective Security

Protection

Internet of Things (IoT) Dependability

Resilient Operation

Programming Models

Adaptive Security

Page 15: Good Enough Dependability: A Unified Paradigm for ...blogs.ubc.ca/karthik/files/2020/07/Good-enough-dsn-rsd.pdf · Good Enough Dependability: Takeaways •Errors and attacks are becoming

Good Enough Dependability: Takeaways

• Errors and attacks are becoming common in commodity systems• Cost is the all important factor in these systems

• But, most errors (attacks) don’t matter much, in many cases !

• Important to focus on the few errors (attacks) that matter• Provide targeted protection for the important errors

(attacks)• Goal is not to achieve near 100% coverage, but keep costs

low• Automated techniques to trade-off coverage for cost

Page 16: Good Enough Dependability: A Unified Paradigm for ...blogs.ubc.ca/karthik/files/2020/07/Good-enough-dsn-rsd.pdf · Good Enough Dependability: Takeaways •Errors and attacks are becoming

Thanks ...

16

Students (Current and Past) - 12 PhD, 20 MS, 30 Undergrad

http://blogs.ubc.ca/karthik