Does%Error%Resilience%Maer%in%the%...

16
Does Error Resilience Ma/er in the age of Approximate Compu:ng ? Karthik Pa/abiraman, Bo Fang, Guanpeng Li, Qining Lu, Anna Thomas, and Jiesheng Wei University of Bri:sh Columbia (UBC) h/p://blogs.ubc.ca/karthik/

Transcript of Does%Error%Resilience%Maer%in%the%...

Page 1: Does%Error%Resilience%Maer%in%the% age%of%Approximate%Compu…blogs.ubc.ca/karthik/files/2016/03/SELSE2016-panel.pdf · 2016. 3. 29. · Approximate%Compu:ng:%Myths%and% Reality%

Does  Error  Resilience  Ma/er  in  the  age  of  Approximate  Compu:ng  ?  

Karthik  Pa/abiraman,      

Bo  Fang,  Guanpeng  Li,  Qining  Lu,  Anna  Thomas,  and  Jiesheng  Wei    

University  of  Bri:sh  Columbia  (UBC)  h/p://blogs.ubc.ca/karthik/  

Page 2: Does%Error%Resilience%Maer%in%the% age%of%Approximate%Compu…blogs.ubc.ca/karthik/files/2016/03/SELSE2016-panel.pdf · 2016. 3. 29. · Approximate%Compu:ng:%Myths%and% Reality%

What  this  talk  is  about  

•  Approximate  Compu.ng:  Exact  results  don’t  ma/er  (much)  -­‐  compromise  on  correctness  

•  Error-­‐Resilient  Compu.ng:  Can  we  produce  correct  results  in  the  presence  of  hardware  faults  ?  

•  Ques.on:  If  correctness  does  not  ma/er,  then  is  it  worth  the  trouble  to  build  error  resilient  systems  ?    

2  

Page 3: Does%Error%Resilience%Maer%in%the% age%of%Approximate%Compu…blogs.ubc.ca/karthik/files/2016/03/SELSE2016-panel.pdf · 2016. 3. 29. · Approximate%Compu:ng:%Myths%and% Reality%

What  this  talk  is  about  

3  

Approximate  Compu.ng  

Error  Resilient  Compu.ng  

Page 4: Does%Error%Resilience%Maer%in%the% age%of%Approximate%Compu…blogs.ubc.ca/karthik/files/2016/03/SELSE2016-panel.pdf · 2016. 3. 29. · Approximate%Compu:ng:%Myths%and% Reality%

Approximate  Compu:ng:  Myths  and  Reality  

•  Myth  1:  SoY  compu:ng  applica:ons  can  tolerate  almost  all  errors  in  their  data  

•  Myth  2:  Crashes  are  harmless.  SDCs  or  output  corrup:ons  are  what  ma/er  in  prac:ce.  

•  Myth  3:  Programmers  are  good  at  wri:ng  correctness  checks  or  annota:ons  in  the  code  

4  

Page 5: Does%Error%Resilience%Maer%in%the% age%of%Approximate%Compu…blogs.ubc.ca/karthik/files/2016/03/SELSE2016-panel.pdf · 2016. 3. 29. · Approximate%Compu:ng:%Myths%and% Reality%

Approximate  Compu:ng:  Myths  and  Reality  

•  Myth  1:  SoY  compu:ng  applica:ons  can  tolerate  almost  all  errors  in  their  data  

•  Myth  2:  Crashes  are  harmless.  SDCs  or  output  corrup:ons  are  what  ma/er  in  prac:ce.  

•  Myth  3:  Programmers  are  good  at  wri:ng  correctness  checks  or  annota:ons  in  the  code  

5  

Page 6: Does%Error%Resilience%Maer%in%the% age%of%Approximate%Compu…blogs.ubc.ca/karthik/files/2016/03/SELSE2016-panel.pdf · 2016. 3. 29. · Approximate%Compu:ng:%Myths%and% Reality%

Soft Computing Applications  

Ø Applica:ons  in  machine  learning,  mul:media  etc.    

6  

Original  image  (leY)  versus  faulty  image:  JPEG  decoder  

Page 7: Does%Error%Resilience%Maer%in%the% age%of%Approximate%Compu…blogs.ubc.ca/karthik/files/2016/03/SELSE2016-panel.pdf · 2016. 3. 29. · Approximate%Compu:ng:%Myths%and% Reality%

EDCs:  What  are  they  ?  

7  

Ø Large  or  unacceptable  devia:on  in  output    

EDC  image  (PSNR  11.37)  Vs.  Non-­‐EDC  image  (PSNR  44.79)  

Page 8: Does%Error%Resilience%Maer%in%the% age%of%Approximate%Compu…blogs.ubc.ca/karthik/files/2016/03/SELSE2016-panel.pdf · 2016. 3. 29. · Approximate%Compu:ng:%Myths%and% Reality%

Error  Resilience:  Ini:al  Study      

8  

6%  

43  %  

23%  

28%  

References:  DSN’13,  SELSE’13,  TECS  –  in  press  

Page 9: Does%Error%Resilience%Maer%in%the% age%of%Approximate%Compu…blogs.ubc.ca/karthik/files/2016/03/SELSE2016-panel.pdf · 2016. 3. 29. · Approximate%Compu:ng:%Myths%and% Reality%

Approximate  Compu:ng:  Myths  and  Reality  

•  Myth  1:  SoY  compu:ng  applica:ons  can  tolerate  almost  all  errors  in  their  data  

•  Myth  2:  Crashes  are  harmless.  SDCs  or  output  corrup:ons  are  what  ma/er  in  prac:ce  

•  Myth  3:  Programmers  are  good  at  wri:ng  correctness  checks  or  annota:ons  in  the  code  

9  

Page 10: Does%Error%Resilience%Maer%in%the% age%of%Approximate%Compu…blogs.ubc.ca/karthik/files/2016/03/SELSE2016-panel.pdf · 2016. 3. 29. · Approximate%Compu:ng:%Myths%and% Reality%

Fail-­‐stop  Assump:on  

10  

Page 11: Does%Error%Resilience%Maer%in%the% age%of%Approximate%Compu…blogs.ubc.ca/karthik/files/2016/03/SELSE2016-panel.pdf · 2016. 3. 29. · Approximate%Compu:ng:%Myths%and% Reality%

Long-­‐latency  Crash  

Send  messages  

File  I/O  

Checkpoints  

11  

Page 12: Does%Error%Resilience%Maer%in%the% age%of%Approximate%Compu…blogs.ubc.ca/karthik/files/2016/03/SELSE2016-panel.pdf · 2016. 3. 29. · Approximate%Compu:ng:%Myths%and% Reality%

Checkpoint  Corrup:ons  

12  

References:  DSN’15,  ISSRE’15  

Page 13: Does%Error%Resilience%Maer%in%the% age%of%Approximate%Compu…blogs.ubc.ca/karthik/files/2016/03/SELSE2016-panel.pdf · 2016. 3. 29. · Approximate%Compu:ng:%Myths%and% Reality%

Approximate  Compu:ng:  Myths  and  Reality  

•  Myth  1:  SoY  compu:ng  applica:ons  can  tolerate  almost  all  errors  in  their  data  

•  Myth  2:  Crashes  are  harmless.  SDCs  or  output  corrup:ons  are  what  ma/er  in  prac:ce  

•  Myth  3:  Programmers  are  good  at  wri:ng  correctness  checks  or  annota:ons  in  the  code  

13  

Page 14: Does%Error%Resilience%Maer%in%the% age%of%Approximate%Compu…blogs.ubc.ca/karthik/files/2016/03/SELSE2016-panel.pdf · 2016. 3. 29. · Approximate%Compu:ng:%Myths%and% Reality%

Programmers  are  busy  !  

•  Last  thing  they’re  going  to  worry  about  is  annota:ng  data/code  as  cri:cal  or  non-­‐cri:cal    

•  Wri:ng  good  correctness  checks  is  hard  –  many  checks  are  either  ineffec:ve  or  wrong    

•  Finally,  programmers  tend  to  be  conserva:ve  –  Will  mark  everything  as  cri:cal  if  in  doubt  

14  

Page 15: Does%Error%Resilience%Maer%in%the% age%of%Approximate%Compu…blogs.ubc.ca/karthik/files/2016/03/SELSE2016-panel.pdf · 2016. 3. 29. · Approximate%Compu:ng:%Myths%and% Reality%

Approximate  Compu:ng:  Myths  and  Reality  

•  Myth  1:  SoY  compu:ng  applica:ons  can  tolerate  almost  all  errors  in  their  data  

•  Myth  2:  Crashes  are  harmless.  SDCs  or  output  corrup:ons  are  what  ma/er  in  prac:ce  

•  Myth  3:  Programmers  are  good  at  wri:ng  correctness  checks  or  annota:ons  in  the  code  

15  

Page 16: Does%Error%Resilience%Maer%in%the% age%of%Approximate%Compu…blogs.ubc.ca/karthik/files/2016/03/SELSE2016-panel.pdf · 2016. 3. 29. · Approximate%Compu:ng:%Myths%and% Reality%

Error  Resilience  DOES  Ma/er  in  the  age  of  Approximate  Compu:ng  !  

Karthik  Pa/abiraman,      

Bo  Fang,  Guanpeng  Li,  Qining  Lu,  Anna  Thomas,  and  Jiesheng  Wei  

University  of  Bri:sh  Columbia  (UBC)  h/p://blogs.ubc.ca/karthik/