GPGPU in HPC - UBC Blogsblogs.ubc.ca/karthik/files/2016/11/SC_Talk-Justin.pdf · 2016-11-15 ·...

GPGPU in HPC

Scien+ficApplica+ons

So, Errors

=0001 =0101

Baumanetal.[TDMR,2005]

Tradi3onal Method: DMR

DualModularRedundancy(DMR)

•  Run2copies

•  Comparefordivergence

Toomuchenergyconsump+on!

So,ware Solu3ons

Advantages

•  Nohardwaremodifica+on

•  Errorscanbemasked

•  Allowselec+veprotec+on

Applica+onLevel

Opera+ngSystemLevel

ArchitecturalLevel

Device/CircuitLevel

Impact

Challenges for GPGPU Resilience

• DifferentarchitectureandprogrammingmodelfromCPUs

• Noscalablefaultinjec+ontoolsforHPCGPGPUapplica+ons

Our Contribu3ons

LLFI-GPU:ScalableFault

Injector

Characteriza+onofError

Propaga+on

Implica+onsonErrorMi+ga+ons

Exis3ng Publicly Available GPU Fault Injectors

•  Hauberk[IPDPS,2011]•  Sourcecodelevelfaultinjec+on

•  Notrepresenta+veforhardwareerrors

•  GPU-Qin[ISPASS,2014]•  Debugger-based

•  Execu+onisslow

•  GPGPU-Simbasedfaultinjector[DSN,2015]

•  Notfullsystemsimula+on

Goals of LLFI-GPU

•  Na)veSpeed•  Program-levelfaultinjec+on

•  Compiletobinary

•  Fullsystemsimula)on

•  Executeonrealhardware

•  Abletosimulatedifferentfailureoutcomes

•  Representa)veness•  LLVMIRlevelfaultinjec+on

•  Closetoassembly,yetpreservehigh-levelprogramsymbols

LLVM (Low Level Virtual Machine)

LLFIforCPU

LLFIforCPU:hfps://github.com/DependableSystemsLab/LLFI

LLFI-GPU: Overview

.cufiles

LLVMIR

PTXAssembly

SASSAssembly

Instrumenta+onPasses

mpila+o

LLFI-GPU…

R0=addR1,R2

R0=injectFault(R0)

R4=mulR0,R3

Advantages of LLFI-GPU

•  CompileonlargeGPGPUprograms

•  1000xfastercomparedtoGPU-Qin(MatrixMul)

•  Representedsimula+on

•  Fullsystemsimula+onofsoierrors

•  Open-source

•  hfps://github.com/DependableSystemsLab/LLFI-GPU

Experiment Setup: Nvidia K20

•  12Benchmarks•  Rodinia&Parboilsuites

•  Lulesh(LLNL),Barns-Hut(TexasStateUniv.),Fiber(NortheasternUniv.),CircuitSolver(RiceUniv.)andNMF(UCBerkley)

•  FaultInjec+on•  10,000perapplica+on(Errorbar:0.22%-2.99%,95%confidencelevel)

•  FaultModel•  Singlebit-flip

•  Transientfaultsinexecu+onunits

Failure Outcomes

•  SilentDataCorrup+on(SDC)•  Mismatchinprogramoutputsfromgoldenrunandfaultinjec+onrun

•  Crash•  CUDAexcep+ons(e.g,illegalmemoryaddress)

•  Causekernelexecu+ontohalt

•  Benign•  Noeffectonprogramoutput

Our Contribu3ons

LLFI-GPU:ScalableFault

Injector

Characteriza+onofError

Propaga+on

Implica+onsonErrorMi+ga+ons

Research Ques3on 1

WhatisthepercentageofSDCsindifferent

memorystates?

Memory State

TotalMemory(TM)

ResultMemory(RM)

OutputMemory(OM)

…cudaMalloc(M1)cudaMalloc(M2)cudaMemcpy(M1,…)cudaMemcpy(M2,…)…Kernel<<<>>>,……cudaMemcpy(…,M2)…Foreach(M2):if(ele>0){print(ele)}…

SDC in Different Memory States

bfs barneshut nmf

SDCTM-SDCRM 0.00% 0.20% 0.00% AverageofSDC(TM-RM)

inallbenchmarks:

bfs barneshut nmf

RM 14.29% 37.50% 0.03%

TM 100% 100% 100%

SizeofStates

SDCofStates

AveragesizeofRM

inallbenchmarks:

13.56%CheckingRMreduces~86%overhead

whileretainingcoverage

MostofthefaultsinTMpropagateRM

Example of Checkers

• Checkvaluerangeofpar+cularstates•  Calcula+ngangle:if(angle>60orangle<0){errordetected}

• Overheadisdirectlypropor+onaltothenumberofstateschecked

•  CheckingRMreduces~86%overhead

•  Smalllossofcoverage

Pafabiramanetal.[TDSC,2011]Harietal.[DSN,2012]

Research Ques3on 2

HowlongdoerrorstaketopropagatetotheRM?

Metrics: Kernel Call

…tomeasurepropaga+on+meoferror

GPUExecu)on

CPUExecu)on

Kernel1 Kernel2 Kernel3

ErrorDetector ErrorDetector ErrorDetector

ErrorDetected

ErrorOccurred

Errordetec+onlatencyis2

Tracking Error Propaga3on

…Kernel1<<<>>>DumpToDisk(TM)DumpToDisk(RM)DumpToDisk(OM)Kernel2<<<>>>…

Comparedwithgoldencopyforanydatacorrup+ons

Propaga3on Latency to RM

CheckingRMprovides

shortdetec+onlatency

Implica3ons

•  RMisanarrowtunnelwherefaultsfrequentlypropagatethrough•  CheckingRMforSDCisabefertrade-off

•  Crash-causingfaultsrarelypropagateacrosskernelcalls•  DeployinghighfrequencycheckpointsforGPGPUcanavoidcheckpointcorrup+ons

•  Studiedon2GPGPUplavorms(NvidiaGTX960&NvidiaK20)•  Resultsaresta+s+callyindis+nguishable

•  Inves+gatedinerrorspread&maskingetc•  … moreinteres+ngfindingscanbefoundinthepaper!

Summary

• DesignedascalablefaultinjectorforGPGPUs:LLFI-GPU• Characterizederrorpropaga+onpafernsinGPGPUapplica+ons• Discussedtheirimplica+onsonerrormi+ga+ontechniques

•  Name:Guanpeng(Jus+n)Li(gpli@ece.ubc.ca)• Website:ece.ubc.ca/~gpli•  LLFI-GPU:

•  hfps://github.com/DependableSystemsLab/LLFI-GPU•  Results:

•  hfps://www.dropbox.com/s/xrvojidskkcrj4y/FI_data.xlsx?dl=0

Acknowledgements

GPGPU in HPC - UBC Blogsblogs.ubc.ca/karthik/files/2016/11/SC_Talk-Justin.pdf · 2016-11-15 ·...

Documents

Transcript of GPGPU in HPC - UBC Blogsblogs.ubc.ca/karthik/files/2016/11/SC_Talk-Justin.pdf · 2016-11-15 ·...

GPGPU overview

GPGPU using CUDA Thrust

GPGPU: Beyond Graphics

OpenFOAM + GPGPU - İTÜtraining.uhem.itu.edu.tr/files/belgeler/belgeler-dosya... · Outline • GPGPU vs CPU • GPGPU plugins for OpenFOAM® • Overview of Discretization • Linear

GPGPU Based Cortical Modeling - courses.csail.mit.educourses.csail.mit.edu/18.337/2012/projects/ted_hilk_slides.pdfExisting GPGPU Cortical Modldeling Frameworks • Very few examples,

The GPGPU Continuum

Computer Vision Algorithm Acceleration Using GPGPU · Boeing Research & Technology | GPGPU GPGPU Pipeline Optimization: After GPU Pipeline is kept full with processing, no “air

peddie gpgpu

Intermediate GPGPU Programming in CUDA

CSTalks - GPGPU - 19 Jan

EPGPU: Expressive Programming for GPGPU

GPGPU-Sim & AerialVision · Overview •GPGPU-Sim –overview –some internals –demo •AerialVision –demo •Encountered problems Note: Heavily based on 3-hour GPGPU-Sim Tutorial

GPGPU algorithms in games

GPGPU Accelerated Database

Dynamic GPGPU Power Management Using Adaptive Model ...albonesi/research/papers/hpca17.pdf · C. GPGPU Kernel Characterization GPGPU kernels show sensitivity to hardware conﬁgurations

Python + GPGPU

GPGPU-2 - Northeastern University

Improving GPGPU Resource Utilization Through Alternative ...camelab.org/uploads/Main/Improving GPGPU Resource... · Improving GPGPU Resource Utilization Through Alternative Thread

GPGPU in scientifc applications

Introduction to GPGPU Programming