The Path from Petascale to Exascale Hardware and ...

16
The Path from Petascale to Exascale Hardware and Applica8ons Issues Rick Stevens Argonne Na8onal Laboratory University of Chicago

Transcript of The Path from Petascale to Exascale Hardware and ...

ThePathfromPetascaletoExascaleHardwareandApplica8onsIssues

RickStevensArgonneNa8onalLaboratory

UniversityofChicago

Supercompu8ng&CloudCompu8ng

•  Twodominantmacroarchitecturesdominatelarge‐scale(inten8onal)compu8nginfrastructures(vsembedded&adhoc)

•  Supercompu8ngtypeStructures– Large‐scaleintegratedcoherentsystems– Managedforhighu8liza8onandefficiency

•  EmergingcloudtypeStructures– Large‐scalelooselycoupled,lightlyintegrated– Managedforavailability,throughput,reliability

3

Top500Trends

LookingtoExascale

AThreeStepPathtoExascale

AThreeStepPathtoExascale

TopPinchPoints

•  PowerConsump8on– Proc/mem,I/O,op8cal,memory,delivery

•  Chip‐to‐ChipInterfaceScaling(pin/wirecount)•  Package‐to‐PackageInterfaces(op8cs)•  FaultTolerance(FITratesandFaultManagement)– Reliabilityofirregularlogic,designprac8ce

•  CostPressureinOp8csandMemory

ProgrammingModels:TwentyYearsandCoun8ng

•  Inlarge‐scalescien8ficcompu8ngtodayessen8allyallcodesaremessagepassingbased(CSPandSPMD)

• Mul8coreischallengingthesequen8alpartofCSPbuttherehasnotemergedadominatemodeltoaugmentmessagepassing

QuasiMainstreamProgrammingModels

•  C,Fortran,C++andMPI•  OpenMP,pthreads•  CUDA,RapidMind•  ClearspeedsCn•  PGAS(UPC,CAF,Titanium)•  HPCSLanguages(Chapel,Fortress,X10)•  HPCResearchLanguagesandRun8me•  HLL(ParallelMatlab,GridMathema8ca,etc.)

DriverApplica8ons:BasicScienceandEmerging

NERSC2007RankAbundance

0

0.2

0.4

0.6

0.8

1

1.21 11

21

31

41

51

61

71

81

91

101

111

121

131

141

151

161

171

181

191

201

211

221

231

241

251

261

271

281

291

301

311

321

331

341

351

361

Series1

Top6use20%

Top17use40%

Top40use60%

Top85use80%

<100groupsusetheMajorityoftheCycles

MillionWayConcurrencyToday

•  Lijle’slawdrivenneedforconcurrency– Tocoverlatencyinmemorypath– Func8onofaggregatememorybandwidthandclockspeed–  Independentoftechnologyandarchitecturetofirstorder

•  MainstreamCPUs(e.g.x86,PPC,SPARC)– 8‐16cores,4‐8hardwarethreadspercore,– Totalsystemwith103–105nodes=>32K–12Mthreads– BG/Pexampleat1PF72x4K=300,000(buteachthreadhastodo4ops/clock)=>1.2Mopsperclock

•  GPUbasedcluster(e.g.1000Tesla1Unodes)– 3x128coresx(32‐96)threadspercorex1000nodes=12M–36Mthreads

Exis8ngBodyofParallelSopware

•  Howmanyexis8ngHPCscienceandengineeringcodesscalebeyond1000processors?– Myes8mateisthatitislessthan1000worldwide– TopusersatNERSC,OLCFandALCF<200groups–  ItappearslikelythatthebulkofcyclesonTop500areusedincapacitymodewiththeexcep8onofasiteswithpoliciesthatenforcecapabilityruns

•  Howquicklyarenewcodesbeinggenerated?– Abini8odevelopment– Migra8onandpor8ngfrompreviousgenera8ons

•  Therearedifferentchoicesfacedbylarge‐establishedprojectsandpersonalexplora8onsofnewtechnologies

NumberofProcessorsIntheTop500

Specula8onsonTheShip

•  Provisioningbythekilogramdiscreteunits–  I/Osurfacetovolumeeffects,flexibletopologies,thecomputeristhe

computer•  Reconfigurablehardwarepor8ngsopware

–  Basedonprogrammingmodelsthatareinherentlyparallelandscaleinvarianttoshiptheproblemtoemula8onnotdiscoveryofconcurrency

•  Internallyselfpoweredexternalpowersources–  Metaboliclogic?Photodriven?Betadecay?Accous8c?

•  Longservicelife8me(100yr+,ZeroM)fewyears+maint–  Massivelyredundantcompu8ngelementsembeddedinstructurally

usefulmaterials?•  Adiaba8clogicdissipatorylogic

–  Ambientenvironment,noinfrastructure