The Path from Petascale to Exascale Hardware and ...
Transcript of The Path from Petascale to Exascale Hardware and ...
ThePathfromPetascaletoExascaleHardwareandApplica8onsIssues
RickStevensArgonneNa8onalLaboratory
UniversityofChicago
Supercompu8ng&CloudCompu8ng
• Twodominantmacroarchitecturesdominatelarge‐scale(inten8onal)compu8nginfrastructures(vsembedded&adhoc)
• Supercompu8ngtypeStructures– Large‐scaleintegratedcoherentsystems– Managedforhighu8liza8onandefficiency
• EmergingcloudtypeStructures– Large‐scalelooselycoupled,lightlyintegrated– Managedforavailability,throughput,reliability
TopPinchPoints
• PowerConsump8on– Proc/mem,I/O,op8cal,memory,delivery
• Chip‐to‐ChipInterfaceScaling(pin/wirecount)• Package‐to‐PackageInterfaces(op8cs)• FaultTolerance(FITratesandFaultManagement)– Reliabilityofirregularlogic,designprac8ce
• CostPressureinOp8csandMemory
ProgrammingModels:TwentyYearsandCoun8ng
• Inlarge‐scalescien8ficcompu8ngtodayessen8allyallcodesaremessagepassingbased(CSPandSPMD)
• Mul8coreischallengingthesequen8alpartofCSPbuttherehasnotemergedadominatemodeltoaugmentmessagepassing
QuasiMainstreamProgrammingModels
• C,Fortran,C++andMPI• OpenMP,pthreads• CUDA,RapidMind• ClearspeedsCn• PGAS(UPC,CAF,Titanium)• HPCSLanguages(Chapel,Fortress,X10)• HPCResearchLanguagesandRun8me• HLL(ParallelMatlab,GridMathema8ca,etc.)
NERSC2007RankAbundance
0
0.2
0.4
0.6
0.8
1
1.21 11
21
31
41
51
61
71
81
91
101
111
121
131
141
151
161
171
181
191
201
211
221
231
241
251
261
271
281
291
301
311
321
331
341
351
361
Series1
Top6use20%
Top17use40%
Top40use60%
Top85use80%
<100groupsusetheMajorityoftheCycles
MillionWayConcurrencyToday
• Lijle’slawdrivenneedforconcurrency– Tocoverlatencyinmemorypath– Func8onofaggregatememorybandwidthandclockspeed– Independentoftechnologyandarchitecturetofirstorder
• MainstreamCPUs(e.g.x86,PPC,SPARC)– 8‐16cores,4‐8hardwarethreadspercore,– Totalsystemwith103–105nodes=>32K–12Mthreads– BG/Pexampleat1PF72x4K=300,000(buteachthreadhastodo4ops/clock)=>1.2Mopsperclock
• GPUbasedcluster(e.g.1000Tesla1Unodes)– 3x128coresx(32‐96)threadspercorex1000nodes=12M–36Mthreads
Exis8ngBodyofParallelSopware
• Howmanyexis8ngHPCscienceandengineeringcodesscalebeyond1000processors?– Myes8mateisthatitislessthan1000worldwide– TopusersatNERSC,OLCFandALCF<200groups– ItappearslikelythatthebulkofcyclesonTop500areusedincapacitymodewiththeexcep8onofasiteswithpoliciesthatenforcecapabilityruns
• Howquicklyarenewcodesbeinggenerated?– Abini8odevelopment– Migra8onandpor8ngfrompreviousgenera8ons
• Therearedifferentchoicesfacedbylarge‐establishedprojectsandpersonalexplora8onsofnewtechnologies
Specula8onsonTheShip
• Provisioningbythekilogramdiscreteunits– I/Osurfacetovolumeeffects,flexibletopologies,thecomputeristhe
computer• Reconfigurablehardwarepor8ngsopware
– Basedonprogrammingmodelsthatareinherentlyparallelandscaleinvarianttoshiptheproblemtoemula8onnotdiscoveryofconcurrency
• Internallyselfpoweredexternalpowersources– Metaboliclogic?Photodriven?Betadecay?Accous8c?
• Longservicelife8me(100yr+,ZeroM)fewyears+maint– Massivelyredundantcompu8ngelementsembeddedinstructurally
usefulmaterials?• Adiaba8clogicdissipatorylogic
– Ambientenvironment,noinfrastructure