CnC: A Dependence Programming Model
Transcript of CnC: A Dependence Programming Model
CnC:ADependenceProgrammingModel
Mo@va@onforthetalk(notmo@va@onforCnC)
• SomeonehearsaboutCnC– Tolearnmoretheycheckintoanimplementa@on– Itwasn’tdesignedforwhattheywant– Theywalkaway
• Ourmarke@ngdepartmentneedstomakesomechangestohowwepresent/write/...aboutCnC
• Here’sapossibledifferentslant• YouknowmostofthisbutI’mlookingfor
– Feedback/discussionontheproblem– Feedback/discussiononthestoryhere– Recommenda@onsforchangingthepresenta@on
• Thisseemsliketherightgroup
Performance
Eigensolver
UsedonDARPA:UHPCprojectatIntelDOE:X-StackprojectatIntelDreamworks:
“HowtotrainyourdragonII”
Intel 2-socket x 4-core Nehalem @ 2.8 GHz + Intel® Math Kernel Libraries 10.2
Mul@threadedIntel®MKL
Baseline
ConcurrentCollec@ons
Intel’s MKL team
Aparna Chandramolishwaren One GaTech first year grad student intern at Intel
Cholesky
Intel 2-socket x 4-core Nehalem @ 2.8 GHz + Intel MKL 10.2
Plasma: Jack Dongarra’s group at Oak Ridge NL
Aparna Chandramolishwaren One GaTech first year grad student intern at Intel
Outline
1. Mo@va@on2. CnC3. Widearrayofexis@ngCnCtuningapproaches
Outline
1. Mo@va@on2. CnC3. Widearrayofexis@ngCnCtuningapproaches
Stylesoflanguages/models• SeriallanguagesandParallellanguages
– Orderings:• Required/tuning/arbitrary• Makeamodifica@on:mustdis@nguishamongthese
– Takes@me– Leadstoerrors
• Toofewdependences:errors• Toomanydependences:looseperformance
• Dependencelanguages– Orderings:
• Requiredonly• Notuningorderings• Noarbitraryorderings
– Makeamodifica@on=>theorderingrequirementsareclear• No@me• Noerrors
• CnCisadependenceprogrammingmodel• Thedependencesincludedataandcontrol
dependences- Bothareexplicit- Theyareatthesamelevel- Theyaredis@nct
Somerelevantcurrentapproaches
• Startwithanexplicitlyserialoranexplicitlyparallelprogram.Modifyitforsomespecifictargetorspecificop@miza@ongoal
Havetoundoortripoverearliertunings
• Startwithsourcecodeandautoma@callyuncovertheparallelism
We’vebeenthere
• StartwiththewhiteboarddrawingIt’showwethinkaboutourappsDependencesareexplicitbut:notexecutable
9
Anactualwhiteboardsketch:LULESHashockhydro-dynamicsapp
ACnCgraphofflowofdataforLULESH
UnlikeanactualwhiteboardsketchCnCisaformalmodel
• Itisprecise– Itisexecutable
• Hasenoughinforma@on– Toproveproper@esabouttheprogram– Toanalyzetheprogram– Toop1mizetheprogram
• Atthegraphlevel– Withoutaccesstocomputa@oncode
Outline
1. Mo@va@on2. CnC– Goalsthatsupportexascaleneeds– CnCdetailsviaanexampleapp
3. Widearrayofexis@ngCnCtuningapproaches
Needsforexascale
So5wareengineering• Separa@onofvariousac@vi@es
(waysofthinking)– Thedetailsofthecomputa@on
withinthecomputa@onsteps– Thedependencesamongsteps– Therun@me– Thetuning
• Leadstoimprovedreuse• Domainexpertdoesn’tneed
toknowabout– Architectureissues– Tuninggoals
Tuning• Givenonlydependences• Hidesirrelevantlowlevel
computa@ondetails• Eachnewtuningstartsfrom
justdependences• Useanystyleoftuning
Justobeythedependences• Computerscien@stdoesn’t
needtoknowabout– Physics,economics,
biochemistry
Tuningexpert(computerscien@st).
Domainexpert(physicist,economist,biochemist,…)
SameordifferentpeopleDifferentac@vity
CommunicateatthelevelofthegraphNotaboutphysicsoraboutparallelperformance
Tuning
• Tuningconsumesbulkofthe@meandenergy• ThereisnoCnC-specificapproachtotuning
– CnCisnotaparallelprogrammingmodel• Onlyonerequirement:
– thetunedappmustobeydomainspecseman@cs• Singledomainspec/Awiderangeoftuninggoals&styles
– Faultycomponents– Dis@ncttuninggoals(@me,memory,energy,…)– Manydifferentstylesofrun@mes(sta@c/dynamic)– Manydifferentstylesoftuning(sta@c/dynamic)
• Domainspecisn’tmodifiedbytunings– Tuningsmayrefertodomainspecbutareisolatedfromit
CnCsimplifiestuningbysepara@onofconcerns
• Separatesthedetailsofthecomputa@onsanddatafromthedependencesamongthem
• Domainspecisisolatedfromtuning– Itexplicitlyrepresentstheorderingrequirements– Noorderingsfortuning&noarbitraryorderings
Outline
1. Mo@va@on2. CnC– Goalsthatsupportexascaleneeds– CnCdetailsviaanexampleapp
3. Widearrayofexis@ngCnCtuningapproaches
Choleskyfactoriza@on
17
Cholesky
Trisolve Update
Choleskyfactoriza@on
18
Cholesky
Trisolve Update
Choleskyfactoriza@on
19
Cholesky
Trisolve Update
Choleskyfactoriza@on
20
Cholesky
Trisolve Update
Cholesky
Trisolve Update
Choleskyfactoriza@on
21
Cholesky
Trisolve Update
Choleskyfactoriza@on
22
Cholesky
Trisolve Update
CholeskyCnCdomainspec1:whiteboarddrawing
(C:iter)
(T::iter,row)
(U:iter,row,col)
[C:iter]
[T:iter,row] [U:iter,row,col]
Computa@onstepcollec@on
Computa@onstepcollec@on
Computa@onstepcollec@onDataitemcollec@on
Dataitemcollec@on Dataitemcollec@on
Collec@onsarejustsetsofinstances
CholeskyCnCdomainspec1:whiteboarddrawing
(C:iter)
(T::iter,row)
(U:iter,row,col)
[C:iter]
[T:iter,row] [U:iter,row,col]
Bydefault:dataitemsaredynamicsingleassignment
Bydefault:computa@onstepsaredeterminis@c
CholeskyCnCdomainspec2:I/O
(C:iter)
(T::iter,row)
(U:iter,row,col)
[C:iter]
[T:iter,row] [U:iter,row,col]
CholeskyCnCdomainspec3:Dis@nguishinstances
(C:iter)
(T:iter,row)
(U:iter,row,col)
[C:iter]
[T:iter,row] [U:iter,row,col]
Tagsarejustiden@fiersRequireanequalityoperatorOrenintegers:itera@on,row,col
CholeskyCnCdomainspec3:Dis@nguishinstances
(C:iter)
(T:iter,row)
(U:iter,row,col)
[C:iter]
[T:iter,row] [U:iter,row,col]
Tagsarejustiden@fiersRequireanequalityoperatorOrenintegers:itera@on,row,colButnotnecessarily:
representa@onofaSudokuboard
CholeskyCnCdomainspec4:exactsetofinstances(control)
(C:iter)
(T:iter,row)
(U:iter,row,col)
[C:iter]
[T:iter,row]
<I:iter>
<IR:iter,row>
<IRC:iter,row,col>
[U:iter,row,col]
Controltagcollec@on
Controltagcollec@on
Controltagcollec@on
CholeskyCnCdomainspec4:exactsetofinstances(control)
(C:iter)
(T:iter,row)
(U:iter,row,col)
[C:iter]
[T:iter,row]
<IR:iter,row>
<IRC:iter,row,col>
[U:iter,row,col]
Computa@onstepinstance:In/compute/out/doneNocyclicdependencesonasingleinstance
<I:iter>
CholeskyCnCdomainspec4:exactsetofinstances(control)
(C:iter)
(T:iter,row)
(U:iter,row,col)
[C:iter]
[T:iter,row]
<I:iter>
<IR:iter,row>
<IRC:iter,row,col>
[U:iter,row,col]
Controldependence
Datadependence
CholeskyCnCdomainspecOp@onal:dependencesfunc@ons
(Citer)
(T:iter,row)
(U:iter,row,col)
[C:iter]
[T:iter,row]
<I:iter>
<IR:iter,row>
<IRC:iter,row,col>
[U:iter,row,col]
(C:iter)consumes[U:iter-1,iter,iter]
(C:iter)produces[C:iter]
Seman@csofCnC:specifiesapar@alorderofexecu@on
Item
avail
stepcontrolReady
stepready
stepdataReady
stepexecuted
tagavail
Outline
1. Mo@va@on2. CnC3. Widearrayofexis@ngCnCtuningapproaches
Tuning
Tuningstartswiththedomainspec• Justtherequiredorderings
Widerangeoftuningapproaches• Exis@ngsta@ctunings• Exis@ngdynamictunings• Possiblefuturetunings
Sta@ctuning
• Totallysta@cscheduleacross@meandspace• Automa@cconversionofelement-leveldataand
computa@onto@ledversions• Polyhedral@ling• Distribu@onfunc@onsacrossnodesinacluster
(dynamicwithineachnode)– Fordataitemsor– Forcomputa@onsteps
• MinimizeRToverheadbyelimina@ngredundantauributepropaga@on
• PIPESfordistributedapps(Sta@canddynamic)
CarlOffnerAlexNelson
AlinaSbirlea
AlinaSbirleaJunShirako
FrankSchlimbach
ZoranBudimlićKathKnobe
TiagoCogumbreiro
VivekSarkar
Mar@nCongYuhanPeng
Dynamictuning• Run@mesbasedonOCR,TBB,BabelandHaskell
• Basic:determineswhenastepisready– TracksstatechangesandexecutesREADYsteps
• Workflowcoordina@on• ChoiceamongCPU,GPU,FPGA
– Dynamiccompromise:Sta@csteppreferenceanddynamicdeviceavailability
• Hierarchicalaffinitygroupsforlocality• Memoryusage:
Dynamicsingleassignmentishigherlevel:Iden@fiesvalues,notplacesMappingdatavaluestomemoryistuning
– Garbagecollec@on:Get-count– Inspector/executor
RyanNewton
ShamsImam
NickVrviloZoranBudimlićVivekSarkar
FrankSchlimbach
ZoranBudimlićMikeBurke
SanjayChauerjeeKathKnobeNickVrvilo
AlinaSbirleaZoranBudimlić
RiceteamFrankSchlimbach
DragosSbirlea
FrankSchlimbach
FrankSchlimbachYvesVandriessche
VivekSarkar
Possiblefuturetunings• Out-of-corecomputa@ons– Basedonhierarchicalaffinitygroups
• Checkpoint-con@nue– Automa@ccon@nuousasynchronouscheckpoin@ng
• Collabora@verun@mes– underdevelopment
• Inspector/executor– Forcomputa@on
• Demand-drivenexecu@on• Specula@veexecu@on
ZoranBudimlić
KathKnobe
Isola@on:Tuningfromdomainspec
• Cri@calforsorwareengineeringforexascale• Cri@calforeaseoftuning• Justdataandcontroldependences– Noarbitraryconstraints– Exceptforgrain…
• CnCdomainspecistotallyflexiblewrttuning:– Moreapproachesinmindnow– Newarchitectures– Newtuninggoals– Newtuningapproaches
Tuninggrainsize:Hierarchy
• Eachcomputa@onstepanddataitemcanbedecomposed
• Hierarchyaddsmoretuningflexibility– Choosethebesthierarchy – Choosethebestgrain– Differenttuningscanbeappliedatdifferentlevelsinthehierarchy
KathKnobeNickVrvilo
Conclusion
• CnCisadependencelanguage– Controlanddatadependences– Explicit,dis@nctandatthesamelevel
• Tuningisseparate– NoCnC-specifictuning
• Thisisola@onmakestuningeasierandmoreflexible
DARPA:UHPCDOE:X-StackCambridgeResearchLabDEC/Compaq/HP/Intel
CarlOffner,AlexNelsonIntel
FrankSchlimbach,JamesBrodman,MarkHampton,GeoffLowney,VincentCave,KamalSharma,X-StackTGteam
RiceVivekSarkar,ZoranBudimlić,MikeBurke,SanjayChauerjee,NickVrvilo,Mar@nKong,TiagoCogumbreiro
ReservoirRichLethinBenoitMeister
UCIrvineAparnaChandramowlishwaran(Intelintern)
GoogleAlinaSbirlea(Rice)DragosSbirlea(Rice)
UCSDLauraCarringtonPietroCicox
GaTechRichVuducHasnainMandviwala(Inelintern)KishoreRamachandran
IndianaRyanNewton(Intel)
PurdueMilindKulkarniChenyangLiu
PNNLJohnFeoEllenPorter
FacebookNicolasVasilache(Reservoir)
MicronKyleWheeler(xxNL)
DreamworksMar@nWau
TwoSigmaShamsImam(Rice)SagnakTasirlar(Rice
Thanksto…
• IntelCnConIntel’sWhatIfsite
hup://sorware.intel.com/en-us/ar@cles/intel-concurrent-collec@ons-for-cc
Availableopensourcehups://icnc.github.io
• Ricehups://wiki.rice.edu/confluence/display/HABANERO/CNC
• DiscussCnCrelatedtopicsNewapps,op@miza@ons,run@me,tuning,…[email protected],[email protected],[email protected]
• CnC’16workshopSept27-28,2016Co-locatedwithLCPC’16inRochester,NYhups://cncworkshop2016.github.io/
Thankyou!