Post on 31-Jan-2017
Swi$inHPC
Inves.ga.onsintheuseofSwi$inHighPerformanceCompu.ng
ThesearenottheSwi.syouarelookingfor!
• h:p://swi$-lang.org/– Asimpletoolforfast,easyscrip.ngonbigmachines
• h:ps://www.swi$.com/– Swi$codeisstandardformatofBankIden.fierCodes
• h:ps://www.boulder.swri.edu/~hal/swi$.html– Asolarsystemintegra.onpackage
• h:p://taylorswi$.com/– DevelopedinPennsylvania–notsureofefficacyinHPC
WhatisSwi$• Swi$hasbeenyearsinthemaking– simplifiedmemorymanagementwithAutoma.cReferenceCoun.ng(ARC)• tracksandmanagesyourapp’smemoryusage.• thismeansthatmemorymanagement“justworks”inSwi$–youdonothavetodothegarbagecollec.ng
• h:ps://developer.apple.com/library/ios/documenta.on/Swi$/Conceptual/Swi$_Programming_Language/Automa.cReferenceCoun.ng.html
– feelsfamiliartoObjec.ve-Cdevelopers– friendlytonewprogrammers
• Itsupportsplaygrounds(thinkofemacs)
Introduc.on• What’sthepointofthistalk?WhyamIhere?• WhatisHPC?• AppleSwi$programminglanguage?– Whyshouldweuseit?– Howcanitbeusednow?– Isitreadyforbig.meuseinHPC
• Bibliography• Ques.onsandpossibleswi$answers
Pointofthistalk
• CanwewriteprogramsinAppleswi$onlinuxclusters(aswecaninFortran,C,C++,Java)?– WeknowwecanuseGCD(GrandCentralDispatch),LLVM,Clang,BLASonApplestuff
• BecauseAppleswi$isopensourcecanwewriteprogramsthatareintrinsicallyparallel?
WhatisHPC• High-performancecompu.ng(HPC)evolvedtomeetthedemandsforprocessingspeed.HPCbringstogetherseveraltechnologiessuchascomputerarchitecture,algorithms,programsandelectronics,andsystemso$wareunderasinglecanopytosolveadvancedproblems/programseffec.vely,reliably,andquickly.AhighlyefficientHPCsystemrequiresahigh-bandwidth,low-latencynetworktoconnectmul.plenodesandclusters.Thetermappliesespeciallytosystemsthatfunc.onaboveateraflop.
CurrentstateofHPC
• Architectures• Models• MPIandOpenMPlibraries,• Concurrencyandthreads
So$wareCrisisinHPC• Asupercomputerapplica.onandso$wareareusuallymuch
morelong-livedthanahardware– Hardwarelifetypicallyfour-fiveyearsatmost.– FortranandCares.llthemainprogrammingmodels
• Complexityisrisingdrama.cally– Challengesforapplica.onsonPetaflopsystems– Improvementofexis.ngcodeswillbecomecomplexandimpossible-TheuseofO(100K)coresimpliesdrama.cop.miza.oneffort
– supportofahundredthreadsinonenodeimpliesnewparalleliza.onstrategies
So$wareCrisisinHPC
• Programmingisstuck– Arguablyhasn’tchangedsomuchsincethe70’ss.llusingFortran,C
• So$wareisamajorcostcomponentofmoderntechnologies.– Thetradi.oninHPCsystemprocurementistoassumethattheso$wareisfree.
• Thereistheneedfornewcommunitycodesandlanguages–AppleSwi$?
Architectures• VectorProcessors-Cray-1• singleinstruc.onmul.pledata(SIMD),ArrayProcessors
– GoodyearMPP,MasPar1&2,TMCCM-2• ParallelVectorProcessors(PVP)
– CrayXMP,YMP,C90NECEarthSimulator,SX-6• MassivelyParallelProcessors(MPP)-CrayT3D,T3E,TMCCM-5,
BlueGene/L• CommodityClusters
–Beowulf-classPC/Linuxclusters–Constella.ons• DistributedSharedMemory(DSM)-SGIOrigin• HPSuperdome• HybridHPCSystems-Roadrunner• ChineseTianhe-1Asystem• GPGPUsystems
ProgrammingModels• MessagePassing(MPI)• SharedMemory(OpenMP)• Par..onedGlobalAddressSpaceProgramming(PGAS)Languages– UPC,CoarrayFortran,Titanium
• NextGenera.onProgrammingLanguagesandModels– Chapel,X10,Fortress
• LanguagesandParadigmforHardwareAccelerators– CUDA,OpenCL
• Hybrid:MPI+OpenMP+CUDA/OpenCL
Ques.ontoMatlabandMaple
• ArethereanyplanstoproduceSWIFTcodefromMatlab/maple?– Wedonotpubliclycommentonfutureproductplans.YouwouldneedtoaskSalesastheyaretheonlypeopleauthorizedtotalkaboutsuchthings.
Bindings
• Swi$is“parallel”similartowhatCandC++arewithoutspecialtools.Youcanusepthreadsordispatch(stablereleaseinSwi$3.0).
• CurrentlythereisnoOpenMporMPIbindingsforSwi$.Buttheseshouldbecomingsoon.
FromOpenMP• Therun.mecanbebuiltwithgcc,iccorclang.However,notethat
arun.mebuiltwithclangcannotbeguaranteedtoworkwithOpenMPcodecompiledbytheothercompilers,sinceclangdoesnotsupporta128-bitfloattype,andcannotthereforegeneratethecodeusedforreduc.onsofthattype
• TheOpenMPrun.meisknowntoworkonARM®architectureprocessorsandPowerPC™processors
• 32and64bitX86processorswhencompiledwithclang,withtheIntelcompilerorwithgcc,andalsotheIntel®XeonPhi™productfamily,whencompiledwiththeIntelcompiler.
• Portstootherarchitecturesandopera.ngsystemsarewelcome.• h:p://openmp.llvm.org/README.txt
CanIdothisinSwi$?voidsimple(intn,float*a,float*b){inti;/*simpleOpenMPcall*/#pragmaompparallelforfor(i=1;i<n;i++)/*iisprivatebydefault*/b[i]=(a[i]+a[i-1])/2.0;}
• Notyetbutyoucandothis:
Wrappers• CreateaSwi.CommandLineU;lityinXcode
– Firstcreateasimplecommandlineu.lityinSwi$usingXcode.• InXcodegotoFile->New->Project
– inthedialogthatappearsselectCommandLineToolandclickNext• ChooseanameforyourprojectandtypeitintheProductNamefieldinthenext
dialog;– makesureSwi$isselectedintheLanguagefieldandletscallitcli_swi'andclickNext
• Inthedialogthatappearsselectaloca.onforyourprojectandclickCreate.– youwillgetaprojectconsis.ngofoneSwi$sourcefile,main.swi$,whichcontains:
importFounda.onprintln("Hello,World!")• commentoutorremovethe"importFounda.on"line,sincewearenotusingany
oftheFounda.onfeatures.• clicktheRunbu:onfromthetoolbar,or
– selec.ngRunfromtheProductmenu,or– bytypingCommand-R.– Wesee"Hello,World!"printedoutintheoutputpane.
Wrappers• CreateasimpleC++sta;cLibraryUsingGCC
– ThiscouldbedoneinXcode..– Thecommandforcompiling/linkingC++codeisg++.(orifyouareusingclang–clang++)
• Choosealoca.onforthelibraryandcdtothatdirectory.• Createtheheaderfileforthelibrary,let'scallitjunk.h,withthefollowingcontent:classA{public:A(int);intgetInt();private:intm_Int;};• Createthelibrary'simplementa.onfile,callitjunk.cpp,withthefollowingcontent:#include"junk.h”A::A(int_i):m_Int(_i){}intA::getInt(){returnm_Int;}• Compilejunk.cpptoobjectfile-junk.o,createalibrary-libjunkcpp.a,thatcontainstheobjectfile$g++-cjunk.cpp$arrlibjunkcpp.ajunk.o
Wrappers–the“glue”• AddC++WrappertotheXcodeProjecttoprovidetheSwi.-C++Interface
– gobacktoXcodeandcreateaC++fileprovidingthe"glue"betweenourC++libraryandcommandlineu.litywri:eninSwi$.
– The"glue"hastobeinC++becauseitwillcallC++librarycodethatwasnotwri:entobecalledfromC.
– The"glue"is,however,wri:eninsuchawaythatitcanbecalledfromCcode,andthingsthatcanbecalledfromCcantypicallybealsocalledfromSwi$.Seethecommentsinthecodebelow.
• Addthesta.cC++librarytotheXcodeproject.WeneedthisinordertobeabletocallthelibraryfromtheC++wrapper("glue").– Addthelibrary'sheaderfile,junk.h,toourprojectbygoingtoFile->AddFilesto
"cli_swi$"...,naviga.ngtothefile,andclickingAdd.• GotoFile->New->File,selectC++fileinthedialogthatappears,andclickNext.• Typewrapper.cppintheNamefield,uncheck"Alsocreateaheaderfile,"andclickNext.• Inthedialogthatappearschoosealoca.onforthenewfileandclickCreate.• IfatthispointXcodesuggeststhatyoucreateabridgingheader,clickYes.
– Otherwisecreateabridgingheadermanuallybygoingtothe"Swi$Compiler-CodeGenera.on"sec.onunderBuildSe�ngsandspecifyaheadernameonthe"Objec.ve-CbridgingHeader"line.
Wrappers–the“glue”• Modifywrapper.cpptohavethefollowingcontent:#include"junk.h"//extern"C"willcausetheC++compiler//(remember,thisiss.llC++code!)to//compilethefunc.oninsuchawaythat//itcanbecalledfromC//(andSwi$).extern"C"intgetIntFromCPP(){//CreateaninstanceofA,definedin//thelibrary,andcallgetInt()onit:returnA(314159).getInt();}
Wrappers• UseC++fromSwi.CodeviatheWrapper• tellSwi$aboutthegetIntFromCPP()method.
– AnyCfunc.onsthatwewanttocallfromSwi$mustbedeclaredinthebridgingheader,soweaddthefollowingtothebridgingheader:
intgetIntFromCPP();• NowwecancallthemethodfromSwi$byaddingthefollowingto
main.swi$:println(”ThevalueofPIwithoutadecimalis\(getIntFromCPP())")• Runtheproject,andtheoutputis:$Hello,World!$ThevalueofPIwithoutadecimalis314159
ConcurrencyinSwi$
• Bydefault,whenyoumakeanapplica.onitrunsthecodeinasingle-threadenvironment,amainthread.Forexample,aniOSapplica.onwouldcalltheapplica.on:didFinishLaunchingWithOp.onsmethodonthemainthread.
• AsimplerexampleisanOSXCommandLineToolapplica.on.Ithasonlyonefile:main.swi$.Whenyoustartit,thesystemcreatesasinglemainthreadandrunsallthecodeinthemain.swi$fileonthatthread.
• Fortes.ngcode,playgroundsarethebest.Bydefault,playgroundsstopa$erexecu.ngthelastlineofcodeanddon'twaitfortheconcurrentcodetofinishexecu.ng.Wecanchangethisbehaviorbytellingtheplaygroundstokeeprunningindefinitely.Todothat,includethesetwolinesintheplaygroundfile:
• importXCPlayground• XCPSetExecu.onShouldCon.nueIndefinitely()
Threads
• Athreadisthemostlow-levelApplica.onprograminterface(API).
• Alltheconcurrencyisbuiltontopofthreadsandrunsmul.plethreads.
• WecanuseNSThreadfromtheFounda.onframework.– Thesimplestwaytodothisistocreateanewclasswithamethodthatwillbethestar.ngpointforournewthread.
Threads(untangling)
• Youcancreateanewthreadintwoways,– UsedetachNewThreadSelectorfunc.on– orcreateaninstanceofNSThreadandusethestartfunc.on.Wehavetomarkourrunfunc.onwiththe@objca:ributebecauseweuseitasaselectorwhencrea.ngathread,andNSThreadisanObjec.ve-Cclassthatusesdynamicdispatchformethodcalling.
Solu.onforThreads
• Insteadofsolvingourini.altaskthatwewantedtorunconcurrently,nowwearespending.memanagingthecomplexityofthatconcurrentexecu.onsystem.Fortunatelywedon'tneedtodothatasthereisasolu.on:Don'tusethreads.
• Butwhatcanweuse?– GCD(GrandCentralDispatch)isahigh-levelAPIthatisbuiltontopofthreads
– performsallaspectsofthreadmanagementforyou
GrandCentralDispatch(GCD)• Queues– Mainqueue
• letmainQueue=dispatch_get_main_queue()– Concurrentqueue–concurrentandownsqueues
• funcdispatch_get_global_queue(iden.fier:Int,flags:UInt)->dispatch_queue_t!
– Serial• letserialQ=dispatch_queue_create("my-s",DISPATCH_QUEUE_SERIAL)
• Tasks– Blocksofcode
• Addingtaskstoqueues
AddingTaskstoQueues
• Wehaveaqueueandtaskthatwewanttorun.– Torunataskonapar.cularqueue,dispatchit.– Twoways:
• Synchronous:dispatch_sync– dispatch_sync(queue:dispatch_queue_t,_block:dispatch_block_t)
• Asynchronous:dispatch_async– dispatch_async(queue:dispatch_queue_t,_block:dispatch_block_t)
SynchronousDispatchSynchronousdispatchsubmitsataskforexecu.onandwaitsun.lthetaskisdone.
dispatch_sync(queue){...}When you use a concurrent queue and dispatch a task to itsynchronously, thequeue can runmany tasks at the same.me,butthe dispatch_sync method waits un.l the task you submi:ed isfinished.letqueue=dispatch_get_global_queue(QOS_CLASS_BACKGROUND,0)dispatch_sync(queue){print("Task1")}print("1Done”)dispatch_sync(queue){print("Task2")}print("2Done”)
MoreonDispatch
• Nevercallthedispatch_syncfunc.onfromataskthatisexecu.nginthesamequeue.Thiswouldcauseadeadlockfortheserialqueueandshouldbeavoidedforconcurrentqueuesaswell.
dispatch_sync(queue){dispatch_sync(queue){print("Nevercalled")//Don'tdothis}}
ConcurrencyGuide
• GCDalsohassomepowerfultoolsforsynchronizingsubmi:edtasks.
• ReadtheConcurrencyProgrammingGuidear.cleintheApplelibrarydocumenta.onath:ps://developer.apple.com/library/ios/documenta.on/General/Conceptual/ConcurrencyProgrammingGuide
Timings
• Thereare,atleastsixwaystomeasureelapsed.meinaSwi$program.– NSDate– CFAbsoluteTime– NSProcessInfo.systemUp.me– mach_absolute_.mewithmach_.mebase_info– clock()inPosixStandard– .mes()inPosixStandard
• OryoucanletInstrumentsdotheheavyli$ing
TimingExamplesNSDatecomple.onblock
publicclassfuncsecElapsed(comple.on:()->Void){letstartDate:NSDate=NSDate()comple.on()letendDate:NSDate=NSDate()let.meInterval:Double=endDate..meIntervalSinceDate(startDate)println("seconds:\(.meInterval)")}
Performance–Howdoweknowwearedoingwell
• Ithastobefast– Stopwatch–.mingfunc.ons– metrics–Amdahl’slaw,Gusta$son-Barsis’slaw,Karp-Fla:metric,Isoefficiencyrela.on
– Writequalitycode–ithastobesolidandflexible– Sweatthesmallstuff–arraysorsets?,stringorint?– Don’top.mizeupfront
• InthewordsofDonaldKnuth:“Prematureop>miza>onistherootofallevil"
SomemorePerformance• “Sco:yweneedmorepower”!(50thanniversaryoftheoriginalStarTrek
series)– WhatdowedowhenSco:yreplies“I’mgivingyouallshe’sgot”!• Wecanaddmoreserversbutthisjustdelaysforsome.me
• Orwecanremovetheissuethatcausestheperformanceproblem.– Forthat,weneedtoiden.fytheproblem,theslowpieceofthecode,andimproveit.
• Thekeymetricsthatimpactanapplica.on’sperformance:– Opera.ons'performancespeed– Memoryusage– Ter.aryusage
PerformanceIssues
• Firstlydon'top.mizeupfront,andsecondly,measurefirst.
• Measurethecode'sperformancecharacteris.csandop.mizeonlythosepartsthatareslow.
PerformanceAnalysisFormulas
• Amdahl’sLaw– Letfbethefrac.onofopera.onsinacomputa.onthatmustbeperformed
sequen.ally,where0≤f≥1.ThemaximumspeedupΨachievablebyaparallelcomputerwithpprocessorsperformingthecomputa.onis: Ψ≤1⁄(f+(1-f)/p)
PerformanceAnalysisFormulas
• Gustafson-Barsis’sLaw– Givenaparallelprogramsolvingaproblemofsizenusingpprocessors,letsdenotethefrac.onoftotalexecu.on.mespentinserialcode.ThemaximumspeedupΨachievablebythisprogramis:
Ψ≤p+(1-p)s
– CheckoutanotherpossiblespeedupbyGustafson• TheEndofError:UnumCompu.ng-anewapproachtocomputerarithme.c:theuniversalnumber(unum)
PerformanceAnalysisFormulas
• Karp-Fla:Metric– Givenaparallelcomputa.onexhibi.ngspeedupΨonpprocessors,wherep>1,theexperimentallydeterminedserialfrac.oneisdefinedtobe:e=(1⁄Ψ−1⁄p)⁄(1−1⁄p)
PerformanceAnalysisFormulas• IsoefficiencyRela.on– Supposeaparallelsystemexhibitsefficiencyε(n,p)wherenproblemsizeandpdenotesthenumberofprocessors.DefineC=ε(n,p)⁄(1-ε(n,p)).LetT(n,1)denotethesequen.alexecu.on.me,andletT0(n,p)denoteparalleloverhead(total.mespentbyallprocessorsperformingcommunica.onsandredundantcomputa.ons).Inordertomaintainthesamelevelofefficiencyasthenumberofprocessorsincreases,problemsizemustbeincreasedsothatthefollowinginequalityissa.sfied:
T(n,1)≥CT0(n,p)
Debugging
• Performancemeasuringinunittests– Whenyoucreateanewproject,XcodecreatesaunittesttargetforthatprojectwiththenameProjectName+Tests.Youcanreadabouttes.nginXcodeat:h:ps://developer.apple.com/library/ios/documenta.on/DeveloperTools/Conceptual/tes.ng_with_xcode.
Debugging–REPL• REPLstandsforread-eval-print-loop.– Swi$REPLisaninterac.veSwi$codeinterpreterthatexecutescodeimmediately.TolaunchSwi$REPL,openTerminal.appandexecutethiscommand:$xcrunswi.
• Ifthereisanerrortheprogramwillstopandyoucanexamineandcon.nuethecodeatthatpoint.
• Wri.ngcodeintheREPLconsoleisnotasconvenientasinthemodernXcodeIDE
• Applehasbuiltmorepowerfultools,– suchasPlaygrounds,whichhasanicesourcecodeeditorandflexibilityofSwi$REPL.
LLDB• LLDBisahigh-performancecommand-linedebugger.ItisalsoavailableinXcode.Theeasiestwaytostartitistosetabreakpointandruntheapplica.on.IntheXcodedebugareaview,youwillfindaconsoleinwhichyoucanexecuteLLDBcommands.Toprintthecontentofavariable,wecanusethepLLDBcommand.Justrunpwiththevariablename.
• LLDBisaverypowerfuldebugger.YoucanreadmoreabouttheLLDBdebuggerath:ps://developer.apple.com/library/ios/documenta.on/IDEs/Conceptual/gdb_to_lldb_transi.on_guide.
Conclusion
• Youcandoparallelcompu.ngwithSwi$(albeitinalimitedway)– Usethreads– Usewrappers
• Powerfuldebugger(LLDB)andeditorinREPL• Profiler–Instruments
Bibliography• h:ps://www.safaribooksonline.com/library/view/swi$-high-performance/
• h:ps://developer.apple.com/library/prerelease/content/documenta.on/Swi$/Conceptual/Swi$_Programming_Language/index.html#//apple_ref/doc/uid/TP40014097-CH3-ID0
• h:ps://developer.apple.com/library/ios/documenta.on/Cocoa/Conceptual/Mul.threading/ThreadSafety/ThreadSafety.html
• h:ps://developer.apple.com/videos/play/wwdc2016/720/-concurrentprogramminginSwi$3
Somewordstoleavewithyou
Etfactumest,fiericito(WelldoneisSwi$lydone) CaesarAugustus
�