Evaluating the Suitability of Commercial Clouds for NASA’s ...€¦ · HECC workload on the...

NAS Technical Report NAS-2018-01

May 2018

Evaluating the Suitability of Commercial Clouds for NASA’s High Performance Computing Applications: A Trade Study S. Chang1, R. Hood1, H. Jin2, S. Heistand1, J. Chang1, S. Cheung1, J. Djomehri1, G. Jost3, D. Kokron1

NASA Advanced Supercomputing Division, NASA Ames Research Center

Executive Summary NASA’sHigh-EndComputingCapability(HECC)Projectisperiodicallyaskedifitcouldbemorecosteffectivethroughtheuseofcommercialcloudresources.Toanswerthequestion,HECC’sApplicationPerformanceandProductivity(APP)teamundertookaperformanceandcostevaluationcomparingthreedomains:twocommercialcloudproviders,AmazonandPenguin,andHECC’sin-houseresources—thePleiadesandElectrasystems.

Inthestudy,theAPPteamusedacombinationoftheNASParallelBenchmarks(NPB)andsixfullapplicationsfromNASA’sworkloadonPleiadesandElectratocompareperformanceofnodesbasedonthreedifferentgenerationsofIntelXeonprocessors—Haswell,Broadwell,andSkylake.Becauseofexportcontrollimitations,themostheavilyusedapplicationsonPleiadesandElectracouldnotbeusedinthecloud;therefore,onlyoneoftheapplications,OpenFOAM,representsworkfromtheAeronauticsResearchMissionDirectorateandtheHumanandExplorationMissionDirectorate.TheotherfiveapplicationsarefromtheScienceMissionDirectorate.

Inadditiontogatheringperformanceinformation,theAPPteamalsocalculatedcostsasofMay2018fortheruns.InthecaseofworkdoneonPleiadesandElectra,itusedthe“fullcost”ofrunning,basedontheProject’stotalannualbudget(includingallhardware,software,power,maintenance,staff,andfacilitycosts,etc.).Inthecaseofthecommercialcloudproviders,theteamcalculatedonlythecomputecostofeachrun,usingpublishedratesandincludingpublicly-knowndiscountsasappropriate.Otherinfrastructurecostsofrunninginthecloud—suchasnear-termstorage,deeparchivestorage,networkbandwidth,softwarelicensing,andstaffingcostsforsecurityandsecuritymonitoring,applicationportingandsupport,problemtrackingandresolution,programmanagementandsupport,andsustainingengineering—werenotconsideredinthisstudy.These“fullcloudcosts”arelikelysignificant.

Whilehardwaredifferencesacrossthethreedomainsmakeitdifficulttocompareperformanceinanapples-to-applesmanner,theAPPteamcanmakesomegeneralobservationsnonetheless.AllrunsonHECCresourceswerefaster,andsometimessignificantlyfaster,thanrunsonthemostcloselymatchingAmazonWebServices(AWS)resources.ThelargestdifferencesaremostlikelyduetoAmazon’slackofatruehigh-performanceprocessorinterconnect,whichprovidesacommunicationfabricbetweencoresondifferentcomputenodes.

ForsomeoftheNPBs,PenguinOn-Demand(POD)resourcesyieldedfasterruns;forothersithadslowerrunsthanHECC.Differencesintheprocessorinterconnectsandcommunicationlibraryarelikelythecausesoftheperformancedifferences.Forthe

1AnemployeeofCSRALLC,aGeneralDynamicsInformationTechnologycompany.2Pointofcontact;pleasedirectanycorrespondenceabouttheworktohaoqiang.jin@nasa.gov.3AnemployeeofSupersmith.

https://ntrs.nasa.gov/search.jsp?R=20190002611 2020-07-10T01:05:09+00:00Z

NAS 2018-01—HECC Cloud Trade Study May 2018 2

applicationbenchmarks,performanceofresourcesatPenguinalwayslaggedbehindsimilarresourcesatHECC.

Inallcases,thefullcostofrunningonHECCresourceswaslessthanthelowest-possiblecompute-onlycostofrunningonAWS.TorunthefullsetoftheNPBs,AWSwas5.8–12timesmoreexpensivethanHECC,dependingontheprocessortypeused.Forthefull-sizedapplications,AWSwasinthebestcase1.9timesmoreexpensive.

TheNPBrunsatPODwere4.7timesmoreexpensivethanequivalentrunsatHECC.Thefull-sizedapplicationswere5.3timesmoreexpensive.

Basedonthisanalysisofcurrentperformanceandcostdata,thisstudyfinds:

Finding 1: Tightly-coupled, multi-node applications from the NASA workload take somewhat more time when run on cloud-based nodes connected with HPC-level interconnects; they take significantly more time when run on cloud-based nodes that use conventional, Ethernet-based interconnects.

Finding 2: The per-hour full cost of HECC resources is cheaper than the (compute-only) spot price of similar resources at AWS and significantly cheaper than the (compute-only) price of similar resources at POD.

Finding 3: Commercial clouds do not offer a viable, cost-effective approach for replacing in-house HPC resources at NASA.

WhilethestudyshowsconclusivelythatitwouldnotbecosteffectivetoruntheentireHECCworkloadonthecommercialcloud,theremaybecaseswherecloudresourceswouldprovetobeacost-effectivesupplementtoHECCresources.Forexample,itmaymakeeconomicsensetousethecloudtoprovideaccesstoresourcesthatwouldbeunderutilizedatHECC,suchasGPU-acceleratednodesusedfordataanalytics,physics-basedmodelingandsimulation,ormachine/deeplearning.ItalsomaymakesensetooffloadsomeoftheHECCworkloadthatwouldrunreasonablywelloncloudresources—namelysingle-nodejobs—inordertoreducewaittimesforremainingHECCjobs.Inthiscase,theeconomicargumentforrunningsmalljobsinthecloudisbasedontheopportunitycostofkeepingthosejobsatHECC.Thisanalysisleadstothefourthfinding:

Finding 4: Commercial clouds provide a variety of resources not available at HECC. Several use cases, such as machine learning, were identified that may be cost effective to run there.

Asaresultofthisstudy,theAPPteamhasidentifiedandstartedworkonthreeactions:

Action 1: Get a better understanding of the potential benefits and costs that might accrue from running a portion of the HECC workload in the cloud. This has two parts: • Determine the scheduling impact to large jobs of running 1-node jobs in house. • Understand the performance characteristics of jobs that might be run on the cloud.

Action 2: Define a comprehensive model that allows accurate comparisons of cost between HECC in-house jobs and jobs running in the cloud.

Action 3: Prepare for a broadening of services offered by HECC to include a portion of its workload running on commercial cloud resources. This has two parts: • Conduct a pilot study, providing some HECC users with cloud access. • Develop an approach where HECC can act as a broker for HPC use of cloud

resources, including user support and an integrated accounting model.


1. Introduction

Background Afteryearsofdevelopment,technologiesforcloudcomputinghavebecomemature,andcloudsarebeingusedinavarietyofproblemdomains,includingphysics-basedsimulations.TherearecurrentlymanycommercialcloudprovidersthathavethepotentialtohostHighPerformanceComputing(HPC)workloads,includingAmazonWebServices(AWS),PenguinOn-Demand(POD),MicrosoftAzure,andGoogleCloud.Inaddition,severalothervendorssuchasRescalehavebegunpackagingandresellingcloudresources.Tomaketheirofferingsattractive,theresellersusuallyprovideauserinterfaceforsubmittingjobsthatsimplifiesthechoiceofdestinationforajob,includingthepossibilityofusingin-housecomputingresources.

ThecloudprovidersandresellershavebeenaggressiveaboutmarketingtheirproductstoHPCconsumers,usuallywithanalysesofhowmuchmoneycanbesavedbymovingHPCworkloadstothecloud.TheymakeacompellingcaseforcostefficiencyinsituationswhereHPCdemandishighlyvariable.InthecaseofanHPCfacilitywithauniformlyhighutilizationrate,however,theirargumentislessclearandisworthyofinvestigation.

TheDepartmentofEnergyundertooksuchaninvestigationin2011withtheirMagellanProject[1].Theircomprehensivestudyfoundthatwhilecloudshavecertainfeaturesthatareattractiveforusecasesrequiringon-demandaccesstocomputingresources,theperformanceofcloudresourceswasseverelylackingfortypicalHPCworkloadswithmoderatetohighlevelsofcommunicationorI/O.Theyconclude:“Ourdetailedperformanceanalysis,usecasestudies,andcostanalysisshowsthatDOEHPCcentersaresignificantlymorecost-effectivethanpubliccloudsformanyscientificworkloads,butthisanalysisshouldbereevaluatedforotherusecasesandworkloads.”

TheHigh-EndComputingCapability(HECC)project’sApplicationPerformanceandProductivity(APP)teamhasperiodicallyevaluatedtheclaimsofcloudproviderstobeaneffectivesubstituteforHPCforNASA.In2011,APPcomparedtheperformanceofsimilarcomputationalresourcesfromtheNebulacloudatNASAAmes,AWS,andHECC’sPleiadessystem.Thestudy[2,3,4]usedavarietyofbenchmarks,includingfullapplicationsfromthePleiadesworkload,andestablishedthattheperformanceofNebulaforHPCapplicationswasworsethanthatoftheHPCinstancesofAmazonEC2,whichinturnwasworsethanthatofPleiades,particularlyathighercorecounts.

AmajorcontributortothepoorperformanceofNebulawasvirtualizationofhardwareresources.Inaddition,performanceonbothcloudplatformssufferedfromrudimentaryprocessorinterconnectsthat,whilecosteffectiveforconventionalITapplications,arenotsufficientforNASA’sHPCworkload.90%ormoreofthatworkloadisfromphysics-basedModelingandSimulation(M&S)applicationsthatrunonmorethanonenode.Byspreadingouttheworktocoresonmultiplenodes—potentiallyathousandormore—thecomputationallyintensiveapplicationscanbespeduptoruninareasonableamountoftime.Suchmulti-nodeapplicationstypicallyneedtoexchangedatabetweencoresrunningondifferentnodesanduseaMessagePassingInterface(MPI)libraryforthatpurpose[5].MostoftheHPCapplicationsrunningatNASAexhibit“tightly-coupled”communicationpatterns,meaningthatthereisasignificantamountofcommunicationinterleavedwiththecomputationalcomponents.Intightly-coupledexecutions,therudimentaryinterconnectsprovidedbymanycommercialcloudproviderscannotkeepup,andapplicationperformanceisadverselyimpacted.


Since2011,theAPPteamhasbeenmonitoringperformanceofAWSresources.In2013,PODresourceswereincludedinthetests.From2014,internalstudiesofcloudresourceslookedatnewfeaturesvendorswereprovidingtoseehowperformancecomparedtoequivalentHECChardware.EvaluationsincludedaspectssuchasI/Ospeedsandbatchschedulerfeaturessets.Thevarioustestsdoneinthistimeframeconfirmedtheoriginalfindingthatcloud-basedresourcescouldnotmatchtheperformanceofPleiadesonthebenchmarksused.

Inthisnewstudy,theAPPteamisevaluatingthepropositionthatmoderncloud-basedresourcescanbecost-effectiveincomparisontoin-houseHPCresources.Thisstudydiffersfrompreviousonesinthatitintroducescostintotheinvestigationanditconsiderscircumstancesunderwhichthemostcost-effectivestrategyfordeliveringHPCcyclesmightincludetheuseofcloud-basedresources.

Goals of the Study Themainquestiontobeaddressedinthisstudyiswhethercommercialcloudresourcesshouldbeusedaspartofacost-effectiveimplementationofHECC.Specifically,itsfirstgoalistodeterminewhethercommercialcloudresourceswouldmakeaviable,cost-effectivesubstituteforrunningallofthecurrentHECCworkload.Intheeventthatisnotthecase,therearetwoadditionalgoalsofthestudy.Oneistodetermineunderwhatconditionscommercialcloudresourcesmightmakeaviable,cost-effectivesupplementforrunningsomeofthecurrentHECCworkload.Theotheristoprovideguidanceforafollow-upstudytoidentifywhichelementsoftheHECCworkloadcouldbeoffloadedtoacommercialcloudandwhatstepswouldbenecessarytoaddcloud-basedresourcestoHECC.

2. Approach

Evaluation Workload ThecommonbenchmarksusedbyHECC’sAPPteaminrecentyearsforarchitectureevaluationinclude:

• theNASParallelBenchmarks[6],asetofcodesthatareportableandusefulforshowingperformanceofdifferenttypesoffullapplications,

• theNASTechnologyRefresh(NTR)procurementsuite,asetoffull-sizedapplicationsdatingto2007,and

• HECC’sStandardBillingUnit(SBU4)suites—boththeoriginalonefrom2011[7]andamorerecentversionfrom2017—whichhavebeenusedtoestablishthebillingratesforHECCresources.

AllofthecodesinthesesuitesuseMPIforinter-processcommunication.ThissuiteofcodesissizedwellbelowtheaveragejobworkloadsizeonHECCsystems,whichiscurrentlybetween2048and4096ranks,eachofwhichwouldrunonaseparatecore.

Sincethecodeswouldberunningonpublicclouds,theComputationalFluidDynamics(CFD)codesOVERFLOW,FUN3D,andUSM3D,whichareintheNTRandSBUsuites,could

4TheHECCprojectandtheNASACenterforClimateSimulation(NCCS)usetheSBUforallocatingandtrackingcomputerresourceusageacrossdissimilararchitectures.RepresentativecodesarerunoneacharchitectureandtheirruntimescomparedtothebaselinevaluescalculatedonaPleiadesWestmerenode,whoseSBUrateisdefinedtobe1.0.Forexample,anElectraSkylakenodehasanSBUrateof6.36,asitwasmeasuredtobe6.36timesmoreeffectiveatprocessingthebenchmarksuitethanaPleiadesWestmerenode.


notbeusedbecausetheyareexportcontrolled(ITAR/EAR99).TheopensourceapplicationOpenFOAMwasusedtorepresenttheCFDmembersinstead.Thevariousbenchmarksusedinthestudyaredescribedintheremainderofthissection.

NAS Parallel Benchmarks (NPBs) TheNPBsareasetofbenchmarksdesignedtohelpevaluatetheperformanceofparallelsupercomputers.Amongthebenchmarks,eightofthem,includingBT,CG,EP,FT,LU,MG,IS,andSP,mimicthecomputationanddatamovementinCFDapplications.ClassCandClassDcomprisethethirdandsecondlargestproblemsizesforeachbenchmark.Twoofthebenchmarks,BTandSP,requirethenumberofMPIrankstobeaperfectsquare;theothersixrequireapoweroftwo.Forexample,ClassCrunsofBTandSPareperformedusing16,25,36,64,121,256,484,and1024MPIranks.Fortheothersixbenchmarks,ClassCrunsareperformedusing16,32,64,128,256,512,and1024ranks.

ATHENA++ ATHENA++isanastrophysicalmagneto-hydrodynamics(MHD)codeinC++.ItisfairlyheavilyusedonPleiadesandisacandidateforfutureSBUsuites.Runswith512,1024and2048MPIrankswereattemptedatAWSandHECC;notalltheAWSrunsweresuccessful.

ECCO (MITgcm) OneofthelargestusersofSBUsintheScienceMissionDirectorateonPleiadesistheEstimatingtheCirculation&ClimateoftheOcean(ECCO)project.TheprimarycodeusedintheprojectisbasedontheMITgeneralcirculationmodel(MITgcm),anumericalmodeldesignedforstudyoftheatmosphere,ocean,andclimate.ThetestcaseusedforthisstudyisonethatwasincludedintheNTR1(NASTechnologyRefresh)benchmarks.Runswith120and240MPIrankswereperformedatAWSandHECC.Intherestofthisstudy,“ECCO”isthelabelusedfortheMITgcmresultsandanalysis.

Enzo EnzoisoneofthebenchmarksinSBU2Suite.Itisacommunity-developed,adaptivemeshrefinementsimulationcode,designedforrich,multi-physicshydrodynamicastrophysicalcalculations.Runswith196MPIrankswereperformedusingtheSBU2dataset.

FVCore FVCoreisoneofthebenchmarksinSBU1Suite.ItisobtainedbyextractingthedynamiccoreofthefvGCMalgorithminGEOS-5,usedforEarthscienceresearch.Theterm“fvGCM”hasbeenhistoricallyusedtorefertothemodelwhichwasdevelopedinthe1990satNASA’sGoddardSpaceFlightCenter.ThegridisaCubed-Sphere,sothereisnosingularityatthepoles,anditcanbepartitionedinwaystotestthecommunicationperformancebetweenregionsaswellasthecomputationalperformanceonthecore.Theverticaldirectionwassettobe26layers,andforeachlayerthehorizontalgridisdividedinto6tiles(6facesinacube).Runswith1176MPIrankswereperformedatAWSandHECCusingtheSBU1dataset.

WRF / NU-WRF TheWeatherResearchandForecasting(WRF)applicationisanobservation-drivenregionalEarthsystemmodelingandassimilationsystematsatellite-resolvablescale.RunswiththeWRFSBU1benchmarkused384MPIranks.

NASA-UnifiedWeatherResearchandForecasting(NU-WRF)isoneofthebenchmarksintheSBU2Suite.Runswith1700MPIrankswiththeSBU2datasetwereattempted;therewasaproblemrunningonAWSSkylakenodes,however.


OpenFOAM OpenFOAMisanopensourceComputationalFluidDynamics(CFD)application.TheChannel395testcasefromtheOpenFOAMtutorialwasusedforthisstudy.Runswith48,144,and288MPIranksweremadeonHaswell-basednodesatHECCandAWS.

Evaluation Systems TheHECCproductionsystemscurrentlyhave6differentgenerationsofIntelXeonsfromWestmerethroughSkylake.ThisstudycomparesHECCresourcesbasedonthethreemostrecentprocessortypes—Haswell,Broadwell,andSkylake—tosimilarsystemsavailablefromcloudproviders.

In-house solution: HECC Pleiades and Electra systems Processor Types and Interconnect: ThreetypesofIntelXeonprocessorsweretested:

1. Haswell–[email protected],andInfiniBand4X-FDR56-GbpsInterconnect

2. Broadwell–[email protected],andInfiniBand4X-FDR56-GbpsInterconnect

3. Skylake–[email protected],andInfiniBand4X-EDR100-Gbpsinterconnect

Storage Resources: HECCprovides:

(i) 680GBofspacefora/nasafilesystemthathoststhepublicsoftwaremodules,(ii) six/homefilesystemseachabout2TB,and(iii) sixLustreshared/nobackupfilesystemswithsizesbetween1.7PBand20PB.

Thetestingforthisstudywasperformedmainlywith/home7and/nobackupp8.

Operating System: SUSELinuxEnterpriseServer12operatingsystem,providedandsupportedbyHPE,wasusedonthesesystems.

MPI Library: MPIapplicationsrunningonHECCsystemsarerequiredtousetheHPEMPTforitsprovenscalingperformanceandpreventionofnetworkinstability.

Job Scheduling: TestjobsaresubmittedfromoneofthesevenPleiadesfront-endsystems(pfe[21-27])andscheduledtorunbythePBSbatchjobscheduler.

Commercial cloud offering: AWS public cloud available through NASA CSSO/EMCC5 ThepubliccloudresourcesavailableforthisstudywerefromtheAWSUSWest(Oregon)region.Thefollowinginstancetypeswereusedinthisstudy.

Processor Types and Interconnect: ThreetypesofinstancesbasedonIntelXeonprocessorsweretested:

5AllNASAacquisitionofcloud-basedresourcesisrequiredtogothroughtheagencyOCIO’sCloudServicesServiceOffice(CSSO),whichhasdevelopedtheEnterpriseManagedCloudComputing(EMCC)framework.


1. c4.8xlarge–Haswellprocessors,[email protected],60GiBofmemory,nolocaltemporarystoragepernode,and10-Gbpsinterconnect

2. m4.16xlarge–Broadwellprocessors,[email protected],256GiBofmemory,nolocaltemporarystoragepernode,and25-Gbpsinterconnect

3. c5.18xlarge–Skylakeprocessors,[email protected],144GiBofmemory,nolocaltemporarystoragepernode,and25-Gbpsinterconnect

Storage Resources: A50-GBEBSvolumewasconfiguredfor/nasaand/homefilesystemsandfive250-GBEBSvolumeswerebundledtogetherandexportedasanNFS/nobackupfortesting.

Operating System: AmazonLinuxAMI,whichhasallAWSspecifictunings,changesanddriversneededtorunwell,waschosenforthisstudy.TheAWScostissmallerusingthisOScomparedto,forexample,CentOSorSLESOS.

MPI Library: ThereisnoHPEMPTavailableontheAWScloud.Intel-MPIwasusedforallMPIapplications.Auser-suppliedlicenseisrequiredtobuildIntel-MPIapplicationsonAWS.

Job Scheduling: Withoutajobschedulerinthetestenvironment,ascriptcalledspot_managerwascreatedandusedonafront-endinstance(hostnamenasfe01)forrequestingandmanagingcomputeinstancesusedforeachrun.

Commercial cloud offering: POD ThePODresourcesavailableforthisstudyarefroma“ProofofConcept”arrangementwheretheusagewastrackedbutnoactualchargesmade.Thefollowingresourceswereusedinthisstudy.

Processor Types and Interconnect: ThreetypesofIntelXeonprocessorsweretested:

1. HaswellProcessorsinMT1location-IntelXeonE5-2660v3CPU@2.6GHzwith20coresand128GiBofmemorypernode,andInfiniBand4X-QDR40-GbpsInterconnect

2. BroadwellProcessorsinMT2location-IntelXeonE5-2680v4CPU@2.4GHzwith28coresand256GiBofmemorypernode,andIntelOmni-Path100-GbpsInterconnect

3. SkylakeProcessorsinMT2location(earlyaccesspriortopublicrelease)[email protected],andIntelOmni-Path100-Gbpsinterconnect

Storage Resources: BothMT1(NASfilesystem)andMT2(Lustrefilesystem)haveabout500TBoftotalstoragespacevisibletopublic.1TBwasallocatedforthistesting.

Operating System: PODconfiguresallMT1resourceswithCentOSLinuxversion6andMT2resourceswithCentOSLinuxversion7.


MPI Library: ThereisnoHPEMPTavailableonPOD.Intel-MPIisavailableviatheirpre-installedIntelcompilermodules.However,auser-suppliedlicenseisrequiredtousetheIntelcompilerandMPIlibrary.TherearealsomultiplemodulesofOpenMPI.

Job Scheduling: APBS-likebatchjobsystemisavailable.Thecommandsqsub,qstat,andqdelcanbeusedtomanagejobsbyeachuser.

Cost Basis Foreachapplicationintheworkload,thecostassociatedwitheachrunwascalculatedasfollows:

HECC Cost Calculation: AjobrunonHECCsystemswaschargedinSBUs[8]whicharecalculatedfrom

(i) aconversionfactorforeachprocessortype(Haswell3.34;Broadwell4.04;Skylake6.36),

(ii) thenumberofdedicatednodesused,and(iii) thedurationofthejobinhours.

Thecurrentcostof1SBUis$0.16[8],whichiscalculatedasthetotalannualHECCcostsdividedbythetotalnumberofSBUspromisedtotheMissionDirectoratesinFY18.NotethatthetotalHECCcostsutilizedinthiscalculationincludes:

• thetotalannualHECCbudgetwhichcoversthecostsofoperatingtheHECCfacilityincludinghardwareandsoftwarecosts,maintenance,supportstaff,facilitymaintenance,andelectricalpowercosts,and

• a$1Millionfacilitydepreciationcost(basedon30-yearamortizationoftheinitialcapitalcostof$30Millionforthebuildinghostingtheresource).

Inthecurrentmodularizedapproachtoexpandthefacilitybeyondthecurrentbuilding,thecostsofthecontainerstohosttheHPChardwareisalsobeingpaidbytheproject.Thus,thetotalannualbudgetbeingusedintheabovecostperSBUcalculationalsoincludesthecostofinfrastructurefacilitydevelopmentandexpansion.

AWS Cost Calculation: Becauseofthedifficultyofassessingfront-end,storageandbandwidth,technicalsupport,andCSSO/EMCCoverheadcostsdiscussedinAppendixI,afull-blowncostforusingtheAWSresourcestoruneachjobisnotavailableforthisstudy.Onlythecostofusingthecomputeinstancesispresented.DependingontheAWSregionandthepurchasetype,availabilityandcostofaninstancetypecanvarysignificantly.Forexample,spotinstancesarenotavailableforGovCloud.Forregionswheretheyareavailable,spotpricefluctuates;itcangoabovetheon-demandpriceorgolowerthan30%oftheon-demandprice.Thus,fourrepresentative(withAmazonLinuxOS)priceswillbecalculated:

• on-demandpriceoftheUS-West(Oregon),• asamplespotpricebasedon30%oftheUS-West(Oregon)on-demandprice(see

AppendixI),• on-demandpriceoftheGovCloud,and• asamplepre-leasingpriceestimatedas70%oftheGovCloudon-demandprice(see

AppendixI).


Table1summarizesthevarioushour-ratesoftheinstancetypesusedinthisstudy.Notethatc5instancesarenotyetofferedfortheGovCloud,sonopricinginformationisavailable.

POD Cost Calculation: SimilartoAWS,itisdifficulttoassessthePODloginnodeandstoragecostsonaper-jobbasis.Thus,afull-blowncostforusingthePODresourcestoruneachjobisnotavailableforthisstudy.Onlythecostofusingthecomputeresourcesispresented.FortheHaswell,Broadwell,andSkylake,thepublishedratesasofMay2018of$0.09,$0.10,and$0.11percore-hour,respectively,willbeusedinthisreport.Notethatgovernmentandvolumediscountsmaybeavailable.

AlthoughPODadvertisesper-corecharging,itturnsoutthatrequestingfewercoresthanthemaximumnumberofcorespernodeisnotallowedformulti-nodejobs.Essentially,thecostofthewholenodewillbecharged.Thatis,$1.80per20-coreHaswellnode,$2.80per28-coreBroadwellnode,and$4.40per40-coreSkylakenode.

3. Results

Performance and Cost Oneofthegoalsofthestudywastodetermineifcloudscanprovideacost-effectivesubstituteforrunningtheHECCworkload.Tothisend,asuiteofnon-ITARworkloadsincludingtheNPBs,Athena++,ECCO,Enzo,FVCore,WRF/nuWRF,andOpenFOAM,wasusedtoassesstheperformanceandcosteffectivenessonAWSandPODversusHECCsystems.

TheNPBClassCbenchmarkruns,withcore-countrangingfrom16to1024cores,wereperformedtocheckscalingperformance.IncomparisonwithHECCsystems,theresultsshowthatthecomputationperformancescaleswellforrunsonAWS,whilecommunication

HECC AWS (Costs are Compute-Only†) POD

Model/Cores Full Cost

Instance Name

On-Demand Spot Price

GovCloud On-

Demand

GovCloud Pre-

Leasing

Compute Only

Haswell/18 m4.16xlarge $1.591 $0.477 $1.915 $1.341

Haswell/20 $1.800

Haswell/24 $0.534 $0.636* $2.160*

Broadwell/28 $0.646 $0.840* $2.800

Broadwell/32 c4.8xlarge $3.200 $0.960 $4.032 $2.822

Skylake/36 c5.18xlarge $3.060 $0.918** N/A N/A

Skylake/40 $1.018 $1.020* ** $4.400 *AWS spot prices in blue are approxima]ons for comparison purposes and are pro-rated based on rela]ve core counts versus

HECC. The POD 24-core Haswell price has been similarly converted for comparison to HECC. **This es]mated AWS spot price for Skylake is unlikely to be available—see Appendix I.

†Full cost informa]on unavailable, but expected to be significantly higher than compute-only cost.

Table 1: Hourly costs of compute nodes used in study.


generallydoesnot,especiallyacrossmultipleinstances.Forexample,thefollowinggraphsshowthescalingbehaviorfortheBT.Cbenchmark:

Thepoorcommunicationscalingindicatesthatthe25-Gbpsinterconnect(maximumofcurrentAWSinstances)hindersefficientcommunicationbetweenMPIprocessesacrossmultipleAWSinstances.DetailsoftheClassCNPBrunscanbefoundinAppendixIII(Figure2andTable6).ThescalingresultsfortheClassDNPBscanalsobefoundthere(Tables7and8).

Table2comparesthecostofrunningtheClassDbenchmarksonIntelBroadwell-basednodesatHECC,AWS,andPOD.Foreachbenchmarkthescalingrunsaresummarizedonasingleline,showingthetotaltimeandthetotalnumberofnodesrequiredbyallrunsandthetotalcostforthenodehoursusedbytheruns.(SeeTable7inAppendixIIIforcostdetailsforeachscalingrun.)InthecaseofHECC,a“fullcost”isused,aswasdetailedintheCostBasisdiscussedabove.

Table2showsanestimateforthecomputechargesiftheAWS30%spotpricingisavailable($16.42).Italsoshowsanestimate,$48.26,forrunninginapre-leasedblockofnodesonthegovernmentresourcesatAWS(i.e.70%ofthecostofAWSGovon-demandinstances).Inthebestcasewhenspotpricingisused,AWSis5.8timesmoreexpensive.Atthepublishedprice,PODis4.7timesmoreexpensivethanHECC.

Table3showssimilarcomparativedataforSkylake-basednodesatHECCandAWS.NotethatthefullcostoftheNPBClassDrunsonHECCSkylakesismorethan12timescheaperthanthecompute-onlycostonAWS.

Table 2: Selected NPB class D performance and cost using Broadwell processors on

HECC (28 cores/node), POD (28 cores/node) and AWS (32 cores/instance).

Benchmark

# of HECC or POD

Broadwell nodes

# of AWS m4.16xlarge

instancesTotal HECC time (sec)

HECC full cost

Total AWS time (sec)

AWS Oregon

compute cost

AWS Gov compute

costTotal POD time (sec)

POD compute

cost bt.D 47 40 135.37 $0.40 327.3 $5.55 $6.99 168.30 $2.37cg.D 140 120 192.01 $0.99 629.75 $15.12 $19.05 150.68 $3.41ep.D 140 120 10.62 $0.04 17.11 $0.30 $0.38 13.84 $0.25ft.D 29 24 95 $0.20 552.59 $5.32 $6.70 58.50 $0.59is.D 29 24 10.89 $0.02 58.09 $0.56 $0.71 7.09 $0.08lu.D 140 120 147.3 $0.62 633.12 $17.07 $21.51 185.14 $3.64mg.D 140 120 17.59 $0.08 52.07 $1.03 $1.30 19.61 $0.38sp.D 47 40 152.11 $0.46 555.84 $9.77 $12.31 171.54 $2.46Total Cost $2.81 $54.72 $68.94 $13.18Estimated AWS spot cost (30% of on-demand cost) $16.42Estimated AWS pre-leasing cost (70% of US-Gov cost) $48.26


Table4showssimilardatafortheapplicationbenchmarksdescribedinSection2,comparingthefullcostofrunningonHaswell-basednodesatHECCtothecompute-onlycostonsimilarnodesatAWS.LikeTables2and3,itaggregatesmultiplescalingrunsintoasinglelineforeachbenchmark;thefulldatacanbefoundinAppendixIII(Table9).Table5showsthedatafortwoapplicationsrunonHECCandPOD,usingHaswell-,Broadwell-,andSkylake-basednodesatbothsites.Inthiscase,multiplerunsarenotbeingaggregatedintoasingleline.

Inthebestcase,wherespotpricingisavailabletoruntheentireworkload,thecompute-onlyAWScostisstill1.9timesmorethanthefullcostofrunningonHECCin-houseresources.Withapre-leasingoptiononGovCloud,theAWScomputecostisabout5.2timesthatofHECCfullcost.ThePODcomputecostisabout5.3timesthatoftheHECCfullcost.

Analysis Thecompute-onlycostofusingtheAWSUS-West(Oregon)spotinstancestoruntheworkloadsisstillafewtimesmoreexpensivethanthefull-blownHECCcost.Forexample,asseeninTable2,forNPBclassDontheBroadwellnodes,usingspotpricingcosts$16.42(compute-only)onAWSand$2.81(full-blown)onHECC,aratioof5.8x.FortheNPBclassDontheSkylakenodes,asseeninTable3,itcosts$18.29(compute-only)onAWS,whichis~12xoftheHECCfull-blowncostof$1.50.Forthe6full-sizedapplicationsusingHaswellnodes,asdepictedinTable4,thecompute-onlycostonAWS,$141.77,is1.9xoftheHECCfull-blowncostof$76.18.

Table 4: Selected MPI applicafons performance and cost using Haswell processors on

HECC (24 cores/node) and AWS (18 cores/instance).

Benchmark Case

# of HECC Haswell

nodes

# of AWS c4.8xlarge instances

HECC time (sec)

Total HECC full cost

AWS time (sec)

Total AWS Oregon

compute cost

Total AWS Gov compute

costATHENA++ SBU2 129 171 3445 $29.51 3672 $127.11 $153.00ECCO NTR1 15 21 185 $0.19 313 $1.41 $1.69Enzo SBU2 9 11 1827 $2.44 2266 $11.02 $13.26FVCore SBU1 49 66 1061 $7.72 1104 $32.20 $38.76nuWRF SBU2 71 95 529 $5.58 1302 $54.66 $65.80OpenFOAM Channel395 20 27 27500 $30.74 51430 $246.31 $296.46Total Cost $76.18 $472.71 $568.96Estimated AWS spot cost (30% of on-demand cost) $141.77Estimated AWS pre-leasing cost (70% of US-gov cost) $398.27

Table 3: Selected NPB class D performance and cost using Skylake processors on

HECC (40 cores/node), and AWS (36 cores/instance).

Benchmark

# of HECC Skylake

nodes

# of AWS c5.18xlarge

instancesTotal HECC time (sec)

HECC full cost

Total AWS time (sec)

AWS Oregon

compute cost

bt.D 33 37 100.35 $0.31 238.81 $3.27cg.D 46 52 65.05 $0.23 1984 $30.20ep.D 46 52 8.5 $0.03 8.58 $0.09ft.D 46 52 50.3 $0.17 1118.4 $14.02is.D 46 52 4.93 $0.02 164.5 $2.20lu.D 46 52 107.12 $0.36 327.69 $4.64mg.D 46 52 13.36 $0.04 71.99 $0.88sp.D 33 37 121.84 $0.35 383.5 $5.68Total Cost $1.50 $60.98Estimated AWS spot cost (30% of on-demand cost) $18.29


TheratiosareevenhigherwhenusingtheAWSUS-West(Oregon)on-demand,theUS-Govon-demand,orpre-leasingUS-Govinstances(seeAppendixIformoredetails).Inaddition,AWSdoesnotofferspotinstancesintheirgovernmentcloud.Payingforon-demandinstancesontheirpublicorgovernmentcloudorpre-leasingoptionwouldnotbecosteffective.

Theadditionalcostofutilizingfront-ends,filesystems,datatransfer,softwarelicenses,andsupportonAWSishardertoquantifyonaper-jobbasis.Onecanhowever,examinethetotalchargeindifferentcategories.Forexample,thetotalAWSchargeforthemonthofDecember2017inconductingthebenchmarkevaluationwas$1944,ofwhich$1068wasspentonusingvariouscomputeinstances,$136ondiskspaceusage,$738onhavingthefront-endavailable(ata$4.256perhourrate),andabout$2ondatatransfer.Therefore,therewasapproximatelyan82%($1944/$1068=1.82)extrabeyondthecomputeinstancescost.Similarly,forJanuary2018,therewasa97%($1654/$840)extracostontopofthecomputeinstancescost.Sincetheworkloadsusedinthisevaluationneitheruseverymuchstoragenorperformlargefiletransfers,itisanticipatedthatinaproductionenvironment,thisoverheadduetostorageandfiletransferwillprobablyincrease.Ontheotherhand,theoverheadduetotheuseofthefront-endwilllikelydecreaseiftherearemanyuserssharingthefront-endsystem.Also,sincetheAWSresourcesusedforthisevaluationarethroughCSSO/EMCC,therewillbeanadditionalCSSO/EMCCoverhead(whichincludesfeespaidtoAWSforsupportservicesandotheroperationalcostsofCSSO/EMCC),whichwasnotyetreportedandthusnotincludedinthiscostassessment.

ForPOD,thereisafront-endcostandstoragecostinadditiontothecomputecost.Storageis$0.10perGB-month,butgovernmentdiscountswilllikelyapply.Thereisnocostfordatatransferorgettingtechnicalsupport.PODalsooffersadditionalvolumediscounts,butthespecificswouldnotbeknownuntilHECCnegotiatesacloudservicescontractwithPOD.

EventhoughthePODMT2resourcesmayprovidecompetitiveperformance,itisnotcosteffectivecomparedtorunningontheHECCresources.Forexample,asseeninTable2,thetotalcostofrunningthevariousNPBclassDbenchmarksonPODBroadwellnodes($13.19,compute-only)is4.7timesthatoftheHECCBroadwellfull-blowncost($2.81).Thetwofull-sizedapplicationbenchmarks,EnzoandWRF,whenrunonthreenodetypeswere~5.3timeshigheronPODthanthefull-blownHECCcosts(seeTable5).NotethatthestoragecostincurredonPODwasminimalduringthisevaluationperiod.GiventhatthecostofPODloginnodeandfiletransferwilllikelybefreeandtherewillbevolumeandgovernmentdiscountswithstorage,itisexpectedthattheextracostontopofcomputeinaproductionenvironmentwillbesmallerwithPODthanwithAWS.

Table 5: Enzo and WRF performance and cost using Haswell, Broadwell, and Skylake processors on HECC

and POD. HECC: Haswell – 24 cores/node, Broadwell – 28 cores/node, Skylake – 40 cores/node; POD: Haswell – 20 cores/node, Broadwell – 28 cores/node, Skylake – 40 cores/node.

APP Case NCPUS

# of HECC

nodes

# of POD

nodes

HECC time

(sec)

Total HECC

full cost

POD time

(sec)

Total POD

compute

cost

ENZO (HAS) SBU2 196 9 10 1827 $2.44 2355 $11.78

ENZO (BRO) SBU2 196 7 7 1625 $2.04 1870 $10.18

ENZO (SKY) SBU2 196 5 5 1519 $2.15 1751 $10.70

WRF (HAS) SBU1 384 16 20 1243 $2.95 1802 $18.02

WRF (BRO) SBU1 384 14 14 1225 $3.08 1436 $15.64

WRF (SKY) SBU1 384 10 10 1069 $3.02 1352 $16.52

Total Cost $15.68 $82.84


Basedontheresultspresentedabove,theHECCsystemsarenotonlymuchbetterintheirperformancebutalsomorecosteffectivethanexistingAWSandPODofferings.NeitherAWSnorPODwouldbeaviablesubstitutetoHECCforrunningthetraditionalHPCMPI(i.e.multi-node)applications.

Usability and Potential Technical Issues IfHECCisgoingtooffercloud-basedresources,thenusabilityisamajorconcern.Thestudyexaminesmultipleareasinthatregard.ThecompletefindingscanbefoundinAppendixIV.Thekeyfindingsaresummarizedhere:

• PortingofNASA’straditionalHPCMPIapplicationstothecloudisnottrivialandtheprocesscanencounterfrequentfailuresdueto:(i) systemandruntimeconfigurationdifferences,especiallywhenan

applicationrequiresalargenumberoflibraries(e.g.,forthecasesofGEOS-5fromtheSBU2benchmarksuiteandWRF),

(ii) failurestorunforsomecasesofanapplicationforunknownissues(e.g.,ATHENA++),and

(iii) unexpectedpoorperformancewhosecauseneedsfurtherinvestigation(e.g.,NPBCGandFT,WRF,OpenFOAM).

(SeeAppendixIVfordetails.)Thetime,effort,andcostforNASAscientists,HECCstaff,orcommercialcloudsupportstafftoinvestigateandresolvesuchissuescanbesignificantwhenmigratingtothecloud.Inaddition,theuseoflicensedsoftwarelibrarieswillincreasethecostofrunninginthecloudinadditiontorunningatHECC.

• AWSoffersmanyoperatingsystemoptions.SinceSLES12isthecurrentOSusedonHECCsystems,itisthefirstchoicefortryingonAWSsinceitislikelytointroducefewerportingissues.However,onAWS,SLESOSismoreexpensivethanthebasicAmazonLinuxOSusedinthisevaluation.PODoffersonlyCentOS.

• AWSoffersnopre-installedsoftwarestack.PODhaspre-installedmorethan250softwaremodulesforHPCuse.Forcommercialsoftware,usershavetorelyonusinglicensesprovidedbyHECCortheymustbringtheirowntoeitherAWSorPOD.

• AWSdoesnotprovideajobschedulingsystem.TheevaluationonAWSusedascript-basedmethodcreatedbyHECCstafftorequest,check,andstopinstances.PODusestheMoab/Torquemanagementsystem,whichisverysimilartothePBSbatchschedulerusedonHECCsystems.

• Giventheadditionalworkneededforporting,itmaybeworthwhiletouse“container”technology,suchasDocker,topackageajob’simagesothatthejobcanrunonbothHECCandthecloud.GettingthistechnologytoworkproperlyandsatisfyNASAsecurityrequirementsstillrequiressignificantdevelopmentandtesting.

• Enablingahybridenvironment,wherejobburstingcanoccurfromHECCtothecloud,willrequireaworkingjobschedulingsystem.AltairhasPBS18inbetatestwithMicrosoftAzure.AdaptiveComputingadvertisesavailabilityoftheirMoab/NODUScloudburstingsolutiononAWS,Googlecloud,AliCloud,DigitalOceanandothers.HECChasyettotestthefeasibilityofthesetechnologies.

• AuthorizationtoOperate(ATO)willbeneededtooperateoncloudresourcessuchasthoseatAWSorPOD.ThebestoptionislikelytoextendtheNASmoderatesecurityplan,underwhichHECCoperates,toincludetheuseofresourcesfrom


specificcommercialcloudvendors.Atthatpoint,export-controlled(ITAR/EAR99)datawouldbepermitted.

Other Considerations Anotherconsiderationforusingcloudsisthattheymaybeabletoprovideaccesstostate-of-the-artresources,suchasGPUs,thatarenotreadilyavailableatHECC.Tothisend,thestudyfound:

• PODlaggedbehindAWSinofferingtheIntelSkylakeprocessors.AWSreleasedSkylakeonNovember7,2017foruseontheirpubliccloud.PODreleasedSkylakeforgeneraluseonMarch12,2018.Note,however,thattheAWSSkylakeprocessorsareinhighdemandanditcantakealongtimetogetspotinstancesifonedoesnotwanttopaytheon-demandprice.AlsonotethatnoSkylakeprocessorsareyetavailable(asofMay2018)throughAWSgovernmentcloud.

• AWSlagsbehindPODinofferinghigher-performanceinterconnect.ThePODMT2sitealreadyoffers100-GbpsOmni-PathinterconnectwhileAWScurrentlyoffersupto25-Gbpsinterconnect.

• PODlagsbehindAWSinofferingnewGPUprocessors.AWShasNvidiaM60,K80,andV100whilePODhasonlyNvidiaK40.Currently,HECCalsohasNvidiaK40.

• PODoffersIntelKNLbutAWSdoesnot.HECCalsohasIntelKNL.

4. Findings for NASA’s Current HPC Workload Notethatthefindingspresentedinthissectionreflecttheperformanceandcostatthetimeofthestudy(May2018).Changesincloudofferings,especiallywithregardtopricing,maynecessitateareexaminationinthefuture.

Performance AllrunsonHECCresourceswerefaster,andsometimessignificantlyfaster,thanrunsonthemostcloselymatchingAmazonresources.ThelargestdifferencesaremostlikelyduetoAmazon’slackofatruehigh-performanceinterconnectforprocessors.

ForsomeoftheNPBs,Penguinresourcesyieldedfasterruns;forothersithadslowerrunsthanHECC.DifferencesintheprocessorinterconnectsandMPIlibraryarelikelythecausesoftheperformancedifferences.Fortheapplicationbenchmarks,performanceofresourcesatPenguinalwayslaggedbehindsimilarresourcesatHECC.Theseperformancedifferencesleadtothefirstfinding:

Finding 1: Tightly-coupled, multi-node applications from the NASA workload take somewhat more time when run on cloud-based nodes connected with HPC-level interconnects; they take significantly more time when run on cloud-based nodes that use conventional, Ethernet-based interconnects.

Cost WiththeexceptionofSkylakeprocessors,Table1inSection2showsthatthebasecompute-onlycostsofanode-houratAWSusingspotpricingismoreexpensivethanthefullcostofanode-houronasimilarresourceatHECC.Skylakesarenewand,asshowninAppendixI,thespotpricingestimateisveryoptimistic.Jobspayingthatpricearelikelytobeevictedbeforecompletionbecauseofthehighdemandforthenodes.


Table1alsoshowsthatthecompute-onlycostofresourcesatPODis4.0–4.3timesmoreexpensivethanthefullcostforsimilarresourcesatHECC.Theseobservationsleadtothesecondfinding:

Finding 2: The per-hour full cost of HECC resources is cheaper than the (compute-only) spot price of similar resources at AWS and significantly cheaper than (compute-only) price of similar resources at POD.

Cost Effectiveness Inallcases,thefullcostofrunningonHECCresourceswaslessthanthelowest-possiblecompute-onlycostofrunningonAWS.TorunthefullsetoftheNPBs,AWSwas5.8–12timesmoreexpensivethanHECC,dependingontheprocessortypeused.Thefull-sizedapplicationswereinthebestcase1.9timesmoreexpensive.

TheNPBrunsatPenguinwere~4.7timesmoreexpensivethanequivalentrunsatHECC.Thefull-sizedapplicationswere~5.3timesmoreexpensive.

TheaverageuseofHECCresourcesisforjobsusing4,000coresandrunningformorethan60hours.Atleastthree-fourthsoftheworkloadisbymulti-processjobsusingtightlycoupledcommunicationsandtheSBUsuitehasbeendefinedtoreflectthatusage.TherequirementsoftheHECCworkloadtogetherwiththecostandperformancedatainthisreportleadstothethirdfinding:

Finding 3: Commercial clouds do not offer a viable, cost-effective approach for replacing in-house HPC resources at NASA.

Thisfindingisbasedontheexpectationthatothercloudproviderscanprovideon-demandresourcesonlyatorabovethespot-pricelevelofAWS.Inaddition,providerswhodonotuseHPC-levelnodeinterconnectswillfacesimilarperformancepenaltiesasseenonAWS.

Clouds as a Supplement to HECC Resources Whileitisnotcosteffectivetousecloudstoreplacein-houseHPCresourcesatNASA,theremaybecircumstanceswhereitmakeseconomicsensetousethemforaugmentingin-houseresources.InidentifyingcandidateusesforcloudsitisworthwhiletostartwithanexaminationofwhyHECCcostsaresolow.

ThemainfactorthatkeepsHECCcostslowisitsveryhighoverallaverageutilizationrate—morethan80%ofthetheoreticalmaximumnumberofSBUsin2017weredeliveredtousers.Coupledwithlowhardwareacquisitioncostsandarelativelylowratioofstaff-to-HWinfrastructure,itisverydifficultforcloudvendorstocompetewithHECC’scostpernode-hour.Multiplecloudvendorshaveacknowledgedthisfactwithcommentssuchas,“Ifyou(HECC)cankeepsystems70%utilizedormore,thenourpricingcannotcompetewithyours.”

AnotherfactorthathelpsdrivedownHECCcostsisitsstandardpracticetoaddcomputecapacitywithoutincreasingstaffingcosts.ThispatternisexpectedtocontinuewiththeNASFacilityExpansioneffort,whichisaddingadditionalfloorspaceandcomputeresourcestoHECC.Overtime,theplancallsfordoublingthespaceavailableandequippingitwithcomputeresources,withnoadditionalstaffingcosts.

Usingutilizationandresponsivenessasdrivers,therearetwoareaswherecloudsmightbeusefultoHECC:


• WhenHECChastheneedforresourceswhoselong-termdemandissporadicorunknown,itmaybemorecosteffectivetousethecloudforthoserequirements.Forexample,testinganewarchitectureoftenrequiresaperiodofhighdemandfollowedbyalengthyperiodofalmostnoactivity.OtherexamplesinvolveuserswithrequirementsforGPU-acceleratednodes,suchasformachinelearning.Here,thedemandislikelylong-term,buttheextentofthedemandisunknownatpresent.

• Whentherunningofmanyloosely-coupled(orserial)jobssignificantlyimpactstheschedulingoftheHECC’sprimaryworkload—large,tightly-coupledcomputations—thenitmaybeworthpayingthepriceofrunningthesmalljobsinthecloudtoimproveresponsivenessforallusers.

ThisanalysisandtheobservationthatcommercialcloudprovidersoffermanymoretypesofcomputeresourcesthanHECCisableto(seeAppendixI)leadtothefourthfinding:

Finding 4: Commercial clouds provide a variety of resources not available at HECC. Several use cases, such as machine learning, were identified that may be cost effective to run there.

5. Discussion and Further Work Thissectiondiscussespotentialcloudusecasesinfurtherdetail,whichleadstosomeactionstheAPPteamistakingtoimproveNASA’sabilitytodelivercomputecapacityinthemostcost-effectivewaypossible.

Using Cloud Resources When User Demand is Unknown or Sporadic OneofthescenarioswhereitmaymakesenseforHECCtousethecloudratherthanacquiringitsownresourcesiswhenaresourcewouldbelightlyorsporadicallyutilized.Forexample,HECCiscontinuallyprocuringsmallsystemstotesttheusefulnessofnewtechnologiesforNASAapplicationssuchasthemany-integratedcoresarchitecture-basedIntelXeonPhiprocessorortheARMprocessor.Unless,thesenewtechnologiesareseentoperformverywellonalargenumberofapplications,theirutilizationmaybelimitedtoasmallnumberofusersevenafterthetestingphaseiscompleted.Insuchcases,whereongoing,long-termdemandforthesystemsisuncertain,HECCshouldevaluatethecosteffectivenessofusingcloud-basedresourcesinsteadofacquiringitsownin-housesystems.6

AnotherscenarioistheuseofGPUsformodelingandsimulation(M&S)applications.DuringthetimethatuserdemandforGPUcyclesforM&Sisdeveloping,HECCshouldconsiderusingcloud-basedresourcesforthosejobs.OneadvantageforuserswouldbeaccesstonewerhardwaretypesthanHECChasin-house.Forexample,AWSoffersnewerGPUprocessorsthaneitherHECCorPOD.AlthoughtheAWSGPUpriceisexpensive(seeAppendixI)andmulti-nodeGPUjobswouldfacethesortofperformancepenaltiesthatmulti-CPUjobsface,itmaystillbemorecosteffectivethanregularlyacquiringnewin-houseGPUsandcertainlyprovidestheabilitytoopportunisticallyaccessasneeded.

Similarly,dataanalyticsapplications,e.g.,machine-learningalgorithms,havebeenshowntoruneffectivelyonGPUs.However,sincethedemandforGPUcyclesforsuchapplicationsisstilldeveloping,HECCshouldconsiderrunningsuchjobsinthecloud.Sincethecomputationsareusuallylooselycoupled,costeffectivenesswouldlikelybenefitfrom

6Notethatgivenrelativelyhighcostofcommercialclouds(atleast1.9timesonAWSforapplications),utilizationofin-houseresourcehastobebelow42%beforeitbecomescost-effectivetouseacommercialcloudfortheresource.


runningonnodeswithmanyGPUs.Suchnodesareparticularlyexpensiveandwouldrepresentalargeinvestmentthatmightonlybeusedafractionofthetime.Runninginthecloudcouldbethemosteffectivesolutionforsuchaworkloaduntilsuchtimeasdemandishighenoughtojustifyanon-premisesresource.Notethatrunningdataanalyticsapplicationsinthecloudmayalsofaceaperformancepenaltydependingonwherethedataislocated.Forexample,adataanalyticsapplicationusingdatasetsalreadyatHECCwouldfaceadditionalcostinaccessingthatdatawhenrunningoncloudresources.

Inanyoftheabovecases,whenuserdemandgrowstothepointthatitismoreeconomicaltorunonpremises,thenbringingaresourcein-housecanbejustifiedusingpurelyaneconomicargument.

Using Cloud Resources to Improve Responsiveness of HECC In-House Systems Opportunity Cost of Running Small Jobs on Pleiades/Electra Experiencewiththeportingandperformancesofmulti-nodeMPIjobsonthecloudleadstotheconclusionthattheyarenotaswellsuitedassingle-nodejobsmaybe.AsignificantportionofusageofHECCin-houseresourcesisbysingle-nodejobs.In2017,1.87Mone-nodejobs(orsubjobs)fromNASA’sScienceMissionDirectorate(SMD)consumedatotal9MSBUsonPleiadesandElectra.Theyrepresentedabout85%ofthetotalnumberofjobsonthosesystemsbutonly3%ofthetotalSBUusage.

Aquestionthatthennaturallycomesupinlightofthesenumbersishowtheschedulingoftheone-nodejobsfromSMDwouldaffecttheother15%ofthejobsonthesystem.Inparticular,dowiderjobs,i.e.onesusingalargernumberofprocessors,getdelayedsignificantlybecauseoftheexistenceofone-nodejobs?

Whileitseemsthattherelativelysmallusage(3%)couldbeaccomplishedbyfinding“holes”intheschedulingofwiderjobs,thesheervolumeofsingle-nodejobsmakesthequestionsworthinvestigating.

Analysis of Leasing Cloud Resources for Small Jobs IftheproductivityimpactofleavingallofthesmalljobsonHECCin-houseresourcesisfoundtobesignificant,NASAmightconsiderleasingresourcesinthecloudinordertorunthosejobs.Sometradeoffanalysisofusingleasedresourceswasthereforeconducted.

AnalysisofPBShistoricaldatashowsthatthemajorityofone-nodejobsonHECCsystemsbelongtoNASASMD.Inaddition,itislikelythattheSMDjobshavelessstringentsecurityrequirements.Whentheeffectsofdelayingtheschedulingofwiderjobsisconsidered,migratingsomeoftheone-nodeSMDjobsfromHECCtoacommercialcloudmightprovetobecosteffectiveforthewholeHECCworkload.Tothisend,thestudyconductedsomeexperimentstodeterminetheutilizationpercentageandwaitingtimeonthecloudresourcethatwouldbeseenbyHECCwhensinglenodejobsaremovedfromin-houseresourcestoafixed-size,leasedresourceinthecloud.Historicaljobsubmissiondatafromallone-nodeSMDjobsin2017wasused.Thesizeoftheleasedresourcewasvariedfrom2to8racksof72nodeseach.TheresultsshowninFigure1clearlyindicatethatitwouldbedifficulttosizealeasetohandleallsinglenodejobsfromSMDandhavetheleasedresourcebothberesponsivetouserrequirementsandbewellutilized.Forexample,aleaseof4racks(288nodes)wouldbeabout95%utilizedbutwouldhaveanaveragewaitingtimeofmorethan750hours.If5racks(360nodes)wereleased,thewaittimewoulddroptoabout150hoursbutitwouldcost25%more.Thus,anyefficientsolutiontorunsinglenodejobsonaleased


resourcewouldneedtohaveafallback(intheon-demandcloudoronin-houseresources)toreducewaitingtimeswhendemandexceedsthecapabilityoftheleasedresources.

Acostcomparisonofusing144HaswellnodesforoneyearamongHECC-in-house,AWSandPODissummarizedbelow.NotethattheseHaswellnodesarenotidenticalandthecostspresentedshouldbeusedwithcautionwhendrawingconclusions.SeeAppendixIIIformoredetailedinformationonhowthecostsareobtained.

• HECC:using144existingHaswellnodes(eachwith24cores,128GiBofmemory),“unlimited”amountofstoragespaceandmultiplefront-endnodescosts$674K.

• AWS:pre-leasingup-front144c4.8xlargeinstances(eachwith18-core,60GiBofmemoryperinstance)withtheAmazonLinuxOSinUSWest(Oregon)region,100TBEBSgp2volumes,1m4.16xlargeaslogin-node,costs$1.35M.Pre-leasinginstanceswithotherOS(suchasSLES,CentOS,etc.),no-up-front,orpartial-up-front,orfromadifferentAWSregion,orusingEFSinsteadofEBSwillcostmore.Forexample,pre-leasingno-up-front144c4.8xlargeinstancesintheGovCloudregion,100TBEFS,1m4.16xlargeaslogin-node,costs$1.92M.Notethatdatatransfercostisnotcounted.AdditionaloverheadfromCSSO/EMCCisnotincludedintheestimate.

• POD:reserving144Haswellnodes(eachwith20cores,128GiBofmemory)and100TBstoragecosts$2.39M.Notethatthiscostdoesnottakeintoaccountofpossiblediscountsforgovernmentagencies,high-volumeusage,long-termcontractsordedicatedresourcesthatPODwilloffer.Also,datatransferisfree.

Asdetailedabove,thecostofpre-leasingcommercialcloudresourcesforone-nodejobsisatleastdoublethecostofusingexistingHECCin-houseresources.However,itmaybethatthecommercialcloudpricingwillbecomemorecompetitiveinthefuture.NASAshouldmonitorperformanceandcostsasthemarketchanges.NASAshouldalsoevaluatethecosteffectivenessofrunningitsownsmall-jobclusterin-house.Theper-nodecostofsuchaclusterwouldlikelybelowerthanHECC’stypicalclusterbecausethenodeinterconnectwouldnotneedtosupporttightly-coupledcommunications.

Figure 1: Results of simulations where all one-node jobs in SMD that were submitted in CY 2017 are run

instead on leased cloud resources, demonstrating how the size of the leased resource (in number of nodes) gives rise to a trade off in utilization of the leased resource vs. the average wait time for each job.


Further Work WithagoalofhelpingHECCusethemostcost-effectivewaytomeetNASA’sHPCrequirements,theAPPteamhasidentifiedandbegunworkonthreeactions:

Action 1: Get a better understanding of the potential benefits and costs that might accrue from running a portion of the HECC workload in the cloud. This has two parts:

• Determine the scheduling impact to large jobs of running 1-node jobs in house. • Understand the performance characteristics of jobs that might be run on the cloud.

HECCisalreadyintheprocessofconductingresearchintohowtheone-nodejobsaffectscheduling.HECC’ssystemsteamissimulatinghowschedulingwouldhaveoccurredoverahistoricalperiodofjobsubmissionsifsingle-nodejobsfromSMDwererunoncloudresourcesinsteadofHECCin-housesystems.Theresultofthisstudywillquantifyan“opportunitycost”ofdelayedschedulingofwidejobsinordertorunone-nodejobsin-house.Thatinformationcanbeusedtohelpdecidewhethermovingtheone-nodejobsisworththelikelyincreaseincosttodoso.

Inaddition,HECCwilldevelopbenchmarkstorepresentworkflowsthatmightberunonthecloud.Inthecasewherecloudresourcesareusedtosatisfyasporadicdemand,thebenchmarkswillbeusedperiodicallytoevaluatethecostofjobsinthecloudandonpotentialin-houseresourcestodeterminewhendemandhasreachedthattippingpoint.ThenewbenchmarkswillalsobeusedtogetherwiththeonesfromthisstudytoextendperformanceevaluationstoproviderssuchasGoogleandMicrosoftandtokeepresultscurrentasprovidershavenewofferings.

Ingeneral,suchanevaluationformovingcloudcomputationstoin-houseresourcesshouldtakeintoaccountfactorssuchas:

• Userdemand,currentandpotential,fortheresource• Up-frontcostofacquiringandmaintainingtheresourceinhouseversusthecostof

acquiringtheresourceviaacommercialcloudprovider• NeedtotesttheresourcewithinHECCenvironment,e.g.,totestitsinteractionwith

thein-housefilesystem• Possibilityofrepurposingthehardwareafterthecompletionoftesting

Action 2: Define a comprehensive model that allows accurate comparisons of cost between HECC in-house jobs and jobs running in the cloud. IfHECCisusingthecloudasacost-effectivewaytoprovideaccesstoresourcesthatwouldhavelowutilizationrates,itwillperiodicallyneedtoevaluatewhendemandissufficienttowarrantpurchasinghardwareandbringingthecomputationsonpremises.Whencomparingcloudcoststoprojectedin-housecosts,itwillbeimportanttouseapples-to-applesmetrics.ThisstudycomparesthemostpessimisticcostofrunningonHECCresourcestothemostoptimisticcostofrunningonAWSorPOD.ThefullcostofrunningonAWSneedstoaddchargesforstorage,networkbandwidth,andstaffingsupport.Itmayneedtouseahigherpriceforcomputeresourcesaswellifspotpricingisnotavailable.ThefullcostofrunningonPODmayneedtoaddchargesforstaffingsupportandforstoragebeyondthemodestamountincludedforfree.

AnalternativewayofcomparingcostswouldbetouseamarginalcostforcomputingatHECC.Inthisapproach,theadditionalcostsassociatedwithaddingresourcestoexistingHECCresources(mostlycapitalcostsforcomputeracks)wouldbeused.Then,the


additionalcostwouldbedividedbytheexpectednumberofSBUsthatequipmentwoulddelivertoarriveatthemarginalcostofprovidingadditionalSBUs.

Asanexample,ifHECCaddedoneHPEE-Cellwith288Skylakenodes,theestimatedmarginalcostforthenewSBUsprovidedwouldbeintherangeof$0.09–0.10/SBU,dependingonthecostofadditionalstorageandnetworkhardwarerequired.Thisassumesthatthenewequipmentwouldbeutilizedatarateof80%overathree-yearperiod.Thisrateisabout60%ofthefullcostrateperSBU7.Beforeusingthisapproachtomakefinancialdecisions,thetruemarginalcostsassociatedwithstorage,networking,andsupportneedtobedetermined.

Incomparison,usingaspot-pricingestimateof30%oftheon-demandrate,anequivalentnumberofSkylakenodehoursatAWSwouldcost60–80%morethanatHECC.(Notethatthespot-pricingrateislessthantherateforleasingresourceslongtermfromAWS—seeAppendixI.)UsingPODresourceswouldcostmorethan6timesthemarginalHECCcost,butgovernmentandvolumediscountswouldlowerthatfactorsomewhat.

Inaddition,inordertoaccuratelyestimatecostsofrunninginthecloud,HECCneedstogetabetterunderstandingofstorageandnetworkI/Obandwidthrequirementsastheyinteractwithcloudofferings.HECCalsoneedstounderstandtherequirementsforadeeptapearchiveandthelong-termpreservationofdataandhowthatcouldbeintegratedwiththecloud.

Action 3: Prepare for a broadening of services offered by HECC to include a portion of its workload running on commercial cloud resources. This has two parts:

• Conduct a pilot study, providing some HECC users with cloud access. • Develop an approach where HECC can act as a broker for NASA’s HPC use of cloud

resources, including user support and an integrated accounting model. Thereareanumberofareasforworktointegratecloud-basedresourcesintoHECCproduction:

• WhileNASA’sCSSO/EMCChasexistingagreementswithAWS,HECCshouldevaluateotherpotentialsuppliersofcloudresources—potentiallythroughanRFP.NotethatNASArequiresthatmoneybefunneledthroughCSSO/EMCC,sothatorganizationwouldlikelyneedtobeinvolvedintheprocurement.

• Usingcloud-basedresourcesinproductionwillrequiremodificationstotheNASSecurityPlanunderwhichtheHECCresourcesoperate.NAS/HECCshouldextendtheirmoderateplantocloud-basedresourcesfromthevendorsNASAchoosestoworkwith.Thiswilllikelyinvolvehavingthosevendorsberesponsibleforsomeofthesecuritycontrols.

• HECCneedstoconsiderhowitsuserswouldaccesscloud-basedresources.Whilemanyvendors,especiallyresellers,offerweb-basedjobsubmissionmechanisms,HECCusersareaccustomedtoscript-basedjobsubmissions.HECCneedstoperformrequirementsanalysisonagroupofuserstodetermineifweb-basedsystemsarepreferablebeforeinvestingmoneyinasoftwarelicense.

7TheassumptionofequipmentbeinginproductionforthreeyearsispessimisticwhencomparedtotypicalHECCpractices.Itismorelikelythattheequipmentbeinusefor5+years,therebyloweringthemarginalcostofanSBUsubstantially—probablytosomethingintherangeof$0.05–0.06,dependingoncostsincurredformaintenanceafterthreeyears.


• HECCalsoneedstoconsiderhowtosupportcloudusers,especiallywithsettingupimagesandportingapplications.

InconsideringthemovingofsomeoftheHECCworkloadtothecloud,thepilotstudyispursuingthefollowing:

• StudyingtheworkflowsofcandidateSMDone-nodeapplicationstoidentifyandresolvepotentialissuesformigration.

• Investigatingthefeasibilityofusingcontainertechnology,e.g.Charliecloud,formovingone-nodejobsbetweenHECCandAWS,PODorothercloudofferings.

• TestingthecloudburstingcapabilitywiththePBS18and/orMoab/NODUSjobschedulerasamechanismforHECCuserstosubmitjobstoruninthecloud.

• DesigningmechanismsforaccountingforcloudusagethatintegratecleanlyintoHECCcurrentaccountinguserinterface.

ToactasabrokerforNASA’sHPCuseofclouds,HECC’sapproachneedstoinclude:

• Processesforidentifyingcloud-appropriateworkflows,bothfromcurrentHECCusersandotherswhoapproachHECCaskingforhelp,

• Processes,documentation,andtrainingforportingapplicationstoruneffectivelyinthecloud,and

• Mechanisms—potentiallythroughauserportal—fornon-HECCuserstoincorporatecloudusageaspartoftheirworkflow.

6. Conclusions TheAPPteamusedacombinationoftheNASParallelBenchmarks(NPB)andsixfullapplicationsfromNASA’sworkloadonPleiadesandElectratocompareperformanceofnodesbasedonthreedifferentgenerationsofIntelXeonprocessors—Haswell,Broadwell,andSkylake.Withtheexceptionofanopen-sourceCFDcodestandinginforexport-controlledcodesthatcouldnotberuninthecloud,thefullapplicationsrepresenttypicalworkdoneacrossNASA’sMissionDirectorates.

Inadditiontogatheringperformanceinformation,theAPPteamalsocalculatedcostsfortheruns.InthecaseofworkdoneonPleiadesandElectra,the“fullcost”ofrunningjobswasdetermined.Inthecaseofthecommercialcloudproviders,theteamcalculatedonlythecomputecostofeachrun,usingpublishedratesandincludingpublicly-knowndiscountsasappropriate.Otherinfrastructurecostsofrunninginthecloud—suchasnear-termstorage,deeparchivestorage,networkbandwidth,softwarelicensing,andstaffingcostsforsecurityandsecuritymonitoring,applicationportingandsupport,problemtrackingandresolution,programmanagementandsupport,andsustainingengineering—werenotconsideredinthisstudy.These“fullcloudcosts”arelikelysignificant.

ResultsshowthatlargeapplicationswithtightlycoupledcommunicationsperformworseoncloudresourcesthanonsimilarresourcesatHECC.Inaddition,per-houruseofcloudresourcesismoreexpensivethanthefullcostofusingsimilarresourcesatHECC.Takenincombination,thedataasofMay2018leadstotheconclusionthatcommercialcloudsdonotofferaviable,cost-effectiveapproachforreplacingin-houseHPCresourcesatNASA.

Whileitisnotcosteffectivetousecloudstoreplacein-houseHPCresourcesatNASA,theremaybecircumstanceswhereitmakeseconomicsensetousethemforaugmentingin-houseresources.ThisstudyidentifiedthreeactionsforHECCinanticipationofcloudsbeingusefulforsomeoftheHECCworkload.ThemainthemesoftheactionsareforNASAtobetter


understandtheimpactandcostofrunninginthecloud,defineacomprehensivemodelformoreaccuratecomparisonswithHECCin-houseresources,andtoprepareforitslikelyeventualuseforacertainfractionoftheagency’sHPCworkload.

Acknowledgments ThisstudywasfundedbytheNASAHigh-EndComputingCapability,includingworkundertaskARC013.11.01oncontractNNA07CA29C.

PenguinComputingmadetheircloud-basedresourcesavailabletoourevaluationatnocost;inaddition,theyprovidedaccesstotheirSkylakeprocessorspriortopublicrelease.Theauthorsappreciatetheirinvolvement.

TheauthorswouldliketothankR.Biswas,B.Ciotti,L.Hogle,P.Mehrotra,andW.Thigpenfortheirhelpfulcommentsandsuggestions.


Acronyms AMI AmazonMachineImage

AWS AmazonWebServices

CSSO (NASA)ComputerServicesServiceOrganization

EAR99 AclassificationofitemssubjectingtoExportAdministrationRegulations

EBS (Amazon)ElasticBlockStore

EC2 (Amazon)ElasticComputeCloud

EFS (Amazon)ElasticFileSystem

EMCC (NASA)EnterpriseManagedCloudComputing

HECC (NASA)High-EndComputingCapability

HPC HighPerformanceComputing

HPE HewlettPackardEnterprise

ITAR InternationalTrafficinArmsRegulations

M&S (Physics-Based)Modeling&Simulation

MPI MessagePassingInterface,usedforparallelprogramming

NAS “NASAAdvancedSupercomputingDivision”or“NetworkAttachedStorage”(dependingoncontext)

NFS NetworkFileSystem

PBS PortableBatchSystem

POD PenguinOn-Demand

SBU (HECC)StandardBillingUnit

SMD (NASA)ScienceMissionDirectorate


References 1. U.S.DepartmentofEnergy.TheMagellanReportonCloudComputingforScience.

OfficeofAdvancedScientificComputingResearch(ASCR),December2011.2. P.Mehrotra,R.Hood,J.Chang,S.Cheung,J.Djomehri,S.Heistand,H.Jin,S.Saini,J.

Yan,andR.Biswas.EvaluatingNASA’sNebulaCloudforApplicationsfromHighPerformanceComputing.NASAAdvancedSupercomputing,2011.

3. P.Mehrotra,J.Djomehri,S.Heistand,R.Hood,H.Jin,A.Lazanoff,S.Saini,andR.Biswas,"PerformanceEvaluationofAmazonEC2forNASAHPCApplications,"the21stInternationalACMSymposiumonHigh-PerformanceParallelandDistributedComputing(HPDC'12),Delft,theNetherlands,June18-22,2012.

4. S.Saini,S.Heistand,H.Jin,J.Chang,R.Hood,P.Mehrotra,andR.Biswas,"ApplicationBasedPerformanceEvaluationofNebula,NASA'sCloudComputingPlatform,"the14thInternationalConferenceonHighPerformanceandCommunications(HPCC'12),Liverpool,UnitedKingdom,June25-27,2012.

5. MessagePassingInterfaceStandard,Version3.1,https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf.

6. NASParallelBenchmarks:https://www.nas.nasa.gov/publications/npb.html7. SBUSuite

2011:https://www.hec.nasa.gov/news/features/2011/sbus_042111.html8. StandardBillingUnits,https://www.hec.nasa.gov/user/policies/sbus.html9. AmazonwhitepaperApril2017,“OverviewofAmazonWebServices”,

https://aws.amazon.com/whitepapers/overview-of-amazon-web-services/10. AWScurrentinstancetypes.https://aws.amazon.com/ec2/instance-types/11. AmazonwhitepaperDecember2016,“AWSStorageServicesOverview”,

https://aws.amazon.com/whitepapers/storage-options-aws-cloud/12. AWSSupportPlans,https://aws.amazon.com/premiumsupport/compare-plans/13. PricinginformationforAWSon-demandandreservedinstances,

https://aws.amazon.com/ec2/pricing/reserved-instances/pricing/14. AWSAcceleratedComputing,https://aws.amazon.com/blogs/aws/new-p2-

instance-type-for-amazon-ec2-up-to-16-gpus/15. AWSEBSVolumeTypes:https://aws.amazon.com/ebs/details/#VolumeTypes16. AWSEFSPricing:https://aws.amazon.com/efs/pricing/17. AWSGovCloud(US)DataTransferPricing,https://aws.amazon.com/govcloud-

us/pricing/data-transfer/18. AWSDataTransferCostsandHowtoMinimizeThem,

https://datapath.io/resources/blog/what-are-aws-data-transfer-costs-and-how-to-minimize-them/

19. WhatisAWSDirectConnect?https://docs.aws.amazon.com/directconnect/latest/UserGuide/Welcome.html

20. PenguinComputingOn-Demand(POD)HPCCloud,https://www.penguincomputing.com/hpc-cloud/pod-penguin-computing-on-demand/

21. PODPricing,https://www.penguincomputing.com/hpc-cloud/pricing/22. PrivatecommunicationwithPODstaff,JanandFeb2018.23. Moab/NODUSCloudBurstingSolution,http://www.adaptivecomputing.com24. AWSPublicSectorContractCenterhttps://aws.amazon.com/contract-center/


Appendix I – Amazon Web Services

Resource Overview TheAWSCloud[9]offersmorethan90AWSservicesinfourmaincategories:computingpower,storageoptions,networking,anddatabases.Withineachcategory,therearemultipleoptionstochoosefromformatchingdifferentworkloads.Forexample,inthecomputingpowercategory,AWSEC2providesresourcesininstanceswhereaninstanceisavirtualserver.Looselyspeaking,theallocationofAWSresourcesinunitsofinstancesisequivalenttoHECC’sresourceallocationinunitsofnodes.Thechartfrom[10]belowshowsthecurrentlyavailableAWSEC2instancefamiliesandinstancetypes.Fourinstancefamilies—generalpurpose,compute-optimized,memory-optimized,andstorage-optimized—currentlyuseIntelXeonprocessorsexclusively.Forthecompute-optimizedinstances,thec3,c4,andc5instancesuseIvyBridge,HaswellandSkylakeprocessors,respectively.TheacceleratedcomputingfamilycurrentlyprovidestheNvidiaTeslaprocessors(K80,M60andV100)inadditiontotheIntelXeonprocessorsontheinstance.

Asseeninthefollowingtableobtainedthroughthespot_managerscriptcreatedbyHECCstaff,eachinstancetypeischaracterizedby:

(i) amountofmemory,(ii) numberofcores,(iii) numberofGPUprocessorsincluded,(iv) whetherthereisalocaltemporarystorageincludedanditssize,(v) networkperformanceand(vi) whetherlaunchingtheinstancesinaplacementgroupisallowed.

Therearetwotypesofplacementgroups:“cluster”groupforclusteringinstancesintoalow-latencygroup,and“spread”groupforspreadinginstancesacrossunderlyinghardwaretoreducerisksofsimultaneousfailures.Fornetworkperformance,mostinstancetypesareconfiguredwith10Gbpsorless.Afewinstancetypesareconfiguredwith25-Gbpsinterconnect.

Type Memory Cores GPUs Local Temp Storage Network Perf Enhanced Networking Placement Group | d2.xlarge | 30.5 GiB | 2 cores | | 6000 GiB | Moderate | Yes | Yes | | d2.2xlarge | 61.0 GiB | 4 cores | | 12000 GiB | High | Yes | Yes | | d2.4xlarge | 122.0 GiB | 8 cores | | 24000 GiB | High | Yes | Yes | | d2.8xlarge | 244.0 GiB | 18 cores | | 48000 GiB | 10 Gigabit | Yes | Yes | | r3.large | 15.25 GiB | 1 cores | | 32 GiB | Moderate | Yes | Yes | | r3.xlarge | 30.5 GiB | 2 cores | | 80 GiB | Moderate | Yes | Yes | | r3.2xlarge | 61.0 GiB | 4 cores | | 160 GiB | High | Yes | Yes | | r3.4xlarge | 122.0 GiB | 8 cores | | 320 GiB | High | Yes | Yes | | r3.8xlarge | 244.0 GiB | 16 cores | | 640 GiB | 10 Gigabit | Yes | Yes |


| i3.large | 15.25 GiB | 1 cores | | 475 GiB | Up to 10 Gigabit | Yes | Yes | | i3.xlarge | 30.5 GiB | 2 cores | | 950 GiB | Up to 10 Gigabit | Yes | Yes | | i3.2xlarge | 61.0 GiB | 4 cores | | 1900 GiB | Up to 10 Gigabit | Yes | Yes | | i3.4xlarge | 122.0 GiB | 8 cores | | 3800 GiB | Up to 10 Gigabit | Yes | Yes | | i3.8xlarge | 244.0 GiB | 16 cores | | 7600 GiB | 10 Gigabit | Yes | Yes | | i3.16xlarge | 488.0 GiB | 32 cores | | 15200 GiB | 25 Gigabit | Yes | Yes | | c3.large | 3.75 GiB | 1 cores | | 32 GiB | Moderate | Yes | Yes | | c3.xlarge | 7.5 GiB | 2 cores | | 80 GiB | Moderate | Yes | Yes | | c3.2xlarge | 15.0 GiB | 4 cores | | 160 GiB | High | Yes | Yes | | c3.4xlarge | 30.0 GiB | 8 cores | | 320 GiB | High | Yes | Yes | | c3.8xlarge | 60.0 GiB | 16 cores | | 640 GiB | 10 Gigabit | Yes | Yes | | r4.large | 15.25 GiB | 1 cores | | | Up to 10 Gigabit | Yes | Yes | | r4.xlarge | 30.5 GiB | 2 cores | | | Up to 10 Gigabit | Yes | Yes | | r4.2xlarge | 61.0 GiB | 4 cores | | | Up to 10 Gigabit | Yes | Yes | | r4.4xlarge | 122.0 GiB | 8 cores | | | Up to 10 Gigabit | Yes | Yes | | r4.8xlarge | 244.0 GiB | 16 cores | | | 10 Gigabit | Yes | Yes | | r4.16xlarge | 488.0 GiB | 32 cores | | | 25 Gigabit | Yes | Yes | | p3.2xlarge | 61.0 GiB | 4 cores | 1 | | Up to 10 Gigabit | Yes | Yes | | p3.8xlarge | 244.0 GiB | 16 cores | 4 | | 10 Gigabit | Yes | Yes | | p3.16xlarge | 488.0 GiB | 32 cores | 8 | | 25 Gigabit | Yes | Yes | | x1.16xlarge | 976.0 GiB | 32 cores | | 1920 GiB | 10 Gigabit | Yes | Yes | | x1.32xlarge | 1952.0 GiB | 64 cores | | 3840 GiB | 25 Gigabit | Yes | Yes | | x1e.32xlarge | 3904.0 GiB | 64 cores | | 3840 GiB | 25 Gigabit | Yes | No | | t2.large | 8.0 GiB | 1 cores | | | Low to Moderate | No | No | | t2.xlarge | 16.0 GiB | 2 cores | | | Moderate | No | No | | t2.2xlarge | 32.0 GiB | 4 cores | | | Moderate | No | No | | p2.xlarge | 61.0 GiB | 2 cores | 1 | | High | Yes | Yes | | p2.8xlarge | 488.0 GiB | 16 cores | 8 | | 10 Gigabit | Yes | Yes | | p2.16xlarge | 732.0 GiB | 32 cores | 16 | | 25 Gigabit | Yes | Yes | | g3.4xlarge | 122.0 GiB | 8 cores | 1 | | Up to 10 Gigabit | Yes | Yes | | g3.8xlarge | 244.0 GiB | 16 cores | 2 | | 10 Gigabit | Yes | Yes | | g3.16xlarge | 488.0 GiB | 32 cores | 4 | | 25 Gigabit | Yes | Yes | | m4.large | 8.0 GiB | 1 cores | | | Moderate | Yes | Yes | | m4.xlarge | 16.0 GiB | 2 cores | | | High | Yes | Yes | | m4.2xlarge | 32.0 GiB | 4 cores | | | High | Yes | Yes | | m4.4xlarge | 64.0 GiB | 8 cores | | | High | Yes | Yes | | m4.10xlarge | 160.0 GiB | 20 cores | | | 10 Gigabit | Yes | Yes | | m4.16xlarge | 256.0 GiB | 32 cores | | | 25 Gigabit | Yes | Yes | | c4.large | 3.75 GiB | 1 cores | | | Moderate | Yes | Yes | | c4.xlarge | 7.5 GiB | 2 cores | | | High | Yes | Yes | | c4.2xlarge | 15.0 GiB | 4 cores | | | High | Yes | Yes | | c4.4xlarge | 30.0 GiB | 8 cores | | | High | Yes | Yes | | c4.8xlarge | 60.0 GiB | 18 cores | | | 10 Gigabit | Yes | Yes | | c5.large | 4.0 GiB | 1 cores | | | Up to 10 Gbps | Yes | Yes | | c5.xlarge | 8.0 GiB | 2 cores | | | Up to 10 Gbps | Yes | Yes | | c5.2xlarge | 16.0 GiB | 4 cores | | | Up to 10 Gbps | Yes | Yes | | c5.4xlarge | 32.0 GiB | 8 cores | | | Up to 10 Gpbs | Yes | Yes | | c5.9xlarge | 72.0 GiB | 18 cores | | | 10 Gigabit | Yes | Yes | | c5.18xlarge | 144.0 GiB | 36 cores | | | 25 Gigabit | Yes | Yes |

Similarly,therearemultiplestorageservices[11]tochoosefromasshowninthischart:

Amongtheseservices,AWSEBSvolumescanbeusedforpersistentlocalstorage.DatainEBSvolumescannotbesharedbetweeninstancesunlessexportedasanNFSfilesystemfromtheinstancetheEBSvolumeismountedon.TheEBSbandwidthhaslimitsdependingonthevolumeandinstancesizes.EFSisanNFSfilesystem;thus,thedataisavailabletooneormoreEC2instancesandacrossmultipleAvailabilityZones(AZs)inthesameregion.S3isascalableanddurableobject-basedstoragesystem;datainS3canbemadeaccessiblefromanyinternetlocation.PointintimesnapshotsofEBSandEFSvolumescanbetakenandstoredinS3.AWSGlacierprovideslong-termarchivalstorage.


TheAWSservicesareavailableinmorethan16regionsaroundtheworld,eachwithafewAZs.Eachregioniscompletelyindependent.ThetworegionsclosesttotheHECCfacilityareUSWest(N.California)andUSWest(Oregon).ThereisalsoaregionmarkedasAWSGovCloud(US)whichisdesignedtoaddressstringentU.S.governmentsecurityandcompliancerequirements.AregionhasmultipleAZs.EachAZisisolated,buttheAZsinaregionareconnectedthroughlow-latencylinks.

Pricing UsageonAWSischargedonthreefundamentalcharacteristics:compute,storage,anddatatransferout.ThereisnochargeforinbounddatatransferorfordatatransferbetweenotherAmazonWebServiceswithinthesameregion.Inaddition,AWSoffersfourdifferentcustomsupportplans[12]:basic,developer,business,andenterprise.Thebasicsupportplanisincludedbuthasnoaccesstotechnicalsupportresourcesandbeyond.

Cost of compute instances TherearefourpurchasetypesforAWSEC2instances:

• On-demandinstances:fullprice.Thecustomerpaysforcomputecapacitybythehourwithnolong-termcommitments.Amazonmaychangethepriceonceayear.

• Reservedinstances:discountedratefromfullprice.Instancesarerequiredtobereservedfor1–3years.Maygetvolumediscountsupto10%whenyoureservemore.

• Spotinstances:bidprice.Thecustomerbidsforunusedinstancesandpricesfluctuateaboutevery5minutes.Notethatitispossibleforthespotpricetogoovertheon-demandprice.SpotinstancescanbeinterruptedbyEC2with2minutesofnotificationwhenEC2needsthecapacityback.

• Dedicatedhosts:on-demandpriceforinstancesonphysicalserversdedicatedforyouruse

Prices[13]varywithAWSregions,OS(categorizedasAmazonLinux,RHEL,SLES,Windows,etc.;RHELandSLESaremoreexpensivethanAmazonLinux.),numberofcores,memory,andotherfactors.Usageisbilledonone-secondincrements,withaminimumof60seconds.

Thefollowingthreechartsshowsamplepricingforac4.8xlargeinstancewiththeLinuxOSinthreeregions:(i)GovCloud(US),(ii)USWest(N.California),and(iii)USWest(Oregon).Ineachchart,reservedandon-demandpricingsforstandard1-yeartermandconvertible1-yeartermareshown.Theconvertible1-yeartermallowsforflexibilitytousedifferentinstancefamilies,OS,etc.Asseeninthesecharts,pricingforUSWest(Oregon)regionislowerthantheGovCloud(US)regionwhiletheUSWest(N.California)regionisthemostexpensiveamongthethree.


Pricing for GovCloud (US)

Pricing for US West (N. California)

Pricing for US West (Oregon)


NotethatspotinstancesarenotofferedfortheGovCloud(US).PricingforspotinstancesintheAWSpubliccloudschangesfrequently.Thetwochartsbelowshowsamplepricingforac4.8xlargeinstanceandac5.18xlargeinstanceintheUSWest(Oregon)regionovera1-monthperiod.Whenthedemandislowest,onecangetadeepdiscountusingspotinstancesversuson-demandinstances.Thetrickypartofthegameistopredictwhenthedemandwillbelow.

• Spotpricingforac4.8xlargeinstance

• Spotpricingforac5.18xlargeinstance

Cost of GPU instances CurrentAWSAcceleratedComputinginstances[14]includeIntelXeonE5-2686v4(Broadwell)CPUandvariousGPUs.Pricinginthefollowingchartisonaper-hourbasis:


Cost for storage space Chargeforstorageisbasedontheamountallocated,nottheamountactuallyused.Pricingistiered;itischeaperpergigabytewhenmorestorageisallocated.Ifnolong-termstoragesuchasS3isneeded,theoptionsareEBSandEFS:

EBS Thischart[15]fromAWSshowsthedescription,characteristics,andpricingofvariousEBSvolumetypes.

EFS AWSEFSfunctionsasasharedfilesystem,whichprovidesacommondatasourceforworkloadsandapplicationsrunningonmorethanoneEC2instance.EFSfilesystemscanbemountedonon-premisesserverstomigratedataovertoEFS.Thisenablescloud-burstingscenarios.

WithAWSEFS,acustomerpaysonlyfortheamountoffilesystemstoragetheyusepermonth.Thereisnominimumfeeandtherearenoset-upcharges.Therearenochargesfor


bandwidthorrequests. PricingforEFSstorageinUSWest(Oregon)is$0.30/GB-month[16].Thereisalsoachargeof$0.01/GBforsyncingdatatoEFS.

Note:gettingaccurateusageaccountingperuserisgoingtobeextremelydifficultinasharedenvironment.Thus,thecostofstoragespaceisprobablybesttreatedasanoverheadoftheoperation.

Cost for transferring data Inmostcases,AWSonlychargesfortransferdataoutofanAWSservice.

AWSwebsitesonlylistthedatatransferpricing[17]fortheAWSGovCloud(US)region.ThischartshowsasamplepricingfortransferdatafromAWSS3tointernet:

TherearealsocostsfordatatransfersbetweenEC2instances,betweenAZsinthesameregion,andbetweendifferentAWSregions[18].

TheoutbounddatatransferisaggregatedacrossmultipleAmazonservicessuchasAmazonEC2,AmazonS3,etc.,andthenchargedattheoutbounddatatransferrate.Thischargeappearsonthemonthlystatementas“AWSDataTransferOut”.

Also,outbounddatatransfercostsarereducedwhengoingoveradirectconnectionintoacustomersite.Thereisthecostofhavingthedirectconnect[19]aswellasasmallchargeofalldatagoingoverthedirectconnect.Thebenefitofusingadirectconnectissmallintermsofcostbuthugeintermsofsecurity.Thereisa1-GbpsdirectconnectionbetweenNASA’sCSSO/EMCCandAWS.

Thecostforout-bounddatatransferisgenerallysmallenoughthatitisbetterincludedasoverheadoftheoperation.


Appendix II – Penguin On Demand (POD)

Resource Overview PenguinComputingofferson-demandpubliccloud(i.e.,POD)andprivatecloudresources[20].PODusesbare-metal(non-virtualized),InfiniBand-andOmniPath-basedclustersintwolocations,referredtoasMT1andMT2,whichareaccessiblethroughasinglePODPortal.Eachlocationhasitsownlocalizedstorage.Therearehigh-speedinterconnectstofacilitateeasymigrationofdatafromonelocationtoanother.

MT1locationprovidesIntelWestmere,SandyBridge,HaswellandSandyBridge+NvidiaK40processorswithInfiniBandQDR40-Gbpsinterconnectandhigh-speedNetworkAttachedStoragevolumes.Nostoragequotaissetbydefault.

MT2locationprovidesIntelBroadwell,Skylake(releasedonMarch12,2018),andKNL,allwithIntelOmni-Path100-GbpsinterconnectandaLustrefilesystem.Nostoragequotaissetbydefault.

PODalsoofferscustomizedlargefilesystemssuchasGPFSifneeded.

Differentloginnodesineachlocationcanbecreated.Fourchoicesofloginnodesareavailable.PODalsooffersaScyldCloudWorkstation,whichisaremote,3D-acceleratedvisualizationsolutionforpre-andpost-processingtasksonPOD.

Fortheprivatecloud,thePenguinteamwillinstallandconfiguretheclusterforthecustomerandtakecareofalloperationalandmaintenancetasks.SuchprivatecloudscanbeintheformofaportionofPOD’spubliccloudonamonthlyoryearlybasisorbedesignedandhostedwithaspecificconfigurationtailoredtomeetthecustomer’sneeds.

Pricing PODpricinginformation,exceptfortheirIntelKNL,ispublishedonline[21].

Onefreeloginnodeperlocationisallowedpermanagementaccount(notethattherecouldbemultipleusersunderonemanagementaccount)perPODlocation.Othertypesofloginnodesarechargedperserverhour.

Formostcomputeresources,chargingbasedonper-core-hourisadvertised.UsageoftheNvidiaK40resourcesisbasedonnodehours.


Auser’shomedirectorycomeswith1GBoffreestorage.Storagebeyond1GBischarged$0.10perGB-month.ThereisnochargefordatatransferinandoutofPOD.

[email protected].

PODmayofferdiscountsforgovernmentagencies,volume,longtermcontractanddedicatedresources[22];thusthepublishedpublicpricesof$0.09,$0.10,and$0.11per-core-hourforPODHaswell,Broadwell,andSkylake,respectively,andthe$0.10perGB-monthstoragecostcouldbereduced.


Appendix III – Performance and Cost

Cost Basis SeeSection2forthecostbasisusedforthisstudy.

Performance and Cost of Workloads NPB Scaling Study TheNPBsarecommonlyusedtocheckthescalingperformanceofasystem.RunswiththeBroadwellprocessorsofferedbyHECC(2.4GHz,28-core,128GiB,56Gbps),AWS(2.3GHz,32-core,256GiB,10Gbps)andPOD(2.4GHz,28-core,256GiB,100Gbps)wereperformed.SincemanyoftheNPBbenchmarksrequire2nMPIranksandwouldfitnicelyina32-corenode,usingtheAWS32-coreBroadwellinstancesresultsinneedingfewerAWSinstancesandpossiblyfeweroccurrencesofMPIinter-nodecommunicationthanusingthe28-coreHECCorPODBroadwellnodesforsomeruns.

Toexaminethescalingperformance,runsusingtheeightClassCmicro-benchmarks(BT,CG,EP,FT,IS,LU,MG,andSP)from16-CPUcountupto1,024-CPUcountwereperformedonHECCandAWS.Thetimingofeachrunwasseparatedintotwocomponents–computationandcommunication.Thestudy’sresultsshowthatthescalingperformanceofthecomputationcomponent,asrepresentedbyBT.Conthelowerleftgraph,ofbothHECCandtheAWSBroadwellisquitegood:

Onthecontrary,thescalingperformancesofthecommunicationcomponentofHECCandAWSBroadwell(asshownontherightabove)aredrasticallydifferent.Figure2showsthecommunicationperformanceforallNPBClassCbenchmarksonHECCBroadwellandAWSBroadwell.WiththeonlyexceptionoftheEP“embarrassinglyparallel"benchmark,theAWScommunicationtimesaresubstantiallyhigher.TheseresultsclearlydemonstratethesuperiorityoftheHECCBroadwellwith56-GbpsInfiniBandversustheAWSBroadwellwith25-Gbpsnetwork.Table6showstheClassCresultsintabularform.TheresultsfortheClassDbenchmarksforAWSinTable7showsimilarcharacteristicstotheClassCresults.


Figure 2: NPB Class C. Time spent in communicafon.

0.10

1.00

10.00

100.00

0 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0

TIME

(SEC

)

NCPUS

BT.C

0.10

1.00

10.00

0 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0

TIME

(SEC

)

NCPUS

CG.C

0.10

1.00

10.00

100.00

0 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0

TIME

(SEC

)

NCPUS

FT.C

0.10

1.00

10.00

0 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0

TIME

(SEC

)

NCPUS

IS.C

0.10

1.00

10.00

100.00

0 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0

TIME

(SEC

)

NCPUS

LU.C

0.01

0.10

1.00

0 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0

TIME

(SEC

)

NCPUS

MG.C

0.10

1.00

10.00

100.00

0 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0

TIME

(SEC

)

NCPUS

SP.C

0.00

0.01

0.10

1.00

0 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0

TIME

(SEC

)

NCPUS

EP.C

0 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0

Electra-Bro AWS-m4.16xlarge


OnthePODBroadwellsystem,thestoryissomewhatdifferent.TheClassDresultsinTable7showthatthePODBroadwellsystemwitha100-Gbpsnetworkperformsquitewell.Forsomebenchmarks,suchasFT.DandIS.D,PODperformsbetterthanHECC.Forothers,suchasEP.DandLU.D,thereverseistrue.ItshouldbenotedthatofthetwoPODlocationsavailablefortesting,theMT2siteoffersresourceswitha100-GbpsOmni-Pathinterconnect.TheNPBClassDperformanceonPODMT2Broadwellnodesareclosertoperformanceon

Table 6: Scaling Runs of NPB Class C on Broadwell.

Benchmark NCPUS

# of HECC Broadwell

nodes

# of AWS m4.16xlarge (Broadwell)

instancesHECC time

(sec)AWS time

(sec)bt.C 16 1 1 83.33 90.14bt.C 25 1 1 52.52 61.13bt.C 36 2 2 32.81 41.51bt.C 64 3 2 17.66 22.72bt.C 121 5 4 8.89 19.63bt.C 256 10 8 4.87 10.86bt.C 484 18 16 2.60 25.54bt.C 1024 37 32 1.58 17.14cg.C 16 1 1 20.62 21.73cg.C 32 2 1 7.32 9.71cg.C 64 3 2 4.37 6.87cg.C 128 5 4 2.32 3.72cg.C 256 10 8 1.45 2.71cg.C 512 19 16 1.05 1.74cg.C 1024 37 32 1.08 1.69ep.C 16 1 1 5.54 6.11ep.C 32 2 1 2.77 3.09ep.C 64 3 2 1.38 1.57ep.C 128 5 4 0.69 0.84ep.C 256 10 8 0.38 0.43ep.C 512 19 16 0.22 0.21ep.C 1024 37 32 0.13 0.11ft.C 16 1 1 17.80 16.54ft.C 32 2 1 9.45 11.29ft.C 64 3 2 5.35 9.81ft.C 128 5 4 3.01 19.09ft.C 256 10 8 1.79 12.28ft.C 512 19 16 0.92 4.27ft.C 1024 37 32 0.86 3.06is.C 16 1 1 1.34 1.22is.C 32 2 1 0.72 0.77is.C 64 3 2 0.47 0.96is.C 128 5 4 0.31 1.48is.C 256 10 8 0.17 0.65is.C 512 19 16 0.14 1.02is.C 1024 37 32 0.19 1.93lu.C 16 1 1 50.11 46.82lu.C 32 2 1 27.07 32.66lu.C 64 3 2 14.08 18.60lu.C 128 5 4 8.06 24.44lu.C 256 10 8 3.74 26.19lu.C 512 19 16 1.93 37.26lu.C 1024 37 32 1.20 50.81mg.C 16 1 1 6.75 4.31mg.C 32 2 1 3.20 4.19mg.C 64 3 2 1.49 2.08mg.C 128 5 4 0.79 1.46mg.C 256 10 8 0.45 0.78mg.C 512 19 16 0.22 0.67mg.C 1024 37 32 0.13 0.39sp.C 16 1 1 114.10 71.34sp.C 25 1 1 64.84 62.23sp.C 36 2 2 38.91 48.28sp.C 64 3 2 17.92 23.38sp.C 121 5 4 9.43 30.61sp.C 256 10 8 4.93 47.47sp.C 484 18 16 2.81 48.36sp.C 1024 37 32 2.02 31.18


HECC’sBroadwellnodes,whichhavea4x-FDR(56-Gbps)InfiniBandinterconnect(seeTable7).AnadditionalfactoristhattheHPE-MPTMPIlibraryusedonHECChasbetterscalingthantheIntel-MPIonPOD.

Inadditiontothetimingcomparison,Table7alsoincludesthecostofrunningthebenchmarksonBroadwellnodes.OntheHECCBroadwell,thesumofthefull-blowncostofeachbenchmarkis$2.81.Thisis~4.7xcheaperthanthecompute-onlycostonPODof$13.19.OfthefourAWScostslisted[includingthe(i)USWest(Oregon)on-demand,(ii)USWest(Oregon)spotpricesampledas30%ofon-demandprice,(iii)Govon-demand,and(iv)Govpre-leasingsampledas70%ofGovon-demand],thelowestcostisthecompute-onlyUSWest(Oregon)spotpriceof$16.42,whichisstill5.8xthatoftheHECCfull-blowncostof$2.81.TheotherthreeAWScostsareallevenmoreexpensive.

FurthercomparisonwasperformedbetweentheHECCSkylakewitha100-GbpsandtheAWSSkylakewitha25-Gbpsnetwork.TheperformanceandcostresultsusingselectedNPBclassDbenchmarksaredetailedinTable8.ThelowestAWSSkylaketotalcompute-onlycostof$18.29,basedonasamplespotprice,isabout12xmoreexpensivethanthe$1.50HECCfull-blowncost.Examinedindividually,someofthebenchmarkssuchasCG.DandFT.DperformworseonAWSSkylakethanonAWSBroadwell.ThisindicatesthatthenewAWSSkylakeinstallationmayneedsometuningortherewassomethingwronginthestudy’sruns.

Inconclusion,thisNPBstudyamongHECC,AWSandPODprovidesastrongargumentthattheHECCsystemsnotonlyprovidegoodscalingperformancesbuttheyarealsomorecosteffectivethanexistingAWSandPODofferings.

Table 7: Scaling Runs of NPB Class D on Broadwell.

Benchmark NCPUS

# of HECC or POD

Broadwell nodes

# of AWS m4.16xlarge (Broadwell)

instancesHECC time

(sec)HECC full

cost AWS time

(sec)

AWS Oregon

compute cost

AWS Gov compute

costPOD time

(sec)

POD compute

cost bt.D 256 10 8 104.03 $0.19 176.23 $1.25 $1.58 117.72 $0.92bt.D 1024 37 32 31.34 $0.21 151.07 $4.30 $5.41 50.58 $1.46cg.D 256 10 8 71.12 $0.13 230.79 $1.64 $2.07 56.96 $0.44cg.D 512 19 16 44.98 $0.15 130.84 $1.86 $2.34 33.91 $0.50cg.D 1024 37 32 45.18 $0.30 127.95 $3.64 $4.59 33.93 $0.98cg.D 2048 74 64 30.73 $0.41 140.17 $7.97 $10.05 25.88 $1.49ep.D 256 10 8 5.64 $0.01 6.86 $0.05 $0.06 6.32 $0.05ep.D 512 19 16 2.82 $0.01 5.91 $0.08 $0.11 3.25 $0.05ep.D 1024 37 32 1.43 $0.01 2.88 $0.08 $0.10 3.16 $0.09ep.D 2048 74 64 0.73 $0.01 1.46 $0.08 $0.10 1.11 $0.06ft.D 256 10 8 58.54 $0.11 357.7 $2.54 $3.20 38.52 $0.30ft.D 512 19 16 36.46 $0.09 194.89 $2.77 $3.49 19.98 $0.30is.D 256 10 8 6.67 $0.01 37.02 $0.26 $0.33 3.55 $0.03is.D 512 19 16 4.22 $0.01 21.07 $0.30 $0.38 3.54 $0.05lu.D 256 10 8 68.15 $0.12 135.26 $0.96 $1.21 75.07 $0.58lu.D 512 19 16 40.32 $0.14 170.15 $2.42 $3.05 48.18 $0.71lu.D 1024 37 32 23.46 $0.16 174.14 $4.95 $6.24 42.19 $1.21lu.D 2048 74 64 15.37 $0.20 153.57 $8.74 $11.01 19.70 $1.13mg.D 256 10 8 8.89 $0.02 22.16 $0.16 $0.20 8.93 $0.07mg.D 512 19 16 4.47 $0.02 12.6 $0.18 $0.23 4.29 $0.06mg.D 1024 37 32 2.81 $0.02 10.21 $0.29 $0.37 4.26 $0.12mg.D 2048 74 64 1.42 $0.02 7.1 $0.40 $0.51 2.13 $0.12sp.D 256 10 8 112.29 $0.20 283.02 $2.01 $2.54 117.81 $0.92sp.D 1024 37 32 39.82 $0.26 272.82 $7.76 $9.78 53.73 $1.55Total Cost $2.81 $54.72 $68.94 $13.19Estimated AWS spot cost (30% of on-demand cost) $16.42

$48.26Estimated AWS pre-leasing cost (70% of US-Gov cost)


NTR1/SBU1/SBU2 Workload Performance and Cost Comparison InadditiontoNPB,aperformanceandcostcomparisonwasalsomadeforafewfull-sizedapplications,includingATHENA++,ECCO,Enzo,FVCore,NU-WRF,andOpenFOAM.Table9showstheresultsbetweenHECCandAWSHaswellsystems.

Asshowninthattable,theHECCfull-blowncostwaslowerthantheAWSOregoncompute-onlyon-demandcostforeachoftheapplicationstested.Evenifonewereabletogetthelow30%compute-onlyspotprice,theHECCfull-blowncostisstillcheaper.Usingthesum,thereisacostratioof1.9x($141.77AWScompute-onlycostversus$76.18HECCfull-blowncost).

FindinganAWSfull-blowncostonaper-jobbasisisnotpossible.AnestimateoftheextracostontopofthecomputeinstancescostisdonebasedonthereportedtotalcostfortheDecember2017AWSusage.Ofthe$1,944chargereportedforDecember2017,$136was

Table 9: All Applicafon Benchmark Results for HECC and AWS.

Benchmark Case NCPUS

# of HECC Haswell

nodes

# of AWS c4.8xlarge

(Hasewell) instances

HECC time (sec)

HECC full cost

AWS time (sec)

AWS Oregon compute cost

AWS Gov compute cost

ATHENA++ SBU2 1024 43 57 2268 $14.48 2298 $57.89 $69.68ATHENA++ SBU2 2048 86 114 1177 $15.03 1374 $69.22 $83.32ECCO NTR1 120 5 7 120 $0.09 173 $0.54 $0.64ECCO NTR1 240 10 14 65 $0.10 140 $0.87 $1.04ENZO SBU2 196 9 11 1827 $2.44 2266 $11.02 $13.26FVCore SBU1 1176 49 66 1061 $7.72 1104 $32.20 $38.76nuWRF SBU2 1700 71 95 529 $5.58 1302 $54.66 $65.80OpenFOAM Channel395 48 2 3 4759 $1.41 7646 $10.14 $12.20OpenFOAM Channel395 144 6 8 12547 $11.17 20771 $73.44 $88.39OpenFOAM Channel395 288 12 16 10194 $18.16 23013 $162.73 $195.87Total Cost $76.18 $472.71 $568.96

$141.77$398.27

Estimated AWS spot cost (30% of on-demand cost)Estimated AWS pre-leasing cost (70% of US-gov cost)

Table 8: Scaling Runs of NPB Class D on Skylake.

Benchmark NCPUS

# of HECC Skylake

nodes

# of AWS c5.18xlarge

(Skylake) instances

HECC time (sec)

HECC full cost

AWS time (sec)

AWS Oregon

compute cost

bt.D 256 7 8 79.83 $0.16 146.73 $1.00bt.D 1024 26 29 20.52 $0.15 92.08 $2.27cg.D 256 7 8 34.3 $0.07 614.2 $4.18cg.D 512 13 15 17.24 $0.06 650.58 $8.29cg.D 1024 26 29 13.51 $0.10 719.22 $17.73ep.D 256 7 8 4.82 $0.01 4.93 $0.03ep.D 512 13 15 2.44 $0.01 2.46 $0.03ep.D 1024 26 29 1.24 $0.01 1.19 $0.03ft.D 256 7 8 27.41 $0.05 526.08 $3.58ft.D 512 13 15 14.84 $0.05 349.26 $4.45ft.D 1024 26 29 8.05 $0.06 243.06 $5.99is.D 256 7 8 2.64 $0.01 71.5 $0.49is.D 512 13 15 1.45 $0.01 48.28 $0.62is.D 1024 26 29 0.84 $0.01 44.72 $1.10lu.D 256 7 8 57.58 $0.11 121.6 $0.83lu.D 512 13 15 32.37 $0.12 106.79 $1.36lu.D 1024 26 29 17.17 $0.13 99.3 $2.45mg.D 256 7 8 8.18 $0.02 39.99 $0.27mg.D 512 13 15 3.36 $0.01 15.56 $0.20mg.D 1024 26 29 1.82 $0.01 16.44 $0.41sp.D 256 7 8 101.52 $0.20 211.48 $1.44sp.D 1024 26 29 20.32 $0.15 172.02 $4.24Total Cost $1.50 $60.98Estimated AWS spot cost (30% of on-demand cost) $18.29


spentondiskspaceusage,$738onhavingthefront-endavailableona$4.256perhourbasis,andlessthan$2ondatatransfer.Therest,$1,068,wasspentonusingvariouscomputeinstances.Therefore,thereis~82%(1944/1068=1.82)extracostontopofthecomputeinstancecost.Similarly,fortheJanuary2018AWSusage,ofthe$1,654charge,$137wasspentondiskspaceusage,$665onthefront-end,$12ondatatransfer,and$840oncomputeinstances.Thus,therewasa~97%(1654/840=1.97)extracostontopofthecomputeinstancescost.ThereisalsotheadditionalOSSC/EMCCoverhead,whichhasnotyetbeenreportedandthusnotincludedinthiscalculation.

ForcomparisonbetweenHECC(withHPEMPT)andPOD(withIntelMPI),onlyEnzoandWRFweretested.TheresultsaresummarizedinTable5ofSection3.Thecompute-onlycostsonPODwere5.3xhigherthanthefull-blownHECCcosts.NotethatthestoragecostincurredonPODwereminimalduringthisevaluationperiod.GiventhatthecostofPODloginnodeandfiletransferwilllikelybefreeandtherewillbevolumediscountwithstorage,itisexpectedthattheextracostontopofcomputeinaproductionenvironmentwillbesmallerwithPODthanwithAWS.

Inconclusion,runningtheMPIworkloadsonAWSorPODarenotcosteffectivecomparedtorunningthemontheHECCsystems.

Cost Comparison: Preleasing versus On-Premises for 1-Node Jobs Section4raisedthepossibilitythattherecouldbeacostadvantageinmovingsomeone-nodejobstothecloud.Tocomparetheannualcostofusingin-houseresourcesversuscommercialcloudofferingsforsomeofthe1-nodejobs,aresourceof144nodeswasassumed.

Cost of using HECC in-house resources ForexistingHECCresources,thefull-blowncostisbasedonanSBUrateof$0.16andaSBUfactor.Thefollowingtablesummarizedtheannualcostwith144nodesfor3processortypes:

ProcessorType SBUfactor Annualcost

Haswell 3.34 $674,114

Broadwell 4.04 $815,395

Skylake 6.36 $1,283,641

Cost for pre-leasing 144 AWS compute instances, 1 front-end and 100-TB storage Thec4.8xlargeinstances(with18coresand60GiBofmemoryperinstance)areusedinthecostestimate.Othertypesofinstances,suchasc5.18xlargeorm4.16xlarge,willcostmore.Thecostofdatatransferisnotcounted.

The compute-only cost varies among the three regions: • CostforleasinginstancesfromtheGovCloud(US)region:

1instance:

NoUpfront:$883.30✕12months=$10,599.60

PartialUpfront:$5,055+$421.21✕12months=$10,109.52

AllUpfront:$9,907

With144instancesfor1yearAllUpfront,thecostis$1,426,608.


• CostforleasinginstancesfromtheUSWest(N.California)region:

1instance:



AllUpfront:$11,152


• CostforleasinginstancesfromtheUSWest(Oregon)region:

1instance:



AllUpfront:$8,293


Cost for 1 front-end running Inthetestenvironment,anr4.16xlargeon-demandinstance([email protected],with32coresand488GiBofmemoryand25-Gbpsinterconnect)isusedasafront-end.Suchaninstancewillcost$4.256perhourintheUSWest(Oregon)region.

Tohaveafront-endrunningfor1yearwillcost$4.256✕365✕24=$37,282.56

Cost for storage space Chargeforstorageisbasedontheamountallocated,nottheamountactuallyused.

• AWSEBSgp2volumesareusedinthetestenvironment.Acapacityof100TBusinggp2volumesforaproductionenvironmentwillcost:

Samplechargeforprovisioning100,000GB(100TB)ofEBSgp2volumes

for1month$0.10✕100,000GB=$10,000/month.

for1year$0.10✕100,000GB✕12months=$120,000/year

• IfEFSisusedinsteadofEBS,thecostis:

Samplechargeforstoring100TBofdataonEFS:

$0.30/GB-month✕100,000GB=$30,000/month

$30,000/month✕12months=$360,000/year

Total Cost Theminimumannualcostofthecompute,front-endandstoragecombinedwillbe$1,194,192+$37,282+$120,000=$1,351,474.

Cost for pre-leasing 144 POD compute instances, 1 login-node and 100 TB storage Iftheuseofafreeloginnodeisadequateforthe1-nodejobworkload,thecostwillincludejustthe144computenodesandthestorage.ThePODHaswellnodes(20coresand128GiBofmemorypernode)willbeusedinthecostestimate.Possiblediscounts(government,volume,longtermcontractordedicatedresources)forbothcomputeandstorageresourcesarenotreflectedintheestimate.

• Costforpre-leasing144Haswellnodesfor1year


• 1node:20cores/node✕$0.09core-hour✕24✕365=$15,768

144nodes:$15,768✕144=$2,270,592

• Costforallocating100TBstoragefor1year

$0.10GB-month✕100,000GB✕12=$120,000

Thetotalcostofthecomputeandstoragecombinedwillbe$2,270,592+$120,000=$2,390,592.


Appendix IV – Usability Considerations Thissectiondocumentsthestudy’sexperiences,besidesperformanceandcost,inusingAWSandPOD.

Account Creation AWS–AccountsforHECCstaffwhoparticipatedinthetestingwerecreatedbyanHECCstaffmemberwhohasbeenactingasasupportstaffforCSSO/EMCC’sAWScloudusage.TheSSHpublickeyofeachuserwascopiedfromaPleiadesfront-endnode(PFE)toAWStofacilitatedirectloginfromaPFEtoanAWSfront-endinstance(nasfe01).Intheeventthataccountsformanyusersaretobecreated,theadministratorssupportingNASAAWSshouldbeabletohandleitwithoutuserinvolvement.

POD–ThroughthePODwebportal,onecanrequestanindividualaccountandberesponsibleforthecost.Onecanalsogetanaccountthroughaninvitationbyanexistinguser.Inthatcase,theusagebytheinviteewillbechargedtotheinviter.Afteranaccountiscreated(needanemailaddressandchoosea14to64characterslongpassword),onehastouploadanSSHpublickeyforuseonMT1and/orMT2andchoosethestorageandloginnodepreferencesthroughthewebportal.Intheeventthattherearemanyusersfromasite,PODstaffcantakecareofsendingoutanemailinvitationtotheindividualusersifanExcelspreadsheetwithusers’emailaddressesisprovided.PODstaffcanprovideafreetrainingtousersonhowtousetheirwebportaltocreatepasswordanduploadtheirSSHpublickeys.

Front-end AWS–Thenasfe01front-endsetupforthistestingincursa$4.256/hr.charge.Toreducecost,thereisregularscheduleddowntimefornasfe01.Restartingitcanbedonethroughawebcontrolinterface:https://gpmcepubweb.aws.nasa.gov/usingausernameandpassword.

POD–Eachmanagementaccountcancreateitsownloginnode.OnMT1,abasicandfreeloginnodehas1coreand256MiBofmemory.OnMT2,thefreeloginnodehas1coreand2GiBofmemory.Otherlargerloginnodeswillcostmoneytopreventabuse.Intheeventthatmanyusersunderamanagementaccountneedalargeloginnode,PODmaybeabletoofferitforfree[22].

ConnectiontoeitheranAWSfront-endoraPODloginnodefromauser’sdesktoporaPleiadesfront-endisthroughauthenticationwithuser’sSSHpublickey.Thestudy’sbenchmarkershadanissuewithSSHfromHECCPFEnodestoourMT2free-loginnodewherethesshwouldhang,butnoissuetotheMT1free-loginnode.Itwasresolvedwithinafewdaysby“forcingtheMTUto1500onPOD’suplinkstomakesurethatanythingleavingPOD’snetworkhasamaxMTUof1500”.Similarly,PODstaffresolvedanotherissuewithscpbetweenourPFEtotheMT2loginnodebyadjustingsomenetworksettings.

Filesystem AWS–AWSoffersmanystoragechoices–EBS,EFS,S3,Glacier,etc.forvariousneeds./homeand/nobackupusingEBSvolumesaresetupbyanHECCstaffforthisstudy.StoragecostisbasedonthesizeoffilesystemallocatedforHECC,nottheamountactuallyused.Inproduction,hourlyusageforeachuserwillberecordedandthatcanbeusedtoallocatetheoverallstoragecostsifnecessary.

POD–PODoffersNASstorageonMT1andLustreonMT2.Storageischarged$0.10perGBpermonth.Volumediscountisavailable[22].PODcanalsoprovidecustomizedlargefilesystemsifneeded.


Data Transfer AWS–SSHfromPFEtonasfe01worksfineandthereisnoissuewithdatatransfer.DatatransferfromPFEtoAWSisfreeandiscurrentlyusingCSSO/EMCC’s1-Gbpsdirectconnect.OutboundtransferfromAWSisnotfree.CostforindividualoutboundtransferisnottrackedbyCSSO/EMCC.However,costofoutboundtransfergenerallyissmallenoughandCSSO/EMCCincludesitintheoverheadcost.Intheeventthatthedatatransfercostisnotsmall,itwillneedtobewrittenoffasHECCoverheadorevenlysplitacrossallusers.

POD–WithproperSSHpublickey,onecanscpfromPFEtotheMT1andMT2loginnodes.Thereisnocostfordatatransfer.

Operating System AWS–Inthetestingenvironment,AmazonLinuxwaschosenfortesting.OtheroperatingsystemssuchasCentOS,Ubuntu,Rhel,SLES,WindowsarealsosupportedbyAWSbuttheywillcostmore.

POD–MT1andMT2arefixedwithCentOS6andCentOS7,respectively.

Software Stack AWS–Thetestenvironmentdoesnothaveanysoftwarepackagespre-installed.Anyrequiredsoftwarepackagesfortesting,suchasIntelcompiler,netcdf,hdf4/5,IntelMPI,OpenMPI,MetisandParmetis,wereinstalledbyanHECCstaff.

POD–Morethan250commercialand3rdpartysoftwarepackageshavebeenpre-installedontheMT1andMT2clusters.Therearesmalldifferencesinwhat’savailableonMT1andMT2.PODtechnicalsupportteamwillhelpwithinstallingadditionalsoftwarepackagesuponrequest.

Forcommercialsoftwarepackages(includingIntelcompilersandMPI),eitherHECCortheindividualuserswillhavetoprovidelicensesforuseoneitherAWSorPOD.

Batch Job Management AWS

• Theuseofthespot_managerscript,createdbyanHECCstaff,to(i)checkpricingandwhatresourcesareavailable,(ii)request/useinstances,(iii)terminateinstances,and(iv)checkcostforarun,isatemporarysolutionforstafftestingontheAWScloud.RegularHECCuserswillhavedifficultiesusingtheAWScloudlackingaPBS-likejobscheduler.Thereisnojobidassociatedwitha‘job’.

POD • TheuseofthePBS-likeMoabschedulerandTorqueresourcemanagerprovidesa

familiarenvironmenttosubmit,monitorandterminatebatchjobswiththeqsub,qstat,andqdelcommands.Eachjobhasajobid.POD-providedutilitypodstatusshowswhatresourcesarenotbeingusedandpodstartprovidesanestimateofwhenasubmittedjobwillstartrunning.Belowisasummaryofavailablequeues.TheB30,S30andIntelKNLqueuesareforMT2andtherestareforMT1.

Queue Compute Nodes Cores/Node RAM/Node ------------------------------------------------------- FREE Free 5 minute, 24 core jobs 12 48GB M40 2.9GHz Intel Westmere 12 48GB H30 2.6GHz Intel Sandy Bridge 16 64GB T30 2.6GHz Intel Haswell 20 128GB H30G H30 with two NVIDIA K40 GPUs 2 64GB S30 2.4GHz Intel Skylake 40 384GB B30 2.4GHz Intel Broadwell 28 256GB


IntelKNL 1.3GHz Intel Xeon Phi 256 112GB

• AlthoughPODadvertisedaper-corechargeforallprocessortypesexceptforH30G,ourattempttorequestfewercoresthanmaximumpernodefailedformulti-nodejobs.Forexample,fortheEnzo196testcase,wetriedtorequest14coresoutofthe28-coreBroadwellnodesandgotanerror.ThishasbeenconfirmedwithPODsupportastheintendedbehavior.

#PBS –l nodes=14:ppn=14 qsub error: ERROR: * ERROR: Multinode jobs in B30 must be ppn=28 (full node)

• For1-nodejobs,weverifiedthatrequestingfewercoresthanthemaximuminanodeisallowedandchargingisbasedonthenumberofcoresrequested.

Ease of Getting the Resources AWS–ItisdifficulttogetthenewSkylakeinstanceswithoutpayingtheon-demandprice.

POD–Computeresourcesinmostqueuesarefrequently100%taken.TheHaswellnodesseemtobemorelikelytobeavailable.AccesstotheSkylakenodeswasgivenonMarch5,oneweekbeforethepublicreleasedataofMarch12,2018.

Porting Experience PortingMPIapplicationsfromHECCtoeitherAWSorPODmayexperiencedifficultiesinthreedifferentways:(i)compiling/linkingissueswithlotsofdependenciesofpackages,(ii)failingtorun,and(iii)poorperformanceforunknowncauses.Debuggingeachoftheseissueswillbenon-trivialandputsasignificantburdenontheHECCsupportstaffortheusers.Belowaresomeexamples.

AWS • PortingGEOS-5(oneoftheSBU-2)toAWSwasdifficultduetolotsofdependencies.

TheHPEMPTusedatHECCisnotavailableatAWScloud.Instead,onehastorebuildwithIntel-MPI.ThiswillpreventaseamlessmigrationofMPIapplicationstothecloud.

• EvenforMPIapplicationsthatovercometheportingissueswithIntel-MPIonAWScloud,therewerecaseswheresomeapplications(suchasGEOS-5)wouldfailwithIntelMPIerror.Someapplications(suchasATHENA++)wouldrunwithsomecore-countcaseswhilefailedwithothercore-countcases.Forexample,usingtheAWSc4.8xlargeinstances,the1024-corecaseofATHENA++consistentlyrunsfinewhilethe512-coreand2048-corecaseswouldsometimesfailwithanIntel-MPIerrorinsocksm.c.Ontheotherhand,usingAWSc5.18xlargeinstances,ATHENA++runswith512-corebutfailswith1024and2048-corecounts.

• Forsomeapplicationsthatranproperly,theirperformanceswerenotgreat.Oneexampleiscg.Dandft.DonAWSSkylake.AnotherexampleisWRFonAWSHaswellandBroadwell(notattemptedonSkylake),whichwouldtakeafewdaystocompleteinsteadof1–2hoursinourearlierruns.Afterexperimentingwithmpiexecoptions,wewereabletoreducetheWRFruntimeonAWSSkylakesignificantly,butitstilltookmorethan2xlongerthantherunonHECC’sSkylake.

POD • ThefactthatPOD’ssoftwarestackalreadyincludesmanyfamiliarcommercialand

third-partysoftwarepackageshelpstoreducetheamountofworkneededduringaportingprocess.Forexample,PODhasprovidedready-to-useWRFandOpenFOAMmoduleswithOpenMPI.Someissuesweencounteredare:


• POD-providedOpenFOAMmodulewithOpenMPIdoesnotprovidegoodperformanceforourtestcase.Performanceanalysiswithtoolswillbeneededtounderstandthisbehavior.

• FailureinbuildingWRFwithIntel-MPIourselvesonPODMT2systembutnotonMT1.WiththehelpfromPODtechnicalstaff,thisissuewasresolvedbychangingsomehard-codedcppcommands.

State-of-the-Art Architecture AWS–ThenewSkylakeprocessorswerereleasedinearlyNov,2017.NvidiaTeslaGPGPUprocessorsK80,M60andV100areavailable.

POD–PODlagsbehindAWSinprovidingstate-of-the-artarchitectures.TheSkylakeprocessorswerereleasedforpubliconMarch12,2018.IntelKNLandNvidiaTeslaK40areavailable,butPODcurrentlyhasnotimeframeonofferingnewerNvidiaGPGPUprocessors.

Cloud Bursting Readiness TheAWSandPODtestingsystemsusedforthisstudydonothaveamechanisminplaceyettoallowaseamlessmigrationofjobsfromadatacentertotheirclouds.Thereareafewvendorswhohavesolutionsforcloudbursting.Forexample,AltairPBShascloud-burstingbetatestingwithMicrosoftAzure.AdaptiveComputingadvertisesavailabilityoftheirMoab/NODUScloudburstingsolutiononAWS,Googlecloud,AliCloud,DigitalOceanandothers[23].Whenthistechnologymatures,therewillbemanyissuestobeworkedoutonHECCandthecloudend.Inthemeantime,ajobcanbemigratedmanuallyifitissetupproperlytorununderboththeHECCandthecloudenvironment.Forexample,theNASANEXgrouphasstartedusingDockertobuild“containers”fortheirjobsthatcanberunoneitherHECCorAWS.However,buildingsuchcontainersonasystemrequiresrootprivileges.GrantingrootaccesstogeneralusersonHECCsystemsforbuildingthecontainersisimpossibleduetosecurityconcerns.

Technical Support AWS–CSSO/EMCChasasupportcontractwithAWS.Wedonotknowthecostofthiscontract.

POD–Technicalsupportisavailable6AMto5PMPDT.Requestforhelpisbyemailtopod@penguincomputing.com.Thereisalsoanonlinesupportportal(https://penguincomputing.my.salesforce.com/secur/login_portal.jsp?orgId=00D300000000y1I&portalId=06050000000D4C9)whereonecanopenandtracksupportcases.

Usage/Cost Tracking System AWS–Withthetestenvironment,afterstoppingajob,thespot_managerscriptreturnsthecompute-costassociatedwiththerun.AnHECCstaffalsohasaccesstohistoricaldatawherehecanlookupthecostbasedonthedate/timeandinstancetype.Thereisalsoamonthlycostreportwherethetotalchargeisseparatedintocomputeinstances,front-end,storage,etc.NotethatthemonthlychargedoesnotincludetheCSSO/EMCCoverheadandAWSsupportcontract.

POD–Itswebportalprovides“MyAccountUsage”foreachuser.Therearedownloadableexcelspreadsheetstoshowusagebyjob,usagebyday,usersummary,groupsummary,andprojectsummary.Forusersunderamanagedaccount,bydefault,theper-jobcostisonlyavailabletothepersonwhoownsthemanagedaccount.


Authorization to Operate (ATO) AWS–ThistestingwasdoneonAWSpubliccloudundertheATOofCSSO/EMCC.EMCCalsohasresourcesonAWSGovCloudwhichinprincipalallowsprocessingITAR/EAR99data.However,CSSO/EMCCprojectmanagerdoesnotyetallowanyonetorunITAR/EAR99onAWSexceptfortightlycontrolledsituations.IfHECCweretouseAWSoutsideofCSSO/EMCC,aseparateATObetweenHECCSecurityLeadandNASAHQwouldbeneeded.

POD–Thistestingwasdonewitha“proofofconcept”arrangementwithoutcost.NoATOhasbeenfiledbyCSSO/EMCCorHECCforusingPODforproduction.PODisSSAE(StatementonStandardsforAttestationEngagements)SOC1andSOC2compliant.

Sales Channel AWS–GovernmentagencieshavemanyoptionsforbuyingAWScloudservices,eitherdirectfromAWSorthrougharangeofcontractvehicles[24].NASA’sSolutionsforEnterprise-WideProcurement(SWEP)workswithmultipleAWSresellerstoprovideservicestofederalagencies.FortheCSSO/EMCCAWS,theresellerusedisFourPointsTechnology.

POD–SalesarehandledbyPOD’sdirectorofSalesandFederalAccountingRep.

Evaluating the Suitability of Commercial Clouds for NASA’s ...€¦ · HECC workload on the...

Documents

Transcript of Evaluating the Suitability of Commercial Clouds for NASA’s ...€¦ · HECC workload on the...