Comet –Tales from the Long Tail –Two Years in and 10,000 ...Comet –Tales from the Long Tail...

35
Comet – Tales from the Long Tail – Two Years in and 10,000 Users Later Shawn Strande and the Comet Team Accelerated Data and Computing ’17 Resource Management Session July 17-18 San Jose, CA

Transcript of Comet –Tales from the Long Tail –Two Years in and 10,000 ...Comet –Tales from the Long Tail...

  • Comet – Tales from the Long Tail – Two Years in and 10,000 Users Later

    Shawn Strandeand the Comet Team

    Accelerated Data and Computing ’17Resource Management Session

    July 17-18San Jose, CA

  • Presented on behalf of the project team at SDSC and Indiana University

    Trevor Cooper HPCSystemsManager

    Haisong Cai HPCStorage

    Mike Dwyer Documentation

    KarenFlammer EOT

    GeoffreyFox(IU) Virtualization

    JerryGreenberg UserServices

    JimHayes HPCSystemsAdministration

    TomHutton Networking

    ChristopherIrving HPCSystemsAdministration

    GregorvonLaszewski(IU)

    Virtualization

    AmitMajumdar ScientificApplications,ScienceGateways

    DmitryMishin HPC SystemsProgrammer

    SoniaNayak Financialreporting

    MikeNorman Principal Investigator

    PhilPapadopoulos Chief Architect,Co-PI

    WaynePfeiffer Scientific Applications

    SusanRathbun EOT

    ScottSakai Security

    Jeff Sale EOT

    Bob Sinkovits Scientific Applications,Co-PI

    ShawnStrande Projectmanager, Co-PI

    MahidharTatineni UserServices

    Fugang Wang(IU) Virtualization

    NancyWilkins-Diehr Science Gateways,Co-PI

    NicoleWolter UserServices &Accounting

    Kenneth Yoshimoto HPCSystems Programmer

    JanZverina Communications

    Acknowledgements: NSF OCA: 1341698; 1548562

  • Design and Operations

  • Comet: System Characteristics • Totalpeakflops~2.1PF• Dellprimaryintegrator

    • IntelHaswellprocessorsw/AVX2• MellanoxFDRInfiniBand

    • 1,944standardcomputenodes(46,656cores)• DualCPUs,each12-core,2.5GHz• 128GBDDR42133MHzDRAM• 2*160GBGBSSDs(localdisk)

    • 72 GPUnodes (288GPUs)• 36xwith2 NVIDIAK80cards,eachwithdualKepler3GPUs

    • 36xwith4P100cards

    • 4large-memorynodes• 1.5TBDDR41866MHzDRAM• FourHaswellprocessors/node• 64cores/node

    • Hybridfat-treetopology• FDR(56Gbps)InfiniBand

    • Rack-level(72nodes,1,728cores)fullbisectionbandwidth

    • 4:1oversubscriptioncross-rack

    • PerformanceStorage(Aeon)• 7.6PB,200GB/s;Lustre

    • Scratch&PersistentStoragesegments

    • DurableStorage(Aeon)• 6PB,100GB/s;Lustre

    • Automaticbackupsofcriticaldata

    • Homedirectorystorage• Gatewayhostingnodes• Virtualimagerepository• 100GbpsexternalconnectivitytoInternet2&ESNet

  • Average job size across XSEDE systems is below 2000 cores

  • One rack of Comet provides full bisection bandwidth up to 1,728 cores

    1,728 cores

  • Trestles and Gordon – A legacy of highly productive HPC systems for the long tail

    Trestles and Gordon were both designed to serve a broader community than traditional HPC systems and endure well beyond their NSF-funded life.

  • Comet’s operational polices and software are designed to support long tail users• Allocations

    • IndividualPIslimitedto10MSU

    • Gatewayscanrequestmorethan10MSUs

    • Gatewaysexemptfrom”reconciliation”cuts

    • Optimizedforthroughput• Joblimitsaresetatjobsof1,728coresorless(asinglerack)

    • Supportforsharednodejobsisaboonforhighthroughputcomputingandutilization

    • Comet“TrialAccounts”provide1000SUaccountswithinoneday

    • Sciencegatewaysreachlargecommunities• There13gatewaysonComet,reachingthousandsofusersthrougheasytousewebportals

    • VirtualClusters(VC)supportwell-formedcommunities• NearnativeIBperformance

    • Project-controlledresourcesandsoftwareenvironments

    • Requirestheallocationteampossesssystemsadministrationexpertise

  • Smaller job sizes, shared node jobs and gateways lead to very high utilization

  • User Experience

  • When asked “What do you like about Comet?”users commented on the quick turnaround time,

    usability, and excellent user support

    11

    • “Short queue times and large processor/GPU-to-node ratio, in comparison to other HPC resources I have used.”

    • “Many standard compute nodes with 12 cores per socket. Good scheduling policy resulting in lesser queue times, large memory, large flash memory enabling faster data read/write.”

    • “Smaller jobs run quickly; the system is quite stable.”• “It is easily accessed, it has very thorough documentation from both

    SDSC and XSEDE, and it has a diverse set of nodes, and it is a very large cluster.”

    • “Low pend time. Low compute time. Relatively large number of cores per node works well for my code.

    • “Excellent help desk. Worked with me personally, immediately to solve all my problems.”

  • Trial Accounts, an SDSC innovation to provide rapid access for evaluation of Comet• Lowersbarriertoentryfornewusers/communities.• IntegratedprocesswithXSEDE.• 1,000core-hoursawardedin1day.• UsersrequesttrialaccountbyclickingonlinkontheXSEDEportal

    orSDSCCometpage.UsersneedaXSEDEportalaccount(whichtheycanselfcreate)

    • DesignatedUserServicesstaffatSDSCserveasPIsonanallocationthatservesasthepoolfortherapiduseraccounts.

    • 566 rapiduseraccountscreatedsinceinception236 ofthesehavebeenconvertedtofullallocations.

  • Users have submitted over 7M jobs since Comet entered production. Job success rate is > 99.8%

  • Keeping the Slurm train running on time• Problem:Highjobsubmissionratesof>50Kjobs/dayfromgatewaysandHTC

    workloadslikeOSGpushedSlurmpre-rundatabaseaccountingcheckstotimeoutcausingalljobsubmissionstostop

    • Solution:Implementmemcached databasetoholdaccountinginformationandalleviateSlurmdatabasecalls

    • Result:Cometcansupport>50Kjobs/day

    By Diego Delso, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=17764259

    slurm

  • Gateways on Comet serve 100x the number of users per SU than traditional users

    15

    Usertype Users served SUsconsumed

    Ratio ofSUsconsumedto#users

    Gateway 15,276 30M 500Users/1MSU

    Traditional 2,604 428M 6Projects/1M SU

    ->Gatewayslowerthebarriertomakingproductiveuseofhighperformancecomputing.Noallocationrequestneededforgatewayusers!

  • Expansion factors for small jobs are excellent, but things are getting busy

  • GPU nodes support researcher from over 70 institutions across many fields of science

    • 36 nodes with144 K80• 36 nodes with144 P100• Support for dedicated and shared-

    node jobs• SDSC is the largest source of GPUs

    in XSEDE

    P100 nodes use a PCIe switch for high GPU-GPU bandwidth, low latency

  • Science Gateways &Virtual Clusters

  • Science GatewaysRevolutionizing and Democratizing Science

    • Research is digital and increasingly complex• Gateways provide broad access to advanced

    resources and allow all to tackle today’s challenging science questions

    science gateway /sī′ əns gāt′ wā′/ n. 1. an online community space for science and engineering research and education.2. a Web-based resource for accessing data, software, computing services, and

    equipment specific to the needs of a science or engineering discipline.

  • SDSC leading on several fronts to adapt to increase in gateway users

    • Allocations• Allocationslimitedtodecreasewaittimes• ComettoawardgatewaysXRAC-recommendedCPU

    • Nofurtherreductionsin“reconciliation”process• Scheduling

    • Morejobsperuserforgatewayallocations• Documentation

    • SDSChelpingtodrivegateway-friendlymachinesasanXSEDEresourcetype

    • Contributingtocross-XSEDEVMlibrary• LedbyJetstreamandBridgesteams

    • ScienceGatewayCommunityInstitute• XDMoD perusergatewayqueries

    • TalkingwithXDMoD teamtomovethisupinpriorityforeasiergenerationofgatewayusercounts

    20

  • 21

    Gateway Domain CPU

    CIPRES;UCSanDiego Systematicand populationbiology 11,088,219

    NeuroscienceGateway;UCSanDiego Neuroscience 3,115,557

    SEAgrid;IU Chemistry,engineering 2516016

    I-TASSER;UMichigan Biochemistry,molecularstructureandfunction 905929

    Ultrascan3;UTHlth Sci Ctr Houston Biophysics 89921

    TAS;SUNYBuffalo XDMoD jobs 49522

    waterhub;PurdueU Earthsciences 14684

    SciGaP;IndianaU Gateway development 1503

    ChemCompute;SonomaStateU Chemistry 1216

    COSMIC2;UCSanDiego Biochemistry,molecularstructureandfunction 355

    dREG;CornellU Veterinarymedicine,geneexpressions 323

    UCISocialScienceGateway;UCIrvine Anthropology 278

    vdjserver;UTSouthwesternMedCtr MolecularBiosciences 24

    Gateways using Comet

  • In two years, Comet has served nearly 18,000 unique users, easily surpassing the project goal of 10,000

    Thiswasdriveninlargepartbytwogateways,I-TASSERandCIPRES,butweareseeinggrowthinothergatewaysaswell.

    22

    Usertype Counts

    Traditionallogin 2,604

    CIPRES 6,310

    I-TASSER 8,015

    Allothergateways 951

    Totalunique usercounts 17,880

  • In Q4 2016 gateway users are 77% of active XSEDE users

    23

    This is largely due to the CIPRES and I-TASSER gateways, but others are gaining

    XSEDE users

    Comet goes productionAllusers

    Gateways

    • Consistentlyrankedasoneofthebestmethodsforautomatedproteinstructurepredictioninthecommunity-wideCASP(CriticalAssessmentofproteinStructurePrediction)experiments

    • Oneofthemostwidelyusedsystemsinthefieldforonline,full-lengthproteinstructureandfunctionprediction

  • Science Gateways Community InstituteDesigned to help the community build gateways more effectively

    24

    Diverse expertise on demand

    Student opportunities & educator resources

    Sharing experiences & knowledge as a community

    Software & visibility for gatewaysLonger-term, hands-

    on support

  • Technology Capability

    KVM VirtualMachine Creation

    SR-IOV NearnativeIBperformance inVirtualMachines

    Rocks SystemsManagment

    ZFS+IMG-STORAGE

    VirtualMachine DiskImageManagement

    VLANs VirtualClusterPrivate ManagementNetworkIsolation

    PKEYs VirtualClusterInfiniband NetworkIsolation

    NucleusAPI Virtual MachineManagementandProvisioning,StatusandConsoleAccess

    CloudmeshClient

    Convenient AccessandAutomationSupport

    Comet’s Virtual Cluster Feature Provides Near-bare metal performance will full software customization to research teamsOn-ramping via Indiana University team

    Recently, Singularity has become popular for its ability to on-ramp users from their campus/lab software environment to Comet

  • Virtual Cluster Case Study: Open Science Grid (OSG)• Supportingsmallscalemulti-core,(future:

    largememory&GPU),largeIOjobs,i.e.,functionsasaportal toCometviathevirtualclusterinterfaceinordertoprovidecapabilitiesthatareotherwiserareonOSG.• OneadmintomanageaVCwithmany

    nodes,andtosubmitjobsonbehalfofmultipleallocations(LIGO-gravitationalwave,CMS-darkmatter,IceCube,etc.)

    • Feedback/JustificationfromOSG• TheVCinterfaceisaverypowerfultool

    forexperts.• WasveryeasyforOSGtousetheVC

    interface.• adayortwoandwewererunningclosetothe

    fulldiversityofscienceontheVConrampenvironment.

    • HavingaccesstoavirtualclusteronCometthatcanbecustomizedopensthedoorforawholenewsetofapplicationswewereneverabletosupportbefore.

    CMS Detector/LHChttp://cms.web.cern.ch/news/what-cms

    IceCube Neutrino Observatoryhttps://icecube.wisc.edu/science/highlights

  • Any XSEDE Allocated Project Can Run in a Comet OSG Virtual Cluster

  • Science Impact

  • Comet has supported over 1,700 allocations and led to 700 publications

    380institutions,withincreasingdiversity.E.g.,• Non-profitresearchinstitutes,hospitals,

    zoosandmuseums• LaJollaInstituteforAllergyand

    Immunology• AmericanMuseumofNaturalHistory• CityofHope• SanDiegoZooInstitutefor

    ConservationResearch• BethIsraelDeaconessMedicalCenter• BurnhamInstitute

    • FFRDCs,stateandfederalgovernment• CaliforniaDepartmentofWater

    Resources• FederalReserveBankofNewYork• FermiNationalAccelerator

    Laboratory• NationalInstituteofStandardsand

    Technology

  • OSG Virtual Cluster used to help confirm LIGO discovery

  • Phylogenetic tree inference using CIPRES gateway

    TreewasgeneratedwithRAxMLon48coresofCometina3-dayrunviaCIPRES.Phylawithreddotsarebaseduponmetagenomicanalyseswithoutisolatedrepresentatives

    YouarehereinOpisthokonta,whichincludesanimals&fungi

    Science Gateway

  • IceCube Neutrino ObservatoryIceCubefoundthefirstevidenceforastrophysicalneutrinosin2013andisextendingthesearchtolowerenergyneutrinos.Themainchallengeistokeepthebackgroundlowandaprecisesimulationofsignalandbackgroundiscrucialtotheanalysis.

    GPUsareidealforthisworkload>100timesfasterthanCPUs.

    Simulationofdirectphotonpropagationintheiceisrequiredforproperlyincludingexperimentaleffects

    Comet’sGPUnodesareavaluableresourceforIceCubeandintegrationwiththeexperimentworkloadmanagementsystemwasverysmooththankstopreviousOSGworkonComet

    GPU

  • Summary Metrics for the Long Tail

    Rue Comet

    18,000 unique users

    Comet QuarterRue Comet

    Rue Comet

    678 publicationsRue Comet

    745 startupsRue Comet

    505 research allocationsRue Comet

    68 education allocations

    Rue Comet

    7,000,000 jobsRue Comet

    Science GatewaysRue Comet

    Virtual Clusters

    Rue Comet

    380 institutions

  • Design and Resource Management Lessons from the Long Tail

    • Comet’sdesignhasproventobeaversatileandhighlyusableresourceforalargecommunityofusers.Importanttopaycarefulattentiontointerconnect,accelerators,SSDs,scheduler,softwarestack.

    • Allocationsandoperationspolicies,likecappingallocations,andsupportingsharednodejobs,areimportanttoolsforthelongtailanddrivehighutilization.

    • Sciencegatewayallocationshundredsmoreuserspercore-hourthanstandardallocations.

    • Comet’sapproachtohighperformancevirtualizationhasbeeneffective,throughon-rampinguserscanbetimeconsuming.TheemergenceofSingularityandrelatedtechnologiesisalsoopeningnewpossibilitiesforbridgingnewcommunities.

    • Atremendousaboutofimportantsciencecanbesupportedbyamodestallocation.

  • Acknowledgments

    NSF ACI: 1341698; 1548562