Comet –Tales from the Long Tail –Two Years in and 10,000 ...Comet –Tales from the Long Tail...
Transcript of Comet –Tales from the Long Tail –Two Years in and 10,000 ...Comet –Tales from the Long Tail...
-
Comet – Tales from the Long Tail – Two Years in and 10,000 Users Later
Shawn Strandeand the Comet Team
Accelerated Data and Computing ’17Resource Management Session
July 17-18San Jose, CA
-
Presented on behalf of the project team at SDSC and Indiana University
Trevor Cooper HPCSystemsManager
Haisong Cai HPCStorage
Mike Dwyer Documentation
KarenFlammer EOT
GeoffreyFox(IU) Virtualization
JerryGreenberg UserServices
JimHayes HPCSystemsAdministration
TomHutton Networking
ChristopherIrving HPCSystemsAdministration
GregorvonLaszewski(IU)
Virtualization
AmitMajumdar ScientificApplications,ScienceGateways
DmitryMishin HPC SystemsProgrammer
SoniaNayak Financialreporting
MikeNorman Principal Investigator
PhilPapadopoulos Chief Architect,Co-PI
WaynePfeiffer Scientific Applications
SusanRathbun EOT
ScottSakai Security
Jeff Sale EOT
Bob Sinkovits Scientific Applications,Co-PI
ShawnStrande Projectmanager, Co-PI
MahidharTatineni UserServices
Fugang Wang(IU) Virtualization
NancyWilkins-Diehr Science Gateways,Co-PI
NicoleWolter UserServices &Accounting
Kenneth Yoshimoto HPCSystems Programmer
JanZverina Communications
Acknowledgements: NSF OCA: 1341698; 1548562
-
Design and Operations
-
Comet: System Characteristics • Totalpeakflops~2.1PF• Dellprimaryintegrator
• IntelHaswellprocessorsw/AVX2• MellanoxFDRInfiniBand
• 1,944standardcomputenodes(46,656cores)• DualCPUs,each12-core,2.5GHz• 128GBDDR42133MHzDRAM• 2*160GBGBSSDs(localdisk)
• 72 GPUnodes (288GPUs)• 36xwith2 NVIDIAK80cards,eachwithdualKepler3GPUs
• 36xwith4P100cards
• 4large-memorynodes• 1.5TBDDR41866MHzDRAM• FourHaswellprocessors/node• 64cores/node
• Hybridfat-treetopology• FDR(56Gbps)InfiniBand
• Rack-level(72nodes,1,728cores)fullbisectionbandwidth
• 4:1oversubscriptioncross-rack
• PerformanceStorage(Aeon)• 7.6PB,200GB/s;Lustre
• Scratch&PersistentStoragesegments
• DurableStorage(Aeon)• 6PB,100GB/s;Lustre
• Automaticbackupsofcriticaldata
• Homedirectorystorage• Gatewayhostingnodes• Virtualimagerepository• 100GbpsexternalconnectivitytoInternet2&ESNet
-
Average job size across XSEDE systems is below 2000 cores
-
One rack of Comet provides full bisection bandwidth up to 1,728 cores
1,728 cores
-
Trestles and Gordon – A legacy of highly productive HPC systems for the long tail
Trestles and Gordon were both designed to serve a broader community than traditional HPC systems and endure well beyond their NSF-funded life.
-
Comet’s operational polices and software are designed to support long tail users• Allocations
• IndividualPIslimitedto10MSU
• Gatewayscanrequestmorethan10MSUs
• Gatewaysexemptfrom”reconciliation”cuts
• Optimizedforthroughput• Joblimitsaresetatjobsof1,728coresorless(asinglerack)
• Supportforsharednodejobsisaboonforhighthroughputcomputingandutilization
• Comet“TrialAccounts”provide1000SUaccountswithinoneday
• Sciencegatewaysreachlargecommunities• There13gatewaysonComet,reachingthousandsofusersthrougheasytousewebportals
• VirtualClusters(VC)supportwell-formedcommunities• NearnativeIBperformance
• Project-controlledresourcesandsoftwareenvironments
• Requirestheallocationteampossesssystemsadministrationexpertise
-
Smaller job sizes, shared node jobs and gateways lead to very high utilization
-
User Experience
-
When asked “What do you like about Comet?”users commented on the quick turnaround time,
usability, and excellent user support
11
• “Short queue times and large processor/GPU-to-node ratio, in comparison to other HPC resources I have used.”
• “Many standard compute nodes with 12 cores per socket. Good scheduling policy resulting in lesser queue times, large memory, large flash memory enabling faster data read/write.”
• “Smaller jobs run quickly; the system is quite stable.”• “It is easily accessed, it has very thorough documentation from both
SDSC and XSEDE, and it has a diverse set of nodes, and it is a very large cluster.”
• “Low pend time. Low compute time. Relatively large number of cores per node works well for my code.
• “Excellent help desk. Worked with me personally, immediately to solve all my problems.”
-
Trial Accounts, an SDSC innovation to provide rapid access for evaluation of Comet• Lowersbarriertoentryfornewusers/communities.• IntegratedprocesswithXSEDE.• 1,000core-hoursawardedin1day.• UsersrequesttrialaccountbyclickingonlinkontheXSEDEportal
orSDSCCometpage.UsersneedaXSEDEportalaccount(whichtheycanselfcreate)
• DesignatedUserServicesstaffatSDSCserveasPIsonanallocationthatservesasthepoolfortherapiduseraccounts.
• 566 rapiduseraccountscreatedsinceinception236 ofthesehavebeenconvertedtofullallocations.
-
Users have submitted over 7M jobs since Comet entered production. Job success rate is > 99.8%
-
Keeping the Slurm train running on time• Problem:Highjobsubmissionratesof>50Kjobs/dayfromgatewaysandHTC
workloadslikeOSGpushedSlurmpre-rundatabaseaccountingcheckstotimeoutcausingalljobsubmissionstostop
• Solution:Implementmemcached databasetoholdaccountinginformationandalleviateSlurmdatabasecalls
• Result:Cometcansupport>50Kjobs/day
By Diego Delso, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=17764259
slurm
-
Gateways on Comet serve 100x the number of users per SU than traditional users
15
Usertype Users served SUsconsumed
Ratio ofSUsconsumedto#users
Gateway 15,276 30M 500Users/1MSU
Traditional 2,604 428M 6Projects/1M SU
->Gatewayslowerthebarriertomakingproductiveuseofhighperformancecomputing.Noallocationrequestneededforgatewayusers!
-
Expansion factors for small jobs are excellent, but things are getting busy
-
GPU nodes support researcher from over 70 institutions across many fields of science
• 36 nodes with144 K80• 36 nodes with144 P100• Support for dedicated and shared-
node jobs• SDSC is the largest source of GPUs
in XSEDE
P100 nodes use a PCIe switch for high GPU-GPU bandwidth, low latency
-
Science Gateways &Virtual Clusters
-
Science GatewaysRevolutionizing and Democratizing Science
• Research is digital and increasingly complex• Gateways provide broad access to advanced
resources and allow all to tackle today’s challenging science questions
science gateway /sī′ əns gāt′ wā′/ n. 1. an online community space for science and engineering research and education.2. a Web-based resource for accessing data, software, computing services, and
equipment specific to the needs of a science or engineering discipline.
-
SDSC leading on several fronts to adapt to increase in gateway users
• Allocations• Allocationslimitedtodecreasewaittimes• ComettoawardgatewaysXRAC-recommendedCPU
• Nofurtherreductionsin“reconciliation”process• Scheduling
• Morejobsperuserforgatewayallocations• Documentation
• SDSChelpingtodrivegateway-friendlymachinesasanXSEDEresourcetype
• Contributingtocross-XSEDEVMlibrary• LedbyJetstreamandBridgesteams
• ScienceGatewayCommunityInstitute• XDMoD perusergatewayqueries
• TalkingwithXDMoD teamtomovethisupinpriorityforeasiergenerationofgatewayusercounts
20
-
21
Gateway Domain CPU
CIPRES;UCSanDiego Systematicand populationbiology 11,088,219
NeuroscienceGateway;UCSanDiego Neuroscience 3,115,557
SEAgrid;IU Chemistry,engineering 2516016
I-TASSER;UMichigan Biochemistry,molecularstructureandfunction 905929
Ultrascan3;UTHlth Sci Ctr Houston Biophysics 89921
TAS;SUNYBuffalo XDMoD jobs 49522
waterhub;PurdueU Earthsciences 14684
SciGaP;IndianaU Gateway development 1503
ChemCompute;SonomaStateU Chemistry 1216
COSMIC2;UCSanDiego Biochemistry,molecularstructureandfunction 355
dREG;CornellU Veterinarymedicine,geneexpressions 323
UCISocialScienceGateway;UCIrvine Anthropology 278
vdjserver;UTSouthwesternMedCtr MolecularBiosciences 24
Gateways using Comet
-
In two years, Comet has served nearly 18,000 unique users, easily surpassing the project goal of 10,000
Thiswasdriveninlargepartbytwogateways,I-TASSERandCIPRES,butweareseeinggrowthinothergatewaysaswell.
22
Usertype Counts
Traditionallogin 2,604
CIPRES 6,310
I-TASSER 8,015
Allothergateways 951
Totalunique usercounts 17,880
-
In Q4 2016 gateway users are 77% of active XSEDE users
23
This is largely due to the CIPRES and I-TASSER gateways, but others are gaining
XSEDE users
Comet goes productionAllusers
Gateways
• Consistentlyrankedasoneofthebestmethodsforautomatedproteinstructurepredictioninthecommunity-wideCASP(CriticalAssessmentofproteinStructurePrediction)experiments
• Oneofthemostwidelyusedsystemsinthefieldforonline,full-lengthproteinstructureandfunctionprediction
-
Science Gateways Community InstituteDesigned to help the community build gateways more effectively
24
Diverse expertise on demand
Student opportunities & educator resources
Sharing experiences & knowledge as a community
Software & visibility for gatewaysLonger-term, hands-
on support
-
Technology Capability
KVM VirtualMachine Creation
SR-IOV NearnativeIBperformance inVirtualMachines
Rocks SystemsManagment
ZFS+IMG-STORAGE
VirtualMachine DiskImageManagement
VLANs VirtualClusterPrivate ManagementNetworkIsolation
PKEYs VirtualClusterInfiniband NetworkIsolation
NucleusAPI Virtual MachineManagementandProvisioning,StatusandConsoleAccess
CloudmeshClient
Convenient AccessandAutomationSupport
Comet’s Virtual Cluster Feature Provides Near-bare metal performance will full software customization to research teamsOn-ramping via Indiana University team
Recently, Singularity has become popular for its ability to on-ramp users from their campus/lab software environment to Comet
-
Virtual Cluster Case Study: Open Science Grid (OSG)• Supportingsmallscalemulti-core,(future:
largememory&GPU),largeIOjobs,i.e.,functionsasaportal toCometviathevirtualclusterinterfaceinordertoprovidecapabilitiesthatareotherwiserareonOSG.• OneadmintomanageaVCwithmany
nodes,andtosubmitjobsonbehalfofmultipleallocations(LIGO-gravitationalwave,CMS-darkmatter,IceCube,etc.)
• Feedback/JustificationfromOSG• TheVCinterfaceisaverypowerfultool
forexperts.• WasveryeasyforOSGtousetheVC
interface.• adayortwoandwewererunningclosetothe
fulldiversityofscienceontheVConrampenvironment.
• HavingaccesstoavirtualclusteronCometthatcanbecustomizedopensthedoorforawholenewsetofapplicationswewereneverabletosupportbefore.
CMS Detector/LHChttp://cms.web.cern.ch/news/what-cms
IceCube Neutrino Observatoryhttps://icecube.wisc.edu/science/highlights
-
Any XSEDE Allocated Project Can Run in a Comet OSG Virtual Cluster
-
Science Impact
-
Comet has supported over 1,700 allocations and led to 700 publications
380institutions,withincreasingdiversity.E.g.,• Non-profitresearchinstitutes,hospitals,
zoosandmuseums• LaJollaInstituteforAllergyand
Immunology• AmericanMuseumofNaturalHistory• CityofHope• SanDiegoZooInstitutefor
ConservationResearch• BethIsraelDeaconessMedicalCenter• BurnhamInstitute
• FFRDCs,stateandfederalgovernment• CaliforniaDepartmentofWater
Resources• FederalReserveBankofNewYork• FermiNationalAccelerator
Laboratory• NationalInstituteofStandardsand
Technology
-
OSG Virtual Cluster used to help confirm LIGO discovery
-
Phylogenetic tree inference using CIPRES gateway
TreewasgeneratedwithRAxMLon48coresofCometina3-dayrunviaCIPRES.Phylawithreddotsarebaseduponmetagenomicanalyseswithoutisolatedrepresentatives
YouarehereinOpisthokonta,whichincludesanimals&fungi
Science Gateway
-
IceCube Neutrino ObservatoryIceCubefoundthefirstevidenceforastrophysicalneutrinosin2013andisextendingthesearchtolowerenergyneutrinos.Themainchallengeistokeepthebackgroundlowandaprecisesimulationofsignalandbackgroundiscrucialtotheanalysis.
GPUsareidealforthisworkload>100timesfasterthanCPUs.
Simulationofdirectphotonpropagationintheiceisrequiredforproperlyincludingexperimentaleffects
Comet’sGPUnodesareavaluableresourceforIceCubeandintegrationwiththeexperimentworkloadmanagementsystemwasverysmooththankstopreviousOSGworkonComet
GPU
-
Summary Metrics for the Long Tail
Rue Comet
18,000 unique users
Comet QuarterRue Comet
Rue Comet
678 publicationsRue Comet
745 startupsRue Comet
505 research allocationsRue Comet
68 education allocations
Rue Comet
7,000,000 jobsRue Comet
Science GatewaysRue Comet
Virtual Clusters
Rue Comet
380 institutions
-
Design and Resource Management Lessons from the Long Tail
• Comet’sdesignhasproventobeaversatileandhighlyusableresourceforalargecommunityofusers.Importanttopaycarefulattentiontointerconnect,accelerators,SSDs,scheduler,softwarestack.
• Allocationsandoperationspolicies,likecappingallocations,andsupportingsharednodejobs,areimportanttoolsforthelongtailanddrivehighutilization.
• Sciencegatewayallocationshundredsmoreuserspercore-hourthanstandardallocations.
• Comet’sapproachtohighperformancevirtualizationhasbeeneffective,throughon-rampinguserscanbetimeconsuming.TheemergenceofSingularityandrelatedtechnologiesisalsoopeningnewpossibilitiesforbridgingnewcommunities.
• Atremendousaboutofimportantsciencecanbesupportedbyamodestallocation.
-
Acknowledgments
NSF ACI: 1341698; 1548562