NVMe-based BeeGFSas a next-generation scratch filesystem ...€¦ · Maia X-Ray Imaging •...
Transcript of NVMe-based BeeGFSas a next-generation scratch filesystem ...€¦ · Maia X-Ray Imaging •...
NVMe-basedBeeGFS asanext-generationscratchfilesystemforHighPerformanceComputingandArtificialIntelligence/MachineLearningworkloadsGregLehman,IgorZupanovic,JacobAnders,ReneTyhouse,GarrySwan,JosephAntony
INFORMATIONMANAGEMENT&TECHNOLOGY(IMT)
13March2019
Whoarewe?
NVMebasedBeeGFS scratchforHPCandAI/ML2 |
AtCSIROwedotheextraordinaryeveryday.AsAustralia’snationalscienceagency,weinnovatefortomorrowwhiledeliveringimpacttoday– forourcustomers,allAustraliansandtheworld.
3 | NVMe-basedBeeGFS scratchforHPCandAI/ML
CSIROinnovations
NVMe-basedBeeGFS scratchforHPCandAI/ML4 |
EXTENDEDWEAR
CONTACTS
POLYMERBANKNOTES
RELENZAFLUTREATMENT
WiFiWLAN
AEROGARD
TOTALWELLBEING
DIET
RAFTPOLYMERISATION
BARLEYmax™
SELFTWISTINGYARN
SOFTLYWASHINGLIQUID
HENDRAVACCINE
NOVACQ™PRAWNFEED
CSIROIM&TScientificComputing
IM&TScientificComputing
NVMe-basedBeeGFS scratchforHPCandAI/ML6 |
~100talentedstaff
80+collaborativeeResearch
projectsevery6months
Workingwithover2600+
customers
1500m²datacentrefloorspace
acrossAustralia
~2Petaflopsaggregate
performance
~40PBprimarydataholdings
2000published
collectionsindata.csiro.au
~5MillionCPUhourspermonth
EnablingScience Impact.CSIROandScientificComputing7 |
InformationServices
WorkflowServices
Outreach&eResearchPlanning
ResearchDataServices
ScientificComputing&Visualisation
AdvancedCollaboration
IMTeResearch SupportsEnd-to-EndScience
ResearchQuestion
ResearchDesign
DataCollection
Processing& Analysis
Data&WorkflowArchiving
Disseminate&Publish
DataRe-Use
MeasureImpact
IMTeEnablementServices:HighSpeedNetworks,ApplicationDevelopment,Tele-presence,OfficeProductivity,CollaborationTools
eResearch services supporting the science workflow(Diagram based on Bath University’s ‘Research360 InstitutionalResearch Lifecycle’)
ScientificComputing
EnablingScience Impact.CSIROandScientificComputing8 |
PlatformsGroup
ServicesGroup
ScienceApplications
Visualisation UserServices
Systems Data Facilities
DataProcessing
NationalCollaborations
Systems
EnablingScience Impact.CSIROandScientificComputing9 |
TheSystemsteammanages:
• Pearcey – Generalpurposecluster.Upgradedto230Haswellnodes,4480cores,FDRInfiniband
• Ruby– SGIUV3000NUMASystemhosting8TBand640coresfromasingleoperatingsystem
• Bragg– 384NvidiaKeplerGPU’sandXeonPhienabledsystem;128nodes.Top500System~1MCUDAcores
• HTCondor –Cycleharvestingserviceacross~4400desktops(360CPUyearsofcomputeinthelastyear)
Systemsservicesare:
• Usedby>2600CSIROscientists&affiliateso ~4millionCPUhoursofHPCjobspermontho ~1millionCPUhoursofHTCondor jobspermonth
• AnessentialcontributiontoCSIRO’sscienceandresearchportfolio
The CSIRO ‘Bragg’ and ‘Pearcey’ supercomputers
ScientificUse-CasesDrivingStorage
GPU-basedTomographicReconstruction
NVMe-basedBeeGFS scratchforHPCandAI/ML11 |
3D CT Reconstruction of breast tumourImaging and Medical Beamline, Australian Synchrotron
3DCTReconstructionofanexcisedhumanbreastcontainingatumour(inred).
ImagedattheImagingandMedicalBeamline(IMBL)at
theAustralianSynchrotron
Simulationsof5GWirelessandBeyond
12 |
Evaluationoflargescalenetworkend-pointsfrom4G,5Gwirelessnetworks
andbeyond
NVMe-basedBeeGFS scratchforHPCandAI/ML
3DVegetationMappingandAnalysis
NVMe-basedBeeGFS scratchforHPCandAI/ML13 |
Generatingvegetationcovermapsin3DfromdataacquiredviaaZebedeehandheldlaserscanner
MaiaX-RayImaging• Synchrotronx-rayfluorescence(SXRF)imagingisa
powerfultechniqueusedinthebiological,geological,materialsandenvironmentalsciences,medicineandculturalheritage
• Digitalimagesofmicroscopicornanoscopic detailarebuilt,pixelbypixel,byscanningthesamplethroughthebeam
• Theresultingx-rayfluorescenceradiationischaracteristicofthechemicalelementsinthatpixel.Thisisusedtoquantifythechemicalcompositionofthesample,includingimportanttraceelements,andtobuildupelementimagesofthesample
• CSIROworkedwiththeBrookhavenNationalLaboratory(BNL)todeveloptheMaiax-raymicroprobedetectorsystem.
• ThesystemcombinesBNL'scustomdetectorarraysandapplication-specificintegratedcircuits,withourhigh-speeddatacapturehardwareandreal-timespectralanalysisalgorithms
• ReconstructionalgorithmsrunonHPCresourcesandneedfaststorage
NVMe-basedBeeGFSscratchforHPCandAI/ML14 |
Maia RGB image collected at the Australian Synchrotron of a clay sample from the Mt Gibson gold deposit in Western Australia (green = iron, blue = bromine, red = arsenic).
CapableStorageUnderpinsNextGenerationAppliedIndustrialScienceApplications
StorageDrivers• ThechallengefacedbytheIM&TScientificComputingTeamwasto•SimultaneouslyoptimizeforhighIOPSandhighbandwidthworkloads
•Needstobeextremelypowerandrackefficient•Needstobeparallel,POSIXcompliantfilesystem•AbilitytosupportHPCandAI/MLworkloads
•WeendedupchoosinganNVMe basedsystemdrivenbytheBeeGFS filesystem
StorageDrivers• ThechallengefacedbytheIM&TScientificComputingTeamwasto•SimultaneouslyoptimizeforhighIOPSandhighbandwidthworkloads
•Needtobeextremelypowerandrackefficient•Needstobeparallel,POSIXcompliantfilesystem•AbilitytosupportHPCandAI/MLworkloads
•WeendedupchoosinganNVMe basedsystemdrivenbytheBeeGFS filesystem
- CurrentNetworkingTopology
- MetadataServiceBuildingBlocks
- StorageServiceBuildingBlocks
HardwareBuildingBlocks
SwitchCentricViewofComputeandStorageClustersattheCDCFacility
NVMe-basedBeeGFS scratchforHPCandAI/ML19 |
SwitchCentricViewofComputeandStorageClustersattheCDCFacility
NVMe-basedBeeGFS scratchforHPCandAI/ML20 |
MellanoxCS7520216portEDR
SwitchCentricViewofComputeandStorageClustersattheCDCFacility
NVMe-basedBeeGFS scratchforHPCandAI/ML21 |
PearcyCPUCluster430Nodes;150TFlops
SwitchCentricViewofComputeandStorageClustersattheCDCFacility
NVMe-basedBeeGFS scratchforHPCandAI/ML22 |
BracewellGPUCluster113Nodes;NvidiaP100s;1.5PFlops
SwitchCentricViewofComputeandStorageClustersattheCDCFacility
NVMe-basedBeeGFS scratchforHPCandAI/ML23 |
BowenStorage40PB
SwitchCentricViewofComputeandStorageClustersattheCDCFacility
NVMe-basedBeeGFS scratchforHPCandAI/ML24 |
BeeGFS2PBNVMe
MetadataServiceBuildingBlocks1/2
• 4Metadata servers• DellEMC R440• DualIntel6154– 3.0GHz12core,384GB
• DualConnectX-5EDR
NVMe-basedBeeGFS scratchforHPCandAI/ML25 |
MetadataServiceBuildingBlocks2/2
• 4Metadata servers• DellEMC R440• DualIntel6154– 3.0GHz12core,384GB
• DualConnectX-5EDR
• IntelP4600• 24x1.6TBIntelP4600NVMe• 3DNANDTLC• RandomReads~5.6million IOPS• RandomWrites~1.8million IOPS• ActivePower
– 14.2Watts(Write);9Watts(Read)• IdlePower– <5Watts
NVMe-basedBeeGFS scratchforHPCandAI/ML26 |
StorageServiceBuildingBlocks1/2
• 32Storage servers• DellEMC R740xd• DualIntel6148– 2.4GHz20core,192GB
• DualConnectX-5EDR
NVMe-basedBeeGFS scratchforHPCandAI/ML27 |
StorageServiceBuildingBlocks2/2
• 32Storage servers• DellEMC R740xd• DualIntel6148– 2.4GHz20core,192GB
• DualConnectX-5EDR
• IntelP4600• 24x3.2TBIntelP4600NVMe• 3DNANDTLC• RandomReads~6.4million IOPS• RandomWrites~2.3million IOPS• ActivePower
– 21Watts(Write);10Watts(Read)• IdlePower– <5Watts
NVMe-basedBeeGFS scratchforHPCandAI/ML28 |
IO500Benchmark
NVMe-basedBeeGFSscratchforHPCandAI/ML29 |
https://www.vi4io.org/_media/17-benchmarking-ws-io500.pdf
https://www.vi4io.org/_media/17-benchmarking-ws-io500.pdf
IO50010NodeChallenge– ZFSbackend
NVMe-basedBeeGFSscratchforHPCandAI/ML32 |
10 Clients; 16 Threads
SC’18 Results
Summary• Capablestoragebuildingblocksareneededfordrivingnextgenerationappliedindustrialscientificapplications
• CSIROhasinvestedina2PBNVMe solutionwhichmetperformanceandpowercriteria
• ThePOSIXcompliant,BeeGFS parallelfilesystemwillberolledouttousersinQ1,2019
NVMe-basedBeeGFS scratchforHPCandAI/ML33 |
CSIRO.Weimagine.Wecollaborate.Weinnovate.
NVMe-basedBeeGFS scratchforHPCandAI/ML34 |
INFORMATIONMANAGEMENT&TECHNOLOGY(IMT)
Thankyou