Post on 16-Mar-2018
ClusterVision Engineer Innovate Integrate
on HPC Cluster Solutions
100% Focus
Alex Ninaber
Technical ManagerClusterVision
ClusterVision Engineer Innovate Integrate
Overview
About ClusterVision
ClusterVision Technologies:• Remote Administration• Power Saving HPC Chassis• Application Resource Balancing
Other Technologies:• Bright Cluster Manager
ClusterVision Machine Evaluation Workshop 2012
ClusterVision Engineer Innovate Integrate
About ClusterVision
• Specialists in Compute Clusters• End-to-End solutions hardware & software:
advice, design, onsite implementation, support• ClusterVision: 38 employees • Bread & Butter: European tenders• 4 focus regions: United Kingdom, France, Germany, Benelux• Active in whole of EMEA & Asia: customers in Saudi Arabia, India, Qatar,
Norway, Sweden, Finland, Spain, Switzerland, Ireland, Italy, Austria, etc.• ISO9001:2008 & ISO14001 certified• Financially strong & growing• More than 400 projects and 250 customers in 10 years• Direct flights between Amsterdam and United Kingdom: Leeds, London, Bristol,
Durham, Newcastle, Glasgow, Manchester, Liverpool, Edinburgh, Southampton, Birmingham, Belfast, Aberdeen, Cardiff, Exeter, Humberside, Dundee, Isle of Man, Guernsey, Norwich, Kent
• UK: Growth of development, project and (pre)-sales team
ClusterVision Engineer Innovate Integrate
Capability AssessmentBenchmarking
New System Design
Replacement & Upgrades
Assembly
Configuration
POC
Racking
Provisioning
Compatibility
Certification
Project Management
Application Analytics
Support & Maintenance
Remote Administration
Education
Maintenance
ClusterVision: End-to-End HPC Cluster Solutions
GPU/Accelerators
Infiniband
Parallel Filesystems
Oil, Direct, Backdoor Cooling
Application TuningHardware System Design
HPC, BigData & CloudSoftware Development
HPC software
ClusterVision Engineer Innovate Integrate
Integrating Leading Brand ManufacturersBenefit• Trusted brands• High-quality• Proven capability• Compatible components• Reliable & Robust• Active Warranty ..
ClusterVision Engineer Innovate Integrate
ClusterVision Benchmark Applications include ..
Manufacturing/CAEAbaqus/Simulia (Explicit FEA)LS-DYNA (ESI) (Crash/Impact)
MAGMASOFT (Casting)Nastran/Dytran/Marc (Explicit FEA)
PAMCRASH (Crash/Impact)
Fluid Dynamics (CFD)Fluent/CFX (ANSYS)
Star-CCM+ (CDAdapco)OpenFoam/FOAMPro (SGI/Icon)
NUMECASWAN (Wave Modelling)ROMS (Ocean Modelling)
ChemistryCASTEP (Atomic Modelling)
CHARMM (Molecular Mechanics)CPMD (Molecular Dynamics)
GAUSSIAN (Molecular Electronics)GROMACS (Molecular Dynamics)MOLPRO (Quantum Chemistry)NAMD (Biomolecular Modelling)NWChem (Molecular Mechanics)
Quantum ESPRESSO SCM ADF (Electro-Structure)
TURBOMOLE (Quantum Chemistry)VASP (Molecular Dynamics)
Physics & AstronomyFLASH (Astrophysics)
HEP-SPEC (CERN Benchmarking)Monte-Carlo Particle (Quantum
Physics)
MathematicsMATLAB (Mathworks)
MAPLE/MAPLESim (MapleSoft)Mathematica (Symbolic Calculation)
Monte-Carlo Particle (Quantum Physics)
BioInformaticsAmber 11 (BIochemistry)
BLAST (DNA Sequencing)ClustalW (DNA Sequencing)
COPIA (Pattern Detection/DNA)EMBOSS (Molecular Biology)
FASTA (Protein Analysis)HMMR 3 (HMM Protein Sequencing)
MrBayes (Phylogeny/Evolution)PatternHunter (Genome Analysis)PROSPECT (Protein Evaluation)
RasMol (DNA Struture)
Climatology/ExplorationALADIN (European Weather Prediction)
Hadley Centre Model (MetOffice)HIRLAM (European Weather Prediction)
WRF (Weather Prediction)ECLIPSE (Resevoir Simulation)
ROXAR (Oil&Gas Recovery)
ClusterVision Engineer Innovate Integrate
Remote System AdministrationFeatures
Professional service packages• User Accounts & Support• Environment Monitoring & Reporting• Systems & Sub-systems• Tools & Applications• Audit & Security
Default: entry level base package• In combination with Bright Health-Checking• Offer to all large clusters by default• Full end-to-end monitoring: cooling, PDU, UPS,
servers, switches, Infiniband, MCEs, core switches, cables, services: Q-system, Bright, NFS, LMGRD etc.
• Know instantly when something is wrong• Software updates• Remote and onsite repair• Notification and explanation for actions• Management reporting
ClusterVision Engineer Innovate Integrate
RSA Service Packages
ClusterVision Engineer Innovate Integrate
• Project for German market / specific for Max Planck• Open Compute Facebook:• Optimised for HPC: Use repetition to cut cost where possible• Easy to maintain, possible Customer repair• Reusable, 5-8 year life-span, stay COTS where possible• Half-width ATX standards Boards: Intel, ASUS, SMC, Tyan,
Flextronics, MSI etc• COTS Components (fans, power supplies, etc)• Low power, big case fans, keeping air pressure on node failure• Independent fan, power and temperature monitoring, Pi?• Front cables = high room temperature• Water cooled rack door, direct CPU/GPU cooling, oil cooling?• When is the saving worth it?
>10% purchasing, >10% power , >15% next upgrade?
ClusterVision Chassis design for HPC
“Build one of the most efficient computing infrastructures at the lowest possible cost.”
ClusterVision Engineer Innovate Integrate
Water Cooled Doors• UK manufactured by Usystems• 42U/48U, up to 45KW per rack• 14C, 21C and new 24C input temperatures• Almost free-air cooling• Remote monitoring, twice a year onsite checks
ASETEK: Direct CPU/GPU Water cooling• Cooling direct to the CPU/GPU and heat-exchange in chassis• Can be fitted to existing racks• Funded & produced in Denmark• Chassis fans reduced to minimum, 5W CPU/GPU pumps• Full closed enclosure, full self cooling. No CRAC-units
required• Water input ~40C, full free-air cooling 24/7 365 days• COTS: 40.000 units p/m produced, originally desktop product
Green Revolution Cooling: Oil Cooling• No chassis, just blades• No fans• Just Oil pumps• Water input ~40C, full free-air cooling 24/7 365 days
Cooling options for Chassis
ClusterVision Engineer Innovate Integrate
Developing Queuing User Analytics' system based on Slurm.
• Estimate: 30%-50% cluster under utilised resources by bad codes• Define BAD: wrong compiler/mpi/math library usage, faulty multi-core/node scaling, bad
programming, bad cluster configuration• Goal: Automatically non-intrusively profiling all applications• Using information during Application run: CPU Performance counters, Memory usage,
Infiniband, GPU efficiency, I/O usage, power consumption• Generic applications & Applications with known profile
o Existing: Linpack on Intel SandyBridge > 90% efficiento Generic: monitor balances between Infiniband, FPU usage, memory etc
• Inform User at end of the run, inform Administrator of badly running codes• Research engineering project! A lot to learn ….
ClusterVision Application Analytics
AIM: increase overall true cluster utilisation by 20%
ClusterVision Engineer Innovate Integrate
Bright Cluster ManagerAdvanced HPC cluster management made easy
ClusterVision Engineer Innovate Integrate
Feature Bright Rocks(+) PCM xCAT
Integrated Cloud Bursting
Cluster health checking ? ?
Automatic OS Failover
Monitoring & Actions
Tight Node‐Switch integration
vSMP auto‐configuration
CLI to configure all & Command line ?
Yum updates: Compilers, MPI, Bright etc etc.
Bright Cluster Manager
Come to our Stand for Video Demo!
ClusterVision Engineer Innovate Integrate
Personal User LoginWhitepapers Library Troubleshooting help Digitalized site survey Open support calls Code examples Application setup Common Procedures
ClusterVision Service PortalOn-line Community Resource Library
www.clustervision.com
ClusterVision Engineer Innovate Integrate
www.clustervision.com
ClusterVision Engineer Innovate Integrate
Remote System Administration
Benefits
• Efficient use of in-house resources• Cost-effective service solution• Remote access to expertise• Rapid response service• Professional quality• Accountable• Secure & Confidential• Comprehensive coverage• Scalable –Service Packages/Credits• Enhances existing services• Single point of contact …
ClusterVision Engineer Innovate Integrate
Delivering Skills & Resources for Maximised ROI
Capability AssessmentNew & Upscale Design
Detailed SpecificationOn-site Assembly
ConfigurationApplication Tuning
QA & Burn-in TestingHPL Benchmarking
CertificationRemote Administration
User SupportSA & User Training
Maintenance & Repair..
services
ClusterVision Engineer Innovate Integrate
servicesPre-Delivery Professional Services
Capability AssessmentNew & Upscale DesignDetailed SpecificationProof of Concept
Benefit• Impartial, informed review of customer history, requirements & constraints• Benefit from ClusterVision’s knowledge & connections to maximise ROI• Independent selection from best-in-class technology, compatibilty assured• Open access to existing reference installations• Advise/ensure optimised performance of systems and user applications ..
ClusterVision Engineer Innovate Integrate
servicesPoint-of-Delivery Professional Services
Professional ISO Quality AssemblyCable ManagementProvisioning & ConfigurationBurn-in-TestingIndustry Standard CertificationHPL Benchmarking
Benefit• Reduced time/effort for SA’s, specialist skills/resources required• Consistent, quality installation, collaborative customisation to site needs• Quality assurance, diagnostics & consistency in acceptance process• Identifies non-compliance, ensures compatibility, future-proofing• Establishes/documents performance, allows diagnostics & tuning ..
ClusterVision Engineer Innovate Integrate
TU Ilmenau, Germany49 Dell PowerEdge R815 serversAMD Opteron™ 6134 Processors
192 terabytes storage capacityReduced Consumption by 10-15%
“Together, Dell and ClusterVision offered the specialised expertise we needed, and were able to provide a
customised, detailed proposal.” Hennig Schwanbeck, IT Manager of
Datacentre Administration
success
Energy Efficient HPC Cluster at the Technical University Ilmenau, Germany
ClusterVision Engineer Innovate Integrate
38.8 Tflops HPC Cluster at the University of Bordeaux, France
University of Bordeaux528 Intel Xeon X5675 Processors
3168 coresDell PowerEdge C6100 servers
Qlogic QDR Infiniband
“This new cluster is a huge step in the long story of supercomputers in
Bordeaux. We have made a powerful system for the whole scientific
community and lsmall and medium enterprises of Aquitaine.” Jean-Christophe Soetens, Scientific
Management of the MCIA.
success
ClusterVision Engineer Innovate Integrate
Dortmund LIDO Cluster, Driving Research at the Virtual Numerics Laboratory
University of Dortmund420 AMD Opteron Processors
1.3 TBytes memory26 TBytes storage
1 TFLOPS peak performance
“We decided on ClusterVision because of their excellent reputation, and the outstanding price/performance ratio.
ClusterVision also best understood and fulfilled our requirements”, Jorg Gehrke,
Division Leader Server & HPC
success
ClusterVision Engineer Innovate Integrate
Top500 Cluster at Ghent University is the Fastest Academic Facility in Belgium
University of Ghent, Belgium196 IBM Blade Servers
60 TBytes Disk20GB/s Infiniband Network
15.7 TFLOPS peak performance
“ClusterVision have successfully delivered the fastest academic
Supercomputer in Belgium”Danny Schellemans, Director of
ICT Ghent University
success
ClusterVision Engineer Innovate Integrate
Germany's Fastest & Europe’s Most Efficient Commodity Supercomputer
Goethe University Frankfurt530 Dual core AMD Opteron
Myrinet/Ethernet4.2 TFLOPS performance
“The new dual-core installation from ClusterVision is an excellent match to
our requirements” Prof. Stefan Schramm, Director of Centre for
Scientific Computing
success
ClusterVision Engineer Innovate Integrate
Driving Research in Manufacturing Technology at the University of Cambridge
University of Cambridge, UK1152 Intel Xeon Processors
576 Dell PowerEdge Servers60 TByte Storage
QLogic Infiniband Network27 TFLOP performance
“ClusterVision’s role has been key in rapidly turning the Dell-supplied
hardware into a usuable and managable cluster, ready for Top500
benchmarks” Dr.Paul CallejaDirector of High Performance
Computing
success
ClusterVision Engineer Innovate Integrate
“Blue Crystal” Predicting Global Climate Change at the University of Bristol
University of Bristol, UK3360 Intel Xeon cores
420 IBM x3450 servers QLOgic Infiniband network
200 TBytes storage40 TFLOPS peak performance
“ClusterVision ‘s flexibility and efficient deployment allowed us to have a far
more capable service within our budget and schedule” Dr. Ian Stewart
Director Advanced Computing Research Centre
success
ClusterVision Engineer Innovate Integrate
Human Genome Discovery with LEGION at UCL, London’s Global University
University College London, UK2500 Processor Cores
Infiniband Network24 TFLOP performance
“ClusterVision’s professional on-site engineering team completed the
installation of a highly distributed HPC system in a challenging environment”
Clare Gryce, Research Computing Manager UCL
success