HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x...

40
©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice 1 Patrick Van Reeth Lugano March 22 BUILD your competitive advantage with HP Code Migration to CPU-GPU Hybrid : An Economic Approach

Transcript of HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x...

Page 1: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice 1

Patrick Van Reeth

Lugano March 22

BUILD your competitive advantage with HP

Code Migration to CPU-GPU Hybrid : An Economic Approach

Page 2: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

2 Footer Goes Here

•Power

•Cooling

•Density

•Price/perf

Current Computing Challenges

Insatiable demand

for :

more Flops

and

Bigger data sets

GPUs :a supplement

multi-core with more

silicon devoted to

computation

Page 3: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

3 Footer Goes Here ©2009 HP Confidential 3

Migrate a CPU application to a CPU + GPU Hybrid Platform

Page 4: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

4 Footer Goes Here

Application Speed-up key factors

The tool box :

• Multi-core CPUs

• Memory bwth/Latency

• InterConnect

• GPUs

• SW Environments : C & Fortran compilers, CUDA, HMPP, openCL, Allinea, TotalView…

Understand the existing code :

• Track the most

computationally expensive

areas (inner loops…)

• Probe the load-balancing on

servers, CPUs & cores

• Examine the data set splits

• Probe the communications

Be Agnostic first !!!

Page 5: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

5 Footer Goes Here

GPU Success Factors

TRACK THE MINES !!!

• Redesign the code architecture to scale to the right levels : Node, CPU, core,

GPU

• With GPU :

• Embarrassingly parallel is good !

• Minimize memory access versus calculation. Too few calculations per memory read/write is bad.

• use memory hierarchy ! use cache when possible

• Thread/core mapping is very important

• Minimize CPU-GPU transfers

• Partition the data sets to stay in the onboard memory boundary

• Data set split : shift along the correct X, Y, Z axis !

• Hide communications

• Balance CPU (e.g. summations, less flops…) and GPU (e.g. convolutions, transpositions, …) execution time appropriately.

Y

X

O

Page 6: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

6 Footer Goes Here

Determine the right platform • CPU / GPU execution time ratio

will determine the server type,# cores, the # GPU per node

• PCI-E communication between CPU & GPU will determine if can be shared by multiple GPUs

• Single or double precision will determine the GPU type

• Data set size will determine the node type

• $$$

Page 7: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

7 Footer Goes Here

GPU-enabled servers System Mem Max Enabled devices

DL160(se)G6 192GB/288GB FX3800/Q4000, Ext. Quadroplex & Tesla

DL380G7 192GB 2x FX3800 or Q4000

Ext. Quadroplex & Tesla

DL370G6

144GB 3x [FX5800, Q4000, Q5000, Q6000, C1060,

C2050, C2070]

Ext. Quadroplex & Tesla

DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000]

2x [C2050, C2070]

Ext. Quadroplex & Tesla

DL580G7

DL585G7

DL980G7

1TB

512GB

2TB

3x [FX5800, Q4000, Q5000, Q6000]

3x [C1060, C2050, C2070]

Ext. Quadroplex & Tesla

SL390G7 192GB 2U SL390 : 3x [M2050, M2070, M2070Q]

4U SL390 : 8x [M2050, M2070, M2070Q]

WS460cG6

192GB 1 or 2 FX880M,FX2800M, FX3600M

x2 FX3800, FX4800, FX5800, Q5000,

Q6000, C1060

Page 8: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

©2010 Hewlett-Packard Development Company, L.P.

The information contained herein is subject to change without notice

ECONOMIC APPROACH

ON 2 APPLICATIONS

Joint study with

Page 9: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

CAPS entreprise www.caps-entreprise.com

Providing worldwide solutions for application deployment on manycore systems

Tools

Porting

Methodology

Services

• Assessment of GPU speed-up

• Code porting to hybrid architectures

• CPU & GPU code optimization

• Trainings in parallelism & hybrid

computing (OpenMP, MPI, CUDA, OpenCL, HMPP, porting methodo)

• HMPP Workbench Source-to-source hybrid compiler

• GPU Migration Tools suite

Page 10: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

10 Footer Goes Here

• CAP-EX (Capital Expenses):

• System acquisition costs:

• All systems included : Servers, GPUs, Interconnect

• Systems amortized over a 3 years period

• (SW license price not included)

• Migration costs

• in man-month

• Migration effort done once for 10 years

• OP-EX (Operational Expenses):

• Energy costs (system consumption + cooling)

• Maintenance costs (system maintenance, support)

Page 11: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

11 Footer Goes Here

–4-node System (4x SL390)

• CPU: 2 x Intel Xeon X5670 6 cores at 2.93 Ghz

• (option CPU: 2 x Intel Xeon E5506 4 cores at 2.13 Ghz)

• Main Memory: 48 GB per node

• GPUs: up to 3 Tesla C2050 per node

• IB on board

–Base energy power

•System Power measured with Amp-meter during code

execution, rated at 0.07€/kWh

•Cooling Costs = 50% Server Consumption

•(Interconnect is neglected)

Conditions

Page 12: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

12 Footer Goes Here

Applications Used

Application 1

• Field: Monte Carlo simulation for thermal radiation

• MPI code

• Migration cost: 1 man month

Application 2

• Field: astrophysics, hydrodynamic

• MPI code

• Requires 3 GPUs per node for having enough memory space

• Migration cost: 2 man month

Page 13: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

13 Footer Goes Here

Application 1: Power consumption results

Page 14: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

14 Footer Goes Here

Application 2: Power consumption results

Page 15: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

15 Footer Goes Here

• Comparison on an equivalent workload

• CAPEX = System costs + Migration costs

• OPEX = Energy cost + Computer maintenance cost (10% Computer costs)

Configuration Execution

time (s) System Costs

Maintenance

Costs

Energy

Costs

CAPEX

+OPEX

Application 1 Migration cost = 1 man.month

4 nodes 6862 1.87€ 0.19€ 0.37€ 2.43€

4 nodes + 4 GPUs 1744 0.71€ 0.07€ 0,12€ 0.90€

4 nodes + 8 GPUs 1000 0.51€ 0.05€ 0,08€ 0.64€

4 nodes + 12 GPUs 731 0.45€

0.04€ 0,08€ 0.57€

Application 2 Migration cost = 2 man.month

4 nodes 713 0.19€ 0.02€ 0.025€ 0.239€

4 nodes + 12 GPUs 485 0.30€ 0.03€ 0.034€ 0.358€

4 nodes (slow ck)+ 12 GPUs 500 (estim.) 0.24€ 0.02€ 0.034€ 0.302€

CAPEX-OPEX Overview

***0.72 of a 3 year period *no infiniband cost counted ****Use GPU mainly – projection not actual run

Page 16: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

16 Footer Goes Here

Cost per run Application 1 Application 2

- €

0.50 €

1.00 €

1.50 €

2.00 €

2.50 €

3.00 €

no GPU 1 GPU/node 2 GPU/node 3 GPU/node

Migration Costs(4 nodes)

Energy Costs(power + cooling)

Maintenance Costs(10% CC)

System Costs

- €

0.05 €

0.10 €

0.15 €

0.20 €

0.25 €

0.30 €

0.35 €

0.40 €

0.45 €

no GPU 3 GPU/node 3 GPU/node(slow ck)

Page 17: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

17 Footer Goes Here

Conclusions

– Be agnostic first … GPUs can do miracles… CPUs too…

– Hybrid System Architecture is key (server, CPU, # GPU, and

associated connection)

– Do consider TCO (CAP-EX + OP-EX) !!!

– Application 1 : GPU extra-cost is easily amortized

– Application 2 : price/perf improvement not compensated by the

speed-up (1.5) better on next GPU generation ?

– ISV’s Licensing policy can drastically change the rule !

– On large machine, Migration cost can be negligible

Page 18: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

18 Footer Goes Here

Thank you !

Page 19: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

• A directive based multi-language programming environment

o Help keeping software independent from hardware targets

o Provide an incremental tool to exploit GPU in legacy applications

o Avoid exit cost, can be future-proof solution

• HMPP provides

o Code generators from C and Fortran to GPU (CUDA or OpenCL)

o A compiler driver that handles all low level details of GPU compilers

o A runtime to allocate and manage GPU resources

• Source to source compiler

o CPU code does not require compiler change

o Complement existing parallel APIs (OpenMP or MPI)

www.caps-entreprise.com 19 February 2011

What is HMPP? (Hybrid Manycore Parallel Programming)

Page 20: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

©2009 HP Confidential template rev. 12.10.09 20

WHERE ARE GPU’S WORKING?

– Oil and Gas :

• seismic exploration and reservoir modeling

• Enables Reverse Time Migration

– Financial Services (FSI banks) • Option pricing and risk modeling

– Bioscience • Genetic sequencing and chemistry • Molecular dynamics • Drug discovery (protein docking)

– Government • Searching and encryption engines

Page 21: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

©2009 HP Confidential 21

Tokyo Institute of Technology TSUBAME 2.0

Page 22: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

22 ©2009 HP Confidential

TSUBAME 2.0 Overview

• Compute nodes: 2.4PFlops (CPU+GPU)

• SL390s G7 (1408 ) thin nodes, each with 2 Westmere-EP and 3 NVIDIA M2050

− 1347 with 54GB and SSD 60GB x2, 41 with 96GB and SSD 120GB x2

− Suse Linux Enterprise Server or Windows HPC Server

• DL580 G7 Medium (24) and Fat (10) nodes, with 2 NVIDIA S1070

− Medium: 128GB plus SSD 120GB x4

− Fat: 256GB plus SSD 120GB x4

• QDR InfiniBand, full bisection, non-blocking

• Spine: Voltaire Grid Director 4700 12 x 324port

• Edge: Voltaire Grid Director 4036 179 x 36 port and 4036E 6 x 34port/10GbE 2 port

• Storage: 7.13PB

• Lustre file system 5.93PB: DDN SFA 10000 (10/rack, 5 racks) and DL360 G6 (30)

• Home file system: 1.2PB: DDN SFA 10000 (10/rack, 1 racks), BlueArc Mercury 100 (2) and DL360 G6 (4)

• Press release (Japanese):

• http://www.gsic.titech.ac.jp/sites/default/files/pdf/TSUBAME/press.pdf

Page 23: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

Storage system: Total 7.13PB (Lustre+ home)

Interconnect: Full bi-section non-blocking

Lustre file system (DDN SFA10K) 5.93PB

SupreTitenet

Home directory region 1.2PB

SupreSinet3

Tapa System

Existing system

OSS x20 MDS x10

MDS,OSS HP DL360 G6 30nodes Storage DDN SFA10000 x5 ( 10 enclosure x5) Lustre(5File System) OSS: 20 OST: 5.9PB MDS: 10 MDT: 30TB

x5

Voltaire Grid Director 4700 12switches IB QDR: 324port

Core Switch

Edge Switch

Edge Switch(10GbE port付き)

Voltaire Grid Director 4036 179switches IB QDR : 36 port

Voltaire Grid Director 4036E 6 switches IB QDR:34port 10GbE: 2port

12switches

6switches 179switches

Storage Server HP DL380 G6 4nodes BlueArc Mercury 100 x2 Storage DDN SFA10000 x1 (10 enclosure x1)

Management nodes

“THIN” nodes

1408nodes (32node x44 Rack)

SL390s G7 1408nodes CPU Intel Westmere-EP 2.93GHz Turbo boost 3.196GHz) 12Core/node Mem: 54GB (1347 nodes) 96GB (41 nodes) GPU NVIDIA M2050 515GFlops,3GPU/node SSD 60GB x 2 120GB (54GB nodes) 120GB x 2 240GB (96GB nodes) OS: Suse Linux Enterprise Server Windows HPC Server

“Med” nodes

DL580 G7 24nodes CPU Intel Nehalem-EX 2.0GHz 32Core/node Mem:137GB(=128GiB) SSD 120GB x 4 480GB OS: Suse Linux Enterprise Server

“Fat” nodes

DL580 G7 10nodes CPU Intel Nehalem-EX 2.0GHz 32Core/node Mem:274GB(=256GiB) ※8nodes 549GB(=512GiB) ※2nodes SSD 120GB x 4 480GB OS: Suse Linux Enterprise Server

CPU Total: 215.99TFLOPS(Turbo boost 3.196GHz) CPU+GPU: 2391.35TFlops Memory Total:80.55TB SSD Total:173.88TB

CPU Total: 6.14TFLOPS CPU Total: 2.56TFLOPS

・・・・・・

Compute nodes: 2.4PFlops(CPU+GPU)

GSIC:NVIDIA Tesla S1070GPU

PCI –E gen2 x16 x2slot/node

TSUBAME 2.0 System Overview

NFS,CIFS用 x4 NFS,CIFS,iSCSI用 x2

Page 24: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

– GPU offers a massively parallel environment

– GPU offers a high (potential) peak performance

• 515 GFlops DP for FERMI compare to 140 GFlops of dual socket X5670 @ 2.93 GHz

– GPU internal bandwidth is really high

• 86.4 GB/s for FERMI vs 40GB/s on Intel Westmere or 53 GB/s on AMD Magny-Cours

Why using GPU ?

Page 25: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

©2010 Hewlett-Packard Development Company, L.P.

The information contained herein is subject to change without notice

HP PORTFOLIO

Page 26: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

©2009 HP Confidential template rev. 12.10.09 26

[GPU+GRAPHIC]-ENABLED SERVERS System Mem Max Enabled devices **

DL160(se)G6 192GB/288GB FX3800/Q4000, Ext. Quadroplex & Tesla

DL380G7 192GB 2x FX3800 or Q4000

Ext. Quadroplex & Tesla

DL370G6

144GB 3x [FX5800, Q4000, Q5000, Q6000, C1060,

C2050, C2070]

Ext. Quadroplex & Tesla

DL2000 (2x DL170eG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000]

2x [C2050, C2070]

Ext. Quadroplex & Tesla

DL580G7

DL585G7

DL980G7

1TB

512GB

2TB

3x [FX5800, Q4000, Q5000, Q6000]

3x [C1060, C2050, C2070]

Ext. Quadroplex & Tesla

SL390G7 192GB 2U SL390 : 3x [M2050, M2070, M2070Q]

4U SL390 : 8x [M2050, M2070, M2070Q]

WS460cG6

192GB 1 or 2 FX880M,FX2800M, FX3600M

x2 FX3800, FX4800, FX5800, Q5000,

Q6000, C1060

** all combinations may not be qualified yet, please consult HP

Page 27: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

©2009 HP Confidential template rev. 12.10.09 27

GPU-BASED SYSTEM SET-UP HP

Servers

D-HIC

TESLA S1070 & S2050

:

4GPUs (2+2)

HIC

OR

TESLA C1060,

C2050, C2070

TESLA M1060, M2050,

M2070, M2070Q

Page 28: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

©2009 HP Confidential template rev. 12.10.09 28 ©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

HP ProLiant SL390s G7 Server

Page 29: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

PURPOSE BUILT SOLUTIONS FOR SCALE

DL BL SL

Design center Rack Blade enclosure in

rack Rack

Design focus Versatility & value Integrated &

optimized, maximum redundancy

ROI optimized for extreme scale out

Management

Essential and advanced

management HP Insight Dynamics

Advanced management-

accelerated service delivery & change in

minutes

In house developed management tools Basic management

via IPMI/DCMI

Density optimized for

the data center

Shared infrastructure for

accelerated service delivery

Extreme scale

out

datacenters with

lean

management

#1 Energy Efficiency

#1Blade Density

#1 Virtualization

29 HP Confidential - CDA Required

Page 30: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

©2009 HP Confidential template rev. 12.10.09 30

HIGHLY FLEXIBLE S6500 CHASSIS

Multi-node, Shared Power &

Cooling Architecture

Benefits: Low cost, high

efficiency chassis

• 4U Chassis for deployment flexibility

• Standard 19” racks, with front I/O cabling

• Unrestricted airflow (no mid-plane or I/O connectors)

• Reduced weight

• Individually Serviceable Nodes

• Variety of optimized Node Modules

• SL Advanced Power Manager Support

• Dynamic Power Capping

• Power Monitoring

• Node Level Power Off/On • Shared Power & Fans

• Optional Hot-Plug Redundant PSU

• Energy efficient Hot Plug fans

• 3 Phase Load Balancing

• 94% Platinum Common Slot Power Supplies

• N +1 Capable Power Supplies

Page 31: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

©2009 HP Confidential template rev. 12.10.09 31

HP S6000 CHASSIS HOT PLUG FANS & POWER SUPPLIES

Page 32: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

©2009 HP Confidential template rev. 12.10.09 32 © Copyright 2010 HPDC 32

HP ProLiant

SL390s G7 1U half width

HP ProLiant

SL390s G7 2U half width

HP ProLiant

SL390s G7 4U half width

Dense, Fabric heavy HPC

with manageability and SNS

Dense GPU offloaded HPC with

heavy fabric, manageability and

SNS

Maximum GPU offloaded HPC

Ideal Application Ideal Application Ideal Application

HPC compute intensive

Highly managed nodes

Balanced GPU & compute

intensive HPC

Extreme GPU

HP ProLiant SL6500 – 300 Series trays

Ideal environments

HP Confidential - CDA Required

Page 33: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

©2009 HP Confidential template rev. 12.10.09 33 © Copyright 2010 HPDC 33

HP ProLiant SL390s G7 Server Trays 1U Half width 2U half width 4U half width

Processor

Chipset

2P Intel (Westmere)

Intel® 5520 Chipset (Tylersburg rev C2)

Max Memory

12 DDR3 slots, R & U DIMMs

Max memory: 192GB

Storage

HP Smart Array B110i SATA SW Raid [standard]

HP SmartArray SAS controller family [upgrade]

2 LFF SATA / SAS, or

4 SFF SATA / SAS

4 SFF HP SATA / SAS,

and 2 SFF SATA / SAS 8 SFF HP SATA / SAS

Networking

Dual port 1GbE NIC [standard]

10GbE SFP+ (via ConnectX2) [standard]

QDR IB QSFP (via ConnectX2) [upgrade]

Slots/GPU 1 x16 LP PCIe G2

1 x8 LP PCIe G2

3 x16 PCIe G2 for up

to 3 GPUs

1 x8 LP PCIe G2

8 x16 PCIe G2 via PLX

for up to 8 GPUs

Remote Mgmt.

Integrated LightsOut 3 (support for IPMI 2.0 and DCMI 1.0)

(Sideband or dedicated port)

Additional features

High efficiency shared PS, (>94%) @ 50%+ load

Shared fan topology

Density 8 servers in 4U 4 servers in 4U 2 servers in 4U

Form Factor ½ W, 1U, SNS ½ W, 2U, SNS ½ W, 4U, SNS

HP Confidential - CDA Required

Page 34: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

©2009 HP Confidential template rev. 12.10.09 34 © Copyright 2010 HPDC 34

HP PROLIANT SL390S FAST FABRIC COMPUTE TRAY

½ width 1U tall, 2 hot plug nodes in 1U

2 LFF (3.5”) or 4

SFF (2.5”) Quick

Release HDD

Dual 1GbE

Dedicated management iLO3

LAN & 2 USB ports VGA

UID LED & Button

Health LED

Serial (RJ45)

Power

Button QSFP

(QDR IB) SFP+

PCIe Gen2 x8

LP Slot

Cx2 LOM

Dual GbE

Dedicated iLO3 port

HP Confidential - CDA Required

Page 35: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

©2009 HP Confidential template rev. 12.10.09 35 © Copyright 2010 HPDC 35

HP PROLIANT SL390S, 0 TO 3 GPUS ½ WIDTH, 2U TALL (EFFECTIVE DENSITY=1U); 4 TRAYS PER 4U

CHASSIS

4 Hot plug SFF

(2.5”) HDDs

SFP+

1 GPU module in the

rear, lower 1U

2 GPU modules

in upper 1U

Dual 1GbE

Dedicated

management iLO3 LAN

& 2 USB ports

VGA

UID LED & Button

Health LED

Serial (RJ45)

Power

Button QSFP

(QDR IB)

2 Non-hot plug

SFF (2.5”) HDD

PCIe Gen2 x8

LP Slot

Std IOH for 1 GPU

2nd IOH for 2 & 3 GPUs

…dedicated x16 lanes

3 GPUs per 1U w/node

HP Confidential - CDA Required

Page 36: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

©2009 HP Confidential template rev. 12.10.09 36 © Copyright 2010 HPDC 36

SL390s Block Diagram 0 to 3 GPUs

PCI-E Riser

CPU1

CPU2

Tylersburg-

36D

3,4

ESI

Mellanox

Connect-

X

10G & IB

ICH10

GXE10/100 PHY

210

210

SPI

QPI

QPI

QP

I

X8 Gen2

NIC1

Serial (RJ45)2xUSB

ESI

GROM

DIMM2A

DIMM4B

DIMM1D

DIMM3E

DIMM1D

DIMM3E

DIMM5F

DD

R3

DIMM6C

DIMM4B

DIMM6C

DIMM5F

DIMM2A

DD

R3

0

1

1

0

0

1

x16

x16

PC

I-S

LO

T

x1

Tylersburg-

36D789

10

ESI

345

0

1

PC

I-S

LO

T

QP

I-R

ise

r

x16 Gen2

6

5,6

7,8,9,10

x8 Gen2x24 PCIE Riser

Conn

RJ45

RJ45

SFP+

6xSATA

LPCPCIE

PC

I-S

LO

T

x16

x8

PC

I-S

LO

T

x4 QSFP

Video

USBInternal

Micro SD

Intel Kawela

82576

x4 Gen2

P6

SROM

RJ45NIC2

NC-SI

1,2

NVRAMLVAD

PCI

RN50/ES1000

DVI

Optional 2nd

IOH & 2 PCIe

x16 G2 slots

HP Confidential - CDA Required

Page 37: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

©2009 HP Confidential template rev. 12.10.09 37 © Copyright 2010 HPDC 37

HP ProLiant SL390s G7 4U half width tray

8 Hot plug SFF

(2.5”) HDDs

SFP+

4 GPU modules in

the front

4 GPU modules

in the back

Dual 1GbE

Dedicated management

iLO3 LAN & 2 USB ports VGA

UID LED & Button

Health LED

Serial (RJ45)

Power Button QSFP

(QDR IB)

HP Confidential - CDA Required

Page 38: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

©2009 HP Confidential template rev. 12.10.09 38 © Copyright 2010 HPDC 38

22 March 2011

SL390 8 GPU Block Diagram

CPU1

CPU2

Tylersburg-

36D

3,4

ESI

Mellanox

Connect-X

10G or IB

ICH10

GXE

10/100

PHY

210

210

SPI

QPI

QPI

QP

I

X8 Gen2

NIC1

Serial (RJ45)2xUSB

ESI

GROM

DIMM2A

DIMM4B

DIMM1D

DIMM3E

DIMM1D

DIMM3E

DIMM5F

DD

R3

DIMM6C

DIMM4B

DIMM6C

DIMM5F

DIMM2A

DD

R3

0

1

1

0

0

1

x16

x16

x4

Tylersburg-

36D789

10

ESI

345

0

1

QP

I-R

ise

r

x16 Gen2

6

1,2

7,8,9,10

5,6 x8 Gen2

RJ45

RJ45

QSFP

6xSATA

LPC

PCIE

x16

x8PCI-SLOT

x1SFP+

USB

RN50PCI-33

DVI

Video

USBAlcor

micro

SD

PLX

PEX8647

PLX

PEX8664

GPU PCIe-SLOT

GPU PCIe-SLOT

x16

x16

GPU PCIe-SLOT

GPU PCIe-SLOT

x16

x16

GPU PCIe-SLOTx16

PLX

PEX8664

GPU PCIe-SLOT

GPU PCIe-SLOT

x16

x16

GPU PCIe-SLOTx16

PC

I-S

LO

T

Intel

82576 NIC2RJ45

X4 Gen2

SROM

SPI

SPI

HP Confidential - CDA Required

Page 39: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

Cluster Management with integrated GPU support

Easy customizable utility with full GUI and command line interface

• Scalable provisioning

• Configurable monitoring

• Cluster commands

Well adapted for large-scale Linux clusters

Proven and scalable: hundreds of customers including Top500 sites

Enhanced for Nvidia Tesla Modules • Monitoring of GPU health & temp

• Handling of ECC,

•Parallel installation/update of

GPU drivers

•Installation of CUDA tools

•CUDA, GPU-aware job-scheduling

•Etc.

Page 40: HP Enterprise Business Template Angle Light 4:3 Purple Short · DL2000G6 (2x DL170hG6) 2x192GB 2x [FX5800, Q4000, Q5000, Q6000] ... Density optimized for the data center Shared infrastructure

Thank you