…updates…

21
Powering high-end x86 systems Aggregate. Scale. Simplify. Save. …updates… 06/28/20 22 1

description

…updates…. New Server Virtualization Paradigm. HIGH PERFORMANCE COMPUTING Applications requiring superset of the physical server resources. Existing: Partitioning. New: Aggregation. ENTERPRISE APPLICATIONS Applications requiring fraction of the physical server resources. - PowerPoint PPT Presentation

Transcript of …updates…

Page 1: …updates…

Powering high-end x86 systems

Aggregate. Scale. Simplify. Save.

…updates…

04/22/20231

Page 2: …updates…

Aggregate. Scale. Simplify. Save. Confidential and Proprietary

04/22/2023 2

New Server Virtualization Paradigm

Existing: Partitioning

ENTERPRISE APPLICATIONSApplications requiring fractionof the physical server resources

New: Aggregation

Hypervisor or VMM

Virtual Machines

AppOS

AppOS

AppOS

Virtual Machine

AppOS

Hypervisor or VMM

Hypervisor or VMM

Hypervisor or VMM

Hypervisor or VMM

HIGH PERFORMANCE COMPUTINGApplications requiring superset

of the physical server resources

Page 3: …updates…

Aggregate. Scale. Simplify. Save. Confidential and Proprietary

04/22/2023 3

Existing HPC Deployment Models

Scale-OutScale-Up

Applications requiring supersetof the physical server resources

Break the problem to fit the hardwareFit the hardware to the problem size

Page 4: …updates…

Aggregate. Scale. Simplify. Save. Confidential and Proprietary

04/22/2023 4

Existing HPC Deployment Models

Scale-OutScale-UpPROS AND CONS

Break the problem to fit the hardwareFit the hardware to the problem size

• Simplified IT infrastructure • Simple and flexible programming• Single system to manage• Consolidated I/O

• Proprietary hardware design• High cost• Architecture lock-in

• High installation & management cost• Complex parallel programming• Multiple operating systems• Cluster file systems, etc.

• Leverages industry standard servers• Low cost• Open architecture

-

+

+

-

Page 5: …updates…

Aggregate. Scale. Simplify. Save. Confidential and Proprietary

04/22/2023 5

Existing HPC Deployment Models

Scale-OutScale-UpPROS AND CONS

• Leverages industry standard servers• Low cost• Open architecture

+

Virtual Machine

AppOS

Hypervisor or VMM

Hypervisor or VMM

Hypervisor or VMM

Hypervisor or VMM

Aggregation • Simplified IT infrastructure • Simple and flexible programming• Single system to manage• Consolidated I/O

+

Page 6: …updates…

Aggregate. Scale. Simplify. Save. Confidential and Proprietary

04/22/2023 6

vSMP Foundation – BackgroundTHE NEED FOR AGGREGATION - TYPICAL USE CASES

Virtual Machine

AppOS

Hypervisor or VMM

Hypervisor or VMM

Hypervisor or VMM

Hypervisor or VMM

vSMP Foundation

Capabilities:Up to 16 nodes:• 32 processors (128 cores)• 4 TB RAMMore at: http://www.scalemp.com/spec

Cluster Management• Requirements driven by IT to simplify

cluster deployment:• Single OS• InfiniBand complexity removal• Simplified I/O: faster scratch storage• Large memory is a plus

• OPEX savings

SMP Replacement• Requirements driven by the end

users per application characteristics:• Large memory • High core-count• IT simplification is a plus

• CAPEX savings

Page 7: …updates…

Aggregate. Scale. Simplify. Save. Confidential and Proprietary

04/22/2023 7

Why Aggregate?

Fit the hardware to the problem size• Alternative to costly and proprietary RISC systems• Large memory x86 resource

– Enable larger workloads that cannot be run otherwise• High core-count x86 shared-memory resource with high memory

bandwidth– Allow threaded applications to benefit from shared-memory systems– Reduced development time of custom code using OpenMP (vs. MPI)

OVERCOMING LIMITATIONS OF EXISTING DEPLOYMENT MODELS

AppOS

$$$$$

AppOS

$$$

Page 8: …updates…

Aggregate. Scale. Simplify. Save. Confidential and Proprietary

04/22/2023 8

Why Aggregate?

Break the problem to fit the hardware• Ease of use: one system to manage: fewer, larger nodes means less

cluster management overhead– Single Operating System– Avoid cluster file systems– Hide InfiniBand complexities

• Shared I/O– Single process can utilize I/O bandwidth of multiple systems

OVERCOMING LIMITATIONS OF EXISTING DEPLOYMENT MODELS

AppOS

$$$$$$$$

AppOS

AppOS

AppOS

AppOS

AppOS

Page 9: …updates…

Aggregate. Scale. Simplify. Save. Confidential and Proprietary

Simplified Cluster - Example

04/22/2023 9

Page 10: …updates…

Aggregate. Scale. Simplify. Save. Confidential and Proprietary

Customers and Partners

1004/22/2023

Com

mer

cial

Fede

ral

Supp

orte

dPl

atfor

ms

Educ

ation

al

Page 11: …updates…

Aggregate. Scale. Simplify. Save. Confidential and Proprietary

04/22/2023 11

Target Environments and Applications

Target Environments• Users seeking to simplify

cluster complexities

• Applications that use large memory footprint (even with one processor)

• Applications that need multiple processors and shared memory

ManufacturingCSM (Computational Structural Mechanics)ABAQUS/ExplicitABAQUS/StandardANSYS MechanicalLSTC LS-DYNAALTAIR Radioss

CFD (Computational Fluid Dynamics)FLUENTANSYS CFXSTAR-CDAVL FIRETgrid

OtherinTrace OpenRT

Life SciencesGaussianVASPAMBERSchrödinger JaguarSchrödinger GlideNAMDDOCKGAMESSGOLDmpiBLASTGROMACSMOLPROOpenEye FREDOpenEye OMEGASCM ADFHMMER

EnergySchlumberger ECLIPSEParadigm GeoDepth3DGEO 3DPSDMNorsar 3D

OthersThe MathWorks MATLABROctaveWolfram MATHEMATICAISC STAR-P

Typical end-user applications

EDAMentorCadenceSynopsys

FinanceWombatKX

Page 12: …updates…

Aggregate. Scale. Simplify. Save. Confidential and Proprietary

04/22/2023 12

vSMP Foundation 2.0

Support for Intel® Nehalem Processor Family– First Nehalem solution with more than 2 processors– Up to 3 times better performance compared to

Harpertown systems– Optimized performance with intra-board memory

placement and QDR InfiniBand

High-availability with dual-rail InfiniBand– 2 InfiniBand switches (dual-rail) in an active-active configuration – Automatic failover on link errors (cable) or switch failure– Improved performance with switch load-balancing (both switches used in

parallel)

Partitioning– Hardware-level isolated partitions, each can run

different OS– Up to 8 partitions, minimum 2 servers per partition– Requires add-on license

Emulex LightPulse® Fibre-Channel HBA Support

Server A Server B Server C

InfiniBand Switch 2InfiniBand Switch 1

Automatic failover and load-balancing

Single Partition

Multiple Partitions

Page 13: …updates…

Aggregate. Scale. Simplify. Save. Confidential and Proprietary

vSMP Foundation 2.0COMPLETE SYSTEM VIEW - NOW AVAILABLE FOR ACADEMIC INSTITUTES !

04/22/2023 13

Before

After

Page 14: …updates…

Aggregate. Scale. Simplify. Save. Confidential and Proprietary

GAUSSIAN

04/22/2023 14

Some Performance Data

Page 15: …updates…

Aggregate. Scale. Simplify. Save. Confidential and Proprietary

Some Performance Data

04/22/2023 15

GAUSSIAN

Page 16: …updates…

Aggregate. Scale. Simplify. Save. Confidential and Proprietary

HW Characteristics:1333MHz - 32 x Intel XEON E5345 QC (Clovertown), 2.33GHz, 2x4MB L2, 1333MHz; 900/960GB (vSMP Foundation 1.7) (Source: ScaleMP)1600MHz - 32 x Intel XEON E5472 QC (Harpertown), 3.00GHz, 2x6MB L2, 1600MHz; 249/288GB (vSMP Foundation 1.7) (Source: ScaleMP)QPI 6.4GT/s - 4 x Intel XEON X5570 QC (Nehalem), 2.93GHz, 8MB L3, QPI 6.4; 9/16GB (vSMP Foundation 1.7) (Source: ScaleMP)

8 16 32 64 1280

30,000

60,000

90,000

120,000

150,000

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

6,157 12,018

23,981

44,111

73,218

9,714

19,145

38,197

75,634

142,280

33,279

66,536

131,462

100%

98% 97%

90%

74%

99% 98% 97%

92%

100%99%

1333MHz FSB (128 cores / 16 boards) 1600MHz FSB (128 cores / 16 boards) QPI 6.4GT/s (32 cores / 4 boards) 1333MHz - Efficiency 1600MHz - EfficiencyQPI - Efficiency

Cores

Band

widt

h (M

B/se

c.)

Effic

ienc

y (c

ompa

red

to 8

cor

es /

1 bo

ards

)

STREAM (OMP) - MB/SEC. (HIGHER IS BETTER)

vSMP Foundation Performance

Page 17: …updates…

Aggregate. Scale. Simplify. Save. Confidential and Proprietary

HW Characteristics:vSMP Foundation™ (QC-8 core): 2 x Intel XEON 5345 QC (Clovertown), 2.33GHz, 2x4MB L2; 908/960GB (vSMP Foundation 1.7) (Source: ScaleMP)vSMP Foundation™ (QC-128 core): 32 x Intel XEON 5345 QC (Clovertown), 2.33GHz, 2x4MB L2; 908/960GB (vSMP Foundation 1.7) (Source: ScaleMP)

164.gzip 175.vpr 176.gcc 181.mcf 186.crafty 197.parser 252.eon 253.perlbmk 254.gap 255.vortex 256.bzip2 300.twolf SPECint_rate2000

0

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

4,500

5,000

0%

20%

40%

60%

80%

100%

120%11

0

78 147

32

242

84

291

197

94 213

113 176

128

1,56

0

922

1,80

3

334

3,73

3

1,00

8

4,32

8

2,33

2

1,00

7

2,94

4

1,49

8

2,28

2

1,62

3

89%

74% 77%

65%

96%

75%

93%

74%

67%

86%83% 81% 79%

vSMP Foundation (8 cores / 1 boards) vSMP Foundation (128 core / 16 boards) Efficiency (128 core to 8 cores)

Benchmark

Rate

vSMP Foundation PerformanceHigher is Bett

er

SPECint_rate_base2000 - RATE (HIGHER IS BETTER)

Page 18: …updates…

Aggregate. Scale. Simplify. Save. Confidential and Proprietary

HW Characteristics:QPI 6.4GT/s - 4 x Intel XEON X5570 QC (Nehalem), 2.93GHz, 8MB L3, QPI 6.4; 9/16GB (vSMP Foundation 1.7) (Source: ScaleMP)

400.pe

rlbench

401.bz

ip2

403.gc

c

429.m

cf

445.go

bmk

456.hm

mer

458.sje

ng

462.lib

quantu

m

464.h2

64ref

471.om

netpp

473.as

tar

483.xa

lancbm

k

SPECint_rat

e_base

2006

0

250

500

750

1,000

1,250

1,500

1,750

2,000

2,250

2,500

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

198

145

189

253

223

168

209

692

293

167

144

250

220

796

560

736

997

889

670

831

2,3

80

1,1

60

651

561

969

857

101%97% 97% 99% 100% 100% 99%

86%

99% 97% 97% 97% 97%

16 Threads (1 board) 64 Threads (4 boards) Efficiency

Rate

Effic

ienc

y

SPECint_rate_base2006 - RATE (HIGHER IS BETTER)

vSMP Foundation Performance

Page 19: …updates…

Aggregate. Scale. Simplify. Save. Confidential and Proprietary

HW Characteristics:vSMP Foundation™ (QC-8 core): 2 x Intel XEON 5345 QC (Clovertown), 2.33GHz, 2x4MB L2; 908/960GB (vSMP Foundation 1.7) (Source: ScaleMP)vSMP Foundation™ (QC-128 core): 32 x Intel XEON 5345 QC (Clovertown), 2.33GHz, 2x4MB L2; 908/960GB (vSMP Foundation 1.7) (Source: ScaleMP)

168.wupwise 171.swim 172.mgrid 173.applu 177.mesa 178.galgel 179.art 183.equake 187.facerec 188.ammp 189.lucas 191.fma3d 200.sixtrack 301.apsi SPECFP_rate2000

0

500

1,000

1,500

2,000

2,500

3,000

3,500

0%

20%

40%

60%

80%

100%

120% 1

00

50

44 1

38

210

265

164

46 105

96

56

63 82 103

93

1,2

82

600

469

1,4

20

3,1

32

2,8

64

1,4

94

561

1,0

61

1,2

67

594

679

1,2

64

1,3

74

1,0

97

80%75%

67% 64%

93%

68%

57%

76%

63%

83%

66% 67%

97%

83%

73%

vSMP Foundation (8 cores / 1 boards) vSMP Foundation (128 core / 16 boards) Efficiency (128 core to 8 cores)

Benchmark

Rate

vSMP Foundation PerformanceHigher is Bett

er

SPECfp_rate_base2000 - RATE (HIGHER IS BETTER)

Page 20: …updates…

Aggregate. Scale. Simplify. Save. Confidential and Proprietary

HW Characteristics:QPI 6.4GT/s - 4 x Intel XEON X5570 QC (Nehalem), 2.93GHz, 8MB L3, QPI 6.4; 9/16GB (vSMP Foundation 1.7) (Source: ScaleMP)

410.bw

aves

416.ga

mess

433.m

ilc

434.ze

usmp

435.gr

omacs

436.ca

ctusA

DM

437.les

lie3d

444.na

md

447.de

alII

450.so

plex

453.po

vray

454.ca

lculix

459.G

emsF

DTD

465.to

nto

470.lbm

481.wrf

482.sp

hinx3

SPECfp_rat

e_base

2006

0

100

200

300

400

500

600

700

800

900

1,000

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

166

197

131

193

187

217

119

174

237

131

259

220

107

196

108

206

188

173

621

784

499

759

744

848

449

695

932

473

1,0

30

870

402

794

396

780

702

666

93%100%

95%98% 99% 98%

95%100% 98%

91%

99% 99%94%

101%

92%95% 93%

96%

16 Threads (1 board) 64 Threads (4 boards) Efficiency

Rate

Effic

ienc

y

SPECfp_rate_base2006 - RATE (HIGHER IS BETTER)

vSMP Foundation Performance

Page 21: …updates…

Powering high-end x86 systems

Aggregate. Scale. Simplify. Save.

Shai FultheimFounder and President

[email protected], +1 (408) 480 1612

04/22/202321