Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx,...

34
Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    1

Transcript of Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx,...

Page 1: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

Swiss-T1 : A Commodity MPI computing solution

Swiss-T1 : A Commodity MPI computing solution

Mars 1999

Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne

Page 2: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

Swiss-T1 : A Commodity MPI computing solution

Swiss-T1 : A Commodity MPI computing solution

March 2000

Content:

1. Distributed Commodity HPC2. Characterisation of machines and applications

3. Swiss-Tx project

Page 3: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

July 1998

Past : SUPERCOMPUTERPast : SUPERCOMPUTER

Cray Research

Convex

Connection Machines

KSR

Intel Paragon

Japanese companies

Teracomputers

Taken over by SGI

Taken over by HP

Disappeared

Disappeared

Stopped supercomputing

Still existing (not main)

Develop since 6 years

Produced own processors

Developped own memory switches

Needed special memories

Developped own operating system

Developped own compiler

Special I/O : HW and SW

Own communication system

Manufactures What happened Why it happened

Page 4: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

Processor performance evolutionProcessor performance evolution

July 1998

Page 5: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

July 1998

SMP/NUMASMP/NUMA

DIGITAL

SUN

IBM

HP

SGI

…..

Wildfire

Starfire

SP-2

Exemplar

Origin 2000

…..

Off the shelf processors

Off the shelf memory switches

Off the shelf memories

Special parts of operating system

Special compiler extensions

Special I/O and SW

Own communication system

Manufacturer Parallel server Present situation

What is the trend ?

Page 6: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

March 2000

Commodity Computing (MPI/PCI)Commodity Computing (MPI/PCI)

PC clusters/Linux:

Fast Ethernet: Beowulf

SOS cooperation (Alpha):

Myrinet/DS10: C-Plant (SNL)T-Net/DS20: Swiss-T1 (EPFL)

Customised commodity:

Quadrics/ES40: Compaq/Sierra

Off the shelf processorsOff the shelf memory switchesOff the shelf memoriesOff the shelf local I/O HW and SWOff the shelf operating systemsOff the shelf compilers

New communication system

New distributed file/IO system

Page 7: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

March 2000

4th SOS workshop on Distributed Commodity HPC

4th SOS workshop on Distributed Commodity HPC

Participants: SNL, ORNL, Swiss-Tx, LLNL, LANL, ANL, NASA, LBL, PSC, DOE, UNM, Syracuse,Compaq, IBM, Cray, Sun, SME’s

Content: Vision, Clusters, Interconnects, Integration, OS, I/O, Applications, Usability, Crystal ball

Page 8: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

March 2000

Distributed commodity HPC User’s Group

Distributed commodity HPC User’s Group

Goals:

Characterise the machinesCharacterise the applications

Match machines to applications

Page 9: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

Characterise processors, machines, and applications

Characterise processors, machines, and applications

PerformanceProcessors: Vmac

Vmac= peak proc. performance/peak memory BWParallel machines: mac

mac = effective proc. perf./effective network perf.Applications: app

app = operation count/words to be sent

Page 10: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

15 juin 1998

In a box: Vmac valuesIn a box: Vmac values

Vmac = R [Mflop/s] / M [Mword/s]

Table: Vmac values for Alpha 21164 and 21264 boxes and NEC SX-4

Machine N R M Vmac

Alpha server 1200 2 2133 138 15 DS20 2 2000 667 3 DS20+ 2 2667 667 4

NEC SX-4 1 2000 2000 1

Page 11: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

Between boxes: mac valueBetween boxes: mac value

mac = N * R [Mflop/s] * <d> / C [Mword/s]

Table: mac of different machines

Machine Type Nproc Peak Eff perf Eff bw mac

Gravitor Beowulf 128 50 6.4* 0.064 100Swiss-T1 T-Net 64 64 13 0.32 40Swiss-T1 FE 64 64 13 0.032 400Baby T1 C+PCI 12 12 2.4 0.072 30Origin2K NUMA/MPI 80 32 9 1 9NEC SX4 vector 8 16 8 6.4 1.3Effective performance measured with MATMULT, * estimated. Effective bandwidth measured with point to point

Page 12: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

The app valueThe app value

app = Operations/Communicated words

Material sciences (3D Fourier analysis): app~ 50Beowulf insufficient, Swiss-T1 just about right

Crash analysis (3D non-linear FE): app> 1000Beowulf sufficient, latency?

Page 13: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

The app value for Finite ElementsThe app value for Finite Elements

app = Operations/Communicated words

FE: Ops Nb of volume nodesOps Nb of variables per node squareOps Nb of non-zero matrix elementsOps Nb of operations per matrix element

FE: Comm Nb of surface nodesComm Nb of variables per node

FE: app Nb of nodes in one directionapp Nb of variables per nodeapp Nb of non-zero matrix elementsapp Nb of operations per matrix elementapp Nb of surfaces

Page 14: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

The app valueThe app valueStatistics for 3D brick problem (Finite elements)

Nb of Nb of Nb Mflop Mflop kB kB app

Subd Nodes interface /cycle /data /cycle /cycleNodes /proc transfer /proc

1 5049 0 13.5 13.5 0.0 0.0 2 5202 153 13.5 6.8 7.2 3.6 150744 5508 459 13.5 3.4 21.5 5.4 502816 6366 1317 13.5 0.8 61.7 3.9 175532 6960 1911 13.6 0.4 89.6 2.8 121164 7572 2523 13.6 0.2 118.3 1.8 918128 8796 3747 13.6 0.1 175.6 1.4 620Table: Current day case, 4096 elements

Page 15: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

March 2000

Fat-tree/Crossbars 16x16Fat-tree/Crossbars 16x16

N=8, P=8, N*P=64 PUs, X=12, BiW=32, L=64

Page 16: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

March 2000

Circulant graphs/Crossbars 12x12Circulant graphs/Crossbars 12x12

K=2 (1/3)N=8, P=8, X=8BiW=8, L=16

K=3 (1/3/5)N=11, P=6, X=11

BiW=18, L=33

K=4 (1/3/5/7)N=16, P=4, X=16

BiW=32, L=64

Page 17: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

March 2000

Fat-tree/Circulant graphsFat-tree/Circulant graphsTable : Comparison of Fat-tree and circulant graph architectures

Parameter Fat-tree Circulantgraph

K=2 (1/3)

Circulantgraph

K=3 (1/3/5)

Circulantgraph

K=4 (1/3/5/7)Crossbar 16x16 12 - - -Crossbar 12x12 - 8 11 16

N 8 8 11 16P 8 8 6 4

N*P 64 64 66 64D 2 2 2 2

Dm 1.75 1.25 1.28 1.38BiW 32 8 18 32

L 64 16 33 64w 1 3 3 3

T=wP2 64 192 108 48

N : Number of computing nodesP : Number of boxes per nodeN*P : Total number of boxesD : Maximum distance between two nodesDm : Average distance between two nodes (load for a point-to-point operation)BiW : Bisectional widthL : Number of linksw : Load factor for an all-to-all communication operationT : Number of steps, or load, to perform an all-to-all operation

Page 18: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

The Swiss-Tx machinesThe Swiss-Tx machines

September 1998

Swiss-T0

Machine

Swiss-T0 *(Dual)

Baby T1*

Swiss-T1

Installation

Date Place

12.97 EPFL

10.98 EPFL

8.99 EPFL4.00 DGM

1.00 EPFL

#P

8

16

16

70

Peak

Gflop/s

8

16

16

70

Memory

GBytes

2

8

8

35

Disk

GBytes

64

170

170

950

Archive

TBytes

1**

-

-

1**

Operating

system

Digital Unix

Windows NTDigital Unix

Tru64 Unix

Tru64 Unix

Connection

EasyNet busFE bus

system

Crossbar 12x12FE switch

EasyNet busFE switch

Crossbar 12x12FE switch

-90002521008504? ? Not decidedCrossbar 12x12

FE switchSwiss-T2

* Baby T1 is an upgrade of T0(Dual) ** Archive ported from T0 to T1

Page 19: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

March 2000

Swiss-T1Swiss-T1

Page 20: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

Swiss-T1Swiss-T1

Components32 computational DS20E

2 frontend DS20E1 development DS20E300 GB RAID disks

600 GB distributed disks1 TB DLT archive

Fast/Gigabit EthernetTru64/TruCluster Unix

LSF, GRD/CodineTotalview, Paradyn

MPICH/PVM

T-Net network technology( 8+1)12x12 crossbar 100MB/s

32 bit PCI adapter 75 MB/s(64 bit PCI adapter 180 MB/s)

Flexible, non-blockingReliable

Optimal routingFCI 5 s

MPI 18 sMonitoring system

Remote controlUp to 3 Tflop/s ( < 100)

Page 21: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

March 2000

Swiss-T1 ArchitectureSwiss-T1 Architecture

Page 22: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

March 2000

Swiss-T1 Routing tableSwiss-T1 Routing table

Table: Routing table for the Swiss-T1 machine

1 2 3 4 5 6 7 81 - 2 2 4 4 6 8 82 1 - 3 7 5 3 7 53 2 2 - 4 4 6 8 84 1 7 3 - 5 7 7 35 4 2 4 4 - 6 6 86 1 3 3 7 5 - 7 17 8 2 8 4 6 6 - 88 1 5 3 3 5 1 7 -

Page 23: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

Swiss-T1: Software in a BoxSwiss-T1: Software in a Box

March 2000

*Digital Unix Compaq Operating system in each box*F77/F90 Compaq Fortran compilers*HPF Compaq High performance Fortran*C/C++ Compaq C and C++ compilers*DXML Compaq Digital math library in each box*MPI Compaq SMP message passing interface*Posix threads Compaq Threading in a box*OpenMP Compaq Multiprocessor usage in a box through directives*KAP-F KAI To parallelise a Fortran code in a multiprocessor box*KAP-C KAI To parallelise a C program in a multiprocessor box

Page 24: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

Swiss-T1: Software between BoxesSwiss-T1: Software between Boxes

March 2000

*LSF Platform Inc.Load Sharing Facility for resource management*Totalview Dolphin Parallel debugger *Paradyn Madison/CSCS Profiler to help parallelising programs*MPI-1/FCI SCS AG Message passing interface between boxes running over TNET*MPICH Argonne Message passing interface running over Fast Ethernet**PVM UTK Parallel virtual machine running over Fast Ethernet*BLACS UTK Basic linear algebra subroutines *ScaLAPACK UTK Linear algebra matrix solvers

MPI I/O SCS/LSP Message passing interface for I/O MONITOR EPFL Monitoring of system parameters NAG NAG Math library packageEnsight Ensight 4D visualisationMEMCOM SMR SA Data management system for distributed architecturesShmem EPFL Interface Cray to Swiss-Tx

Page 25: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

March 2000

Baby T1 ArchitectureBaby T1 Architecture

Page 26: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

Swiss-T1 : Alternative networkSwiss-T1 : Alternative network

March 2000

Page 27: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

March 2000

Swiss-T2 : K-Ring architectureSwiss-T2 : K-Ring architecture

Page 28: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

Create SwissTx Company Create SwissTx Company

Commercialise T-Net

Commercialise dedicated machines

Transfer knowhow in parallel application technology

Page 29: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

Between boxes: mac valueBetween boxes: mac value

* measured (SAXPY and Parkbench) ** expected

mac = N * R [Mflop/s] * <d> / C [Mword/s]

Table : The mac values for Swiss-T0, Swiss-T0(Dual) and Swiss-T1 for MATMUL

Machine N R % N * R C <d> mac

T0 (Bus) 8 8000 5* 400* 4* 1 100T0(Dual) (Bus) 8*2 16533 6* 1000* 4* 1 250

Baby T1 (Switch) 6*2 12000 20* 2400* 90* 1 27T1(local) (Switch) 4*2 8000 20* 1600* 60** 1 27T1(global) (Switch) 32*2 64000 20* 12800* 400** 1.25 40

T1 (Fast Ethernet) 32*2 64000 20* 12800* 80** 1 160

Page 30: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

Time ScheduleTime Schedule

March 2000

1.1.98 1.1.99 1.1.00

1st phase 2nd phase

Swiss-T2504 processorsOS not defined

Baby T112 processorsDigital Unix

Swiss-T0(Dual)16 processorsWindows NT

Swiss-T0(Dual)16 processorsDigital Unix

1.6.98 31.10.001.11.99

Swiss-T168 processorsDigital Unix

EasyNet bus based prototypes T-Net switch based prototype/production

machines

Page 31: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

March 2000

Phase I: Machines installedPhase I: Machines installed

Swiss-T0: 23 December 97 (accepted 25 May 98)

Swiss-T0(Dual): 29 September 98 (accepted 11 Dec. 98 / NT)

Swiss-T0(Dual): 29 September 98 (accepted 22 Jan. 99 / Unix)

Swiss-T1 Baby: 19 August 99 (accepted 18 Oct. 99 / Unix)

Swiss-T1: 21 Jan. 2000

Page 32: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

Swiss-T1 Node ArchitectureSwiss-T1 Node Architecture

Mars 1999

Page 33: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

March 2000

2nd Phase Swiss-Tx: The 8 WPs2nd Phase Swiss-Tx: The 8 WPs

Managing Board: Michel DevilleTechnical Team: Ralf GruberManagement: Jean-Michel Lafourcade

WP1: Hardware development Roland Paul, SCSWP2: Communication software development Martin Frey, SCSWP3: System and user environment Michel Jaunin, SIC-EPFLWP4: Data management issues Roger Hersch, DI-EPFLWP5: Applications Ralf Gruber, CAPA/SIC-EPFLWP6: Swiss-Tx concept Pierre Kuonen, DI-EPFLWP7: Management Jean-Michel Lafourcade, CAPA/DGM-EPFLWP8: SwissTx Spin-off Company Jean-Michel Lafourcade, CAPA/DGM-EPFL

Page 34: Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

;March 2000

2nd Phase Swiss-Tx: The MUSTs2nd Phase Swiss-Tx: The MUSTs

WP1: PCI adapter page table / 64 bit PCI adapterWP2: Dual processor FCI / Network monitoring / Shmem WP3: Management / Automatic SI / Monitoring / PE / LibrariesWP4: MPI-I/O / Distributed file managementWP5: ApplicationsWP6: Swiss-Tx architecture / AutoparallelisationWP7: ManagementWP8: SwissTx Spin-off Company