IBM Systems & Tech. Group Bangalore, IndiaHiPC 2004, Dec. 19-22 Copyright by IBM HPS Switch and...

IBM Systems & Tech. Group Bangalore, India HiPC 2004, Dec. 19-22

Copyright by IBM

HPS Switch and AdapterArchitecture, Design & Performance

Rama K Govindarajuramag@us.ibm.com

HiPC ConferenceBangalore, India

December 19-22, 2004

Copyright by IBM

Team• Architecture

– Peter Hochschild, Don Grice, Kevin Gildea, Rama Govindaraju

• Hardware– Carl A Bender, Jay Herring, Piyush

Chaudhary, Steven Martin, Jason Goscinski, John Houston, …

• Software– Chulho Kim, Robert Blackmore, Rajeev

Sivaram, Hanhong Xue, …

• And many others contributed to this effort

Copyright by IBM

Outline

• What is HPS?

• Example HPS customers

• Interconnect Historical Performance

• HPS switch architecture

• HPS adapter architecture

• HPS software architecture

• Transport Modes

• HPS Performance

• Lessons Learned and Future Work

Copyright by IBM

What is HPS?• HPS (High Performance Switch)

– 4th generation switch and adapter to interconnect IBM’s Power processor based nodes (Power 4 and 5)

– To be used in many of the world’s fastest supercomputers

• 20 of the top 100 today use HPS

– Addressing requirements of• HPC labs, DOE, and others• Weather Forecasting, Petroleum sector, Automotive and

Aerospace sector• NSA and DOD

– Core infrastructure for the 100TF ASCI Purple system to be delivered in June 2005

Copyright by IBM

Example HPS CustomersMore than 30 and growingSeveral over 1000 CPUsTotal over: 200TF

Copyright by IBM

Historical Interconnect Performance

1993 1996 1998 2000 2004

Adapter

Switch

Processor

Power 2

Power PC/3

Colony

SP-Switch2

Power 3

Power 4

Peak link bandwidth

40MB/s 150MB/s 150MB/s 500MB/s 2GB/s

MPI bandwidth

35MB/s 110MB/s 135MB/s 375MB/s 1.8-14GB/s

MPI latency 40us 24us 21us 17us <4.2us

Links/node server

1 1 1 1,2 2,4,6,8

IBM developed Switch Interconnects and Adapters

Copyright by IBM

HPS Switch Fabric

switch board

12 meter copper cablesor

80 meter fiber cables

GX bus

Power4 and Power5 based servers

link driverscopper driverfiber optics driver

adapter

GX bus

GX bus RAMCanopus

Canopus

GX bus

GX bus RAMCanopus

Canopus

GX bus

GX busRAMCanopus

Canopus

HPS switch chip

LDC chipHPS Adapters Agilent optics

12 meter copper cablesor

40 meter fiber cables

GX bus

GX bus RAMCanopus

Canopus

4K end points, 59ns latency, 2GB/s bandwidth per link per direction

Copyright by IBM

HPS Adapter Microcode Model

General Parameters256 byte

Channel Buffer512 byte

Packet Header128 byte

Packet Data256 - 2K byte16K byte total

server

interface

fabric

interface

8M bytes

Formattermask, rotate, merge

Formatter RAM256 entries

parallel mask, shift,

arithmetic & branch

Instruction RAM4K entries of 64 bits

Program Counter

General Registers16 entries of 64 bits

Task Registers16 entries of 64 bits

control - status

IAMover

PacketMover

DataMover

Format

memoryfetch

memorystore

IAread

IAwrite

Copyright by IBM

HMCFNM DD

HPS Switch Fabric

HPS Adapter

User Space Kernel Space

IBM’s MPI

Parallel ESSL

GPFS SOCKETS

TCP UDP

APPLICATION

IF_LSHAL

ServiceProcessor

HPS Software ArchitectureL

Copyright by IBM

User SpaceKernel Space

Federation Adapter

Interface Layer

User Buffer

HAL BuffersIP Interface

UDP TCP

Sockets

FIFO versus RDMA models

FIFOcopy

FIFODMA

Copyright by IBM

Supported Communication Modes• FIFO Mode

– Message chopped into 2K packet chunks on the host and copied by CPU

– Memory bus crossing depends on caching. At least 1 IO bus crossing

• RDMA enablement – No slave side protocol– CPU offload – Enhanced Programming

model– 1 IO bus crossing

UserBuffer

Network FIFO

Adapter

Copyright by IBM

RDMA value proposition• Possible overlap of computation and communication

– Fragmentation/reassembly offloaded to the adapter– Minimize packet arrival interrupts– Requires application to be written take advantage of overlap

• One sided programming model• Zero copy transport and reduced memory subsystem

load• Striping advantage• KEY DIFFERENTIATOR: reliable RDMA protocol over

unreliable datagram transport– Allows striping across multiple paths – Out of order arrival – Reduces hot spotting and contention

• Cons– Pinned memory usage– Resource management and fairness issues

Copyright by IBM

Federation Performance• Summary:

– Latency: Power 4, 1.9GHz, HPS• MPI latency 4.34us• Interrupt latency: adds 10us• 8 task latency: adds 1us

– Bandwidth: Power 4, 1.9GHz, HPS• FIFO mode:

– Unidirectional bandwidth: ~ 1.8GB/s– Bidirectional bandwidth: 2.1GB/s

• RDMA mode:– Unidirectional bandwidth: ~1.8GB/s– Bidirectional bandwidth: ~3.0GB/s– Linear striping performance up to 8 links

» Unidirectional: 14GB/s, Bidirectional: 24GB/s

• These are preliminary measurements

Copyright by IBM

HPS: MPI LatencyMachine Type Latency Measurement

1.9GHz, p690+ 4.34us

1.7GHz, p690+ 4.72us

1.7GHz, p655+ 4.70us

1.5GHz, p690+ 5.15us

1.3GHz, p690 5.5us

All measurements measured using IBM’s thread safe MPI libraries8 task latency adds approximately 1 additional microsecondInterrupt latency adds approximately 10-12 microsecondsAll measurements are preliminary

Copyright by IBM

Unidirectional Bandwidth Peak

Machine Type Peak Uni-dir Bandwidth

1.9GHz, p690+ 1.800GB/s

1.7GHz, p690+ 1.686GB/s

1.7GHz, p655+ 1.800GB/s

1.5GHz, p690+ 1.470GB/s

1.3GHz, p690 1.170GB/s

All measurements are preliminary

Copyright by IBM

Unidirectional Bandwidth Profile

Message Size (bytes)

P655, 1.7GHz based systemM1/2= 32K, M3/4=128K

Copyright by IBM

Bidirectional Bandwidth Profile

Message Size (bytes)

P655, 1.7GHz based systemM1/2=16K, M3/4=64K

Copyright by IBM

T1 T2 T3 T1 T2 T3 T1 T2 T3

= Communication time by thread/task

a) Asynchronous Model b) Synchronous Model c) Aggregate Comm Thread Model

Striping Options

Copyright by IBM

Striping Models

MPI LayerLA

ADAPTERS

MPI Layer

ADAPTERS

Multiple threads doing copies model Single Thread with Pipelined RDMA model

Second approach: - More elegant failover model - Less synchronization issues and CPU contention via RDMA

Copyright by IBM

RDMA Unidirectional BandwidthPreliminary RDMA Unidirectional BW

16384 32768 65536 131072 262144 524288 1048576 2097152 4194304 8388608 16777216 33554432 67108864 1.34E+08 2.68E+08

Message Size

Single Link Two Links Four Links Eight Links

Copyright by IBM

RDMA Bidirectional BandwidthPreliminary RDMA Bidirectional BW

16384 32768 65536 131072 262144 524288 1048576 2097152 4194304 8388608 16777216 33554432 67108864 1.34E+08 2.68E+08

Message Size

Single Link Two Links Four Links Eight Links

Copyright by IBM

How can users exploit RDMA?• Overlap computation and communication

– Non blocking calls– Reuse communication buffers if possible– User exposed RDMA in 11/05

• Minimize interrupts for large transfers• Reduce contention for memory• Better raw bandwidth for messages over 80KB• Possibility of overlapping collectives better (via

striping)• IP transport much more efficient (translates to

improved GPFS performance)• Select striping when sending large messages

Copyright by IBM

Future Work• Enabling HPS for Power 5 based nodes• Exploit SMT in Power 5 processor for

FIFO mode• Further attack MPI latency• Use RDMA to improve MPI collectives

performance• Parallel file systems (GPFS) further

exploitation of IP over RDMA• Take lessons learned into the Percs

project

IBM Systems & Tech. Group Bangalore, IndiaHiPC 2004, Dec. 19-22 Copyright by IBM HPS Switch and...

Documents

Transcript of IBM Systems & Tech. Group Bangalore, IndiaHiPC 2004, Dec. 19-22 Copyright by IBM HPS Switch and...

Digital Access of Handwritten Documentsgovind/pdfs/and09_govindaraju.pdf[Milewski & Govindaraju, DAS 2006] Farooq et al, DAS 2008] [Cao & Govindaraju, ICDAR 2007] Outline Recognition

HPS IN 2011 - HPS Worldwide€¦ · 6 HPS - 2011 Annual Report HPS - 2011 Annual Report 7 KEY DATES 1995 - 1999 HPS formed in 1995 and developed Version 1 of PowerCARD under Oracle

Naga K. Govindaraju, Stephane Redon, Ming C. Lin, Dinesh Manocha

HPS Newsletter Mar 2020 › wp-content › uploads › 2020 › 03 › HPS...HPS, HPS website and Facebook Group 10 2020 Mee>ng Calendar & Compeon Point Totals 6 Macro/Close-up—

Ceridian Hps Services

HPS 1702 Junior/Senior Seminar for HPS Majorsjdnorton/teaching/1702_jnrsnr_sem/... · Web viewHPS 1702 Junior/Senior Seminar for HPS Majors HPS 1703 Writing Workshop for HPS Majors

IBM HPC Development MPI update - spscicomp.org · AIX5.3, IBM HPS (4 links), IBM RSCT/LAPI 2.4.3, and IBM LoadLeveler 3.4 For each benchmark, run using the old algorithm in PE4.2.2

HPS Weekly Report - documents.hps.scot.nhs.uk · HPS Weekly Report Correspondence to: The Editor, HPS Weekly Report HPS, ... • HSE campaign on Respirable Crystalline Silica •

Dantu, Srirangaraj Setlur, Venu Govindaraju Neeti Narayan ...€¦ · Neeti Narayan, Nishant Sankaran, Devansh Arpit, Karthik Dantu, Srirangaraj Setlur, Venu Govindaraju University

HPS DAQ Overview

HPS Values - howickprimary.school.nz

Hantavirus Pulmonary Syndrome (HPS) Syndrome (HPS)

Series HPS SAH, HPS SAK, HPS DAH, HPS DAH/S, HPS DAK, … · Series HPS-SAH, HPS-SAK, HPS-DAH, HPS-DAH/S, HPS-DAK, HPS-DAK/S Manual Non-Coded Pull Station STANDARD FEATURES Metal

HPS Chapters

Biometrics and Sensors Venu Govindaraju CUBS, University at Buffalo govind@buffalo.edu.

Venu Govindaraju Curriculum Vitaegovind/Venu-cv-2011.pdf · 2011-02-14 · ICPR, IBM Best Student Paper Award, (X. Peng), Istanbul, Turkey, 2010, òText Separation from Annotated

HPS 55 DS, HPS 65 DS, HPS 85 DS HPS 115 DS, HPS 175 DS ... · Operating Instructions . Hydraulic section steel shears . HPS 55 DS, HPS 65 DS, HPS 85 DS HPS 115 DS, HPS 175 DS HP S

Venu Govindaraju I. General 1. 2. - University at Buffalogovind/Venu-CV-2009.pdfVenu Govindaraju Curriculum Vitae December 15, 2009 I. General 1. Address 2. Education 3. Employment

HPS Service Pack 18 FLASH/Readme - IBM - United … Service Pack 18 FLASH/Readme ... Detailed LP Level Check VSD LAPI . HPS PPE . LoadL GPFS . ... Component update/download information

The Tuck Shop - Hamilton Police Service€¦ · Ladies Tank Tops (HPS crest) Jackets-Fleece (HPS crest) Jackets-Spring (HPS crest) Jackets-Winter (HPS crest) Jackets-w/Lining (HPS