Infiniband and RDMA Technologyrich/Infiniband/Summit2006-RDMA.pdf1.2 4.3 16.1 60.3 2003 2004 2005*...
Transcript of Infiniband and RDMA Technologyrich/Infiniband/Summit2006-RDMA.pdf1.2 4.3 16.1 60.3 2003 2004 2005*...
Infiniband and RDMA TechnologyDoug Ledford
Top 500 SupercomputersNov 2005
● #5 Sandia National Labs, 4500 machines, 9000 CPUs, 38TFlops, 1 big headache
● Performance great....but....● Adding new machines problematic due to software
interactions● Diagnosing and locating faults very difficult
OpenFabrics Software Stack
InfiniBandHCA
iWARPNIC
HardwareSpecific Driver
HardwareSpecificDriver
InfiniBandConnection
Manager (CM)
InfiniBandMAD
InfiniBandSpecificVerbs
InfiniBandSubnet Admin
Client(SA Client)
iWARPConnection
Manager (CM)
iWARPSpecific
Verbs/API
ConnectionManager
Abstraction (CMA)
Common Verbs/ API
User Level Verbs
SDPIPoIB SRP iSER RDS
UDAPL
SDP LibraryUser Level MAD API
Open SMDiagnostic
Tools
Hardware
Provider
Core
Verbs / APILayer
Upper Layer Protocol
User APIs
NFS-RDMARPC
ClusteredDB Access
ClusterFS
SocketsBasedAccess
VariousMPIs
Application Level
Access to File
Systems
BlockStorageAccess
IP basedApp
Access
OtherFS
Headache
690
140030
6
0
200
400
600
800
1000
1200
1400
1600
10GigE w/ TOE 20GigInfiniBand
Thro
ughp
ut (M
B/s)
0
5
10
15
20
25
30
35
CPU
Utiliz
atio
n (%
)
Source: “ Head to TOE” from OSU, “ InfiniBand and 10-Gigabit Ethernet for I/O in cluster computing”from Sandia National Laboratories, and Mellanox
1.2 4.3
16.1
60.3
2003 2004 2005* 2006 p
Cancels per Trade
CAGR269%
2005* = Aug 05 – Source: NASDAQ
1.2 4.3
16.1
60.3
2003 2004 2005* 2006 p
Cancels per Trade
CAGR269%
1.2 4.3
16.1
60.3
2003 2004 2005* 2006 p
Cancels per Trade
CAGR269%
2005* = Aug 05 – Source: NASDAQ
12.6
20.5
36.1
63.6
2003 2004 2005* 2006 p
Quotes per Trade
CAGR72%
2005* = Aug 05 – Source: NASDAQ
12.6
20.5
36.1
63.6
2003 2004 2005* 2006 p
Quotes per Trade
CAGR72%
12.6
20.5
36.1
63.6
2003 2004 2005* 2006 p
Quotes per Trade
CAGR72%
2005* = Aug 05 – Source: NASDAQ
2005
120,000
80,000
4,799 7,063 9,650 12,90625,869
55,105
2000 2001 2002 2003 2004 Feb05 Jun05 Dec2005Proj
Aggregated One Minute Peak MPS Rates CTS, CQS, OPRA, NQDS
Source: SIAC, OPRA, and NASDAQ
Wall Street Trading Environment Challenges
Performance-Low Latency-High Bandwidth-Efficient CPU Utilization-Reliable Transport
HighPerformanceComputing Storage
MiddlewareServers
Aggregation
Web Servers
Number of Concurrent Applications
EnterpriseLANs
InfiniBand
10GigE
1GigE
HighPerformanceEmbedded
Interconnects
FibreChannel
DatabaseServers
● Price/Performance– $69 (OEM) adapter IC vs. $500 for similar 10GigE adapter IC solution– $200 (OEM) adapter card vs. $2000 for comparable 10GigE card – 1.4GB/s and 2.7µs latency
● Virtualization– Highest utilization of computing and storage resources– Simplifies adding resources for rapidly expanding data centers
IB HCAIB HCA20 Gb/s
VirtualMachine …
Hypervisor
GbE NIC
New Top500 Cluster
01020304050607080
New IB clusters New Myrinetclusters
New Quadricsclusters
2004 2005
Infiniband/RDMA use climbing rapidly
Common Verbs/ API
User Level Verbs
SDPIPoIB
UDAPL
SDP Library
Kernel Provided Interface
User APIs
SocketsBasedAccess
VariousMPIs
Application Level
IP basedApp
Access
Level 1 IPoIB
● Easiest to use, requires no modification of applications● Lowest overall payback
Common Verbs/ API
User Level Verbs
SDPIPoIB
UDAPL
SDP Library
Kernel Provided Interface
User APIs
SocketsBasedAccess
VariousMPIs
Application Level
IP basedApp
Access
Level 2 – SDP
● You might be able to use libsdp library to enable SDP in your application without any code changes or recompiles● If not, the code changes to natively support SDP are very minimal● This methods gets a good deal of the RDMA benefit
Common Verbs/ API
User Level Verbs
SDPIPoIB
UDAPL
SDP Library
Kernel Provided Interface
User APIs
SocketsBasedAccess
VariousMPIs
Application Level
IP basedApp
Access
Level 3 – Verbs/MPI
● Code must be written to either the verbs or MPI API● Code changes are not minimal, and in some cases require rethinking of application design● This methods gets full benefit of RDMA capabilities
OpenFabrics Software Stack
InfiniBandHCA
iWARPNIC*
HardwareSpecific Driver
HardwareSpecificDriver*
InfiniBandConnection
Manager (CM)
InfiniBandMAD
InfiniBandSpecificVerbs
InfiniBand SubnetAdmin Client(SA Client)
iWARPConnection
Manager (CM)*
iWARPSpecific
Verbs/API*
ConnectionManager
Abstraction (CMA)
Common Verbs/ API
User Level Verbs
SDPIPoIB SRP iSER* RDS*
UDAPL
SDP LibraryUser Level MAD API
Open SMDiagnostic
Tools
Hardware
Provider
Core
Verbs / API Layer
Upper Layer Protocol
User
APIs
ClusteredDB Access
SocketsBasedAccess
VariousMPIs
Application Level
BlockStorageAccess
IP basedApp
Access
Common
IB Specific
iWARP Specific
Key
* Future