1NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. High-end Routers & Modern...
-
Upload
shaun-smock -
Category
Documents
-
view
214 -
download
0
Transcript of 1NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. High-end Routers & Modern...
1NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved.
High-end Routers &Modern Supercomputers
Bob Newhall & Dan Lenoski
Cisco Systems, Routing Technology Group
NORDUnet 2003, Reykjavik – August 2003
NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 222
Agenda
• Traditional Routers and Supercomputers
• Modern Routers and Supercomputers
• Comparison of Subsystems
• Conclusions
NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 333
What’s a Router? Traditionally…
PCI Bus1 PCI Bus2
PA-6PA-6
PA-4PA-4
PA-2PA-2
PA-5PA-5
PA-3PA-3
PA-1PA-1
I/O Bus
PCI Bus0
ROMROM
Flash
Flash
NVRAMNVRAMCon/AuxCon/Aux
PBPB
PBPB
PBPB
PBPB
PBPB
PBPB
FEFE
PCMCIA-2PCMCIA-2
CPU BusPBPB
SystemControllerSystem
ControllerSDRAM(256 MB)SDRAM(256 MB)
CPUMIPSCPUMIPS
SecondaryCacheSRAM
SecondaryCacheSRAM
PCMCIA-1PCMCIA-1
Architecturally, routers have been like normal computers except:
- Mechanical form factors, especially for IO- Embedded forwarding and routing SW
NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 444
What’s a Supercomputer? Traditionally… Cray Y-MP
250 Gbyte/sec of interconnect bandwidth
Cray Y-MP C90
NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 555
Evolution of High-End Routers
• Increasing bandwidth of external connections: T1 -> DS3 -> OC3 -> OC12 -> OC48 -> OC192 -> OC768
1mbit/sec -> 40 gbit/sec
• Line speed increases require changes in router architecture to remove the central memory bottleneck and replace with distributed memories and central interconnect fabric
NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 666
Evolution of High-End Routers
• Increased computational power for routing, forwarding and feature processing
• Larger systems (more line cards) desired by end customers to exploit DWDM capabilities and simplify operation of POPs
NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 777
What’s a High-End Router today?
Switch Fabric Route Processor(s)
Linecards (8-16)
T1 to OC-192Interfaces
Distributed Architecture with Crossbar Switch Fabric
Multi-Gigabit Switching Capacity
NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 888
The next-generation of High-End Routers
Switch Fabric Route Processor(s)
Linecards (100’s to 1000’s)
T1 to OC-768Interfaces
Multi-Terabit Switching Capacity
Multi-Chassis, Distributed Architecture with Multi-Stage Switch Fabric
NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 999
Evolution of Supercomputers
• Move from globally clocked, ECL vector processors to distributed-memory uP based multiprocessors 250MHz C90 to 1-2GHz Pentium 4, Alpha, Power3
• This architecture change driven by: Complexity and economics of building highest performance processors
Commoditization of smaller-scale computers
Not driven by programming desires of end-users
• Note that state-of-the-art processors can generate less than 10Gbit/sec of communication data
NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 101010
What’s a Supercomputer today?ASCII White at LLNL
• 8K processors in 512 nodes, 12TFLOPS
• Interconnect has connection BW of 1TByte/Sec
• Diagram and photo from LLNL ASCII webpage
NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 111111
Major components of a Router
• Distributed Control Plane Used to run routing protocols (= dist. computer)
• Distributed Data Plane Packet Processing: Examine L2-L7 protocol information
(Determine QoS, VPN ID, policy, etc.)
Packet Forwarding: Make appropriate routing, switching, and queuing decisions
• System Interconnect Control Plane – can be combined with data plane or
dedicated
Data Interconnect – at least sum of external BW required
NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 121212
Major components of a Supercomputer
• Distributed Control / Computational nodes Small number of processor nodes (4-16) with local memory
• Distributed IO Subsystem Typically tied to subset of nodes, but if fully distributed these can be
viewed as sync/source of external bandwidth similar to router external connections
• System interconnect BW driven primarily by data sharing requirements and often limited by
CPU’s ability to generate data
NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 131313
Router – Supercomputer Analogy
High-End Router Supercomputer
Route Processors CPU Nodes
Line Cards I/O Nodes
Switch Fabric Interconnection Network
NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 141414
Route Processors ~ CPU Nodes
• Route Processors execute routing protocols and maintain routing and forwarding information bases Large networks dictate gigabytes of memory to hold routing and interface
database
Also require high-peak computation rates to reconverge network topology and download table updates to line cards
1000 MIPs per eight 40Gbit/sec interfaces for control plane
• CPU nodes in supercomputer run applications and source and sync processor communication traffic 1-2 Gflops and 1000 MIPs per processor
1-2 Gbytes of memory per processor
NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 151515
Router Line Card ~ SC I/O Node
• Packet forwarding, classification and feature processing require complex look-ups and queuing decisions be made on a per packet basis Even with HW assist (TCAMs, etc.) approximately 500 instructions per packet
At 40Gbps and minimum size packet => 100MPPS
Total of 50,000 MIPS / 40Gbps line rate
• Queuing and TCP/IP congestion semantics imply 200millisec of buffering on ingress and egress .2sec x 40Gbps x 2 = 16Gbits = 2Gbyte / 40Gbps line rate
Fragmentation usually typically requires 4x BW queuing 40Gbps => 160Gpbs per queue x 2 (I & E) => 320Gbps
NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 161616
Table SRAMFwd/Class
TCAMs
RTT Buffer Mem (1GB)+ pointer
SRAM
Distributed Memory Router Line Card
InputQueuing
ReceiveFwd
Engine
ControlCPU Mem
Control
LinecardControl
CPU
FabricRe-Assem.
TransmitFwd
Engine
OutputQueuing
L2 Buffering
Optics
ToFabric
FromFabric
Framer
RTT Buffer Mem (1GB)+ pointer
SRAM
Table SRAMFwd/Class
TCAMs
512+MB DRAM
NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 171717
Supercomputer I/O Nodes
• Disk and network attachment dominate requirements
• Computational requirements on data typically limits effective throughput
• 52 nodes of 512 on ASCII-White each with appox. 1-2Gbyte/sec per node of IO BW
• Data must be moved from IO to local node memory and then IPC’d to other computational nodes Limited by node to interconnect BW limits
NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 181818
Router Switch Fabric ~ SC Interconnect Network
• Critical design parameters are: Throughput Traffic Isolation Fault-Tolerance
• Router switch fabric must have over-speed of fabric BW to line BW to provide traffic isolation and deal with packet fragmentation Minimum 1.5x with at least 2x line rate desirable
60-100Gbps per 40Gbps line rate
• Depending size of system – topology varies from Crossbar Multistage Network (e.g., Benes, Clos) Must be symmetric – all-to-all (like old-style Supercomputer)
NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 191919
Supercomputer Interconnect Network
• Critical parameters are: Throughput
Latency (end-to-end)
• Actual supercomputers interconnects vary substantially, but usually <1Gbyte/sec per processor
• Topology Varies, but generally exploits locality Hypercube
Torus or Mesh
Multi-stage networks
NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 202020
Overall Comparison
Feature 512 Linecard 40Gpbs/LC
Router
512 node, 8K ASCII-White
SuperComputer
Control MIPS 64 GIPS 8000 GIPS
Data MIPS 25600 GIPS N/A
Total Memory Storage
1024 Gbytes 4096 Gbytes
Total Memory Bandwidth
20 Tbyte/sec 8 Tbyte/sec
Interconnect Bandwidth
4 Tbyte/sec 2 Tbyte/sec
NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 212121
Overall Technology Required
• Traditionally, networking equipment exploited off-the-shelf silicon, FPGA, standard ASIC technology
• High-end routers with OC-192 support approaching supercomputers 0.25u and 0.18u ASICs shipped in early 2001
• High-end routers with OC-768 support require the leading edge of technology ASICs using 0.13u technology and >1500pin packages
Latest memory technologyRambus, FCRAM and RLDRAM, QDR SRAM
Power per rack comparable to the 9.5KW for IBM’s SP2
NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 222222
Conclusions
• Explosive data rates and optics capabilities have pushed router technology tremendously in the last decade From embedded single-board computers in the 80’s
To distributed-memory computers with specialized forwarding, queuing and feature processing capabilities
• In nearly every metric of system technology, today’s high-end routers match or exceed the capability of an equivalent supercomputer
• In addition, high-end routers have a critical requirement of system fault-tolerance
• Going forward, advances in high-end routers and supercomputers are technology-limited
23NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved.
Thank you!
Bob Newhall, [email protected]