High Node Count - Scalability Challenges for...
Transcript of High Node Count - Scalability Challenges for...
High Node Count -Scalability Challenges for Interconnection Networks
Professor Olav Lysne
Simula Research Laboratory
Overview
Congestion control
Fault Tolerance
Scalable Modular Routing
State Of The Art:State of TechnologyState of KnowledgeState of Problem
CONGESTION CONTROL
Congestion tree
HOL blocked traffic (Victim)
FECN
BECN
The InfiniBand CC mechanism relies on a closed loop feedback control systems to remove the congestion tree.
Shared network resources could lead to network congestion and head-of-line (HOL) blocking.
Switch- Threshold- Marking Rate- Packet Size
Switch- Threshold- Marking Rate- Packet Size
Host- CCT- CCT Index Increase- CCT Index Min- CCT Index Limit- CCT Index Timer
Host- CCT- CCT Index Increase- CCT Index Min- CCT Index Limit- CCT Index Timer
Experiments show that the HOL blocking leads to performance degradation when CC is not activated.
The InfiniBand CC mechanism is able to remove both the HOL blocking and the parking lot problem.
Without CCWith CC
Parameter Values:
Threshold 15Marking Rate 1Packet Size 8
CCTI Increase 1CCTI Limit 127CCTI Min 0CCTI Timer 150
Parameter Values:
Threshold 15Marking Rate 1Packet Size 8
CCTI Increase 1CCTI Limit 127CCTI Min 0CCTI Timer 150
The average throughput of the victim flow as a function of the Marking_Rate (sw) and the CCTI_Timer (host).
The average combined throughput of the contributors as a function of the Marking_Rate and the CCTI_Timer.
Contributors may experience unfairness if an unfortunate CCTI_Timer value is chosen
Contributors experience unfairnessamong each other for an extendedperiode of time each time a newcontributer is added whenan unfortunate timer is chosen.
Max value
Min value
.
.∆
∆ = (max value) – (min value)
TVV = Var(∆1, ∆2, ..., ∆n)
The “treatment variation variable”:
The “treatment variation variable” rules out a large part of the parameter space.
Parameter Values:
Threshold 15Marking Rate 1Packet Size 8
CCTI Increase 1CCTI Limit 127CCTI Min 0CCTI Timer 150
Parameter Values:
Threshold 15Marking Rate 1Packet Size 8
CCTI Increase 1CCTI Limit 127CCTI Min 0CCTI Timer 150
InfiniBand Congestion Control in M9(SUN™ DATACENTER INFINIBAND SWITCH 648)
• 20% of the nodes send to everyone• 80% of the nodes send to 8 hotspots
Gbps
Gbps
• IBTA Specification 1.2 compliant• 648 QDR/DDR/SDR 4x InfiniBand ports• Three-stage internal full Clos network (non-blocking)
!HS,
!CC
HS, !
CC
HS, C
C
HS, C
C☺
HS, C
C –
QP ☺
!HS,
CC
!HS,
!CC
HS, !
CC
HS, C
C
HS, C
C☺
!HS,
CC
HS, C
C –
QP ☺
Further simulation studies:- Different traffic patterns- Other topologies (M24: SUN DATA CENTER SWITCH 3456)
Congestion Control - State of the art
•State of technology InfiniBand Congestion Control Fecn/BecnDatacenter Ethernet - TBDMuch more to be expected
•State of KnowledgeRegional Explicit Congestion NotificationImprovements on Fecn/BecnParametrizationsDynamics…?Impact on applications…?Much more to do
Fault Tolerance-Living with faults
StaticReconfiguration-basedEnd-to-End ReroutingLocal Rerouting
What is network deadlock?
• Deadlock is a cycle of packets all waiting for the next packet in the cycle to proceed before it can proceed itselg
• Routing functions may be deadlock free – topologies may not– for almost all topologies there
exist reasonable but deadlocking routing functions, as well as reasonable and deadlock free routing functions.
Static Fault Tolerance
Checkpoint – Reconfigure – Rollback – Restart
Requires topology agnostic routing algorithms
LASH, TOR, LASH/TOR, L-turn, Segment-based, Up*/Down
Dynamic Reconfiguration
A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets
Key idea: make sure that new packets never wait behind old packets
•Dependencies •of Rold
Dynamic Reconfiguration
•Dependencies •of Rold
•TOKEN
A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets
Key idea: make sure that new packets never wait behind old packets
Dynamic Reconfiguration
•Dependencies •of Rold
•Depend. •of Rnew
•STOP
A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets
Key idea: make sure that new packets never wait behind old packets
Dynamic Reconfiguration
•Dependencies •of Rold
•Depend. •of Rnew
•STOP
•TOKEN •TOKEN A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets
Key idea: make sure that new packets never wait behind old packets
Dynamic Reconfiguration
•Dependencies •of Rold
•Depend. •of Rnew
•STOP
•TOKEN •TOKEN A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets
Key idea: make sure that new packets never wait behind old packets
Dynamic Reconfiguration
•Dependencies •of Rold
•Depend. •of Rnew
•STOP
•TOKEN •TOKEN A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets
Key idea: make sure that new packets never wait behind old packets
Dynamic Reconfiguration
•Dependencies •of Rold
•Depend. •of Rnew
•STOP
•TOKEN •TOKEN A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets
Key idea: make sure that new packets never wait behind old packets
Dynamic Reconfiguration
•Dependencies •of Rold
•Depend. •of Rnew
•STOP
•TOKEN A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets
Key idea: make sure that new packets never wait behind old packets
Dynamic Reconfiguration
•Dependencies •of Rold
•Depend. •of Rnew
•STOP
•TOKEN A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets
Key idea: make sure that new packets never wait behind old packets
Dynamic Reconfiguration
•Dependencies •of Rold
•Depend. •of Rnew
•STOP
•TOKEN
A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets
Key idea: make sure that new packets never wait behind old packets
Dynamic Reconfiguration
•Depend. •of Rnew
A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets
Key idea: make sure that new packets never wait behind old packets
Dynamic Reconfiguration
•Depend. •of Rnew
A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets
Key idea: make sure that new packets never wait behind old packets
Views – Fully Connected Subnetworks for endpoint fault-tolerance
The fat-tree is divided into a set of sub-networks.Each of these constitute a view.
Views – Fully Connected Subnetworks for endpoint fault-tolerance
Close-up of a subtree with 3 views
One link is present in one, and only one, view
Any path through the network is contained entirely within one view
Only bottom-tier switches (and the endnode-connections) will contain traffic for serveral views.
FROOTS – Dynamic Fault Tolerance
Configuration 1 Configuration 2
1 2 3
4 5
6 7 8
1 2 3
4 5
6 7 8
Full VL for the non affected traffic, and two VLs for the traffic affected by faults.
1 2 3
4 5
876
Handling faults
1 2 3
4 5
876
Fault tolerance – State Of The Art
•State of technology Topology agnostic routing algorithms (OFED)Static Reconfiguration with LASH (OFED)Endpoint Dynamic Reconfiguration (APM in IBA)
•State of KnowledgeDynamic ReconfigurationLocal Rerouting
New “Compatible” Routing function
Modularity of routing
What is the problem?
Dependencies aggregate
1c
2c
The aggregated dependencies in a switch fabric must either be identified and removed, or taken into consideration in how the fabric is used
So...
A configuration of ”Network of networks” is free from deadlocks if its channel dependency graph extended with the aggregated dependencies in the switches is free from deadlocks.
Well…what about local fault tolerance…?
Modularity of routing
•State of technology Not present
•State of KnowledgeWide open…but there is an approach
There is a way to do it better – find it!T. A. Edison
Simplicity is the ultimate sophistication.L.DaVinci