High Node Count - Scalability Challenges for...

40
High Node Count - Scalability Challenges for Interconnection Networks Professor Olav Lysne Simula Research Laboratory

Transcript of High Node Count - Scalability Challenges for...

Page 1: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

High Node Count -Scalability Challenges for Interconnection Networks

Professor Olav Lysne

Simula Research Laboratory

Page 2: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

Overview

Congestion control

Fault Tolerance

Scalable Modular Routing

State Of The Art:State of TechnologyState of KnowledgeState of Problem

Page 3: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

CONGESTION CONTROL

Page 4: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

Congestion tree

HOL blocked traffic (Victim)

FECN

BECN

The InfiniBand CC mechanism relies on a closed loop feedback control systems to remove the congestion tree.

Shared network resources could lead to network congestion and head-of-line (HOL) blocking.

Switch- Threshold- Marking Rate- Packet Size

Switch- Threshold- Marking Rate- Packet Size

Host- CCT- CCT Index Increase- CCT Index Min- CCT Index Limit- CCT Index Timer

Host- CCT- CCT Index Increase- CCT Index Min- CCT Index Limit- CCT Index Timer

Page 5: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

Experiments show that the HOL blocking leads to performance degradation when CC is not activated.

Page 6: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

The InfiniBand CC mechanism is able to remove both the HOL blocking and the parking lot problem.

Without CCWith CC

Parameter Values:

Threshold 15Marking Rate 1Packet Size 8

CCTI Increase 1CCTI Limit 127CCTI Min 0CCTI Timer 150

Parameter Values:

Threshold 15Marking Rate 1Packet Size 8

CCTI Increase 1CCTI Limit 127CCTI Min 0CCTI Timer 150

Page 7: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

The average throughput of the victim flow as a function of the Marking_Rate (sw) and the CCTI_Timer (host).

Page 8: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

The average combined throughput of the contributors as a function of the Marking_Rate and the CCTI_Timer.

Page 9: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

Contributors may experience unfairness if an unfortunate CCTI_Timer value is chosen

Contributors experience unfairnessamong each other for an extendedperiode of time each time a newcontributer is added whenan unfortunate timer is chosen.

Max value

Min value

.

.∆

∆ = (max value) – (min value)

TVV = Var(∆1, ∆2, ..., ∆n)

The “treatment variation variable”:

Page 10: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

The “treatment variation variable” rules out a large part of the parameter space.

Parameter Values:

Threshold 15Marking Rate 1Packet Size 8

CCTI Increase 1CCTI Limit 127CCTI Min 0CCTI Timer 150

Parameter Values:

Threshold 15Marking Rate 1Packet Size 8

CCTI Increase 1CCTI Limit 127CCTI Min 0CCTI Timer 150

Page 11: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

InfiniBand Congestion Control in M9(SUN™ DATACENTER INFINIBAND SWITCH 648)

• 20% of the nodes send to everyone• 80% of the nodes send to 8 hotspots

Gbps

Gbps

• IBTA Specification 1.2 compliant• 648 QDR/DDR/SDR 4x InfiniBand ports• Three-stage internal full Clos network (non-blocking)

!HS,

!CC

HS, !

CC

HS, C

C

HS, C

C☺

HS, C

C –

QP ☺

!HS,

CC

!HS,

!CC

HS, !

CC

HS, C

C

HS, C

C☺

!HS,

CC

HS, C

C –

QP ☺

Further simulation studies:- Different traffic patterns- Other topologies (M24: SUN DATA CENTER SWITCH 3456)

Page 12: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

Congestion Control - State of the art

•State of technology InfiniBand Congestion Control Fecn/BecnDatacenter Ethernet - TBDMuch more to be expected

•State of KnowledgeRegional Explicit Congestion NotificationImprovements on Fecn/BecnParametrizationsDynamics…?Impact on applications…?Much more to do

Page 13: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

Fault Tolerance-Living with faults

StaticReconfiguration-basedEnd-to-End ReroutingLocal Rerouting

Page 14: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

What is network deadlock?

• Deadlock is a cycle of packets all waiting for the next packet in the cycle to proceed before it can proceed itselg

• Routing functions may be deadlock free – topologies may not– for almost all topologies there

exist reasonable but deadlocking routing functions, as well as reasonable and deadlock free routing functions.

Page 15: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

Static Fault Tolerance

Checkpoint – Reconfigure – Rollback – Restart

Requires topology agnostic routing algorithms

LASH, TOR, LASH/TOR, L-turn, Segment-based, Up*/Down

Page 16: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

Dynamic Reconfiguration

A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets

Key idea: make sure that new packets never wait behind old packets

•Dependencies •of Rold

Page 17: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

Dynamic Reconfiguration

•Dependencies •of Rold

•TOKEN

A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets

Key idea: make sure that new packets never wait behind old packets

Page 18: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

Dynamic Reconfiguration

•Dependencies •of Rold

•Depend. •of Rnew

•STOP

A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets

Key idea: make sure that new packets never wait behind old packets

Page 19: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

Dynamic Reconfiguration

•Dependencies •of Rold

•Depend. •of Rnew

•STOP

•TOKEN •TOKEN A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets

Key idea: make sure that new packets never wait behind old packets

Page 20: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

Dynamic Reconfiguration

•Dependencies •of Rold

•Depend. •of Rnew

•STOP

•TOKEN •TOKEN A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets

Key idea: make sure that new packets never wait behind old packets

Page 21: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

Dynamic Reconfiguration

•Dependencies •of Rold

•Depend. •of Rnew

•STOP

•TOKEN •TOKEN A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets

Key idea: make sure that new packets never wait behind old packets

Page 22: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

Dynamic Reconfiguration

•Dependencies •of Rold

•Depend. •of Rnew

•STOP

•TOKEN •TOKEN A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets

Key idea: make sure that new packets never wait behind old packets

Page 23: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

Dynamic Reconfiguration

•Dependencies •of Rold

•Depend. •of Rnew

•STOP

•TOKEN A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets

Key idea: make sure that new packets never wait behind old packets

Page 24: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

Dynamic Reconfiguration

•Dependencies •of Rold

•Depend. •of Rnew

•STOP

•TOKEN A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets

Key idea: make sure that new packets never wait behind old packets

Page 25: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

Dynamic Reconfiguration

•Dependencies •of Rold

•Depend. •of Rnew

•STOP

•TOKEN

A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets

Key idea: make sure that new packets never wait behind old packets

Page 26: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

Dynamic Reconfiguration

•Depend. •of Rnew

A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets

Key idea: make sure that new packets never wait behind old packets

Page 27: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

Dynamic Reconfiguration

•Depend. •of Rnew

A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets

Key idea: make sure that new packets never wait behind old packets

Page 28: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

Views – Fully Connected Subnetworks for endpoint fault-tolerance

The fat-tree is divided into a set of sub-networks.Each of these constitute a view.

Page 29: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

Views – Fully Connected Subnetworks for endpoint fault-tolerance

Close-up of a subtree with 3 views

One link is present in one, and only one, view

Any path through the network is contained entirely within one view

Only bottom-tier switches (and the endnode-connections) will contain traffic for serveral views.

Page 30: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

FROOTS – Dynamic Fault Tolerance

Configuration 1 Configuration 2

1 2 3

4 5

6 7 8

1 2 3

4 5

6 7 8

Page 31: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

Full VL for the non affected traffic, and two VLs for the traffic affected by faults.

Page 32: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

1 2 3

4 5

876

Handling faults

1 2 3

4 5

876

Page 33: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

Fault tolerance – State Of The Art

•State of technology Topology agnostic routing algorithms (OFED)Static Reconfiguration with LASH (OFED)Endpoint Dynamic Reconfiguration (APM in IBA)

•State of KnowledgeDynamic ReconfigurationLocal Rerouting

New “Compatible” Routing function

Page 34: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

Modularity of routing

Page 35: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

What is the problem?

Page 36: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

Dependencies aggregate

1c

2c

The aggregated dependencies in a switch fabric must either be identified and removed, or taken into consideration in how the fabric is used

Page 37: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

So...

A configuration of ”Network of networks” is free from deadlocks if its channel dependency graph extended with the aggregated dependencies in the switches is free from deadlocks.

Page 38: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

Well…what about local fault tolerance…?

Page 39: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

Modularity of routing

•State of technology Not present

•State of KnowledgeWide open…but there is an approach

Page 40: High Node Count - Scalability Challenges for ...ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/... · - CCT Index Limit - CCT Index Timer Host - CCT - CCT Index

There is a way to do it better – find it!T. A. Edison

Simplicity is the ultimate sophistication.L.DaVinci