1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized...
-
date post
22-Dec-2015 -
Category
Documents
-
view
215 -
download
2
Transcript of 1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized...
![Page 1: 1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59a31/html5/thumbnails/1.jpg)
1
Lecture 25: Interconnection Networks
• Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)
• Review session, Wednesday Dec 1st, 10-12, LCR (MEB 3147)
• Final exam reminders• Come early, 10:35 – 12:15• Same rules as first midterm, open books/notes/…, • Can use calculators and laptops (no search or internet)• 20% from first midterm material; remaining 80% from caches, multiprocs, TM• 20% new problems• Attempt every question
![Page 2: 1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59a31/html5/thumbnails/2.jpg)
2
Topologies
• Internet topologies are not very regular – they grew incrementally
• Supercomputers have regular interconnect topologies and trade off cost for high bandwidth
• Nodes can be connected with centralized switch: all nodes have input and output wires going to a centralized chip that internally handles all routing decentralized switch: each node is connected to a switch that routes data to one of a few neighbors
![Page 3: 1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59a31/html5/thumbnails/3.jpg)
3
Centralized Crossbar Switch
P1
P2
P3
P4
P5
P6
P7
P0
Crossbarswitch
![Page 4: 1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59a31/html5/thumbnails/4.jpg)
4
Centralized Crossbar Switch
P1
P2
P3
P4
P5
P6
P7
P0
![Page 5: 1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59a31/html5/thumbnails/5.jpg)
5
Crossbar Properties
• Assuming each node has one input and one output, a crossbar can provide maximum bandwidth: N messages can be sent as long as there are N unique sources and N unique destinations
• Maximum overhead: WN2 internal switches, where W is data width and N is number of nodes
• To reduce overhead, use smaller switches as building blocks – trade off overhead for lower effective bandwidth
![Page 6: 1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59a31/html5/thumbnails/6.jpg)
6
Switch with Omega Network
P1
P2
P3
P4
P5
P6
P7
P0000
001
010
011
100
101
110
111 111
110
101
100
011
010
001
000
![Page 7: 1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59a31/html5/thumbnails/7.jpg)
7
Omega Network Properties
• The switch complexity is now O(N log N)
• Contention increases: P0 P5 and P1 P7 cannot happen concurrently (this was possible in a crossbar)
• To deal with contention, can increase the number of levels (redundant paths) – by mirroring the network, we can route from P0 to P5 via N intermediate nodes, while increasing complexity by a factor of 2
![Page 8: 1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59a31/html5/thumbnails/8.jpg)
8
Tree Network
• Complexity is O(N)• Can yield low latencies when communicating with neighbors• Can build a fat tree by having multiple incoming and outgoing links
P0 P3P2P1 P4 P7P6P5
![Page 9: 1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59a31/html5/thumbnails/9.jpg)
9
Bisection Bandwidth
• Split N nodes into two groups of N/2 nodes such that the bandwidth between these two groups is minimum: that is the bisection bandwidth
• Why is it relevant: if traffic is completely random, the probability of a message going across the two halves is ½ – if all nodes send a message, the bisection bandwidth will have to be N/2
• The concept of bisection bandwidth confirms that the tree network is not suited for random traffic patterns, but for localized traffic patterns
![Page 10: 1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59a31/html5/thumbnails/10.jpg)
10
Distributed Switches: Ring
• Each node is connected to a 3x3 switch that routes messages between the node and its two neighbors
• Effectively a repeated bus: multiple messages in transit
• Disadvantage: bisection bandwidth of 2 and N/2 hops on average
![Page 11: 1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59a31/html5/thumbnails/11.jpg)
11
Distributed Switch Options
• Performance can be increased by throwing more hardware at the problem: fully-connected switches: every switch is connected to every other switch: N2 wiring complexity, N2 /4 bisection bandwidth
• Most commercial designs adopt a point between the two extremes (ring and fully-connected):
Grid: each node connects with its N, E, W, S neighbors Torus: connections wrap around Hypercube: links between nodes whose binary names differ in a single bit
![Page 12: 1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59a31/html5/thumbnails/12.jpg)
12
Topology Examples
GridHypercube
Torus
Criteria Bus Ring 2Dtorus 6-cube Fully connected
Performance
Bisection bandwidth
Cost
Ports/switch
Total links
![Page 13: 1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59a31/html5/thumbnails/13.jpg)
13
Topology Examples
GridHypercube
Torus
Criteria Bus Ring 2Dtorus 6-cube Fully connected
Performance
Bisection bandwidth
1 2 16 32 1024
Cost
Ports/switch
Total links 1
3
128
5
192
7
256
64
2080
![Page 14: 1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59a31/html5/thumbnails/14.jpg)
14
k-ary d-cube
• Consider a k-ary d-cube: a d-dimension array with k elements in each dimension, there are links between elements that differ in one dimension by 1 (mod k)
• Number of nodes N = kd
Number of switches :Switch degree :Number of links :Pins per node :
Avg. routing distance:Diameter :Bisection bandwidth :Switch complexity :
Should we minimize or maximize dimension?
![Page 15: 1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59a31/html5/thumbnails/15.jpg)
15
k-ary d-Cube
• Consider a k-ary d-cube: a d-dimension array with k elements in each dimension, there are links between elements that differ in one dimension by 1 (mod k)
• Number of nodes N = kd
Number of switches :Switch degree :Number of links :Pins per node :
Avg. routing distance:Diameter :Bisection bandwidth :Switch complexity :
N2d + 1Nd2wd
d(k-1)/2d(k-1)2wkd-1
Should we minimize or maximize dimension?
(2d + 1)2
(with no wraparound)
![Page 16: 1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59a31/html5/thumbnails/16.jpg)
16
Routing
• Deterministic routing: given the source and destination, there exists a unique route
• Adaptive routing: a switch may alter the route in order to deal with unexpected events (faults, congestion) – more complexity in the router vs. potentially better performance
• Example of deterministic routing: dimension order routing: send packet along first dimension until destination co-ord (in that dimension) is reached, then next dimension, etc.
![Page 17: 1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59a31/html5/thumbnails/17.jpg)
17
Deadlock
• Deadlock happens when there is a cycle of resource dependencies – a process holds on to a resource (A) and attempts to acquire another resource (B) – A is not relinquished until B is acquired
![Page 18: 1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59a31/html5/thumbnails/18.jpg)
18
Deadlock Example
Packets of message 1
Packets of message 2
Packets of message 3
Packets of message 4
4-way switchOutput ports
Each message is attempting to make a left turn – it must acquire anoutput port, while still holding on to a series of input and output ports
Input ports
![Page 19: 1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59a31/html5/thumbnails/19.jpg)
19
Deadlock-Free Proofs
• Number edges and show that all routes will traverse edges in increasing (or decreasing) order – therefore, it will be impossible to have cyclic dependencies
• Example: k-ary 2-d array with dimension routing: first route along x-dimension, then along y
1 2 32 1 0
1 2 32 1 0
1 2 32 1 0
1 2 32 1 0
17
18
19
18
17
16
![Page 20: 1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59a31/html5/thumbnails/20.jpg)
20
Breaking Deadlock I
• The earlier proof does not apply to tori because of wraparound edges
• Partition resources across multiple virtual channels
• If a wraparound edge must be used in a torus, travel on virtual channel 1, else travel on virtual channel 0
![Page 21: 1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59a31/html5/thumbnails/21.jpg)
21
Breaking Deadlock II
• Consider the eight possible turns in a 2-d array (note that turns lead to cycles)
• By preventing just two turns, cycles can be eliminated
• Dimension-order routing disallows four turns
• Helps avoid deadlock even in adaptive routing
West-First North-Last Negative-First Can allowdeadlocks
![Page 22: 1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d775503460f94a59a31/html5/thumbnails/22.jpg)
22
Title
• Bullet