Cs668 Lec7 Interconnection
-
Upload
vasant-chandrashekar -
Category
Documents
-
view
221 -
download
0
Transcript of Cs668 Lec7 Interconnection
![Page 1: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/1.jpg)
Lecture 10 Outline
Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy
![Page 2: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/2.jpg)
Interconnection Networks
Uses of interconnection networksConnect processors to shared memoryConnect processors to each other
Interconnection media typesShared mediumSwitched medium
![Page 3: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/3.jpg)
Shared versus Switched Media
![Page 4: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/4.jpg)
Shared Medium
Allows only message at a time Messages are broadcast Each processor “listens” to every message Arbitration is decentralized Collisions require resending of messages Ethernet is an example
![Page 5: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/5.jpg)
Switched Medium
Supports point-to-point messages between pairs of processors
Each processor has its own path to switch Advantages over shared media
Allows multiple messages to be sent simultaneously
Allows scaling of network to accommodate increase in processors
![Page 6: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/6.jpg)
Switch Network Topologies
View switched network as a graphVertices = processors or switchesEdges = communication paths
Two kinds of topologiesDirect Indirect
![Page 7: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/7.jpg)
Direct Topology
Ratio of switch nodes to processor nodes is 1:1
Every switch node is connected to1 processor nodeAt least 1 other switch node
![Page 8: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/8.jpg)
Indirect Topology
Ratio of switch nodes to processor nodes is greater than 1:1
Some switches simply connect other switches
![Page 9: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/9.jpg)
Evaluating Switch Topologies
Diameter Bisection width Number of edges / node (degree) Constant edge length? (yes/no)
Layout area
![Page 10: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/10.jpg)
2-D Mesh Network
Direct topology Switches arranged into a 2-D lattice Communication allowed only between
neighboring switches Variants allow wraparound connections
between switches on edge of mesh
![Page 11: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/11.jpg)
2-D Meshes
![Page 12: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/12.jpg)
Evaluating 2-D Meshes
Diameter: (n1/2)
Bisection width: (n1/2)
Number of edges per switch: 4
Constant edge length? Yes
![Page 13: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/13.jpg)
Binary Tree Network
Indirect topology n = 2d processor nodes, n-1 switches
![Page 14: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/14.jpg)
Evaluating Binary Tree Network
Diameter: 2 log n
Bisection width: 1
Edges / node: 3
Constant edge length? No
![Page 15: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/15.jpg)
Hypertree Network
Indirect topology Shares low diameter of binary tree Greatly improves bisection width From “front” looks like k-ary tree of height d From “side” looks like upside down binary
tree of height d
![Page 16: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/16.jpg)
Hypertree Network
![Page 17: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/17.jpg)
Evaluating 4-ary Hypertree
Diameter: log n
Bisection width: n / 2
Edges / node: 6
Constant edge length? No
![Page 18: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/18.jpg)
Butterfly Network
Indirect topology n = 2d processor
nodes connectedby n(log n + 1)switching nodes
0 1 2 3 4 5 6 7
3 ,0 3 ,1 3 ,2 3 ,3 3 ,4 3 ,5 3 ,6 3 ,7
2 ,0 2 ,1 2 ,2 2 ,3 2 ,4 2 ,5 2 ,6 2 ,7
1 ,0 1 ,1 1 ,2 1 ,3 1 ,4 1 ,5 1 ,6 1 ,7
0 ,0 0 ,1 0 ,2 0 ,3 0 ,4 0 ,5 0 ,6 0 ,7R ank 0
R ank 1
R ank 2
R ank 3
![Page 19: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/19.jpg)
Butterfly Network Routing
![Page 20: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/20.jpg)
Evaluating Butterfly Network
Diameter: log n
Bisection width: n / 2
Edges per node: 4
Constant edge length? No
![Page 21: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/21.jpg)
Hypercube
Directory topology 2 x 2 x … x 2 mesh Number of nodes a power of 2 Node addresses 0, 1, …, 2k-1 Node i connected to k nodes whose
addresses differ from i in exactly one bit position
![Page 22: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/22.jpg)
Hypercube Addressing
0010
0000
0100
0110 0111
1110
0001
0101
1000 1001
0011
1010
1111
1011
11011100
![Page 23: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/23.jpg)
Hypercubes Illustrated
![Page 24: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/24.jpg)
Evaluating Hypercube Network
Diameter: log n
Bisection width: n / 2
Edges per node: log n
Constant edge length? No
![Page 25: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/25.jpg)
Shuffle-exchange
Direct topology Number of nodes a power of 2 Nodes have addresses 0, 1, …, 2k-1 Two outgoing links from node i
Shuffle link to node LeftCycle(i)Exchange link to node [xor (i, 1)]
![Page 26: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/26.jpg)
Shuffle-exchange Illustrated
0 1 2 3 4 5 6 7
![Page 27: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/27.jpg)
Shuffle-exchange Addressing
0 0 0 0 0 0 0 1 0 0 1 0 0 0 11 0 1 0 0 0 1 0 1
11 1 0 11 1 11 0 0 0 1 0 0 1 1 0 1 0 1 0 11 11 0 0 1 1 0 1
0 1 1 0 0 1 1 1
![Page 28: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/28.jpg)
Evaluating Shuffle-exchange
Diameter: 2log n - 1
Bisection width: n / log n
Edges per node: 2
Constant edge length? No
![Page 29: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/29.jpg)
Comparing Networks
All have logarithmic diameterexcept 2-D mesh
Hypertree, butterfly, and hypercube have bisection width n / 2
All have constant edges per node except hypercube
Only 2-D mesh keeps edge lengths constant as network size increases
![Page 30: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/30.jpg)
Vector Computers
Vector computer: instruction set includes operations on vectors as well as scalars
Two ways to implement vector computersPipelined vector processor: streams data
through pipelined arithmetic unitsProcessor array: many identical, synchronized
arithmetic processing elements
![Page 31: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/31.jpg)
Why Processor Arrays?
Historically, high cost of a control unit Scientific applications have data
parallelism
![Page 32: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/32.jpg)
Processor Array
![Page 33: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/33.jpg)
Data/instruction Storage
Front end computerProgramData manipulated sequentially
Processor arrayData manipulated in parallel
![Page 34: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/34.jpg)
Processor Array Performance
Performance: work done per time unit Performance of processor array
Speed of processing elementsUtilization of processing elements
![Page 35: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/35.jpg)
Performance Example 1
1024 processors Each adds a pair of integers in 1 sec What is performance when adding two
1024-element vectors (one per processor)?
sec/ops10024.1ePerformanc 9sec1
operations1024
![Page 36: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/36.jpg)
Performance Example 2
512 processors Each adds two integers in 1 sec Performance adding two vectors of length
600?
sec/ops103ePerformanc 6sec2
operations600
![Page 37: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/37.jpg)
2-D Processor Interconnection Network
Each VLSI chip has 16 processing elements
![Page 38: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/38.jpg)
if (COND) then A else B
![Page 39: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/39.jpg)
if (COND) then A else B
![Page 40: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/40.jpg)
if (COND) then A else B
![Page 41: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/41.jpg)
Processor Array Shortcomings
Not all problems are data-parallel Speed drops for conditionally executed code Don’t adapt to multiple users well Do not scale down well to “starter” systems Rely on custom VLSI for processors Expense of control units has dropped
![Page 42: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/42.jpg)
Multiprocessors
Multiprocessor: multiple-CPU computer with a shared memory
Same address on two different CPUs refers to the same memory location
Avoid three problems of processor arraysCan be built from commodity CPUsNaturally support multiple usersMaintain efficiency in conditional code
![Page 43: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/43.jpg)
Centralized Multiprocessor
Straightforward extension of uniprocessor Add CPUs to bus All processors share same primary memory Memory access time same for all CPUs
Uniform memory access (UMA) multiprocessorSymmetrical multiprocessor (SMP)
![Page 44: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/44.jpg)
Centralized Multiprocessor
![Page 45: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/45.jpg)
Private and Shared Data
Private data: items used only by a single processor
Shared data: values used by multiple processors
In a multiprocessor, processors communicate via shared data values
![Page 46: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/46.jpg)
Problems Associated with Shared Data
Cache coherenceReplicating data across multiple caches
reduces contentionHow to ensure different processors have
same value for same address? Synchronization
Mutual exclusionBarrier
![Page 47: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/47.jpg)
Cache-coherence Problem
Cache
CPU A
Cache
CPU B
Memory
7X
![Page 48: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/48.jpg)
Cache-coherence Problem
CPU A CPU B
Memory
7X
7
![Page 49: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/49.jpg)
Cache-coherence Problem
CPU A CPU B
Memory
7X
7 7
![Page 50: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/50.jpg)
Cache-coherence Problem
CPU A CPU B
Memory
2X
7 2
![Page 51: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/51.jpg)
Write Invalidate Protocol
CPU A CPU B
7X
7 7 Cache control monitor
![Page 52: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/52.jpg)
Write Invalidate Protocol
CPU A CPU B
7X
7 7
Intent to write X
![Page 53: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/53.jpg)
Write Invalidate Protocol
CPU A CPU B
7X
7
Intent to write X
![Page 54: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/54.jpg)
Write Invalidate Protocol
CPU A CPU B
X 2
2
![Page 55: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/55.jpg)
Distributed Multiprocessor
Distribute primary memory among processors
Increase aggregate memory bandwidth and lower average memory access time
Allow greater number of processors Also called non-uniform memory access
(NUMA) multiprocessor
![Page 56: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/56.jpg)
Distributed Multiprocessor
![Page 57: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/57.jpg)
Cache Coherence
Some NUMA multiprocessors do not support it in hardwareOnly instructions, private data in cacheLarge memory access time variance
Implementation more difficultNo shared memory bus to “snoop”Directory-based protocol needed
![Page 58: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/58.jpg)
Directory-based Protocol
Distributed directory contains information about cacheable memory blocks
One directory entry for each cache block Each entry has
Sharing statusWhich processors have copies
![Page 59: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/59.jpg)
Sharing Status Uncached
Block not in any processor’s cache Shared
Cached by one or more processors Read only
Exclusive Cached by exactly one processor Processor has written block Copy in memory is obsolete
![Page 60: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/60.jpg)
Directory-based ProtocolInterconnection Network
Directory
Local Memory
Cache
CPU 0
Directory
Local Memory
Cache
CPU 1
Directory
Local Memory
Cache
CPU 2
![Page 61: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/61.jpg)
Directory-based ProtocolInterconnection Network
CPU 0 CPU 1 CPU 2
7X
Caches
Memories
Directories X U 0 0 0
Bit Vector
![Page 62: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/62.jpg)
CPU 0 Reads XInterconnection Network
CPU 0 CPU 1 CPU 2
7X
Caches
Memories
Directories X U 0 0 0
Read Miss
![Page 63: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/63.jpg)
CPU 0 Reads XInterconnection Network
CPU 0 CPU 1 CPU 2
7X
Caches
Memories
Directories X S 1 0 0
![Page 64: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/64.jpg)
CPU 0 Reads XInterconnection Network
CPU 0 CPU 1 CPU 2
7X
Caches
Memories
Directories X S 1 0 0
7X
![Page 65: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/65.jpg)
CPU 2 Reads XInterconnection Network
CPU 0 CPU 1 CPU 2
7X
Caches
Memories
Directories X S 1 0 0
7X
Read Miss
![Page 66: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/66.jpg)
CPU 2 Reads XInterconnection Network
CPU 0 CPU 1 CPU 2
7X
Caches
Memories
Directories X S 1 0 1
7X
![Page 67: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/67.jpg)
CPU 2 Reads XInterconnection Network
CPU 0 CPU 1 CPU 2
7X
Caches
Memories
Directories X S 1 0 1
7X 7X
![Page 68: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/68.jpg)
CPU 0 Writes 6 to XInterconnection Network
CPU 0 CPU 1 CPU 2
7X
Caches
Memories
Directories X S 1 0 1
7X 7X
Write Miss
![Page 69: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/69.jpg)
CPU 0 Writes 6 to XInterconnection Network
CPU 0 CPU 1 CPU 2
7X
Caches
Memories
Directories X S 1 0 1
7X 7X
Invalidate
![Page 70: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/70.jpg)
CPU 0 Writes 6 to XInterconnection Network
CPU 0 CPU 1 CPU 2
7X
Caches
Memories
Directories X E 1 0 0
6X
![Page 71: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/71.jpg)
CPU 1 Reads XInterconnection Network
CPU 0 CPU 1 CPU 2
7X
Caches
Memories
Directories X E 1 0 0
6X
Read Miss
![Page 72: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/72.jpg)
CPU 1 Reads XInterconnection Network
CPU 0 CPU 1 CPU 2
7X
Caches
Memories
Directories X E 1 0 0
6X
Switch to Shared
![Page 73: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/73.jpg)
CPU 1 Reads XInterconnection Network
CPU 0 CPU 1 CPU 2
6X
Caches
Memories
Directories X E 1 0 0
6X
![Page 74: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/74.jpg)
CPU 1 Reads XInterconnection Network
CPU 0 CPU 1 CPU 2
6X
Caches
Memories
Directories X S 1 1 0
6X 6X
![Page 75: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/75.jpg)
CPU 2 Writes 5 to XInterconnection Network
CPU 0 CPU 1 CPU 2
6X
Caches
Memories
Directories X S 1 1 0
6X 6X
Write Miss
![Page 76: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/76.jpg)
CPU 2 Writes 5 to XInterconnection Network
CPU 0 CPU 1 CPU 2
6X
Caches
Memories
Directories X S 1 1 0
6X 6X
Invalidate
![Page 77: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/77.jpg)
CPU 2 Writes 5 to XInterconnection Network
CPU 0 CPU 1 CPU 2
6X
Caches
Memories
Directories X E 0 0 1
5X
![Page 78: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/78.jpg)
CPU 0 Writes 4 to XInterconnection Network
CPU 0 CPU 1 CPU 2
6X
Caches
Memories
Directories X E 0 0 1
5X
Write Miss
![Page 79: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/79.jpg)
CPU 0 Writes 4 to XInterconnection Network
CPU 0 CPU 1 CPU 2
6X
Caches
Memories
Directories X E 1 0 0
Take Away
5X
![Page 80: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/80.jpg)
CPU 0 Writes 4 to XInterconnection Network
CPU 0 CPU 1 CPU 2
5X
Caches
Memories
Directories X E 0 1 0
5X
![Page 81: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/81.jpg)
CPU 0 Writes 4 to XInterconnection Network
CPU 0 CPU 1 CPU 2
5X
Caches
Memories
Directories X E 1 0 0
![Page 82: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/82.jpg)
CPU 0 Writes 4 to XInterconnection Network
CPU 0 CPU 1 CPU 2
5X
Caches
Memories
Directories X E 1 0 0
5X
![Page 83: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/83.jpg)
CPU 0 Writes 4 to XInterconnection Network
CPU 0 CPU 1 CPU 2
5X
Caches
Memories
Directories X E 1 0 0
4X
![Page 84: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/84.jpg)
CPU 0 Writes Back X BlockInterconnection Network
CPU 0 CPU 1 CPU 2
5X
Caches
Memories
Directories X E 1 0 0
4X
4X
Data Write Back
![Page 85: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/85.jpg)
CPU 0 Writes Back X BlockInterconnection Network
CPU 0 CPU 1 CPU 2
4X
Caches
Memories
Directories X U 0 0 0
![Page 86: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/86.jpg)
Multicomputer
Distributed memory multiple-CPU computer Same address on different processors refers to
different physical memory locations Processors interact through message passing Commercial multicomputers Commodity clusters
![Page 87: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/87.jpg)
Asymmetrical Multicomputer
![Page 88: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/88.jpg)
Asymmetrical MC Advantages
Back-end processors dedicated to parallel computations Easier to understand, model, tune performance
Only a simple back-end operating system needed Easy for a vendor to create
![Page 89: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/89.jpg)
Asymmetrical MC Disadvantages Front-end computer is a single point of
failure Single front-end computer limits scalability
of system Primitive operating system in back-end
processors makes debugging difficult Every application requires development of
both front-end and back-end program
![Page 90: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/90.jpg)
Symmetrical Multicomputer
![Page 91: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/91.jpg)
Symmetrical MC Advantages
Alleviate performance bottleneck caused by single front-end computer
Better support for debugging Every processor executes same program
![Page 92: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/92.jpg)
Symmetrical MC Disadvantages
More difficult to maintain illusion of single “parallel computer”
No simple way to balance program development workload among processors
More difficult to achieve high performance when multiple processes on each processor
![Page 93: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/93.jpg)
ParPar Cluster, A Mixed Model
![Page 94: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/94.jpg)
Commodity Cluster
Co-located computers Dedicated to running parallel jobs No keyboards or displays Identical operating system Identical local disk images Administered as an entity
![Page 95: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/95.jpg)
Network of Workstations
Dispersed computers First priority: person at keyboard Parallel jobs run in background Different operating systems Different local images Checkpointing and restarting important
![Page 96: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/96.jpg)
Flynn’s Taxonomy
Instruction stream Data stream Single vs. multiple Four combinations
SISD SIMD MISD MIMD
![Page 97: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/97.jpg)
SISD
Single Instruction, Single Data Single-CPU systems Note: co-processors don’t count
Functional I/O
Example: PCs
![Page 98: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/98.jpg)
SIMD
Single Instruction, Multiple Data Two architectures fit this category
Pipelined vector processor(e.g., Cray-1)
Processor array(e.g., Connection Machine)
![Page 99: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/99.jpg)
MISD
MultipleInstruction,Single Data
Example:systolic array
![Page 100: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/100.jpg)
MIMD
Multiple Instruction, Multiple Data Multiple-CPU computers
MultiprocessorsMulticomputers
![Page 101: Cs668 Lec7 Interconnection](https://reader031.fdocuments.in/reader031/viewer/2022030317/577ccf2a1a28ab9e788f0c54/html5/thumbnails/101.jpg)
Summary
Commercial parallel computers appearedin 1980s
Multiple-CPU computers now dominate Small-scale: Centralized multiprocessors Large-scale: Distributed memory
architectures (multiprocessors or multicomputers)