Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single...

36
1 Parallel Machine

Transcript of Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single...

Page 1: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

1

Parallel Machine

Page 2: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

2

CPU Usage

Normal computer – 1 CPU & 1 memoryThe problem of Von Neumann Bottleneck:

Slow processing because the CPU faster than memorySolution

Use multiple CPUs or multiple ALUsFor simultaneous processingKnown as parallel computers or multiprocessors computer

To improve the efficiency of computerIncrease the speed of processorImprove memory access

Page 3: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

3

Type of Processing

Flynn Taxonomy

FLYNN TAXONOMY

Single Instruction Multiple Instruction

SISD SIMD MISD MIMD

Page 4: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

4

Type of ProcessingSISD

Single Instruction Single DataSingle processor executes a single instruction stream to operate on data stored in single memory. Example: Von Neumann Machine.

SIMD Single Instruction Multiple DataSingle machine instruction controls the simultaneous execution of a number of processing elements on a lockstep basis. Each processing elements has an associated data memory, so that each instruction is executed on different set of data

VectorParallel

Page 5: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

5

Type of ProcessingMISD

Multiple Instruction Single DataA sequence of data is transmitted to a set of processors, each of which executes a different instruction sequence.This structure is not commercially implemented.Extraordinary

MIMDMultiple Instruction Multiple DataA set of processors simultaneously execute different instructionsequence on different data sets.SMP (Symmetric Multiprocessor) and NUMA (Non Uniform Memory Access)Shared memory

switchbus

Distributed / Local MemorySwitch Bus

Page 6: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

6

MIMD Distributed MemoryUsing many CPUs connectedCPU control the implementation of each operation separatelyCan perform various tasks simultaneously2 techniques of connection between CPU and memory:

Direct connectionNet/Grid Connection

The relationship between a corner with the opposite corner is fatSolution: use the hypercube / n-cube

Page 7: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

7

Direct Connection

Page 8: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

8

Net Connection

Page 9: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

9

Hypercube Connection

100

101

111

000

010

011110

001

Route from 100 to 111

XOR 100

111

011

Therefore, the possible routes are through 110 and 101

Page 10: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

10

MIMD Shared Memory – Bus

Use Bus – simple and easy

Cache memory

Cache memory

Cache memory

Memory

CPU1 CPU2 CPU3

Page 11: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

11

MIMD Shared Memory – Bus

Using busProblem: Von Neumann BottleneckSolution: Use cache memory in each CPU

Problem: Coherence cache memory – 2 processors read the same data. When one of them change the data, the other processor assumed the data is original and did know the changing of data. Solution:

SoftwareHardware

Page 12: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

12

MIMD Shared Memory – Bus

Solution in softwareClassified the data

SharedRead onlyRead-Write

UnsharedProblem on shared data read-write

Solution: not allow the caching

Page 13: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

13

MIMD Shared Memory - Bus

Solution in hardwareUsing the cache memory controller and cache memory resolution protocolThe required word block will be loaded in the memory cache.

Page 14: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

14

MIMD Shared Memory - Switch

Crossed Switch that connecting n CPU with k memoryAdvantage – network without barriesDisadvantage – use a lot of cross point (increase in n2)

Page 15: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

15

Omega NetworkHave log2 n stages / levels with n/2 switch in each stageExample:Omega Network8 CPU x 8 memoryStage : log28 = 3Number of Switch : 8/2 = 4Total number of switch = 3 * 4 = 12

Less crossed point. Disadvantage – network detained

Suis Bersilang

8 CPU x 8 Ingatan = 64 Suis

Page 16: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

16

Omega Network

000

001

010

011

100

101

110

111

1A

1B

1C

2A

2B

2C

3A

3B

3C

000

001

010

011

100

101

110

1111D 2D 3D

Page 17: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

17

Benes Network

Resolved obstacles in omega networkUse more switches and more stage

Provide more route options from CPU to memory

Page 18: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

18

SIMD Parallel Computer

Execution of programs with the same set of data simultaneouslyMore simple, cheap and very fastExample: connection machine

Page 19: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

19

Connection Machine

Consist of: 4 quadrant which can be operated separately1 quadrant = 2 part of 8KPE (8192 processors)Each quadrant has:

ALU 8Kb memory4 bit flagsInterface with memory and I/O system1 route determinant

Page 20: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

20

Connection Machine

The compiler is written in C or LISPEach section of 8KPE sub-cube quadrant is divided into 2 part of 4KPE (256 cip pemproses)Each 4KPE subcube has I/O system of its ownBus Width I/O = 64 bitHas 39 disk drive I/O – 1 disk 1 bit

Page 21: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

21

SIMD Computer Vector

Connection machine is only suitable to solve artificial intelligent problemsFor floating point arithmetic such as grafic processing that involves vectors, connection machine is not suitableExample of SIMD Computer Vector – Super Computer CRAY-1

Page 22: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

22

CRAY-1

Consist of Multiple ALU that can operate simultaneously2 addressing unit to compute addresses4 unit integer scalar for arithmetic operations.6 unit vector integer for vector operations

Page 23: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

23

Cache MemoryCharacteristics of Memory System

Location:Refers to whether memory is internal or external to the computerExample: main memory, cache (internal) and optical disk, magnetic disk (external)

Capacity: Number of words or Number of bytesUnit of transfer: Word or blockAccess Method: Sequential, Direct, Random, AssociativePerformance: Access time, cycle time and transfer timePhysical type: semiconductor, magnetic, opticalPhysical characteristic: volatile or erasableOrganization: memory modules

Page 24: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

24

Cache Memory PrinciplesIt is intended to give memory speed approaching that of the fastest memory availableAt the same time provide a large memory size at the price of less expensive types of semiconductor memories.The cache contains a copy of portions of main memory.When the processor attempts to read a word of memory:

A check is made to determine if the word is in the cache.If so, word is delivered to the processor.If not, a block of main memory is read into cache and the word is delivered to the processor.

The phenomenon of locality of reference, it is likely that there will be future references to that same memory location or to other words in the block

Page 25: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

25

Cache/Main Memory Structure

Page 26: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

26

Cache/Main Memory Principles

Main memory consists up to 2n addressable words, with each word having a unique n-bit address.For mapping purpose, this memory is considered to consist of a number of fixed length blocks of K words each.That is, M=2n/K blocks in main memory.The cache consist of m blocks called lines.Each line contains K words, plus a tag of a few bits.Each line also includes control bits.

Page 27: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

27

Cache Read Operation

Page 28: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

28

Cache Mapping Function

An algorithm is needed for mapping main memory blocks to cache line.It is because of a fewer cache lines than main memory blocks.The choice of the mapping function dictates how the cache is organized.Three technique can be used: Direct, associative and set associative.

Page 29: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

29

ExampleA line is an adjacent series of bytes in main memory (that is, their addresses are contiguous). Suppose a line is 16 bytes in size. For example, suppose we have a 212 = 4K-byte cache with 28 = 256 16-byte lines; a 224 = 16M-byte main memory, which is 212 = 4K times the size of the cache; and a 400-line program, which will not all fit into the cache at once.

Page 30: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

30

Direct MappingUnder this mapping scheme, each memory line j maps to cache line j mod 128 so the memory address looks like this:

Here:The "Word" field selects one from among the 16 addressable words in a line:The "Line" field defines the cache line where this memory line should reside.The "Tag" field of the address is is then compared with that cache line's 5-bit tag to determine whether there is a hit or a miss. If there's a miss, we need to swap out the memory line that occupies that position in the cache and replace it with the desired memory line.

Page 31: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

31

Direct MappingE.g., Supposed that we want to read or write a word at the address 357A, whose 16 bits are 0011010101111010. This translates to Tag = 6, line = 87, and Word = 10 (all in decimal). If line 87 in the cache has the same tag (6), then memory address 357A is in the cache. Otherwise, a miss has occurred and the contents of cache line 87 must be replaced by the memory line 001101010111 = 855 before the read or write is executed.Direct mapping is the most efficient cache mapping scheme, but it is also the least effective in its utilization of the cache - that is, it may leave some cache lines unused.

Page 32: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

32

Associative mappingThis mapping scheme attempts to improve cache utilization, but at the expense of speed. Here, the cache line tags are 12 bits, rather than 5, and any memory line can be stored in any cache line. Thememory address looks like this:

Here:The "Tag" field identifies one of the 2 12 = 4096 memory lines; all the cache tags are searched to find out whether or not the Tag field matches one of the cache tags. If so, we have a hit, and if not there's a miss and we need to replace one of the cache linesby this line before reading or writing into the cache.The "Word" field again selects one from among 16 addressable words (bytes) within the line.

Page 33: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

33

Associative MappingFor example, suppose again that we want to read or write a word at the address 357A, whose 16 bits are 0011010101111010. Under associative mapping, this translates to Tag = 855 and Word = 10 (in decimal). So we search all of the 128 cache tags to see if any one of themwill match with 855. If not, there's a miss and we need to replace one of the cache lines with line 855 from memory before completing the read or write.The search of all 128 tags in the cache is time-consuming. However, the cache is fully utilized since none of its lines will be unused prior to a miss (recall that direct mapping may detect a miss even though the cache is not completely full of active lines).

Page 34: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

34

Set Associative MappingThis scheme is a compromise between the direct and associative schemes described above. Here, the cache is divided into sets of tags, and the set number is directly mapped from the memory address (e.g., memory line j is mapped to cache set j mod 64), as suggested by the diagram below:

Page 35: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

35

Set Associative MappingThe memory address is now partitioned to like this:

Here:The "Tag" field identifies one of the 26 = 64 different memory lines in each of the 26 = 64 different "Set" values. Since each cache set has room for only two lines at a time, the search for a match is limited to those two lines (rather than the entire cache). If there's a match, we have a hit and the read or write can proceed immediately. Otherwise, there's a miss and we need to replace one of the two cache lines by this line before reading or writing into the cache. The "Word" field again select one from among 16 addressable words inside the line.

Page 36: Bab 5 Mesin Selari - ftsm.ukm.myweek4parallelmachine).pdf · 4 Type of Processing SISD Single Instruction Single Data Single processor executes a single instruction stream to operate

36

Set Associative MappingIn set-associative mapping, when the number of lines per set is n, the mapping is called n-way associative. For instance, the above example is 2-way associative.Example: Again, supposed that we want to read or write a word at the memory address 357A, whose 16 bits are 0011010101111010. Under set-associative mapping, this translates to Tag = 13, Set = 23, and Word = 10 (all in decimal). So we search only the two tags in cache set 23 to see if either one matches tag 13. If so, we have a hit. Otherwise, one of these two must be replaced by the memory line being addressed (good old line 855) before the read or write can be executed.