Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture...

17
Vector/Array Processors CSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section 18.7

Transcript of Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture...

Page 1: Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.

Vector/Array ProcessorsCSCI 4717 – Computer Architecture

CSCI 4717/5717 Computer Architecture

Topic: Vector/Array Processors

Reading: Stallings, Section 18.7

Page 2: Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.

Vector/Array ProcessorsCSCI 4717 – Computer Architecture

Vector/Array Computing

• Optimized for calculation rather than multitasking and I/O

• Design focus is to perform parallel mathematical operations on a vector or array of data elements

• Scalar processor would need to handle one element at a time.

• Limited market -- Research, government agencies, meteorology

Page 3: Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.

Vector/Array ProcessorsCSCI 4717 – Computer Architecture

Vector/Array Computing (continued)

• Target applications:– data-intensive/scientific research such as:

• Aerodynamics, seismology, meteorology• Continuous field simulation

– specialized (high-performance) graphics applications

• Applicable because of ever-increasing need for improved resolution and model capabilities

Page 4: Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.

Vector/Array ProcessorsCSCI 4717 – Computer Architecture

Array Processor

• Alternative to supercomputer

• Configured as a peripheral to mainframe or minicomputer

• Processor is only responsible for running vector portion of problem

• The Sony PlayStation 3 uses a processor consisting of one scalar processor and eight vector processors. Developed by IBM, Toshiba and Sony. (Source: http://en.wikipedia.org/wiki/Vector_computer)

Page 5: Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.

Vector/Array ProcessorsCSCI 4717 – Computer Architecture

Vector/Array Operation

• Power of vector computing comes in the form of special processing instructions (Single Instruction, Multiple Data or SIMD)

• Lock-step execution of code issuing single instruction to a large number of identical processors (or ALUs) with a large register set working on different data elements

• Single master CPU keeps control of the entire process

Page 6: Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.

Vector/Array ProcessorsCSCI 4717 – Computer Architecture

Speed-Up Not Linear

• As with any parallel processing architecture, the realized speed up of a vector processor is not linear because of:– Overhead for managing parallel computations– Bottlenecks for communication and storage– Load of application doesn't always match

available processors

• These problems have an increasing effect with increases in the number of processors

Page 7: Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.

Vector/Array ProcessorsCSCI 4717 – Computer Architecture

Data Pipelining

• The sequential nature of instructions allows for an instruction pipeline

• Vector computing tends to have data that is well organized too

• This allows for pipelining the data too

• Single decode for instruction

• Stages to fetch data, process data, store result in register

Page 8: Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.

Vector/Array ProcessorsCSCI 4717 – Computer Architecture

Data Pipelining (continued)

• Example: To add an array of numbers, processor must have the following information:– a single "add" instruction– start address for the data– end address for the data

Page 9: Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.

Vector/Array ProcessorsCSCI 4717 – Computer Architecture

Vector/Array Programming

• The programming goal is to divide a large dataset into independent sets that can be operated on in parallel

• Requires a deep understanding of the algorithm being applied to the data

• Distribute data to different processors

• Initiate parallel processing

• Bring everything back together when parallel processing is complete

Page 10: Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.

Vector/Array ProcessorsCSCI 4717 – Computer Architecture

Vector/Array Programming (continued)

• Example: Count the number of times a specific value appears in a large array

• Begin by breaking up array into smaller arrays, one for each array processor

• Each array processor, in parallel, counts the number of occurrences of the value

• Final sum is then computed by adding the results from all of the processors

Page 11: Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.

Vector/Array ProcessorsCSCI 4717 – Computer Architecture

Vector/Array ApplicationsWhich of the following applications would be better served by a vector or array computer than an SMP, cluster, or scalar processor? What component of the problem is parallel?– Web search indexing– Generating Fibonacci Sequence: f(i) = f(i-1) + f(i-2)– Weather prediction– Image processing for a game– Web site server– Photoshop-type image processing

Page 12: Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.

Vector/Array ProcessorsCSCI 4717 – Computer Architecture

Scalar Programming• The following two slides are based on the

multiplication of two 100X100 matrices A and B

DO 100 I = 1,N

DO 100 J = 1,N

C(I,J) = 0.0

DO 100 K = 1,N

C(I,J) = C(I,J) + A(I,K)*B(K,J) (J = 1,N)

100 CONTINUE

Page 13: Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.

Vector/Array ProcessorsCSCI 4717 – Computer Architecture

(J = 1,N) Vector Programming

• The notation (J = 1,N) indicates that operations on all indices J are to be carried out on N processors as a single operation

DO 100 I=1,N

C(I,J) = 0.0 (J = 1,N)

DO 100 K = 1,N

C(I,J) = C(I,J) + A(I,K)*B(K,J) (J = 1,N)

100 CONTINUE

Page 14: Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.

Vector/Array ProcessorsCSCI 4717 – Computer Architecture

Fork/Join Parallel Programming

• One method of parallel programming is the fork-join.

• Programs start as a single process known as a master thread

• The operation "fork" is used to indicate the beginning of sections of the program that are to be executed in parallel

• The operation "join" is used to terminate the parallel threads created by "fork" to bring the program back to a single, master thread

Page 15: Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.

Vector/Array ProcessorsCSCI 4717 – Computer Architecture

Fork/Join Method (continued)

DO 50 J=1,N – 1

FORK 100

50 CONTINUE

J = N

100 DO 200 I=1,N

C(I,J) = 0.0

DO 200 K = 1,N

C(I,J) = C(I,J) + A(I,K)*B(K,J)

200 CONTINUE

Page 16: Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.

Vector/Array ProcessorsCSCI 4717 – Computer Architecture

Neural Networks

Page 17: Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.

Vector/Array ProcessorsCSCI 4717 – Computer Architecture

What?! A Blank Slide?!It must be over!!!