Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture...
-
Upload
lewis-conley -
Category
Documents
-
view
218 -
download
0
Transcript of Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture...
Vector/Array ProcessorsCSCI 4717 – Computer Architecture
CSCI 4717/5717 Computer Architecture
Topic: Vector/Array Processors
Reading: Stallings, Section 18.7
Vector/Array ProcessorsCSCI 4717 – Computer Architecture
Vector/Array Computing
• Optimized for calculation rather than multitasking and I/O
• Design focus is to perform parallel mathematical operations on a vector or array of data elements
• Scalar processor would need to handle one element at a time.
• Limited market -- Research, government agencies, meteorology
Vector/Array ProcessorsCSCI 4717 – Computer Architecture
Vector/Array Computing (continued)
• Target applications:– data-intensive/scientific research such as:
• Aerodynamics, seismology, meteorology• Continuous field simulation
– specialized (high-performance) graphics applications
• Applicable because of ever-increasing need for improved resolution and model capabilities
Vector/Array ProcessorsCSCI 4717 – Computer Architecture
Array Processor
• Alternative to supercomputer
• Configured as a peripheral to mainframe or minicomputer
• Processor is only responsible for running vector portion of problem
• The Sony PlayStation 3 uses a processor consisting of one scalar processor and eight vector processors. Developed by IBM, Toshiba and Sony. (Source: http://en.wikipedia.org/wiki/Vector_computer)
Vector/Array ProcessorsCSCI 4717 – Computer Architecture
Vector/Array Operation
• Power of vector computing comes in the form of special processing instructions (Single Instruction, Multiple Data or SIMD)
• Lock-step execution of code issuing single instruction to a large number of identical processors (or ALUs) with a large register set working on different data elements
• Single master CPU keeps control of the entire process
Vector/Array ProcessorsCSCI 4717 – Computer Architecture
Speed-Up Not Linear
• As with any parallel processing architecture, the realized speed up of a vector processor is not linear because of:– Overhead for managing parallel computations– Bottlenecks for communication and storage– Load of application doesn't always match
available processors
• These problems have an increasing effect with increases in the number of processors
Vector/Array ProcessorsCSCI 4717 – Computer Architecture
Data Pipelining
• The sequential nature of instructions allows for an instruction pipeline
• Vector computing tends to have data that is well organized too
• This allows for pipelining the data too
• Single decode for instruction
• Stages to fetch data, process data, store result in register
Vector/Array ProcessorsCSCI 4717 – Computer Architecture
Data Pipelining (continued)
• Example: To add an array of numbers, processor must have the following information:– a single "add" instruction– start address for the data– end address for the data
Vector/Array ProcessorsCSCI 4717 – Computer Architecture
Vector/Array Programming
• The programming goal is to divide a large dataset into independent sets that can be operated on in parallel
• Requires a deep understanding of the algorithm being applied to the data
• Distribute data to different processors
• Initiate parallel processing
• Bring everything back together when parallel processing is complete
Vector/Array ProcessorsCSCI 4717 – Computer Architecture
Vector/Array Programming (continued)
• Example: Count the number of times a specific value appears in a large array
• Begin by breaking up array into smaller arrays, one for each array processor
• Each array processor, in parallel, counts the number of occurrences of the value
• Final sum is then computed by adding the results from all of the processors
Vector/Array ProcessorsCSCI 4717 – Computer Architecture
Vector/Array ApplicationsWhich of the following applications would be better served by a vector or array computer than an SMP, cluster, or scalar processor? What component of the problem is parallel?– Web search indexing– Generating Fibonacci Sequence: f(i) = f(i-1) + f(i-2)– Weather prediction– Image processing for a game– Web site server– Photoshop-type image processing
Vector/Array ProcessorsCSCI 4717 – Computer Architecture
Scalar Programming• The following two slides are based on the
multiplication of two 100X100 matrices A and B
DO 100 I = 1,N
DO 100 J = 1,N
C(I,J) = 0.0
DO 100 K = 1,N
C(I,J) = C(I,J) + A(I,K)*B(K,J) (J = 1,N)
100 CONTINUE
Vector/Array ProcessorsCSCI 4717 – Computer Architecture
(J = 1,N) Vector Programming
• The notation (J = 1,N) indicates that operations on all indices J are to be carried out on N processors as a single operation
DO 100 I=1,N
C(I,J) = 0.0 (J = 1,N)
DO 100 K = 1,N
C(I,J) = C(I,J) + A(I,K)*B(K,J) (J = 1,N)
100 CONTINUE
Vector/Array ProcessorsCSCI 4717 – Computer Architecture
Fork/Join Parallel Programming
• One method of parallel programming is the fork-join.
• Programs start as a single process known as a master thread
• The operation "fork" is used to indicate the beginning of sections of the program that are to be executed in parallel
• The operation "join" is used to terminate the parallel threads created by "fork" to bring the program back to a single, master thread
Vector/Array ProcessorsCSCI 4717 – Computer Architecture
Fork/Join Method (continued)
DO 50 J=1,N – 1
FORK 100
50 CONTINUE
J = N
100 DO 200 I=1,N
C(I,J) = 0.0
DO 200 K = 1,N
C(I,J) = C(I,J) + A(I,K)*B(K,J)
200 CONTINUE
Vector/Array ProcessorsCSCI 4717 – Computer Architecture
Neural Networks
Vector/Array ProcessorsCSCI 4717 – Computer Architecture
What?! A Blank Slide?!It must be over!!!