(Page 554 – 564) Ping Perez CS 147 Summer 2001 Alternative Parallel Architectures Dataflow ...

(Page 554 – 564)

Ping PerezCS 147Summer 2001

Alternative Parallel Architectures

Dataflow

Systolic arrays

Neural networks

To understand how data flow computers work, it is first necessary to understand dataflow graphs. As a computer program is compiled, it is converted into its equivalent dataflow graph, which shows the data dependencies between statements and is used by the dataflow computer to generate the structures it needs to execute the program.

A code segment and its dataflow graph

+ +

+

+

+

1. A B + C

2. D E + F

3. G A + H

4. I D + G

5. J I + K

B C E F

H

K

As shown in the figure, each vertex of the graph corresponds to the operator performed by one of the instructions. The directed edges going to a vertex correspond to the operands of the function performed by the vertex, and the directed edge leaving the vertex represents the result generated by the function.

• A B + C• D E + F• G A + H • I D + G• J I + K

+

+ +

+

+

B C E F

H

K

Single Assignment Rule

This code segment has four violations of the single assignment rule, starting with statement 2. The value stored by this statement, B, was used as an operand in statement 1, so it must be renamed. We can rename it B1, and change all references to it later in this code. Similarly, values C and D, set by statements 3 and 4, are also used as operands in prior statements and must be renamed.

• 1. A B + C• 2. B A + D• 3. C A + B• 4. D C + B• 5. A A + C

Single Assignment Rule (con’t)

Finally, statement 5 stores its result in A, the same variable used to store the result in statement1, we must also change this variable’s name. Note that statement 2, 3 and 5 all use A as an operand: This is not a violation of the single assignment rule. An operand can be used many times.

• 1. A B + C• 2. B A + D• 3. C A + B• 4. D C + B• 5. A A + C

1. A B + C

2. B1 A + D

3. C1 A + B1

4. D1 C1 + B1

5. A1 A + C1

B C

D

++

+

+

+

Single Assignment RuleSingle Assignment Rule

The data flow graph describes the dependencies between statements and how data will flow between statements. An edge, however, does not show when data flows from one statement to another. The data that traverses an edge is called a token. When a token is available, it is represented as a dot on the edge.

A vertex is ready to fire, or execute its instruction when all edges have tokens, or the instruction’s operands are all available.

B C

D

+

+

+

+ +

I - Structures

Within the computer system, dataflow vertices are usually stored as I-structures. Each I-structure includes the operation to be performed, its operands, and a list of destinations for its result.

An I-structure and the dataflow graph with I-structure

+ 2 ( ) { 2 / 1 }

+ 2 3 { 2/1, 3/1,4/2}

+ ( ) 4 {3/2,4/2}

+ ( ) ( ) {4/1,5/2}

+ ( ) ( ) - + ( ) ( ) -

The architectures of dataflow system

1. Static architectures

2. Dynamic architectures

Static dataflow computer organization

This figure shows the organization of the static dataflow computer. The I-store unit has two sections. The memory section stores the I-structures of the dataflow program.

I-store unit

Processors Firing queue

Memory sectionUpdate/Ready/section

What is Systolic Arrays?

Systolic array incorporates several processing elements into a regular structure, such as linear array or mesh. Each processing element performs a single, fixed function, and communicates only with its neighboring processing elements.

A 2 X 2 systolic array to multiply two matrices

UL 1,1 R

D

UL 1,2 R

D

UL 2,1 R

D

UL 2,2 R

D

During the first clock cycle we input A1,1 to input L and B1,1 to input U of processing element 1,1. This processing element calculates A1,1B1,1 and adds it to its running total and running time remain 0.

A1,1

0

B1,1 0

Total=A1,1 B1,1

Total= 0

Total= 0

Total= 0

During the second clock cycle, we input A1,2 to L and B2,1 to U, this processing element multiplies them and adds to product to its running total, which becomes A1,1B1,1 + A1,1B2,1, the finial value of C1,1.

A1,2

A2,1

B2,1 B1,2

Total=A1,1 B1,1 +A1,2B2,1

Total= A2,1B1,1

Total=A1,1B1,2

Total= 0

A1,1

Clock cycle 3 continues the matrix multiplication. Since C1,1 has already been calculated, we input 0 to the inputs of processing element 1,1 so the running total is not changed. The final values of C1,2 and C2,1are calculated during this clock cycle and first part of C2,2 is generated.

0

0 B2,2

Total=

A1,1 B1,1+A1,1B2,1

Total=A1,1B1,1 +A2,2B2,1

Total=A1,1B1,2 +A1,2B2,2

Total= A2,1B1,2A2,2

B2,1 B1,2

A2,1

A1,2

The final value of C2,2 is calculated during clock cycle 4, as shown in the figure, at this point, multiplication of the two matrices has been computed.

Total=

A1,1 B1,1+A1,1B2,1

Total=A2,1B1,1 +A2,2B2,1

Total=A1,1B1,2 +A1,2B2,2

Total= A2,1B1,2+A2,2B2,2

B2,2

A2,2

• Neural network are different from any other computing structure.

• They incorporate thousands or millions of simple processing elements called neurons.

• They have far less processing power than CPU.

Unlike traditional computer, which are programmed,neural networks are trained. Training consists of defining system input data and defining the desiredsystem outputs for that input data.

System outputs are generated as a function as afunction of the outputs of individual neurons. Eachneuron’s output, in turn is a function of the outputsof the neurons to which it is connected. The output of each neuron is multiplied by its weighting factor.All of these weighted values are added together.

( 1 )

This value is compared to the threshold value for that neuron. If the weighted value is greater than or equal to the threshold value, the neural output value is 1, otherwise it’s output is 0.

(2)

1

Label 2 1 3 4

weight 0.1 0.2 0.3 0.4weight

value

Label 1 2 3 4

Weight 0.1 0.2 0.3 0.4

Value 1 1 0 1

Input 1*0.1 + 1*0.2 + 0*0.3 + 1* 0.4Value =0.7 > 0.65 (N’s threshold value)

Since this weighted value 0.7 is greater than the thresholdValue, neuron N outputs a logical value of 1

Where is a neural network be used? A neural network is not appropriate for general

purpose computing, you won’t find a neural network running windows on a personal computer. Instead it has found applications in tasks that do not run well on conventional architectures. Neural networks are also being used in control systems and artificial intelligence applications.

(Page 554 – 564) Ping Perez CS 147 Summer 2001 Alternative Parallel Architectures Dataflow ...

Documents

Transcript of (Page 554 – 564) Ping Perez CS 147 Summer 2001 Alternative Parallel Architectures Dataflow ...