Pipe Lining concept

7/30/2019 Pipe Lining concept

1/23

Pipelining

Advanced Computer Architecture


2/23

Pipelining Techniques

Linear Pipeline Processors

Asynchronous and Synchronous Models

Clocking and Timing control

Speedup, Efficiency and Throughput

Non Linear Pipeline Processors

Reservation and Latency Analysis

Collision Free Scheduling

Pipeline Schedule Optimization


3/23


A linear pipeline processoris constructed with k

processing stages i.e. S1 Sk

These stages are linearly connected to perform a

specific function

Data stream flows from one end of the pipeline to

another end, external inputs are fed into S1 and results

move out from Sk , intermediate results pass from Si to

Si+1

Linear pipelining applied to:-

Instruction execution

Arithmetic computation

Memory access operations


4/23











5/23

Asynchronous Model

Data flow controlled by handshaking protocol

When a stage Si is ready to transmit, it sends a ready

signalto stage Si+1

This is followed by the actual data transfer After stage Si+1 receives the data, it returns an

acknowledge signalto Si

Source: Kai Hwan


6/23

Synchronous Model

Clocked latches are used to interface between

stages

Latches are flip flops that isolate inputs from outputs.

Upon arrival of a clock pulse, all latches transfer datato next stage at same time.

Pipeline stages are combinational circuits.

Source: Kai Hwan


7/23

Reservation Table

It specifies the utilization pattern of successive stages in

a synchronous pipeline

Space time graph depicting precedence relationship in

using the pipeline stages

Source: Kai Hwang


8/23











9/23

Clocking and Timing Control

Clock cycle and throughput:

Clock cycle time (t) of a pipeline is given below

t = tm + d

where

tm denote maximum stage delay

d denote latch delay

Pipeline frequency (1/t) is referred as throughput of the pipeline

Clock skewing:

Ideally clock pulses should arrive at all stages at same time, butdue to clock skewing, same clock pulse may arrive at different

stages with an offset of s

Further, let tmax be time delay of longest logic path in a stage and

tmin be that of shortest logic path in a stage, then

d + tmax + s


10/23











11/23

Speedup

Case 1: Pipelined processor

Ideally, number of clock cycles required by a k stage pipeline to

process n tasks is:-

Np = k + (n-1)

(k clock cycles for first task & 1 clock cycle for each of n-1 tasks)

Total time required is

Tp = (k+(n-1))t

Case 2: Non-pipelined processor

Non-pipelined processor would take time, Tnp = nkt

Speedup Factor:

Sk = Tp / Tnp = nkt / (k+ (n-1))t = nk / (k + n-1))


12/23

Efficiency & Throughput

Efficiency: It is defined as speedup divided by number of

stages:-

Ek = Sk / k = n / (k + (n-1))

Throughput: It is defined as number of tasks per unittime as below:-

Hk = n / (k + (n-1))t = nf / (k + (n-1))


13/23











14/23


It has a dynamic pipeline that can be reconfigured to

perform different functions at different times

Dynamic pipeline allows feedback and feedforward

connections in addition to the conventional streamline

connections

Output of the non-linear pipeline is not necessarily from

the last stage.

Source: Kai Hwang


15/23











16/23

Reservation Tables

Each table evaluates a

function

Number of columns in a

reservation table represent

the evaluation time

Pipeline initiation happens

when input for a function is

fed into the pipeline

Note: There is only a single

reservation table of linear

pipeline

Source: Kai Hwan


17/23

Latency Analysis

Number of time units between two initiations of pipeline

is called latency

Any attempt by two or more

initiations to use the samepipeline stage at same time

causes collision

Latencies that cause collisions

are called forbidden latencies

Source: Kai Hwan


18/23

Latency Analysis contd.

A sequence of permissible non-forbidden latencies

between successive task

initiations is called

latency sequence Latency sequence

repeats itself after

every fixed number of

cycles called latencycycle

Source: Kai Hwang


19/23


Linear Pipeline ProcessorsAsynchronous and Synchronous Models








20/23


Scheduling Goal: To obtain shortest average

latency between initiations without collisions

Next, we aim to study a systematic method to

achieve collision free scheduling Collision vectors

State diagrams

Single cycles

Greedy cycles

Minimal average latency (MAL)


21/23

Collision Vector

Combined set of permissible and forbidden

latencies can be displayed by a collision vector

It is a binary representation of size 1 . n-1,

where n is evaluation timeC = (Cn-1 Cn-2.. C2 C1)

Ci = 1, if latency i causes a collision

Ci

= 0, if latency i is permissible

Examples: Cx = (1011010) ; Cy = (1010)


22/23

State Diagrams

From the collision vector, one

can construct a state

diagram, specifying the

permissible state transitions

among successive initiations

Next state is obtained with

the help of a shift register

and at time t+p wherep

refers to a permissiblelatency

Source: Kai Hwan


23/23

Cycles

There are many latency cycles that can be

traced from state diagram

Eg. (1,8), (1,8,6,8), (3), (6), (3,8), etc.

Among these only simple cycles are of interest Simple cycle is the latency cycle in which each state

appears only once.

Eg. (3), (6), (1,8), etc.

Some of these simple cycles are greedy cycles

Greedy cycle is the one whose edges are all made

with minimum latencies from respective starting

states.

Eg (1 8) (3) etc

Pipe Lining concept

Documents

Transcript of Pipe Lining concept