Pipelining and Parallel Processing -...

Post on 30-Jun-2020

1 views 0 download

Transcript of Pipelining and Parallel Processing -...

1

Pipelining and

Parallel Processing

Shao-Yi Chien

DSP in VLSI Design Shao-Yi Chien 2

Introduction

Pipelining and parallel are the most important

design techniques in VLSI DSP systems

Make use of the inherent parallel property of

DSP algorithms

Pipelining: different function units working in parallel

Parallel: duplicated function units working in parallel

DSP in VLSI Design Shao-Yi Chien 3

An Example:

Car Production Line

mT

T T T

mT

mT

mT

m

n

Cost: 1

Throughput: 1/mT

Latency: mT

Throughput: how

many cars produced in

one hour

Latency: how long will

it take to produce one

carCost: >1

Throughput: 1/T

Latency: >mT

Cost: >n

Throughput: (1/mT)*n

Latency: >mT

DSP in VLSI Design Shao-Yi Chien 4

Pipelining of Digital Filters (1/3)

FIR filter

Critical path: TM+2TA

DSP in VLSI Design Shao-Yi Chien 5

Pipelining of Digital Filters (2/3)

Pipelined FIR filter

AMsample TTT AM

sampleTT

f

1

DSP in VLSI Design Shao-Yi Chien 6

Pipelining of Digital Filters (3/3)

Schedule

DSP in VLSI Design Shao-Yi Chien 7

Pipeline

Can reduce the critical path to increase

the working frequency and sample rate

TM+2TATM+TA

DSP in VLSI Design Shao-Yi Chien 8

Drawbacks of Pipelining

Increasing latency (in cycle)

For M-level pipelined system, the number of

delay elements in any path from input to

output is (M-1) greater than the origin one

Increase the number of latches (registers)

DSP in VLSI Design Shao-Yi Chien 9

How to Do Pipelining?

Put pipelining latches across any feed-forward cutset of the graph

Cutset A cutset is a set of edges of a graph such that if these

edges are removed from the graph, the graph becomes disjoint

Feed-forward cutset The data move in the forward direction on all the

edges of the cutset

DSP in VLSI Design Shao-Yi Chien 10

How to Do Pipelining?

Feed-forward cutset

DSP in VLSI Design Shao-Yi Chien 11

Example

In the SFG in Fig. (a), all the computation time for each node is 1 u.t.

(a) Calculate the critical path

(b) The critical path is reduced to 2 u.t. by inserting 3 extra delay elements. Is it a valid pipelining?

DSP in VLSI Design Shao-Yi Chien 12

Fine-Grain Pipelining

Critical path (TM=10, TA=2) TM+2TA=14

TM+TA=12

TM1=6 or TM2+TA=6

DSP in VLSI Design Shao-Yi Chien 13

Notes for Pipelining (1/2)

Pipelining is a very simple design technique which can maintain the input output data configuration and sampling frequency

Tclk=Tsample

Supported in many EDA tools

Still has some limitations Pipeline bubbles

Has some problems for recursive system

Introduces large hardware cost for 2-D or 3-D data

Communication bound

DSP in VLSI Design Shao-Yi Chien 14

Notes for Pipelining (2/2)

Effective pipelining

Put pipelining registers on the critical path

Balance pipelining

10(2+8): critical path=8

10(5+5): critical path=5

DSP in VLSI Design Shao-Yi Chien 15

Parallel of Digital Filters (1/5) Single-input single-output (SISO) system

Multiple-input multiple-output (MIMO) system

3-Parallel System!

DSP in VLSI Design Shao-Yi Chien 16

Parallel of Digital Filters (2/5)

Parallel processing, block processing

Block size (L): the number of data to be

processed at the same time

Block delay (L-slow)

A latch is equivalent to L clock cycles at the

sample rate

DSP in VLSI Design Shao-Yi Chien 17

Parallel of Digital Filters (3/5)

The critical path is the same

Tclk is not equal to Tsample

L-slow

DSP in VLSI Design Shao-Yi Chien 18

Parallel of Digital Filters (4/5)

Whole system

DSP in VLSI Design Shao-Yi Chien 19

Parallel of Digital Filters (5/5)

Serial-to-parallel converter Parallel-to-serial converter

DSP in VLSI Design Shao-Yi Chien 20

Parallel Processing v.s.

Pipelining Parallel processing is superior than pipelining

processing for the I/O bottleneck

(communication bounded)

Pipelining only can increase the clock rate

System clock rate = sample rate for pipelining system

Use parallel processing can further lower the required

working frequency

DSP in VLSI Design Shao-Yi Chien 21

Pipelining-Parallel Architecture

DSP in VLSI Design Shao-Yi Chien 22

Notes for Parallel Processing

The input/output data access scheme

should be carefully designed, it will cost a

lot sometimes

Tclk>Tsample, fclk<fsample

Large hardware cost

Combined with pipelining processing

DSP in VLSI Design Shao-Yi Chien 23

Low Power Issues

Pipelining and parallel processing are also beneficial for low power design

Propagation delay

Power consumption

Assume the sampling frequency is the same

DSP in VLSI Design Shao-Yi Chien 24

Pipelining for Low Power (1/2)

For M-level pipelining

Critical path is reduced to 1/M

The capacitance is also reduced to Ccharge/M

The supply voltage can be reduced to , and the

propagation delay remains unchanged

DSP in VLSI Design Shao-Yi Chien 25

Pipelining for Low Power (2/2)

Power consumption:

How about the parameter ?

Solve this equation to get

DSP in VLSI Design Shao-Yi Chien 26

Example

Parameters TM=10 u.t.

TA=2 u.t.

Tm1=6 u.t.

Tm2=4 u.t.

CM=5CA

Vt=0.6V

Normal Vcc=5V

(a) New supply voltage?

(b) Power saving percentage?

DSP in VLSI Design Shao-Yi Chien 27

Answer(a)

Origin system:

Pipelined system:

(b)

Invalid value, less

than threshold

voltage

DSP in VLSI Design Shao-Yi Chien 28

Parallel Processing for Low

Power (1/2) For L-parallel system

Clock period: TseqLTseq

Ccharge remains unchanged

CtotolLCtotal

Have more time to charge the capacitance, the supply

voltage can be lower

DSP in VLSI Design Shao-Yi Chien 29

Parallel Processing for Low

Power (2/2)

Power consumption

To derive the parameter

DSP in VLSI Design Shao-Yi Chien 30

Example

Parameters TM=8 u.t.

TA=1 u.t.

Tsample=9 u.t.

CM=8CA

Vt=0.45V

Normal Vcc=3.3V

(a) New supply voltage?

(b) Power saving percentage?

DSP in VLSI Design Shao-Yi Chien 31

Answer(a)

Origin:

2-parallel system:

(b)

Invalid value, less

than threshold

voltage

DSP in VLSI Design Shao-Yi Chien 32

Pipelining-Parallel for Low

Power

DSP in VLSI Design Shao-Yi Chien 33

Conclusion

Pipelining

Reduce the critical path

Increase the clock speed

Reduce power consumption at the same speed

Parallel

Increase effective sampling rate

Reduce power consumption at the same speed