Low Power IP Design Methodology for Rapid Development of DSP Intensive SOC Platforms
-
Upload
kareem-wright -
Category
Documents
-
view
25 -
download
2
description
Transcript of Low Power IP Design Methodology for Rapid Development of DSP Intensive SOC Platforms
Low Power IP Design Methodology for Rapid Development of DSP Intensive SOC Platforms
T. ArslanA.T. ErdoganS. MasupeC. Chun-Fu
D. Thompson
Contents
• Introduction to power consumption
• Introduction to Main Concepts
• Low Power Design Methodology
• IP implementations
• Results and conclusions
Power Consumption in CMOS-Based DSP Systems
Vdd
VoutinV
C L
I sc Idy
Idd = I sc Idy+
I sc
V
Idy
V
t
out
in
P k.C.V . f I .V I Vave dd sc dd l dd 2
Power ReductionMethods
Reduce
C*= k.C
ReduceVdd
• Supply Voltage Reduction
• Clock Gating
Disadvantage:
• Added design effort
Common Approaches to Low Power Design
Systematic Low Power Design Approach
Exploit Algorithmic Correlations and Redundancies within an algorithm, then Map to hardware.
Verilog/VHDL
DSP AlgorithmLibrary
PerformanceCriteria
Block,Segmentation, etc.
Multiplier SC,Bus SC CAD
SynthesisComponent
Library
Ordering algorithm
Data representation
Netlist
Systematic Design Implementation Framework
Synthesis(Buildgate)
System Design(Verilog)
Verification(Behavioural Simulation)
Technology-SpecificNetlist
Verification(Gate-Level Simulation)
Verification(Post-Layout Simulation)
Floorplanning,Placement & Routing(Silicon Ensemble)
I/O PadsPlacement
Tape-out Verification(Dracula DRC/ERC/LVS)
SystemSpecifications
Layout
Design Flow for Filter IPs
Typical Single Multiplier DSP Processor Architecture
Multiplier
Adder
Output register
Control
ADCinput
x(n)
DACoutput
y(n)
Data bus
Coefficient bus
Datamemory
Coefficientmemory
Multiplier-accumulator(MAC)
Transpose Direct Form (TDF) FIR Structure
x(n)
z-1z-1
h(0) h(1) h(2)
y(n)
h(N-1)
z-1. . .
. . .
. . .
stage0 stage1stage1. . . stageN-1
PCV1(n)PCV1(n) PCV2(n) PCVN-1(n)PCV0(n)
Multiplier
Adder
Control
ADCinput
x(n)
DACoutput
y(n)
Data bus I
Coefficient bus
PCVMCoefficient
memory
Data bus II
Modified DSP Processor Architecture for TDF FIR Filter Implementation
Coefficient Memory Configuration with Coefficient Ordering
Order coefficients such that adjacent coefficients are highly correlated.
Filter Design(Matlab)
FilterSpecifications
Coefficient Set
Coefficient Ordering(C Routine)
OrderedCoefficient Set
Memory Configuration(C Routine)
Coefficient Words
Coefficient Word:
SF : Shift Flag
SF = 1 shift
SF = 0 no shift
PCVMA : Pre-Calculated Value Memory Address
h(k) PCVMA SF
Filter Specifications
Lowpass filter specifications
Filter #Passband
(kHz)Stopband
(kHz)
Passbandripple(dB)
Stopbandattenuation
(dB)
Windowfunction
Filterlength
1 0 - 1.5 2 - 4 0.1 50 Hamming 532 0 - 1.2 1.7 - 5 0.01 40 Kaiser 713 0 - 3.375 5.625 - 10 0.002 90 - 424 0 - 1 1.5 - 5 0.0135 56 - 615 0 - 1.5 2 - 4 0.1 50 Blackman 89
Bandpass filter specifications
Filter #Stopband
(kHz)Passband
(kHz)Stopband
(kHz)
Passbandripple(dB)
Stopbandattenuation
(dB)
Windowfunction
Filterlength
1 0 - 0.1 0.15 - 0.25 0.3 - 0.5 0.1 60 Kaiser 732 0 - 0.45 0.9 - 1.1 1.55 - 7.5 0.8 30 - 343 0 -5 8 - 12 15 - 44.14 0.00868 60 Kaiser 544 0 - 1 2 - 3.5 4.25 - 5 0.13 56.4 - 325 0 - 0.1 1.375-3.625 4 - 5 0.1 68.4 - 80
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
swit
ched
cap
acit
ance
(p
F)
IP1 IP2 IP3
PCVM bus
coefficient bus
data bus
multiplier
25%
54%
Power Reductions Achieved (wordlength = 16 bit)
Power Reductions for IP4 (wordlength = 16 bit)
0
100
200
300
400
500
600
700
800
900
swit
ched
cap
acit
ance
(p
F)
1 2 4 8 16
Block Size
coefficient bus
data bus
multiplier
40% 42%
50%53%
Reductions in Number of Memory Accesses (%)
0
10
20
30
40
50
60
70
80
90
100
Red
uct
ion
(%
)
2 4 8 16
Block Size
Data memory
Coefficient memory
Coefficient Set
Coefficient Set1 Coefficient Set2
Data Set
Shifter
Multiplier
Adder
Output
Coefficient Segmentation Algorithm
Coefficient Segmentation Algorithm for Two’s Complement Coding
Begin
H = (h0, h1, … , hL-1)
i = 0, k = 0
2i >= hk i = i + 1
sk = 2i-1
sk = - 2i
k > L -1i = 0
k = k + 1
End
No
No
No
Yes
Yes
Yes
3 hk <= 0
mk = 0
sk = hkNo
Yes
2 2i != hk
1
mk = hk-sk
mk = hk-sk
Coefficient Segmentation Algorithm for Sign-Magnitude Coding
Begin
H = (h0, h1, … , hL-1)
i = 0, k = 0
2i >= hk i = i + 1
mk = hk - 2i-1
sk = 2i-1hk
- 2i < hk - 2i-1
mk = hk - 2i
sk = 2i
hk < 0
mk = - mk
sk = - sk
k > L -1i = 0
k = k + 1
End
No
No
No
No
Yes
Yes
Yes
Yes
1
2
3
MSB(coefficient)
(two’s complement)
(sign magnitude)
Multiplier(two’s)
Add/Sub
Acc
Control
CoefficientMemory
DataMemory
Output
Simplified Filter Architecture for Mixed-Mode Multiplication
( sign magnitude)
( sign magnitude)
Multiplier(sign)
Add
Acc
Control
CoefficientMemory
DataMemory
Sign two’s
Output
Simplified Filter Architecture for Sign-Magnitude Multiplication
0
5
10
15
20
25
30
35
40
45
#Tra
nsi
tio
ns/
sam
ple
b0 b2 b4 b6 b8 b10
b12
b14
Bit Position
conventional
segmentation
Example Switching Activity Distribution with Two’s Complement Coding (N=89, W=16)
0
5
10
15
20
25
30
35
40
45
50
#Tra
nsi
tio
ns/
sam
ple
b0 b2 b4 b6 b8 b10
b12
b14
Bit Position
conventional
segmentation
Example Switching Activity Distribution with Sign-Magnitude Coding (N=89, W=16)
Two’s complement Mixed mode Sign-magnitudeMultiplier
sizeAlgorithm swcap/sample
(pF)Reduction
(%)swcap/sample
(pF)Reduction
(%)swcap/sample
(pF)Reduction
(%)
conventional 497 294 1628-bit
segmentation 236 52.52 222 24.49 81 50.00conventional 3862 2511 2173
16-bitsegmentation 2058 46.71 1806 28.08 1452 33.18conventional 14795 12281 11458
24-bitsegmentation 11051 25.31 10283 16.27 9367 18.25
Power Reductions Achieved with Coefficient Segmentation
0
500
1000
1500
2000
2500
3000
3500
4000
sw
itc
he
d c
ap
ac
ita
nc
e (
pF
)
twos mixed sign
Data representation
conventional
segmentation
Power Reduction in Multiplier Circuit (wordlength = 16 bit)
47%35%
53%44%
62%
0
500
1000
1500
2000
2500
3000
3500
4000
twos mixed sign
Data representation
multiplier
shifter
swit
ched
cap
acit
ance
(p
F) 46%
35%
51%44%
61%
Power Reduction (wordlength = 16 bit)
Power Reduction at Coefficient Bus (wordlength = 16 bit)
0
50
100
150
200
250
300
350
400
swit
ched
cap
acit
ance
(p
F)
twos mixed sign
Data representation
conventional
segmentation
49%37%
54%
37%
54%
0
2000
4000
6000
8000
10000
12000
14000
16000
Are
a
8-bit 16-bit 24-bit
Wordlength
IP1
IP2
IP3
Area Comparison
Conclusions
• A methodology for Low Power Implementation of DSP functions has been presented.
• The methodology has been used to develop a number of IPs.
• Significant reductions in Power is reported.
• Power reduction is achieved in the multiplier and system buses.
• Methodology can be used for prototyping other DSP functions.