Chapter 4 Low-Power VLSI Design
Transcript of Chapter 4 Low-Power VLSI Design
![Page 1: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/1.jpg)
Chapter 4Low-Power VLSI Design
Jin-Fu Li Advanced Reliable Systems (ARES) Lab.
Department of Electrical EngineeringNational Central University
Jhongli, Taiwan
![Page 2: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/2.jpg)
2EE4012VLSI DesignNational Central University
Outline
• Introduction
• Low-Power Gate-Level Design
• Low-Power Architecture-Level Design
• Algorithmic-Level Power Reduction
• RTL Techniques for Optimizing Power
![Page 3: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/3.jpg)
3EE4012VLSI DesignNational Central University
Introduction • Most SOC design teams now regard power as one
of their top design concerns
• Why low-power design? Battery lifetime (especially for portable devices) Reliability
• Power consumption Peak power Average power
![Page 4: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/4.jpg)
4EE4012VLSI DesignNational Central University
Overview of Power Consumption • Average power consumption
Dynamic power consumption Short-circuit power consumption Leakage power consumption Static power consumption
• Dynamic power dissipation during switching
Cinput
CdrainCinput
interconnect
![Page 5: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/5.jpg)
5EE4012VLSI DesignNational Central University
Overview of Power Consumption • Generic representation of a CMOS logic gate for
switching power calculation
inputerconnectdrain CCC int
pMOS network
nMOS network
Vout
2/
0 2/]))(()([1 T T
Tout
loadoutDDout
loadoutavg dtdt
dVCVVdtdt
dVCVT
P
VA
VB
VA
VB
![Page 6: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/6.jpg)
6EE4012VLSI DesignNational Central University
Overview of Power Consumption • The average power consumption can be expressed
as
• The node transition rate can be slower than the clock rate. To better represent this behavior, a node transition factor ( ) should be introduced
• The switching power expressed above are derived by taking into account the output node load capacitance
CLKDDloadDDloadavg fVCVCT
P 221
CLKDDloadTavg fVCP 2T
![Page 7: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/7.jpg)
7EE4012VLSI DesignNational Central University
Overview of Power Consumption
Cload
Cinternal
Vinternal
VA
VB
VA VB
Vout
Vinternal
VA
VB
Vout
CLKDD
ofnodes
iiiTiavg fVVCP
#
1
The generalized expression for the average power dissipation can be rewritten as
![Page 8: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/8.jpg)
8EE4012VLSI DesignNational Central University
Gate-Level Design – Technology Mapping
• The objective of logic minimization is to reduce the boolean function.
• For low-power design, the signal switching activity is minimized by restructuring a logic circuit
• The power minimization is constrained by the delay, however, the area may increase.
• During this phase of logic minimization, the function to be minimized is
i
iii CPP )1(
![Page 9: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/9.jpg)
9EE4012VLSI DesignNational Central University
Gate-Level Design – Technology Mapping
• The first step in technology mapping is to decompose each logic function into two-input gates
• The objective of this decomposition is to minimize the total power dissipation by reducing the total switching activity
0384.0
ABCD
A 0.2
B 0.2
C 0.5D 0.5
A 0.2
B 0.2
C 0.5D 0.5
0099.00196.0
0099.0
1875.0
0384.0
![Page 10: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/10.jpg)
10EE4012VLSI DesignNational Central University
Gate-Level Design – Phase Assignment
A
B
High activity node
C
A
B
High activity node
C
![Page 11: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/11.jpg)
11EE4012VLSI DesignNational Central University
Gate-Level Design – Pin Swapping
b
a
d
c
ba dc
b
a
d
c
ba dc
Sw
itching activity
Sw
itching activity
ba
dc b
a
dc
![Page 12: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/12.jpg)
12EE4012VLSI DesignNational Central University
Gate-Level Design – Glitching Power
• Glitches spurious transitions due to imbalanced path delays
• A design has more balanced delay paths has fewer glitches, and thus has less power dissipation
• Note that there will be no glitches in a dynamic CMOS logic
A
B
C
D
E
A
B
CD
E
![Page 13: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/13.jpg)
13EE4012VLSI DesignNational Central University
Gate-Level Design – Glitching Power
• A chain structure has more glitches
• A tree structure has fewer glitchesA
B
C
D
A
B
CD
Chain structure
Tree structure
![Page 14: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/14.jpg)
14EE4012VLSI DesignNational Central University
Gate-Level Design – Precomputation
Combinational LogicREG
R1
REG
R2
Combinational LogicREG
R1
REG
R2
Precomputation Logic
![Page 15: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/15.jpg)
15EE4012VLSI DesignNational Central University
Gate-Level Design – Precomputation
1-bit Comparator(MSB)
REGR4
(n-1)-bitComparator
REGR1
REGR2
REGR3
A<n-1>B<n-1>
A<n-2:0>
B<n-2:0>
Precomputation logicEnable
F
![Page 16: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/16.jpg)
16EE4012VLSI DesignNational Central University
Gate-Level Design – Gating Clock
D Q D Q D Q D Q
D Q D Q D Q D Q
T
clk
clk
Fail DFT rule checking
Add control pin to solve DFT violation problem
![Page 17: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/17.jpg)
17EE4012VLSI DesignNational Central University
Gate-Level Design – Input Gating
+
clk
f1
f2
select
![Page 18: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/18.jpg)
18EE4012VLSI DesignNational Central University
Clock-Gating in Low-Power Flip-Flop
D QD
CK
Source: Prof. V. D. Agrawal
![Page 19: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/19.jpg)
19EE4012VLSI DesignNational Central University
Reduced-Power Shift Register
D Q D Q D Q
D QD QD Q
D Q
D Q
D
CK(f/2)
mul
tiple
xer
Output
Flip-flops are operated at full voltage and half the clock frequency.
Source: Prof. V. D. Agrawal
![Page 20: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/20.jpg)
20EE4012VLSI DesignNational Central University
Power Consumption of Shift Register
P = C’VDD2f/n
Degree of parallelism, n1 2 4
Nor
mal
ized
pow
er
1.0
0.5
0.25
0.0
Deg. Of parallelism
Freq (MHz)
Power (μW)
1 33.0 1535
2 16.5 887
4 8.25 738
16-bit shift register, 2μ CMOS
C. Piguet, “Circuit and Logic LevelDesign,” pages 103-133 in W. Nebeland J. Mermet (ed.), Low PowerDesign in Deep SubmicronElectronics, Springer, 1997.
Source: Prof. V. D. Agrawal
![Page 21: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/21.jpg)
21EE4012VLSI DesignNational Central University
Architecture-Level Design – Parallelism
16x16multiplier
RA
RB
fref
fref
32 16x16multiplier
RA
R
fref/2
32
MUXfref/2
16x16multiplier
R
B R
fref/2
fref/2
32
32
fref
16
16
16
16
16
16
Assume that With the same 16x16 multiplier, the power supply can be reduced from Vref to Vref/1.83.
refrefref
refparallel PfV
CP 33.02
)83.1
(2.2 2
![Page 22: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/22.jpg)
22EE4012VLSI DesignNational Central University
Architecture-Level Design – Pipelining
HalfmultiplierREG(A ,B)
fref
32REG
Halfmultiplier
32
refrefref
refpipeline PfV
CP 36.0)83.1
(2.1 2
The hardware between the pipeline stages is reduced then the reference voltage Vref can be reduced to Vnew to maintain the same worst case delay. For example, let a 50MHz multiplier is broken into two equal parts as shown below. The delay between the pipeline stages can be remained at 50MHz when the voltage Vnew is equal to Vref/1.83
![Page 23: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/23.jpg)
23EE4012VLSI DesignNational Central University
Architecture-Level Design – Retiming
Retiming is a transformation technique used to change the locations of delay elements in a circuit without affecting the input/output characteristics of the circuit.
D D 2DD
D
D
2D
y(n)x(n)
w(n)
y(n)x(n)
w1(n)
w2(n)
b
a
(2)
(1) (2)
(1)
a
(2)
(1) (2)
(1)
retiming
Two versions of an IIR filter.
b
![Page 24: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/24.jpg)
24EE4012VLSI DesignNational Central University
Architecture-Level Design – Retiming
REG
fref
REG C3(4ns)
Retiming for pipeline design
C1(6ns)
C2(2ns)
REG
fref
REG C3(4ns)
C1(6ns)
C2(2ns)
![Page 25: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/25.jpg)
25EE4012VLSI DesignNational Central University
Architecture-Level Design – Retiming
Clock cycle is 4 gate delays
Clock cycle is 2 gate delays
![Page 26: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/26.jpg)
26EE4012VLSI DesignNational Central University
Architecture-Level Design – Power Management
C2
C1
C1_FREEZE
C2_FREEZE
C2
C1
C1_FREEZE
C2_FREEZE
![Page 27: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/27.jpg)
27EE4012VLSI DesignNational Central University
Architecture-Level Design – Bus Segmentation
• Avoid the sharing of resources Reduce the switched capacitance
• For example: a global system bus A single shared bus is connected to all modules, this
structure results in a large bus capacitance due to The large number of drivers and receivers sharing the same
bus The parasitic capacitance of the long bus line
• A segmented bus structure Switched capacitance during each bus access is
significantly reduced Overall routing area may be increased
![Page 28: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/28.jpg)
28EE4012VLSI DesignNational Central University
Architecture-Level Design – Bus Segmentation
Cbus
Cbus1
Cbus1
Bus
In
terfa
ce
![Page 29: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/29.jpg)
29EE4012VLSI DesignNational Central University
Algorithmic-Level Design – factivity Reduction
Binary Code Gray Code Decimal Equivalent000001010011100101110111
000001011010110111101100
01234567
Minimization of the switching activity, at high level, is one way to reduce the power dissipation of digital processors.One method to minimize the switching signals, at the algorithmic level, is to use an appropriate coding for the signals rather than straight binary code.The table shown below shows a comparison of 3-bit representation of the binary and Gray codes.
![Page 30: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/30.jpg)
30EE4012VLSI DesignNational Central University
State Encoding for a Counter• Two-bit binary counter:
State sequence, 00 → 01 → 10 → 11 → 00 Six bit transitions in four clock cycles 6/4 = 1.5 transitions per clock
• Two-bit Gray-code counter State sequence, 00 → 01 → 11 → 10 → 00 Four bit transitions in four clock cycles 4/4 = 1.0 transition per clock
• Gray-code counter is more power efficient.G. K. Yeap, Practical Low Power Digital VLSI Design, Boston:Kluwer Academic Publishers (now Springer), 1998.
Source: Prof. V. D. Agrawal
![Page 31: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/31.jpg)
31EE4012VLSI DesignNational Central University
Binary Counter: Original Encoding
Present state Next state
a b A B0 0 0 10 1 1 01 0 1 11 1 0 0
A = a’b + ab’B = a’b’ + ab’
A
B
a
b
CKCLR
Source: Prof. V. D. Agrawal
![Page 32: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/32.jpg)
32EE4012VLSI DesignNational Central University
Binary Counter: Gray Encoding
Present state Next state
a b A B
0 0 0 1
0 1 1 1
1 0 0 0
1 1 1 0
A = a’b + abB = a’b’ + a’b
A
B
a
b
CKCLR
Source: Prof. V. D. Agrawal
![Page 33: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/33.jpg)
33EE4012VLSI DesignNational Central University
Three-Bit Counters
Binary Gray-code
State No. of toggles State No. of toggles
000 - 000 -
001 1 001 1
010 2 011 1
011 1 010 1
100 3 110 1
101 1 111 1
110 2 101 1
111 1 100 1
000 3 000 1
Av. Transitions/clock = 1.75 Av. Transitions/clock = 1
Source: Prof. V. D. Agrawal
![Page 34: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/34.jpg)
34EE4012VLSI DesignNational Central University
N-Bit Counter: Toggles in Counting Cycle
• Binary counter: T(binary) = 2(2N – 1)
• Gray-code counter: T(gray) = 2N
• T(gray)/T(binary) = 2N-1/(2N – 1) → 0.5Bits T(binary) T(gray) T(gray)/T(binary)1 2 2 1.02 6 4 0.66673 14 8 0.57144 30 16 0.53335 62 32 0.51616 126 64 0.5079∞ - - 0.5000
Source: Prof. V. D. Agrawal
![Page 35: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/35.jpg)
35EE4012VLSI DesignNational Central University
FSM State Encoding
11
0100 0.1
0.10.40.3
0.6 0.9
0.6
01
1100 0.1
0.10.40.3
0.6 0.9
0.6
Expected number of state-bit transitions:
1(0.3+0.4+0.1) + 2(0.1) = 1.0
Transitionprobabilitybased on
PI statistics
State encoding can be selected using a power-based cost function.
2(0.3+0.4) + 1(0.1+0.1) = 1.6
Source: Prof. V. D. Agrawal
![Page 36: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/36.jpg)
36EE4012VLSI DesignNational Central University
FSM: Clock-Gating• Moore machine: Outputs depend only on the
state variables. If a state has a self-loop in the state transition
graph (STG), then clock can be stopped whenever a self-loop is to be executed.
Sj
SiSk
Xi/Zk
Xk/Zk
Xj/ZkClock can be stoppedwhen (Xk, Sk) combinationoccurs.
Source: Prof. V. D. Agrawal
![Page 37: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/37.jpg)
37EE4012VLSI DesignNational Central University
Clock-Gating in Moore FSM
Combinationallogic
LatchClock
activationlogic
Flip
-flop
s
PI
CK
PO
L. Benini and G. De Micheli,Dynamic Power Management,Boston: Springer, 1998.
Source: Prof. V. D. Agrawal
![Page 38: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/38.jpg)
38EE4012VLSI DesignNational Central University
Bus Encoding for Reduced Power• Example: Four bit bus
0000 → 1110 has three transitions. If bits of second pattern are inverted, then 0000 →
0001 will have only one transition.
• Bit-inversion encoding for N-bit bus:
Number of bit transitions0 N/2 N
N
N/2
0Num
ber o
f bit
trans
ition
saf
ter i
nver
sion
enc
odin
g
Source: Prof. V. D. Agrawal
![Page 39: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/39.jpg)
39EE4012VLSI DesignNational Central University
Bus-Inversion Encoding Logic
Polaritydecision
logic
Sen
t dat
a
Rec
eive
d da
ta
Bus register
Polarity bitM. Stan and W. Burleson, “Bus-Invert Coding for Low Power I/O,” IEEE Trans. VLSI Systems, vol. 3, no. 1, pp. 49-58, March 1995.
Source: Prof. V. D. Agrawal
![Page 40: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/40.jpg)
40EE4012VLSI DesignNational Central University
RTL-Level Design – Signal Gating
module decoder (a, sel); input [1:0[ a;ouput [3:0] sel;reg [3:0] sel; always @(a) begin
case (a)2’b00: sel=4’b0001;2’b01: sel=4’b0010;2’b10: sel=4’b0100;2’b11: sel=4’b1000;
endcase end
endmodule
module decoder (en,a, sel); input en;input [1:0[ a;ouput [3:0] sel;reg [3:0] sel; always @({en,a}) begin
case ({en,a})3’b100: sel=4’b0001;3’b101: sel=4’b0010;3’b110: sel=4’b0100;3’b111: sel=4’b1000;default: sel=4’b0000;
endcase end
endmodule
Decoder with enableSimple Decoder
![Page 41: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/41.jpg)
41EE4012VLSI DesignNational Central University
RTL-Level Design – Datapath Reordering
Mux
Muxstable
glitchyA<B Mux
Mux
stable
glitchyA<B
ReorderedInitial
![Page 42: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/42.jpg)
42EE4012VLSI DesignNational Central University
RTL-Level Design – Memory Partition
128x32
MUX
32
128x32
dout
32
32
dout
dout
qdpre_addr
clkwriteaddrdin
dinaddrwrite
addr[7:0] addr[7:1]
addr0
8noe
noe
![Page 43: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/43.jpg)
43EE4012VLSI DesignNational Central University
RTL-Level Design – Memory Partition
ARMCore
64K bytes
Data
Addr
R/W
Reads
AddrRange
64K28K 4K 32K
• Application-driven memory partition
![Page 44: Chapter 4 Low-Power VLSI Design](https://reader030.fdocuments.in/reader030/viewer/2022012605/619a162b86cb2e037d1bdd93/html5/thumbnails/44.jpg)
44EE4012VLSI DesignNational Central University
RTL-Level Design – Memory Partition
ARMCore
Data
Addr
R/W
CS
Data
Addr
R/W
CS
Data
Addr
R/W
CS
Decoder
• A power-optimal partitioned memory organization