Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10 1 ELEC 5270/6270 Fall 2007 Low-Power Design of...
-
date post
20-Dec-2015 -
Category
Documents
-
view
213 -
download
1
Transcript of Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10 1 ELEC 5270/6270 Fall 2007 Low-Power Design of...
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 11
ELEC 5270/6270 Fall 2007ELEC 5270/6270 Fall 2007Low-Power Design of Electronic CircuitsLow-Power Design of Electronic Circuits
Memory and Multicore Design Memory and Multicore Design
Vishwani D. AgrawalVishwani D. AgrawalJames J. Danaher ProfessorJames J. Danaher Professor
Dept. of Electrical and Computer EngineeringDept. of Electrical and Computer EngineeringAuburn University, Auburn, AL 36849Auburn University, Auburn, AL 36849
[email protected]://www.eng.auburn.edu/~vagrawal/COURSE/E6270_Fall07/course.html
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 22
Memory ArchitectureMemory Architecture
Word 0Word 1Word 2
M bits
Storage cell
Word N-2Word N-1
Input-Output (M bits)
N w
ord
s
S0
SN-1
Word 0Word 1Word 2
M bits
Storage cell
Word N-2Word N-1
Input-Output (M bits)
N w
ord
s
S0
SN-1
A0
A1
.AK-1
Dec
oder
K a
ddre
ss li
nes
K = log2NN = 2K
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 33
Memory OrganizationMemory Organization
Sense amplifiers/drivers
Column decoder
AL
AL+1
AK–1
Storage cell
Word line
Bit line
Input-Output (M bits)
A0
AL–1
2K – L
M.2L
K –
L b
it ro
wa
ddre
ss
L bit column address
N = 2K
M-bit words
Ro
w d
eco
der
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 44
An SRAM CellAn SRAM Cell
bit bit
VDD
WL
BL BL
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 55
Read OperationRead Operation
bit bit
VDD
WL
BL BL
1. Precharge to VDD
2. WL = Logic 1
3. Sense amplifier converts BL swing to logic level
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 66
Precharge CircuitPrecharge Circuit
bit bit
VDDWL
BL BLDiff. sense ampl.
VDDVDD PC
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 77
Reading 1 from CellReading 1 from Cell
Pre
char
ge
time
WL
BL
BL
Sense ampl. output
Pulsed to save bit line charge
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 88
Write Operation, 1Write Operation, 1→ 0→ 0
bit bit
VDD
WL
BL BL
011. Set BL = 0, BL = 1
2. WL = 1
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 99
Cell Array Power ManagementCell Array Power Management
Smaller transistorsSmaller transistorsLow supply voltageLow supply voltageLower voltage swing (0.1V – 0.3V for Lower voltage swing (0.1V – 0.3V for
SRAM)SRAM)Sense amplifier restores the full voltage swing Sense amplifier restores the full voltage swing
for outside use.for outside use.Power-down and sleep modesPower-down and sleep modes
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 1010
Sense AmplifierSense Amplifier
bit bit
SEor CLK
Sense ampl. enable:Low when bit lines are precharged and equalized
VDD
Full voltage swing output
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 1111
Sense Amplifier: PrechargeSense Amplifier: Precharge
bit=1 bit=1
SE=0
VDD
0VDD
OFF
ON ON
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 1212
Sense Amplifier: Reading 0Sense Amplifier: Reading 0
bit=1 – ∆ bit=1
SE=1
VDD
10
ON
OFF ON
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 1313
Sense Amplifier: Reading 1Sense Amplifier: Reading 1
bit=1bit=1– ∆
SE=1
VDD
01
ON
OFFON
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 1414
Block-Oriented ArchitectureBlock-Oriented Architecture
A single cell array may contain 64 Kbits to A single cell array may contain 64 Kbits to 256 Kbits.256 Kbits.
Larger arrays become slow and consume Larger arrays become slow and consume more power.more power.
Larger memories are block oriented.Larger memories are block oriented.
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 1515
Hierarchical OrganizationHierarchical Organization
Global data bus
Global amplifier/driver
I/O
Block 0 Block 1 Block P-1
Controlcircuitry
Block selector
Row addr.
Column addr.
Block addr.
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 1616
Power SavingPower SavingBlock-oriented memoryBlock-oriented memory
Lengths of local word and bit lines are kept Lengths of local word and bit lines are kept small.small.
Block address is used to activate the addressed Block address is used to activate the addressed block.block.
Unaddressed blocks are put in power-saving Unaddressed blocks are put in power-saving mode:mode: sense amplifier and row/column decoders are sense amplifier and row/column decoders are
disabled.disabled.Cell array is put in power-saving mode.Cell array is put in power-saving mode.
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 1717
Static PowerStatic Power
0.0 0.6 1.2 1.8Supply voltage
1.3μ
1.1μ
900n
700n
500n
300n
100n
0.13μ CMOS
0.18μ CMOS
8-kbit SRAM
7x
incr
eas
e
Lea
kag
e c
urr
ent
(A
mp
ere
s)
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 1818
Power Saving ModesPower Saving Modes
Power-down: Disconnect supply. Data is Power-down: Disconnect supply. Data is not retained. Must be refreshed before not retained. Must be refreshed before use. Example, caches.use. Example, caches.
Increasing thresholds by body biasing: Increasing thresholds by body biasing: Negative bias on nonactive cells reduces Negative bias on nonactive cells reduces leakage.leakage.
Sleep mode:Sleep mode: Insert resistance in leakage path; retain data.Insert resistance in leakage path; retain data.Lower supply voltage.Lower supply voltage.
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 1919
Adding Resistance in Leakage PathAdding Resistance in Leakage Path
SRAM cell
SRAM cell
SRAM cell
GND
VDD
sleep
sleep
Low-threshold transistor
VSS.int
VDD.int
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 2020
Lowering Supply VoltageLowering Supply Voltage
SRAM cell
SRAM cell
SRAM cell
GND
VDD
sleep
VDDL ≥ 100mV for 0.13μ CMOS
Sleep = 1, data retention mode
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 2121
Parallelization of MemoriesParallelization of Memories
instr. A instr. C instr. E
.
.
.
f/2
Mem 1
instr. B instr. D instr. F
.
.
.
f/2
Mem 2
MUXf/2 0 1
Power = C’ f/2 VDD2
C. Piguet, “Circuit and Logic Level Design,” pp. 124-125 inW. Nebel and J. Mermet (Eds.), Low Power Design in DeepSubmicron Electronics, Springer, 1997.
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 2222
ReferencesReferences
K. Itoh, K. Itoh, VLSI Memory Chip DesignVLSI Memory Chip Design, Springer-, Springer-Verlag, 2001.Verlag, 2001.
J. M. Rabaey, A. Chandrakasan and B. Nikolić, J. M. Rabaey, A. Chandrakasan and B. Nikolić, Digital Integrated CircuitsDigital Integrated Circuits, Upper Saddle River, , Upper Saddle River, New Jersey: Pearson Education, Inc., 2003, New Jersey: Pearson Education, Inc., 2003, Chapter 12.Chapter 12.
S.-M. Kang and Y. Leblebici, S.-M. Kang and Y. Leblebici, CMOS Digital CMOS Digital Integrated Circuits Analysis and DesignIntegrated Circuits Analysis and Design, New , New York: McGraw-Hill, 1996, Chapter 10.York: McGraw-Hill, 1996, Chapter 10.
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 2323
Low-Power Datapath ArchitectureLow-Power Datapath Architecture Lower supply voltageLower supply voltage
This slows down circuit speedThis slows down circuit speed Use parallel computing to gain the speed backUse parallel computing to gain the speed back
Works well when threshold voltage is also Works well when threshold voltage is also lowered.lowered.
About 60% reduction in power obtainable.About 60% reduction in power obtainable. Reference: A. P. Chandrakasan and R. W. Reference: A. P. Chandrakasan and R. W.
Brodersen, Brodersen, Low Power Digital CMOS DesignLow Power Digital CMOS Design, , Boston: Kluwer Academic Publishers (Now Boston: Kluwer Academic Publishers (Now Springer), 1995.Springer), 1995.
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 2424
A Reference DatapathA Reference Datapath
Combinationallogic
OutputInputR
eg
iste
r
Re
gis
ter
CK
Supply voltage = Vref
Total capacitance switched per cycle = Cref
Clock frequency = fPower consumption: Pref = CrefVref
2f
Cref
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 2525
A Parallel ArchitectureA Parallel Architecture
Comb.Logic
Copy 1
Comb.Logic
Copy 2
Comb.Logic
Copy N
Re
gis
ter
Re
gis
ter
Re
gis
ter
Re
gis
ter
N to
1 m
ulti
ple
xer
MultiphaseClock gen. and mux
control
InputOutput
CK
f
f/N
f/N
f/N
Each copy processes every Nth input, operates at reduced voltage
Supply voltage:VN ≤ V1 = Vref
N = Deg. of parallelism
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 2626
Level Converter: L to HLevel Converter: L to H
Vin_L
Vout_H
VDDH
VDDL
Transistors with thicker oxide and longer channels
N. H. E. Weste and D. Harris, CMOS VLSI Design, ThirdEdition, Section 12.4.3, Addison-Wesley, 2005.
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 2727
Level Converter: H to LLevel Converter: H to L
Vin_H Vout_L
VDDLTransistors with thicker oxide and longer channels
N. H. E. Weste and D. Harris, CMOS VLSI Design, ThirdEdition, Section 12.4.3, Addison-Wesley, 2005.
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 2828
Control Signals, N = 4Control Signals, N = 4
CK
Phase 1
Phase 2
Phase 3
Phase 4
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 2929
PowerPowerPN = Pproc + Poverhead
Pproc = N(Cinreg+ Ccomb)VN2f/N + CoutregVN
2f
= (Cinreg+ Ccomb+Coutreg)VN2f
= CrefVN2f
Poverhead = CoverheadVN2f ≈ δCref(N – 1)VN
2f
PN = [1 + δ(N – 1)]CrefVN2f
PN VN2
── = [1 + δ(N – 1)] ───P1 Vref
2
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 3030
Voltage vs. SpeedVoltage vs. Speed CLVref CLVref
Delay of a gate, T ≈ ──── = ────────── I k(W/L)(Vref – Vt)2
where I is saturation currentk is a technology parameterW/L is width to length ratio of transistorVt is threshold voltage
Supply voltage
No
rma
lize
d g
ate
de
lay,
T
4.0
3.0
2.0
1.0
0.0 Vt Vref =5VV2=2.9V
N=1
N=2
V3
N=31.2μ CMOS Voltage reduction
slows down as we get closer to Vt
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 3131
Increasing MultiprocessingIncreasing Multiprocessing
PN/P1
1 2 3 4 5 6 7 8 9 10 11 12
1.0
0.8
0.6
0.4
0.2
0.0
Vt=0V (extreme case)
Vt=0.4V
Vt=0.8V
N
1.2μ CMOS, Vref = 5V
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 3232
Extreme Cases: VExtreme Cases: Vtt = 0 = 0Delay, T α 1/ Vref
For N processing elements, delay = NT → VN = Vref/N
PN 1── = [1+ δ (N – 1)] ── → 1/NP1 N2
For negligible overhead, δ→0
PN 1── ≈ ──P1 N2
For Vt > 0, power reduction is less and there will be an optimum value of N.
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 3333
Example: Multiplier CoreExample: Multiplier Core
Specification:Specification:200MHz Clock200MHz Clock15W dissipation @ 5V15W dissipation @ 5VLow voltage operation, VLow voltage operation, VDDDD ≥ 1.5 volts ≥ 1.5 volts
(V(VDDDD – 0.5) – 0.5)22
Relative clock rate = Relative clock rate = ────────────── 20.2520.25
Problem:Problem:Integrate multiplier core on a SOCIntegrate multiplier core on a SOCPower budget for multiplier ~ 5WPower budget for multiplier ~ 5W
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 3434
A Multicore DesignA Multicore Design
MultiplierCore 1
MultiplierCore 5
Reg
RegR
egR
eg
5 to
1 m
ux
MultiphaseClock gen.
and muxcontrol
Input
Output
200MHzCK
200MHz
40MHz
40MHz
40MHz
MultiplierCore 2
Core clock frequency = 200/N, N should divide 200.
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 3535
How Many Cores?How Many Cores?
For N cores:For N cores:clock frequency = 200/N MHzclock frequency = 200/N MHzSupply voltage, VSupply voltage, VDDNDDN= 0.5 + (20.25/N)= 0.5 + (20.25/N)1/21/2 Volts VoltsAssuming 10% overhead per core,Assuming 10% overhead per core,
VVDDNDDN
Power dissipation =15 [1 + 0.1(N – 1)] Power dissipation =15 [1 + 0.1(N – 1)] ((──────))2 2
wattswatts 55
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 3636
Design TradeoffsDesign TradeoffsNumber of cores, N
Clock (MHz)Core supply
VDDN (Volts)Total Power
(Watts)
11 200200 5.005.00 15.015.0
22 100100 3.683.68 8.948.94
44 5050 2.752.75 5.905.90
55 4040 2.512.51 5.295.29
88 2525 2.102.10 4.504.50
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 3737
Power Reduction in ProcessorsPower Reduction in Processors Just about everything is used.Just about everything is used. Hardware methods:Hardware methods:
Voltage reduction for dynamic powerVoltage reduction for dynamic power Dual-threshold devices for leakage reductionDual-threshold devices for leakage reduction Clock gating, frequency reductionClock gating, frequency reduction Sleep modeSleep mode
Architecture:Architecture: Instruction setInstruction set hardware organizationhardware organization
Software methodsSoftware methods
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 3838
Parallel ArchitectureParallel Architecture
Processor
f
Processor
f/2
Processor
f/2
f
Input Output
Input
Output
Capacitance = CVoltage = VFrequency = fPower = CV2f
Capacitance = 2.2CVoltage = 0.6VFrequency = 0.5fPower = 0.396CV2f
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 3939
Pipeline ArchitecturePipeline Architecture
Processor
f
Input Output
Re
gis
ter
½Proc.
f
Input Output
Re
gis
ter
½Proc.
Re
gis
ter
Capacitance = CVoltage = VFrequency = fPower = CV2f
Capacitance = 1.2CVoltage = 0.6VFrequency = fPower = 0.432CV2f
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 4040
Approximate TrendApproximate Trend n-parallel proc.n-parallel proc. n-stage pipeline proc.n-stage pipeline proc.
CapacitanceCapacitance nCnC CC
VoltageVoltage V/nV/n V/nV/n
FrequencyFrequency f/nf/n ff
PowerPower CVCV22f/nf/n22 CVCV22f/nf/n22
Chip areaChip area n timesn times 10-20% increase10-20% increase
G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Springer,1998.
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 4141
Multicore ProcessorsMulticore Processors
2000 2004 2008
Per
form
ance
bas
ed o
nS
PE
Cin
t200
0 an
d S
PE
Cfp
2000
ben
chm
arks
Multicore
Single core
Computer, May 2005, p. 12
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 4242
Multicore ProcessorsMulticore Processors D. Geer, “Chip Makers Turn to Multicore D. Geer, “Chip Makers Turn to Multicore
Processors,” Processors,” ComputerComputer, vol. 38, no. 5, pp. 11-13, , vol. 38, no. 5, pp. 11-13, May 2005.May 2005.
A. Jerraya, H. Tenhunen and W. Wolf, A. Jerraya, H. Tenhunen and W. Wolf, “Multiprocessor Systems-on-Chips,” “Multiprocessor Systems-on-Chips,” ComputerComputer, , vol. 5, no. 7, pp. 36-40, July 2005; vol. 5, no. 7, pp. 36-40, July 2005; this special issue contains three more articles on multicore processors.
S. K. Moore, “Winner Multimedia Monster – S. K. Moore, “Winner Multimedia Monster – Cell’s Nine Processors Make It a Supercomputer Cell’s Nine Processors Make It a Supercomputer on a Chip,” on a Chip,” IEEE SpectrumIEEE Spectrum, vol. 43. no. 1, pp. , vol. 43. no. 1, pp. 20-23, January 2006. 20-23, January 2006.
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 4343
Cell - Cell Broadband Engine Cell - Cell Broadband Engine ArchitectureArchitecture
L to RAtsushi Kameyama, ToshibaJames Kahle, IBMMasakazu Suzoki, Sony
© I
EE
E S
pe
ctru
m,
Jan
ua
ry 2
00
6
Nine-processor chip:192 Gflops
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10ELEC6270 Fall 07, Lecture 10 4444
Cell’s Nine-Processor ChipCell’s Nine-Processor Chip
© IEEE Spectrum, January 2006 Eight IdenticalProcessors f = 5.6GHz (max)44.8 Gflops