Low Power Architecture and Implementation of Multicore Design Khushboo Sheth, Kyungseok Kim Fan...
-
Upload
kelley-bradley -
Category
Documents
-
view
217 -
download
3
Transcript of Low Power Architecture and Implementation of Multicore Design Khushboo Sheth, Kyungseok Kim Fan...
Low Power Architecture and Low Power Architecture and Implementation of Multicore DesignImplementation of Multicore Design
Khushboo Sheth, Kyungseok KimKhushboo Sheth, Kyungseok Kim
Fan Wang, Siddharth DantuFan Wang, Siddharth Dantu
ELEC6270 Low Power Design of Electronic Circuits ELEC6270 Low Power Design of Electronic Circuits Team Project Team Project
VLSI D&T Seminar VLSI D&T Seminar Nov. 8 2006Nov. 8 2006
Advisor: Dr. V Agrawal
Project ObjectivesProject Objectives
Design and verify 16-bit ALU with Design and verify 16-bit ALU with synchronous clocked inputs and outputs.synchronous clocked inputs and outputs.
Study low-voltage power and delay Study low-voltage power and delay characteristics of the design.characteristics of the design.
Redesign ALU for minimum power and Redesign ALU for minimum power and highest speed. highest speed.
Component of Power DissipationComponent of Power Dissipation
DynamicDynamic
Power due to Signal transitions.Power due to Signal transitions.• Logic power (due to logic transitions).Logic power (due to logic transitions).• Glitch power (due to glitches).Glitch power (due to glitches).
Short Circuit powerShort Circuit power
StaticStatic Leakage power (due to leakage currents).Leakage power (due to leakage currents).
Power components in CMOS circuitPower components in CMOS circuit
VVDDDD
GroundGround
CL
Ron
R=large
vi (t) vo(t)
Dynamic power
Short circuit power
Leakage power
Power = CVDD2
1-bit ALU Design1-bit ALU Design
1-bit ALU Core
Reg B
Reg A
Reg C
1 bit ALU Core1 bit ALU CoreSimulation SpecificationSimulation Specification
TechnologyTechnology TSMC 0.25 umTSMC 0.25 um
Application VoltageApplication Voltage 2.5 Volt2.5 Volt
N-MOS VthN-MOS Vth 0.365 V0.365 V
P-MOS VthP-MOS Vth -0.5625 V-0.5625 V
TemperatureTemperature 90 C degree90 C degree
Spice SimulatorSpice Simulator Eldo ver. 6.3.1.1Eldo ver. 6.3.1.1
Sweep Supply Voltage (6 point)Sweep Supply Voltage (6 point) 0,0.5,1.0,1.5,2.0,2.5 V0,0.5,1.0,1.5,2.0,2.5 V
Combinational Logic
DFF
NX156
NX80
NX16
NX60
A
B
CLK
C
CYINCY
Z
1-bit ALU Core Timing ( Vdd=2.5V )1-bit ALU Core Timing ( Vdd=2.5V )
Longest Path in Combinational Logic: c <= a+b (Opcode 0000)
opcode[3:0]
COMPOUT
C
CY
COMPOUT
Z
opcode 1010 (nand) opcode 1001 (c<=b)
opcode 1000 (c<=a) opcode 0111 (and)opcode 0110 (or) opcode 0101 (nor)opcode 0100 (xor) opcode 0011 (not equal)opcode 0010 (equal) opcode 0001 (a-b)opcode 0000 (a+b) opcode others (all zero’s output)
1-bit ALU Core Sweep Vdd from 2.5V to 0V1-bit ALU Core Sweep Vdd from 2.5V to 0V
2.5V
2.0V
1.5V
1.0V
0.5V
0.0V
Analog Mode C(NX156) Output
Vdd=2.5
Vdd=0.5
1Bit ALU Core Logic Operation Voltage @200Mz1Bit ALU Core Logic Operation Voltage @200Mz
Supply Voltage Sweep near PMOS Vth = -0.5625 V ( ver. NMOS Vth= 0.365)
Sweep From Vsupply = 0.50 to 1.00 Volt ( linear increment 0.05 V, 11 point)
Vsupply = 0.85 V
Correct Operation
Overshoot
Ripples
Vsupply = 0.85 V
(Analog Domain)
Output
Input
Vsupply = 0.80 V
(Analog Domain)
Vsupply = 0.80 V
Wrong Operation
Output
Input
opcode 1000 (c<=a)
1-bit ALU Average Power vs. Delay 1-bit ALU Average Power vs. Delay @200MHz@200MHz
1-bit ALU Core
Average Power
1bit ALU Block
Average Power
1-bit ALU Core
Delay
0 0.5 1 1.5 2 2.50
200
400
Vsupply(V)
Pow
er(
uW
)
Average Power ( Total ALU Block ver. ALU Core)
0 0.5 1 1.5 2 2.50
2
4
Dela
y(n
sec)
0.0 1.00.5 2.01.5 2.5
31.02830.5427
82.8828
354.563
179.91532.2493
1.4203
0.49550.7204
0.4123
Power = CVDD2
16 Bit ALU (Single Core) Design16 Bit ALU (Single Core) Design
CombinationalLogic
(16-Bit ALU)OutputInput
Re
gis
ter
Re
gis
ter
CK
Supply voltage = Vref
Total capacitance switched per cycle = Cref
Clock frequency = fPower consumption: Pref = CrefVref
2f
Cref
16-BIT ALU Vectors16-BIT ALU Vectors
aa bb OpcodeOpcode cyincyin
Vector1Vector1 10101010101010101010101010101010 00010101010101010001010101010101 0001 (sub)0001 (sub) 00
Vector2Vector2 01010101010101010101010101010101 10101010101010101010101010101010 0011 (comp)0011 (comp) 00
Vector3Vector3 01010101010101010101010101010101 10101010101010101010101010101010 0100 (xor)0100 (xor) 00
Vector4Vector4 11111111111111111111111111111111 00000000000000010000000000000001 0000 (add)0000 (add) 00
Vector5Vector5 01100110011001100110011001100110 00000000000000000000000000000000 1010 (nand)1010 (nand) 00
Vector6Vector6 00010110011011010001011001101101 01010100101010100101010010101010 0001 (sub)0001 (sub) 00
*Vector4 activate the critical path, carryout = 1
16-Bit ALU Simulation Result16-Bit ALU Simulation ResultCircuit information: # 694 Gates Clock Frequency applied: 10 MHz
Temperature: 27C Temperature: 27C oo Vectors Applied: 6 vectors Vectors Applied: 6 vectors
TSMC025 Technology : Vthn = 0.365 V, Vthp = -0.562 V
By ELDO, SPICE simulation Simulation Time: 700 nsSimulation Time: 700 ns
VoltageVoltage
(v)(v)2.5 2.5 1.25 1.25 0.85 0.85 0.625 0.625 0.45 0.45
Static Static Power(nw)Power(nw)
24.55 24.55 6.02 6.02 3.05 3.05 1.84 1.84 1.711.71
Average Average Power Power (uw)(uw)
391.16 391.16 62.62 62.62 26.66 26.66 14.57 14.57 3.56 3.56
Delay (ns)Delay (ns) 2.83 2.83 7.14 7.14 18.88 18.88 73.21 73.21 Ckt Ckt failedfailed
16 Bit ALU Functional Correct Operation at 2.5 V, 1.25 V, 0.85 V and 16 Bit ALU Functional Correct Operation at 2.5 V, 1.25 V, 0.85 V and 0.625 V for 6 Vectors0.625 V for 6 Vectors
Circuit fail @0.45 V (< Vth)Circuit fail @0.45 V (< Vth)
Simulated Single Vector Pair
16-Bit ALU Power Savings and Delay 16-Bit ALU Power Savings and Delay Increase with Reference @ 2.5 VoltsIncrease with Reference @ 2.5 Volts
VoltageVoltage
(v)(v)
(Reference)(Reference)
VDDVDD
2.5V2.5V
1.25 V1.25 V VDD/2VDD/2
0.85 V0.85 V VDD/3VDD/3
0.625 V0.625 V VDD/4VDD/4
Average Average Power Power (uw)(uw)
391.16391.16
62.22 62.22
P2.5/6.24P2.5/6.24
84%84%
26.22 26.22
P2.5/14.67P2.5/14.67
93%93%
14.67 14.67
P2.5/26.66P2.5/26.66
96%96%
Delay Delay (ns)(ns) 2.832.83
7.147.14
2.57*D2.52.57*D2.5
18.87 18.87
6.67*D2.56.67*D2.5
73.21 73.21
25.87*D2.525.87*D2.5
16 Bit ALU Power Savings and Delay 16 Bit ALU Power Savings and Delay Increase with Reference @1.25 VoltsIncrease with Reference @1.25 Volts
VoltageVoltage
(v)(v)
(Reference)(Reference)
1.251.250.850.85(VDD/1.5)(VDD/1.5)
0.6250.625(VDD/2)(VDD/2)
Average Average PowerPower
(uw)(uw)62.22 62.22
26.66 26.66
P1.25/2.35P1.25/2.35
57%57%
14.67 14.67
P1.25/4.27P1.25/4.27
77%77%
DelayDelay
(ns)(ns)7.147.14
18.87 18.87
2.63 * D1.252.63 * D1.25
73.21 73.21
10.25 * D1.2510.25 * D1.25
Different Technology Impact On Power SavingDifferent Technology Impact On Power Saving16 Bit ALU16 Bit ALU
Simulation Setup:Simulation Setup: Supply Voltage: 2.5vSupply Voltage: 2.5v Simulation Transient Time: 700 nsSimulation Transient Time: 700 ns 6 vectors6 vectors Temperature: 27CTemperature: 27Coo
TechnologyTechnology TSMC035 TSMC035 TSMC025TSMC025
#Gates after synthesis#Gates after synthesis 734 gates734 gates 694 gate694 gate
Voltage Voltage 2.5 V2.5 V 2.5 V2.5 V
Static PowerStatic Power 24.555 N Watts24.555 N Watts 24.550 N Watts24.550 N Watts
Average Power Average Power 381.60 U Watts381.60 U Watts 391.16 U Watts391.16 U Watts
Delay Delay 3.12 ns3.12 ns 2.83 ns2.83 ns
Temperature Influence On PowerTemperature Influence On Power Circuit information: # 734734 Gates Clock Frequency applied: 10 MHz ; Vdd=2.5V Vectors Applied: 6 vectorsVectors Applied: 6 vectors Simulation Time: 700 nsSimulation Time: 700 ns TSMC035 Technology
TemperatureTemperature
(C (C o o )) 00 2727 6060 9090 120120 900900
Static Power Static Power
(nw)(nw)12.712.7 24.524.5 75.5175.51 357.36357.36 4803.34803.3
3.383.38
mwmw
Average Power Average Power (uw)(uw) 404.23404.23 381.60381.60 378.15378.15 367.48367.48 363.15363.15
70.4370.43
ww
Delay (ns)Delay (ns) 2.582.58 3.123.12 3.183.18 3.533.53 3.913.91 Ckt Ckt fail!!fail!!
Multicore Design MethodologyMulticore Design Methodology Lower supply voltageLower supply voltage
This slows down circuit speedThis slows down circuit speed Use parallel computing to gain the speed backUse parallel computing to gain the speed back
Multi-core means to place two or more complete cores Multi-core means to place two or more complete cores within a single module.within a single module.
This architecture is a “divide and conquer” strategy. By This architecture is a “divide and conquer” strategy. By splitting the work between multiple execution cores , a splitting the work between multiple execution cores , a multi-core design can perform more work within a given multi-core design can perform more work within a given clock cycle.clock cycle.
About more than 60% reduction in power is observed.About more than 60% reduction in power is observed.
Source: http://www.eng.auburn.edu/~vagrawal/D&TSEMINAR_SPR06/SLIDES/Agrawal_DTSem06.ppt
Parallel ArchitectureParallel ArchitectureComb.Logic
Copy 1
Comb.Logic
Copy 2
Comb.Logic
Copy 4
Rg
st
Re
gis
ter
Rg
stR
gst
4 to
1 m
ulti
ple
xer
InputOutput
CK
f
f/4
f/4
Rg
stf/4
Comb.Logic
Copy 3
f/4
Mux controlCk0
Ck1Ck2
Ck3
16 Bit ALU
Control Signals, N = 4Control Signals, N = 4
CK
Phase 1
Phase 2
Phase 3
Phase 4
Mux control00 01 10 11 00 01 01 10 11 ……
16 Bit ALU 16 Bit ALU Multi-core Power Savings and Delay Increase with Multi-core Power Savings and Delay Increase with
Reference @2.5 VoltsReference @2.5 Volts
Circuit information: # 2617 Gates Clock Frequency applied: 10 MHz Temperature: 27C Vectors Applied: 6 vectorsTemperature: 27C Vectors Applied: 6 vectorsTSMC025 Technology : Vthn = 0.365 V, Vthp = -0.562 V Simulator: ELDO(Spice) Simulation Setup: Simulation Time: 700 nsSimulation Setup: Simulation Time: 700 ns
VoltageVoltage
(v)(v)
(Reference)(Reference)
2.5 2.5 1.25 1.25
VDD/2VDD/2
0.85 0.85
VDD/3VDD/3
0.625 0.625
VDD/4VDD/40.45 0.45
Static Static Power (nw)Power (nw) 96.3596.35 23.5623.56 11.9411.94 7.217.21 6.376.37
Average Average PowerPower
(uw)(uw)687.86687.86
95.64U95.64UP2.5/7.19P2.5/7.19
86%86%
40.93U40.93UP2.5/16.8P2.5/16.8
94%94%
21.13U21.13UP2.5/32.55P2.5/32.55
94.75%94.75%7.26U7.26U
DelayDelay
(ns)(ns) 0.110.11 0.570.575.18*D2.55.18*D2.5
1.521.5213.8*D2.513.8*D2.5
30.7030.70279.1*D2.5279.1*D2.5
Ckt Ckt failed failed
16 Bit ALU Multicore Power Savings and 16 Bit ALU Multicore Power Savings and Delay Increase with Reference @1.25 VoltsDelay Increase with Reference @1.25 Volts
VoltageVoltage
(v)(v)
(Reference)(Reference)
1.251.25
VDDVDD
0.850.85
VDD/1.5VDD/1.5
0.6250.625
VDD/2VDD/2
Average Average PowerPower
(uw)(uw)95.6495.64
40.93 40.93
P1.25/2.33P1.25/2.33
57%57%
21.13 21.13
P1.25/4.52P1.25/4.52
78%78%
DelayDelay
(ns)(ns)0.570.57
1.52 1.52
2.67 * D1.252.67 * D1.25
30.7 30.7
53.86 * D1.2553.86 * D1.25
Power and Delay comparison @2.5 V Power and Delay comparison @2.5 V Reference Design with Multicore Design at different voltagesReference Design with Multicore Design at different voltages
VoltageVoltage
(v)(v)
2.52.5
VDDVDDReference Reference DesignDesign
1.251.25Multicore Multicore DesignDesign
VDD/2VDD/2
0.85 0.85 Multicore Multicore DesignDesign
VDD/3VDD/3
0.7250.725MulticoreMulticore
DesignDesign
VDD/3.5VDD/3.5
0.70.7MulticoreMulticore
DesignDesign
VDD/3.6VDD/3.6
0.625 0.625 Multicore Multicore DesignDesign
VDD/4VDD/4
Average Average PowerPower
(uw)(uw)
391.16 391.16 95.6495.64
P2.5/4.09P2.5/4.09
76%76%
40.9340.93
P2.5/9.56P2.5/9.56
89.5%89.5%
25.625.6
P2.5/15.23P2.5/15.23
93.45%93.45%
22.3522.35
P2.5/17.5P2.5/17.5
94.3%94.3%
21.1421.14
P2.5/18.5P2.5/18.5
94.6%94.6%
DelayDelay
(ns)(ns)
2.83 2.83 0.57 0.57
D2.5/4.96D2.5/4.96
1.52 1.52
D2.5/1.86D2.5/1.86
2.612.61
D2.5/1.08D2.5/1.08
3.043.04
D2.5/0.93D2.5/0.93
30.7 30.7
D2.5/0.09D2.5/0.09
SummarySummary
For Single core ALU design we get more than 60% For Single core ALU design we get more than 60% power savings at reduced voltage but at the cost of power savings at reduced voltage but at the cost of performance. performance.
With Reference of 2.5 Volts we observe power drops With Reference of 2.5 Volts we observe power drops faster than 1/Vsquare.faster than 1/Vsquare.
With Reference of 1.25 Volts, power drop is almost With Reference of 1.25 Volts, power drop is almost equal to 1/Vsquare.equal to 1/Vsquare.
Multi-core design helps to gain the speed back at Multi-core design helps to gain the speed back at reduced voltage and consumes less power. reduced voltage and consumes less power.
ReferencesReferences ELEC6270 Low Power Design Electronics Class Slides from Dr. Agrawal ELEC6270 Low Power Design Electronics Class Slides from Dr. Agrawal Spring 06, Dr. Agrawal’ Presentation on VLSI D&T seminar “Spring 06, Dr. Agrawal’ Presentation on VLSI D&T seminar “
Multi-Core Parallelism for Low-Power DesignMulti-Core Parallelism for Low-Power Design”” www.tomshardware.comwww.tomshardware.com N. H. E. Weste and D. Harris, N. H. E. Weste and D. Harris, CMOS VLSI Design, Third EditionCMOS VLSI Design, Third Edition, Reading, , Reading,
Massachusetts, Addison-Wesley, 2005.Massachusetts, Addison-Wesley, 2005. L. Shang, R.P Dick, “Thermal crisis: challenges and potential solutions,” PotL. Shang, R.P Dick, “Thermal crisis: challenges and potential solutions,” Pot
entials IEEE, vol. 25 , Issue 5, 2006entials IEEE, vol. 25 , Issue 5, 2006 International Technology Roadmap for Semiconductors. International Technology Roadmap for Semiconductors. http://public.itrs.nethttp://public.itrs.net Alokik Kanwal, “A review of Carbon Nanotube Field Effect Transistors” VersiAlokik Kanwal, “A review of Carbon Nanotube Field Effect Transistors” Versi
on 2.0, 2003on 2.0, 2003 K. K Likharev, “Single Electron Devices and their applications,” Proc IIEEE, K. K Likharev, “Single Electron Devices and their applications,” Proc IIEEE,
vol. 87, no. 4, pp. 606-632, Apr. 1999vol. 87, no. 4, pp. 606-632, Apr. 1999 A. P. Chandrakasan and R. W. Brodersen, A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Low Power Digital CMOS
DesignDesign, Boston: Kluwer Academic Publishers (Now Springer), 1995., Boston: Kluwer Academic Publishers (Now Springer), 1995. ““Quad-core processor forecas”,Quad-core processor forecas”,Alexander WolfeAlexander Wolfe @ @TechWebTechWeb
Thank You !!!