VRCon: Dynamic Reconfiguration of Voltage Regulators in a ...
Transcript of VRCon: Dynamic Reconfiguration of Voltage Regulators in a ...
VRCon: Dynamic Reconfiguration of Voltage Regulators in a
Multicore Platform
Woojoo Lee, Yanzhi Wang, and Massoud PedramUniversity of Southern California
• Introduction• Preliminary - VR characteristics• Dynamic reconfiguration of the VR-to-core
network– Proposed multicore platform– Reactive VRCon– Proactive VRCon
• Experimental work• Conclusion
Outlines
Mar-27-14 2
• Per-chip DVFS vs. Per-core DVFS– (Conventional) per-chip DVFS hinders DVFS from achieving
its full potential.– Per-core DVFS allows excellent flexibility in controlling
power, but has shortcomings from the indispensable use of multiple voltage regulators (VRs), such as footprint, power conversion loss, and control complexity.
• We target the multicore platforms that support the per-core DVFS.
Introduction (1/3)
Mar-27-14 3
• We focus on the power conversion efficiency of the multiple VRs.– The figure below shows traces of the VR efficiency during
delivering power to a core. Around 24% of input power is dissipated by a single VR in the high efficiency region, but more than 53% of the input power is consumed by the VR in the low efficiency region.
– Power dissipations of all VRs can result in a considerable amount of power loss.
Introduction (2/3)
Mar-27-14 4
2 2.5 3 3.5 4 4.5 5 5.5 6x 104
0
20
40
60
80
0 10 20 30Time (ms)
0
20
40
60
80
Effic
ienc
y (%
)
Mean: 75.18(%)
40
Mean: 46.38(%)
5 15 25 35
• We propose a system-level optimization technique to substantially improve the VR efficiency: VR consolidation (VRCon for short).– This technique starts from the intuition of combining some
cores, which require the same voltage level and driving small amount of load current, to be powered by a single VR.
– Why is this helpful? We will see the reasons from the VR characteristics, in the following slides.
– We present two VRCon techniques, a reactive and a proactive VRCon.
Introduction (3/3)
Mar-27-14 5
• We targets inductive switching regulators. – The inductive switching regulators achieve the higher
conversion efficiencies over a wide range of output loads, compared to other types of VRs, such as low-dropout regulators and switched-capacitor regulators.
– Due to the equipped controller to support dynamic voltage setting with fast transient response, the inductive switching regulator is suitable to power the processors.
– The circuit schematics is in the below:
VR characteristics (1/3)
Mar-27-14 6
• The load current condition of the VR affects the VR efficiency – The figure below shows load current vs. efficiency, simulated by
the VR schematics and 45nm PTM. The main source of the power loss for Region I is the switching and controller losses, Region II is the conduction loss.
– Modern VRs exhibits high peak efficiency with a specific load current value, but their efficiency drops dramatically under the adverse load current conditions.
–
VR characteristics (2/3)
Mar-27-14 7
• VRCon is motivated to save power by configuring the VR-to-core network to use a single VR instead of multiple VRs, if available. – If some cores in a multicore processor require the same
voltage level, and they have small load currents, then their power domains can be consolidated to share a single VR.
– Then, the VR used to power multiple cores has relatively high load current, and hence, higher efficiency.
– The VRs that are not used can be turned off to save power.
• Now, let’s go into the detail of VRCon!
VR characteristics (3/3)
Mar-27-14 8
• The proposed platform has a several components
– Power manager (PM) monitors the core status (i.e., performance) reported by hardware performance monitor (HPM). Different from PMs in conventional multicore platforms, PM here determines a tentative voltage and frequency levels of cores, and transmits this information to VRCon manger.
Proposed platform (1/2)
Mar-27-14 9
– Network switches is to implement the reconfigurable the VR-to-core network.
– VRCon manager (VRCM) is added to ultimately controls the core’s frequency/voltage level, as well as the operations of VRs and ON/OFF states of the network switches in VRCon
• The figure below is a conceptual diagram of the proposed multicore platform
Proposed platform (2/2)
Mar-27-14 10
DVFSopinion
DVFS setup
Core5
Core8
Core1
Core4
VR groups
..
Multi-core processor (per-core DVFS)
VR output setup
VRConManager
Hardware Performance
Monitor
Dynamic Config.
.. .. ..Switch set 1 Switch set 2 Switch set 3
Power Manager
Sensing circuits
.. .. Core9
Core12
..
.. ..VR 1
VR4
VR5
VR8
VR9
VR-to-core distribution network
• The power saving achieved by employing DVFS strongly depends on the frequency of the decision making process.– Equivalently, it is the duration of decision period ( ).– should be considered a design variable to be set by the
PM, which needs to be (much) longer than the voltage scaling time of the VR.
• Turning on/off the network switches, the time to reconfigure the VR-to-core network ( ) is only limited by the transient response of the VR.– It is in general much shorter than the voltage scaling time.
• We treat the DVFS setting and network reconfiguration as the global and local power managements of VRCon. – and are the required minimum global and local
decision epoch lengths, respectively.
VRCon: overview (1/7)
Mar-27-14 11
TDVFS
TDVFS
TNS
TDVFS TNS
• As a local management function, the reactive VRCon applies only to cores with the same voltage level.– The figure below shows an example of applying the reactive
VRCon to a dual core platform.
VRCon: reactive VRCon (2/7)
Mar-27-14 12Yanzhi Wang/ University of Southern California
Vdd
0.750.83
0.951.05
1.2
012345
Volta
ge (V
)
0.750.830.951.051.20
135Current
0.750.83
0.951.05
1.2
0123456
Vdd
0.750.830.951.051.20
Current
135
Cur
rent
(A)
Time
Cur
rent
(A)
Volta
ge (V
)
is a valid region for VRCon, is not, because of the high load current.
• (cont.)– The VRCM in this case performs only the network switch control
to minimize the total energy consumption. – The total energy consumption is the summation of energy losses
of the active VRs (including network switches) and the energy consumptions of the cores during the time period .
• Algorithm for reactive VRCon.– The VRCM first sorts the cores that have the same voltage levels
and a lower amount of load current than the maximum driving capability of a single VR.
– The VRCM finds the two cores, by merging which the VR energy saving is maximized. The merged cores are treated as one core.
– The VRCM keeps repeating the above procedure until there is no available core.
VRCon: reactive VRCon (3/7)
Mar-27-14 13
TDVFS
• For its global power management function, the proactive VRCon exploits DVFS technique to perform frequency (and its corresponding voltage level) scaling.– The proactive VRCon takes account for the energy consumption
of both cores and VRS.– There can be a trade-off between the energy saving by DVFS
(which is initially determined by the PM), and reduced energy loss by adaptively turning off the VRs and using fewer number of VRs at higher conversion efficiencies.
– If the VRCM determines that the latter option is better, the VRCM will not decrease the frequency/voltage levels of some cores to the minimum level possible; Instead it will adjust the frequency/voltage levels of the cores to increase the chances for applying the VRCon.
VRCon: proactive VRCon (4/7)
Mar-27-14 14
• The objective here is to find the frequency/voltage level of each core for each TDVFS to minimize the total energy consumption.
– denotes the total energy consumption during the time period of ; is core’s voltage level.
– is the total number of cores; indicates that all the task processings are finished in this period
• Solving the objective is difficult, because:– changing in time period affects the VRCon results
in period .– There are locking and synchronization issues of the multi-
thread applications in multi-core processors.
VRCon: proactive VRCon (5/7)
Mar-27-14 15
min
TX
t=1
ETDV FS,t(Vcore,1, Vcore,2, .., Vcore,N
)
!,
ETDV FS,ttth
TDVFS V core,i ith
N ETDV FS,T
Vcore,8i TDV FS,t
TDV FS,t+1
• Therefore, by exploiting the initial DVFS schedule of the PM, we first divide the overall problem into sub-problems, each of which only concerns how to modify the initial DVFS schedule to optimize the energy saving results of the reactive VRCon in a given period, .– In order to guarantee that the performance (i.e., total execution
time of applications) is not degraded by the modification of DVFS schedule, we impose the constraint that the VRCM can only keep the same or increase (but not decrease) the frequency/voltage level of each core from the original DVFS level suggested by the PM.
– This can be formulated as follows:
• s.t., , for
VRCon: proactive VRCon (6/7)
Mar-27-14 16
TDVFS
f(V new
core,1, Vnew
core,2, .., Vnew
core,N
) < f(V others
core,1 , V others
core,2 , .., V others
core,N
)
V new
core,i
� V PM
core,i
1 i N
• We present a clustering-based heuristic solution as follows:
VRCon: proactive VRCon (7/7)
Mar-27-14 17
- We first sift through the cores driving a small amount of current so that they can be combined with others.
- Next we consolidate two cores (and treat them as one equivalent core) if this merge results in the maximum energy saving.
- The procedure is repeated until no energy saving can be achieved by VR consolidation.
• Multicore processor setup– We performed the multicore processor simulations in the Sniper
simulator. The platform configurations were set based on Intel Xeon Nehalem architecture, the topology is shown in the figure below.
– We set the five DVFS levels as follows:
– We modified the codes related to the McPAT module in the Sniper to collect the power and timing data from per-core DVFS.
– The multi-threaded applications from the PARSEC and SPLASH2 benchmarks were used in the simulation.
Experimental work (1/4)
Mar-27-14 18
Core 1 Core 2 Core 3 Core 4
L1-I (32KB)
L1-I (32KB)
L1-I (32KB)
L1-I (32KB)
L1-D (32KB)
L1-D (32KB)
L1-D (32KB)
L1-D (32KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L3 (8MB)
DRAM
Core 12 Core 13 Core 14 Core 15
L1-I (32KB)
L1-I (32KB)
L1-I (32KB)
L1-I (32KB)
L1-D (32KB)
L1-D (32KB)
L1-D (32KB)
L1-D (32KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L2 (256KB)
L3 (8MB)
DRAM
• Per-core DVFS simulation– We treat the PM’s DVFS recommendation as given a priori, exploit
an offline DVFS approach as an intermediate step for the overall aim.
– We adopt an ILP based algorithm, as follows:
• s.t., , and
• R is the total interval, and S is the five frequency/voltage levels. Pr,s is the power consumption set by sth frequency/voltage level for rth interval. By following the same notation to Pr,s, Dr,s denotes the incurred delay under the frequency/voltage condition.
• is a certain performance penalty.
Experimental work (2/4)
Mar-27-14 19
min
RX
r
SX
s
Pr,sxr,s
!
RX
r
SX
s
Dr,sxr,s < �
RX
r
SX
s
xr,s = R
�
• VR-to-core network setup– We selected the programmable VR from LTC3816, which can
power each core in our processor setup, and perform the high efficiency at the average current level of the core obtained from the benchmark simulations.
– We set the number of VRs and cores in one group of the VR-to-core networks to 4.
– We determined the width of the network switch as 8mm based on 45nm technology.
Experimental work (3/4)
Mar-27-14 20
0 2000 4000 6000 8000 10000 12000 14000 16000 18000
20
40
60
80
0 2000 4000 6000 8000 10000 12000 14000 16000 180000
2000
4000
6000
8000
0 2000 4000 6000 8000 10000 12000 14000 16000 180000
5000
0 2000 4000 6000 8000 10000 12000 14000 16000 180000
1000
2000
3000
4000
5000
0 2000 4000 6000 8000 10000 12000 14000 16000 180000
5000
0 2000 4000 6000 8000 10000 12000 14000 16000 180000
1000
2000
3000
4000
5000
data1data2data3data4data5data6
Load current (A)
Effic
ienc
y (%
)
Output voltage: 1.2VOutput voltage: 1.05VOutput voltage: 0.95VOutput voltage: 0.83VOutput voltage: 0.75V
0 2 4 6 8 10 12 14 16 18
20
40
60
80
1234
56
Input voltage: 12V
Pow
er lo
ss (W
)
– We performed LTspice simulation to acquire the VR efficiencies for the various load current under the five output voltage levels.
• Simulation results– We define GVR and Gtotal as the energy loss reduction from VRs,
and total energy saving, respectively.– When we ran Streamcluster in 8-core simulator setup, the
resulted enhancements showed GVR = 24.06% and Gtotal = 9.96% from the reactive VRCon, and GVR = 35.86% and Gtotal = 14.85% from both reactive and proactive VRCon.
– The below shows the simulation results from various applications under the different simulator setup.• (I), (II) and (III) indicates 16cores, 8cores and 4cores setups, respectively.
Experimental work (4/4)
Mar-27-14 22
• We addressed the problem of power conversion efficiency in the multicore platform.– Significant power is dissipated by the multiple VRs
to support per-core DVFS.• We proposed the VR consolidation methods with the
configurable VR-to-core distribution network. – The reactive VRCon was presented to configure the network to
enhance the power conversion efficiency under the predetermined DVFS levels.
– The proactive VRCon was proposed to determine new DVFS levels for maximizing system-wide energy saving without performance degradation.
Conclusion
Mar-27-14 23
• Thank you!
Q&A
Mar-27-14 24