Power Efficient Idle Injection - events.static.linuxfound.org
Transcript of Power Efficient Idle Injection - events.static.linuxfound.org
1
Power Efficient Idle Injection
Jacob Pan
Intel Open Source Technology Center
LinuxCon Japan 2015
2
Agenda
• Introduction to idle injection
• Techniques available in Linux
• Experiment results
• Future work
3
Why Injecting Idle?
• Primary: Thermal/Power limiting
• Secondary:
• Performance management
• Pay per use
• Idle power efficiency
4
Understanding Processor Idle States/C-States
5
Motivation For Idle Injection: Increasingly lower
Idle power
Deep idle power is negligible!
*TDP=Thermal Design Power0.32
1.9
14
0
2
4
6
8
10
12
14
16
95% pc7 95% pc2 TDP C0P
ow
er
(watt)
Idle Power vs Running Power On Broadwell
6
When to use idle injection?
Idle injection at LFM
(low frequency mode)
Idle injection at LFM
(low frequency mode)
7
Idle Injection in Linux
• Intel PowerClamp driver
• Scheduler throttling, RT or CFS bandwidth control
8
Intel Power Clamp V1
(current design in mainline kernel)
The idea: play idle!
9
PowerClamp v1 timeline of idle injection
sched tick
throttled
unthrottled
RT kthread
10
Limitations of Intel PowerClamp V1
• CPU appears busy while playing idle
• Scheduler ticks not stopped in NOHZ idle• Removal of tick_nohz_idle_enter/exit() API
• RCU grace period
• Relies on timely jiffies updates
11
Limitations of Intel PowerClamp V1
CPU appears busy while playing idle
12
Limitations of Intel PowerClamp V1
Scheduler ticks not stopped in NOHZ idle
• Interrupted sleep is less efficient in power
•Removal of tick_nohz_idle_enter/exit() API
•RCU grace period
13
Limitations of Intel PowerClamp V1
Relies on secondary timing source
• timely jiffy updates
• periodic timers
14
Scheduler Based Throttling
Normal tasks under completely fair scheduling (CFS) class› Bandwidth control via CPU control group/container
› Runqueue throttling by enqueue/dequeue tasksRoot CG
CG1
CG1.1 CG1.2
CG2
CG2.1
15
Time chart of CFS Bandwidth Control(two cgroups multithreaded workload)
• Pros: No fake idle task, Finer per cgroup controls
• Cons: No synchronization loss of package C-state opportunities
cgroup1
cgroup2
unthrottlethrottle
throttle unthrottle
16
Power Clamp V2(work in progress)
• Runqueue throttling of CFS class
• Synchronization around rounded Ktime instead of jiffies
17
Time Chart Powerclamp v1 vs. v2
18
Experiment Data
• Goals:
• Comparing Power Efficiency
• Scalability
• CPU HW design trend: old vs. new
• Configurations:
• CPUs: Ivy Bridge/Haswell/Broadwell clients, Haswell EX server
• Workload:fspin by Len Brown. CPU bound, floating
• Test case: Inject idle from 0 to 50% at 5% increment
19
Power and Performance Control V1 vs. V2
20
Power Efficiency Comparison On A Client Platform
21
Scalability Tests V1 vs. V2
(144 core 4 socket Haswell EX)
22
Power Efficiency Comparison On A Server Platform
23
Comparing Deep vs. Shallow Package C-States
(powerclamp v2)
24
Conclusions
• Idle injection can effectively reduce power beyond energy efficient frequency
• With deeper package C-states, can achieve near linear performance and power
reduction
• Scheduler runqueue throttling results in cleaner and more efficient solution
• Align activities results in significant power savings
25
Future plan
• Better handling of interrupts
• Integration with scheduler
• Synchronize with devices with latency tolerance
• Work with hardware duty cycling
26
Backups
27
Time Chart of Redesigned Power Clamp
28
Entering Idle Injection Period
29
Exiting Idle Injection