Performance directed energy management using BOS technique
description
Transcript of Performance directed energy management using BOS technique
Performance directed energy management using
BOS technique
Pratap Ramamurthy
Ramanathan Palaniappan
University of Wisconsin-Madison
Outline
Introduction BOS Mechanisms Policies Results Conclusion
Introduction
Introduction Energy consumption in mobile devices and laptops
Memory could consume 50% more power than processors
Is there hardware support ?
RAMBUS devices provide capability
Problem Statement How to save power in memory without
hurting performance ? Solution: BOS
Estimating the optimal amount of memory based on paging activity.
Dynamic resizing of memory !
What is BOS ? BOS – Ballooning in-OS technique Tracks current memory requirements Tracks chip access pattern Powers down memory chips Minimizes power consumption
BOS Mechanisms Ballooning in-OS technique
Power Down Page Migration Page Reservation Invisible Buddy
Chip recovery (Power up) Chip selection
Intercept memory accesses
BOS policies Power decision
Disk activity, # chips powered on Epoch
Time interval between two decisions Chip selection
Access pattern
Inferences Operating point balances disk accesses and memory
power consumption Cost of thrashing offsets the power saved from a
single chip Most media applications do not need the entire RAM Least allocated chips are not the least accessed ones
BOS Architecture
Get systemparameters
Decide to Power off/on
Select Chip
Loop all pages in chip
If occupied
MigrateIf page is free
Pin the page
If Locked or reserved
Wait & Pounce
Recover PowerPages
Clear flags & unPin
Update
Chip Power Table(CPT)
Wait an Epoch
OFF
Do NothingON
Wake up
Kpower_d
BOS Mechanisms
Mechanisms In-OS Ballooning Techniques
Page Migration Page Reservation Invisible Buddy Chip Recovery
Chip Selection Intercept Memory Accesses
No Actual Power Down
Power down
101010101010101000101010001010
101010101010101000101010001010
101010101010101000101010001010
101010101010101000101010001010
101010101010101000101010001010
101010101010101000101010001010
101010101010101000101010001010
101010101010101000101010001010
101010101010101000101010001010
101010101010101000101010001010
Under Memory pressure ???
So replace page using LRU…
And Migrate! DISK
Selected Chip
101010101010101000101010001010
101010101010101000101010001010
101010101010101000101010001010
101010101010101000101010001010
Page Migration When to migrate ? Kinds of pages
File pages Anonymous pages
Swap cache pages
Free Pages
Page Migration Mechanism Mechanism
Data Transfer Reverse map (thanks to Linux Kernel 2.6) Remap Dependant data structures
Page Table Active LRU and Inactive LRU Page cache Swap cache Buffers Flags
101010101010101000101010001010
101010101010101000101010001010
101010101010101000101010001010
101010101010101000101010001010
Processes
Buffers
Page MigrationMajor Data Structures Updated while Migrating
LRU lists
1. Memcopy2. Remapping using rmap3. Update buffers4. Remove from LRU lists
101010101010101000101010001010
Page Reservation Ways to Pin pages
Remove from LRU Lists Reserve Page Lock Page Page count
Free Pages Buddy allocator
List of free pages Buddy Order (0-11) Locality
How to deal with already free pages?
2
1
Request for a free page of order 0Make the ‘power’ pages invisibleRequest for a page of order 1 (Invisibility in action)
Invisibility in Action (Buddy allocator)
0
Invisible Buddy Power Bit Invisibility – divert page allocation Other methods?
Why not always remove the page completely from the buddy allocator? Because the free pages are not necessarily in the
buddy
Recovery Page in the buddy
Clear power bit That’s it !
Pages that were migrated Clear Power bit Add to the buddy
Reverse Recovery Recover the last pinned page and move backwards
Partial Recovery Why would you abort a chip?
Locked pages Reserved pages IO activity
Easy solution Abort chip
Better solution Wait and Pounce
Policies
Policy Epoch
Time interval between two decisions Power decision
Disk activity, # chips powered on Chip select
Access pattern
Chip Selection Mechanism Chip pattern table Access history over 32 epochs Form a number with the bits
and Select the minimum ! How to monitor chip access ?
Referenced bit in h/w Clear bits Examine every epoch
1
1
0 0
0
1
0
0
0
1
0 0
1
1
0
0
1
1
1 0
1
1
0
0
1
1
0 1
1
1
1
0
0
1
0 0
1
1
0
1The least accessed chip is # 7
The least accessed chip is # 6
Results
ResultsPage Distribtuion
0
500
1000
1500
2000
2500
3000
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Chip Number
# P
ag
es
Allo
ca
ted
rhn-applet-gui
kiconedit
kolourpaint
Chip access pattern
ResultsFile access time (vs) Power down
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
1800000
OFF = 0 OFF = 10 OFF = 15 OFF =20
# chips powered OFF
Tim
e(m
icro
seco
nd
s)
100
200
300
400
# Off Available Memory (MB)
0 512
10 352
15 272
20 192
Results
BOS architecture
Get systemparameters
Decide to Power off/on
Select Chip
Loop all pages in chip
If occupied
MigrateIf page is free
Pin the page
If Locked or reserved
Wait & Pounce
Recover PowerPages
Clear PowerBit & unPin
Update
Chip Power Table(CPT)
Wait an Epoch
OFF
Do NothingON
Wake up
Kpower_d
Power down/up decisionIf ( # disk_access < α )
power_down()
elseif ( # disk_access > β )
power_up()
elseif (α < # disk_access < β )
take_no_action()
α, β – thresholds, α < β
Results
Results Other workloads
Application # chips ON Memory used
Xmms Mp3 player 7 112 MB
Totem media player (X-server) 25 400 MB
Mozilla,X-server, Open-office (together)
12 192 MB
Conclusion
Conclusion
BOS technique is feasible
BOS helps us to vigorously control memory size
Chip allocation and access pattern have no correlation
Several applications do not require the full memory
Threshold based policy gives reasonable performance
An incorrect chip select could increase migration overhead
Summary BOS philosophy Implementation of BOS
Mechanism Page migration Invisible buddy Chip recovery Chip access pattern
Policy Study of sample workloads Threshold based policy
Acknowledgements lxr.linux.no We would like to thank the following people for
their valuable time and for the immensely helpful discussions in the course of this project Remzi H. Arpaci-Dusseau Muthian Sivathanu Vijayan Prabhakaran Lakshmi Bairavasundaram
Amit Jhawar for resources
?
References1. Huang et.al. “Design and Implementation of Power-Aware
Virtual Memory”, USENIX 2003.
2. Lebeck et.al. “ Power Aware Page Allocation”, ASPLOS 2000.
3. Li et.al. “Performance directed energy management for main memory and disks”, ASPLOS 2004.
4. Delaluz et.al. “Scheduler-Based DRAM Energy Management”, DAC 2002.
Related Work Power Aware Virtual Memory [1]
Per process chip select Localized per process page allocation Performance not considered
Execution Time based energy management [ref] Hardware support for energy management
Various power modes in RAMBUS Various power modes in disks
Tough nuts to crack Absolutely no Documentation for Linux 2.6 memcpy() Alternate recovery mechanism Multiple chip select policy Alternate Allocator mechanism Removing from LRU