Download - Caching for Bursts (C-Burst): Let Hard Disks Sleep Well and Work Energetically Feng Chen and Xiaodong Zhang Dept. of Computer Science and Engineering The.


Caching for Bursts (C-Burst): Let Hard Disks Sleep Well and Work Energetically

Feng Chen and Xiaodong ZhangDept. of Computer Science and

EngineeringThe Ohio State University

Power Management in Hard Disk Power management is a requirement in computer

system design Maintenance cost Reliability & Durability Environmental effects Battery life

Hard disk drive is a big energy consumer, for I/O intensive jobs e.g. hard disk drives account for 86% of total energy

consumption in EMC Symmetrix 3000 storage systems. As multi-core CPUs become more energy efficient, disks

are less. 2

Standard Power Management Dynamic Power Management (DPM)

When disk is idle, spin it down to save energy When a request arrives, spin it up to service the

request Frequently spin up/down a disk incurs substantial

penalty: high latency and energy Disk energy consumption is highly dependent on

the pattern of disk accesses (periodically sleep and work)


Ideal Access Patterns for Power Saving


Ideal Disk Power Saving Condition Requests to disk form a periodic and burst pattern Hard disk can sleep well and work energetically

timeDisk accesses

Disk Accesses in bursts

Long Disk Sleep Interval

Increasing Burstiness of disk accesses is the key to saving disk energy

Buffer Caches Affect Disk Access Patterns


Buffer caches in DRAM are part of hard disk service Disk data are cached in main memory (buffer cache) Hits in buffer caches are fast, and avoid disk accesses The buffer cache is able to filter and change disk

access streams


Buffer Cache

Hard Disk


disk accesses



Existing Solution for Burst in Disks


Forming Burst Disk Accesses with Prefetching Predict the data that are likely to be accessed in the future Preload the to-be-used data into memory Directly condense disk accesses into a sequence of I/O bursts

Both energy efficiency and performance could be improved

Limitations Buffer caching share the same buffer space. Energy-unaware replacement can easily change burst patterns created

by prefetching Aggressive prefetching shrinks available caching space and demands

highly effective caching Energy-aware caching policy can effectively complement prefetching No work has been done.

Reference – Papathanasiou and Scott, USENIX’04

Caching can Easily Affect Access Patterns


An example

Original Disk Accesses



BurstyDisk Accessesorganized by Prefetching

a b c

Buffer Cache


3 blocks to be evicted byEnergy-Unaware CachingUnfortunately, these blocks

to be accessed in a non-bursty way

Disk still cannot sleep well

Solely relying on prefetching is sub-optimal,Energy-aware caching policyis needed to create burst

disk accesses

Caching Policy is Designed for Locality


Standard Caching Policies Identify the data that are unlikely to be accessed in the

future (LRU) Evict the not-to-be-used data out from memory They are performance-oriented and energy-unaware

Most are locality-based algorithms, e.g. LRU (clock), LIRS (clock-pro),

Designed for reducing the number of disk accesses No consideration of creating burst disk access pattern

C-Burst (Caching for Bursts) Our objectives: effective buffer caching

To create burst disk accesses for disk energy saving To retain the performance (high hit ratios in buffer cache)



Motivation Scheme Design

History-based C-Burst (HC-Burst) Prediction-based C-Burst (PC-Burst) Memory Regions Performance Loss Control

Performance Evaluation Programming Multimedia Multi-role Server


Restructuring Buffer Caches


Buffer cache is segmented into two regions Priority region (PR)

Hot blocks are managed using LRU-based scheme Blocks w/ strong locality are protected in PR Overwhelming memory misses can be avoided Retain the performance

Energy-aware region (EAR) Cached blocks are managed using C-burst schemes

Non-burst accessed blocks are kept here Re-accessed block (strong locality) is promoted into PR

Region size is dynamically adjusted Both performance and energy saving are considered

Priority Region(LRU)

Energy Aware Region


Buffer Cache


Our focus in this talk

History-based C-Burst (HC-Burst)


Distinguish different streams of disk accesses Multiple tasks run simultaneously in practice History record can help us to distinguish them

Various tasks feature very different access patterns Burst – e.g. grep, CVS, etc. Non-Burst – e.g. make, mplayer, etc.

Accesses reaching the hard disk is a mixture of both burst and non-burst accesses

In aggregate, the disk access pattern is determined by the most non-burst one

Basic Idea of History-based C-Burst (HC-Burst)






Grep+ make

bursty access long disk Idle intervals

non-bursty accessshort disk Idle Period

Aggregate Results:non-bursty access


a b c d e f g

A B C Da

b I J K Lc de

E F G H f g

Buffer Cache

To cache the blocks being accessed in a non-burst pattern, to reshape disk accesses to a burst pattern

History-based C-Burst (HC-Burst)


Epoch Application’s access pattern may change over time Execution is broken into epochs, say T seconds for each

Too small or too large are both undesired Our choice – T = (Disk Time-out Threshold ) / 2

No disk spin-down happens during one epoch with disk accesses

Distribution of disk accesses during one epoch can be ignored





a b c d e f g


History-based C-Burst (HC-Burst)


Block Group Blocks accessed during one epoch by the same IOC are grouped

into a block group Each block group is identified by an process ID and an epoch

time The size of a block group indicates the burtiness of data access

pattern of one application The larger a block group is, the more bursty disk accesses are





a b c d e f g

epochBlock Groups

History-based C-Burst (HC-Burst)


HC-Burst Replacement Policy Two types of blocks should be evicted

Data blocks that are unlikely to be re-accessed Blocks with weak locality (e.g. LRU blocks)

Data blocks that can be re-accessed with little energy Blocks being accessed in bursty pattern

Victim block group – the largest block group Blocks that are frequently accessed would be promoted

into PR Large block group often holds infrequently accessed blocks

Blocks that are accessed in a bursty pattern stay in a large BG Large block group holds blocks being accessed in bursts

Level 10

History-based C-Burst (HC-Burst)


Multi-level Queues of Block Groups 32-level queues of block groups A block group of N blocks stays in queue Block groups on one queue are linked in the order of their epoch times Block groups may move upwards/downwards, if # of blocks changes The victim block group is always the LRU block group on the top queue w/ valid block groups

Level 0

Level 1

Level 9

Grep# of blks= 1024Epoch ID = 10

Block is promoted to PRGrep# of blks= 1023Epoch ID = 10

Victim block groupLRU + MB

make# of blks= 1023Epoch # 8

Block is demoted to EARMake# of blks= 1024Epoch # 8

Epoch TimeLeast Recent Used (LRU) Most Recent Used (MRU)


Least Bursty (LRU)

Most Bursty (MB)

Prediction-based C-Burst (PC-Burst)


Main idea Certain disk access events are known and can be predicted. Evicting a block that is to be accessed during a short interval and close to a

deterministic disk access




short disk idle interval

Block A Block B

long disk idle interval

predicted block


With deterministic disk accesses and block reaccess time , selectively evicting blocks to be accessed in a short

intervals and holding blocks to be accessed in long idle intervals

Holding Block B can avoidBreaking a long idle interval

Prediction-based C-Burst (PC-Burst)


Multi-level Queues of Block Groups in PC-Burst 32-level queues of block groups – each level has two queues

Prediction Block Group (PBG) – blocks to be accessed in the same future epoch time History Block Group (HBG) – blocks being accessed in the same history epoch time

Reference Points (RP) - resent deterministic disk accesses Victim block group

The PBG on the top level, in the shortest interval, closest to a RP, to be accessed in the furthest future

If no PBG is found, search the same level queue of HBG

Level 0

Level 10

Shortest Interval

Victim block group

long Interval

Performance Loss Control


Why there is performance loss? Increase of memory misses due to energy-oriented caching policy

How to control performance loss Basic Rule

Control the size of Energy Aware Region

Estimating performance loss Ghost buffer (GB)

LRU replacement policy The increase of memory misses (M)

Blocks not found in EAR, but found in GB The average memory miss penalty (P)

Observed average I/O latency Performance loss

L = M x P

Automatic tune Energy Aware Region size L < Tolerable Performance Loss

Enlarge EAR region size L > Tolerable Performance Loss

Shrink EAR region size

Priority Region

Energy Aware Region

Main Memory

Ghost Buffer (LRU)

A memory missPriority Region

Energy Aware Region

Perof. Loss > Tolerable RateShrink Region Size

Priority Region

Energy Aware Region

Perf. Loss < Tolerable RateEnlarge Region size

Check GhostBuffer

Hit in Ghost BufferPerf. Loss ++

Performance Evaluation


Implementation Linux Kernel 5,500 lines of code in buffer cache management

and generic block layer

Experimental Setup Intel Pentium 4 3.0GHz 1024 MB memory Western Digital WD160GB 7200RPM hard disk drive RedHat Linux WS4 Linux kernel Ext3 file system

Performance Evaluation


Methodology Workloads run on experiment machine Disk activities are collected in Kernel on experiment machine Disk events are sent via netconsole to an monitor machine Disk energy consumption is calculated based on collected log of disk

events using disk power models off line

Experiment machine


disk activitieslog

Gigabit LAN

Performance Evaluation


Emulated Disk Models Hitachi DK23DA Laptop Disk IBM UltraStar 36Z15 SCSI Disk

Hitachi DK23DA IBM UltraStar 36Z15

Capacity 30GB 18.4GB

Cache 2MB 4MB

RPM 4200 15000

B/W 35MB/sec 53MB/sec

Active Power 2 watt 13.5 watt

Idle Power 1.6 watt 10.2 watt

Standby Power 0.15 watt 2.5 watt

Spin up 1.6 sec / 5 J 10.9 sec / 135 J

Spin down 2.3 sec / 2.94 J 1.5 sec / 13 J

Performance Evaluation


Eight applications 3 applications w/ bursty data accesses 5 applications w/ non-bursty data accesses

Three Case Studies Programming Multi-media Processing Multi-role servers

Name Description MB/ epoch


Make Linux kernel builder

1.98 119.7

Vim Text editor 0.006 0.395

Mpg123 Mp3 player 0.15 3.69

Transcode Video converter 3.2-6.5 10.9-19.1

TPC-H Database query #17

7.3 476.7

Grep Textual search tool

102.2 10186.6

Scp Remote copy tool

51.5-53.8 135-139

CVS Version control tool

19.9 1705.7

Performance Evaluation


Case Study I – programming Applications: grep, make, and vim

Grep – bursty workload Make, vim – non-bursty workload

C-Burst schemes protects data set of make from being evicted by grep Disk idle intervals are effectively extended

Performance Evaluation


Case Study I – programming Applications: grep, make, and vim

Grep – bursty workload Make, vim – non-bursty workload

C-Burst schemes protects data set of make from being evicted by grep Disk idle intervals are effectively extended

over 30% energy saving Nearly 0% intervals > 16 sec

Over 50% intervals > 16 sec

Performance Evaluation


Case Study II – Multi-media Processing Applications: transcode, mpg123, and scp

Mpg123 – its disk accesses serve as deterministic accesses Scp – bursty disk accesses Transcode – non-bursty accesses

PC-Burst achieves better performance than HC-Burst PC-Burst can accommodate deeper prefetching by efficiently using caching space

Around 30% energy savingOver 70% intervals > 16 sec

Performance Evaluation


Case Study III – Multi-role Server Applications: TPC-H # 17, and CVS

TPC-H – non-bursty disk accesses ( very random I/O ) CVS – bursty disk accesses

Dataset of TPC-H is protected in memory Performance of TPC-H is significantly improved

No improvement on disk idle interval

Reduced I/O latency

over 35% energy saving

Our Contributions


Design a set of comprehensive energy-aware caching policies, called C-Burst, which leverages the filtering effect of buffer cache to manipulate disk accesses

Our scheme does not rely on complicated disk power models and requires no disk specification data, which means high compatibility to different hardware

Our scheme does not assume using any specific disk hardware, such as multi-speed disks, which means our scheme is beneficial to existing hardware

Our scheme provides flexible performance guarantees to avoid unacceptable performance degradation

Our scheme is fully implemented in Linux kernel, and experiments under realistic scenarios show up to 35% energy saving with minimal performance loss



Energy efficiency is a critical issue for computer system design

Increasing disk access burstiness is the key to achieving disk energy conservation

Leveraging filtering effect of buffer cache can effectively shape the disk accesses to an expected pattern

HC-Burst scheme can distinguish different access pattern of tasks and create a bursty stream of disk accesses

PC-burst scheme can further predict blocks’ re-access time and manipulate the timing of future disk accesses

Our implementation of C-Burst schemes in Linux kernel and experiments show that C-Burst schemes can achieve up to 35% energy saving with minimal performance loss




[USENIX04] A. E. Papathanasiou and M. L. Scott, Energy Efficient prefetching and caching. In Proc. of USENIX’04

[EMC99] EMC Symmetrix 3000 and 5000 enterprise storage systems product description guide., 1999.

Memory Regions


Buffer cache is segmented into two regions Priority region (PR)

Hot blocks are managed using LRU-based scheme Blocks w/ strong locality are protected Overwhelming memory misses can be avoided

Energy-aware region (EAR) Cold blocks are managed using C-burst schemes Victim blocks are always reclaimed from EAR Re-accessed block is promoted into PR Accordingly, the coldest block in PR is demoted

Region size is tuned on line Both performance and energy saving are considered



Energy Aware Region

Buffer Cache

Evict a cold block

Insert a new block

Demote a hot block

Promote a cold block



Limitations Energy-unaware caching policy can significantly affect

periodic bursty patternscreated by prefetching Improperly evicting a block may easily break a long disk idle


Aggressive prefetching shrinks available caching space and demands highly effective caching Effective prefetching needs large volume of memory space,

which raises high memory contention for caching

Energy-aware caching policy can effectively complement prefetching When prefetching works unsatisfactorily, caching can give a

hand by carefully selecting blocks for eviction



Prefetching Predict the data that are likely to be accessed in the future Preload the to-be-used data into memory Directly condense disk accesses into a sequence of I/O

bursts Both energy efficiency and performance could be improved

Caching Identify the data that are unlikely to be accessed in the

future Evict the not-to-be-used data out from memory Traditional caching policies are performance-oriented

Designed for reducing the number of disk accesses No consideration of creating bursty disk access pattern

Prediction-based C-Burst (PC-Burst)


Prediction of Deterministic Disk Accesses Track each task’s access history and offer each task credits of [-32,


Feed-back based Prediction Compare observed interval with predicted interval

If prediction is proved wrong, reduce a task’s credits If prediction is proved right, increase a task’s credits

Task w/ credit less than 0 is unpredictable

Repeated mis-prediction increases the charge of credits exponentially Occasional system dynamics only charge a task’s credit slightly Real patter change quickly decreases a task’s credits