Post on 02-Jan-2016
Informed Prefetching and Caching
R. Hugo Patterson, Garth A. Gibson, Eka Ginting, Daniel
Stodolsky, Jim Zelenka
Contribution One of basic functions of file system:
Management of disk accesses Management of main-memory file buffers
Approach: Use hints from I/O-intensive applications to
prefetch aggressively enough to eliminate I/O stall time while maximizing buffer availability for caching
How to allocate cache buffers dynamically among competing hinting and non-hinting applications for the greatest performance benefit
Balance caching against prefetching
Distribute cache buffers among competing applications
Motivation Storage parallelism CPU I/O performance dependence Cache cache-hit ratios I/O intensive applications:
Amount of data processed >> file cache size
Locality is poor or limited Frequently non-sequential accesses Large I/O stall time/total execution time Access patterns are largely predictableHow can I/O workloads be improved to take
full advantage of the hardware that already exists?
ASAP: the four virtues of I/O workloads
Avoidance: not a scalable solution to the I/O bottleneck
Sequentiality: scale for writes but not for reads Asynchrony: scalable through write buffering, scaling
for reads depends on prefetching aggressiveness Parallelism: scalable for explicitly parallel I/O
requests; but for serial workloads, scalable parallelisms achieved by scaling no. of asyn requests
Asynchrony eliminates write latency, and parallelism provides throughput.No existing techniques scalably relieve the I/O bottleneck for reads. Aggressive prefetching
Hints Historical Information: LRU cache
replacement algorithm Sequential readahead: prefetching up to
64 blocks ahead when it detects long sequential runs
Disclosure: hints based on advance knowledge A mechanism for portable I/O optimizations Providing evidence for a policy decision Conforms to software engineering principles of
modularity
Informed Prefetching System: TIP-1 implemented in OSF/1, which has 2 I/O
optimizations Application: 5 I/O-intensive benchmarks single
threaded, data fetched from FS Hardware: DEC3000/500 workstation, 1 150 MHz
21064 processor, 128 MB RAM, 5 KZTSA fast SCSI-2 adapters, each hosting 3 HP2247 1GB disks, 12MB (1536 x 8KB) cache
Stripe unit: 64 KB Cluster prefetch: 5 prefetches. Disk scheduler: striper SCAN
512 buffers (1/3 cache)Unread hinted prefetch
LRUcount_unread_buffers--
Postgres Join of two relations
Outer relation: 20,000 unindexed tuples (3.2 MB)
Inner relation: 200,000 tuples (32 MB) and indexed (5 MB)
Output about 4,000 tuples written sequentially
MCHF Davidson algorithm MCHF: A suite of computational-chemistry
programs used for atomic-structures calculations Davidson algorithm: an element of MCHF that
computes, by successive refinement, the extreme eigenvalue-eigenvector pairs of a large, sparse, real, symmetric matrix stored on disk
Matrix size: 17 MB The algorithm repeatedly accesses the same
large file sequentially.
MCHF Davidson algorithm (cont’d) Hints disclose only sequential access in
one large file. OSF/1’s aggressive readahead performs
better than TIP-1. Neither OSF/1 nor informed prefetching
alone uses the 12 MB of cache buffers well.
LRU replacement algorithm flushes all of the blocks before any of them are reused.
Informed caching Goal: allocate cache buffers to minimize
application elapsed time Approach: estimate the impact on execution
time of alternative buffer allocations and then choose the best allocation
3 broad uses for each buffer: Caching recently used data in the traditional
LRU queue Prefetching data according to hints Caching data that a predictor indicates will
be reused in the future
Cost-benefit analysis System model: from which the
various cost and benefit estimates are derived
Derivations: for each component Comparison: how to compare the
estimates at a global level to find the globally least valuable buffer and the globally most beneficial consumer
System assumptions Assumptions:
Modern OS with a file buffer cache running on a uniprocessor with sufficient memory to make available number of cache buffers
Workload emphasized on read-intensive applications
All application I/O accesses request a single file block that can be read in a single disk access and that the requests are not too bursty.
System parameters are constant. Enough parallelism, no congestion
System model
/ /( )I O CPU I OT N T T Elapsed time
# I/O req. Avg time to service an I/O req.Avg app CPU time between
requests
/hit
I Ohit driver disk miss
TT
T T T T
Overhead: allocating of a buffer, queuing the request at the drive, and servicing the interrupt when the I/O completes
Cost of deallocating LRU bufferGiven ( ): cache-hit ratio, n: # of buffersH n
( ) ( ) (1 ( ))LRU hit missT n H n T H n T ( ) ( 1) ( ) ( )( )LRU LRU LRU miss hitT n T n T n H n T T
'()(1)() HnHnHn
The benefit of prefetching Prefetching a block can mask some of the latency of a disk
read, is the upper bound of the benefit of fetching a block.
If the prefetch can be delayed and still complete before it is needed, we consider there to be no benefit from starting the prefetch now.
diskT
diskT
prefetch consume
. . . . . . . . . . . . . . . . . . . .X requests
x
The benefit of prefetching the block that will be needed:
B ( )disk cpu hit driverT x T T T
x
Assume 0, 0
B / prefectch horizon
cpu driver
disk hit disk hit
T T
T xT P T T
There is no benefit from prefetching further than P
Comparison of LRU cost to prefetching benefit Shared resources: cache buffers Common currency: T/access = T/buffer
Rate of hinted accesses
( )( )xh d miss hit
Br r H n T Tx
Rate of unhinted demand accesses
A buffer should be reallocated from the LRU cache for prefetching
The cost of flushing a hinted block
When should we flush a hinted block?
flush Hint access. . . . . . . . . . . . . . . . . . . .y accesses
prefetch back
Py-P
when
when 1
driver
flush
driver y
Ty P
y PT
T By P
Cost:
Putting it all together: global min-max
3 estimates: Which block should be replaced when a buffer is
needed for prefetching or to service a demand request? The globally least valuable block in the cache.
Should a cache buffer be used to prefetch data now? Prefetch if the expected benefit is greater than the expected cost of flushing or stealing the least valuable block.
, ,LRU x flushT B T
Separate estimators for LRU cache and for each independent Stream of hints
Value estimators LRU cache: i- th position if the LRU queue
Hint estimators:
Global value=max(value_LRU,value_hint) Globally least valuable block = min(global
value)
Value ( )( )miss hitH i T T
when
value
when 1
driver
driver y
Ty P
y P
T By P
A global min-max valuation of blocks
Informed caching example: MRU
The informed cache manager discovers MRU caching without being specifically coded to implement this policy.