Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini...
-
Upload
beryl-black -
Category
Documents
-
view
213 -
download
0
Transcript of Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini...
Improving Disk Throughput in Data-Intensive Servers
Enrique V. Carrera and Ricardo Bianchini
Department of Computer ScienceRutgers University
Introduction
Disk drives are often bottlenecks Several optimizations have been proposed
• Disk arrays
• Fewer disk reads using fancy buffer cache mgmt
• Optimized disk writes using logs
• Optimized disk scheduling
Disk throughput still problem for data-intensive servers
Modern Disk Drives
Substantial processing and memory capacity
Disk controller cache• Independent segments = sequential streams
• If #streams > #segments, LRU segm is replaced
• On access, blocks are read ahead to fill segment
Disk arrays• Array controller may also cache data
• Striping affects read-ahead
Key Problem
Controller caches not designed for servers• Sequential access to small # large files
• Read-ahead of consecutive blocks
• Segment is unit of allocation and replacement
Data-intensive servers• Small files
• Large # concurrent accesses
• Large # blocks often miss in the controller cache
This Work
Goal• Management techniques for disk controller
caches that are efficient for servers
Techniques• File-Oriented Read-ahead (FOR)
• Host-guided Device Caching (HDC)
Exploit processing and memory of drives
Architecture
File-Oriented Read-ahead
Disk controller has no notion of file layout
Read-ahead can be useless for small files• Disk utilization is not amortized
• Useless blocks pollute the controller cache
FOR only reads ahead blocks of same file
File-Oriented Read-ahead
FOR needs to know layout of files on disk• Bitmap of disk blocks kept by controller
• 1 block is logical continuation of previous block
• Initialized at boot, updated on metadata writes
# blocks to read-ahead = # consecutive 1’s or max read-ahead size
File-Oriented Read-ahead
FOR could underutilize segments, so allocation and replacement based on blocks
Replacement policy: MRU
FOR benefits• Lower disk utilization
• Higher controller cache hit rates
Host-guided Device Caching
Data-intensive servers rely on disk arrays, so non-trivial amount of cache space
Current disk controller caches are speed matching and read-ahead buffers
More useful if each cache can be managed directly by the host processor
Host-guided Device Caching
Our evaluation:• Disk controllers permanently cache data with
most misses in buffer cache
• Each controller caches data stored on its disk
• Assumes block-based organization
Support for three simple commands• pin_blk()
• unpin_blk()
• flush_hdc()
Host-guided Device Caching
Execution divided into periods to determine:• How many blocks to cache; which blocks
those are; when to cache them
HDC benefits • Higher cache hit rate
• Lower disk utilization
Tradeoff: space for HDC and read-aheads
Methodology
Simulation of 8 IBM Ultrastar 36Z15 drives attached to non-caching Ultra160 SCSI card
Logical disk blocks striped across array
Contention for buses, memories, and other components is simulated in detail
Synthetic + real traces (Web, proxy, file)
Real Workloads
Web: I/O time as function of striping unit size
HDC: 2MB
Real Workloads
Web: I/O time as function of HDC memory size
Stripes: 16KB
Real Workloads
Summary• Consistent and significant performance gains
• Combination achieves best overall performance
Related Work
Techniques external to disk controllers
Controller cache different than other caches• Lack of temporal locality
• Orders of magnitude smaller than main memory
• Read-ahead restricted to sequential blocks
Explicit grouping• Grouping needs to be found and maintained
• Segment replacements may eliminate benefits
Related Work
Controller read-ahead & caching techniques• None considered file system info, host-guided
caching, or block-based organizations
Other disk controller optimizations• Scheduling of requests
• Utilizing free bandwidth
• Data replication
• FOR and HDC are orthogonal
Conclusions
Current controller cache management is inappropriate for servers
FOR and HDC can achieve significant and consistent increases in server throughput
Real workloads show improvements of 47, 33 and 21% (Web, proxy, and file server)
Extensions
Strategies for servers that use raw I/O
Better approach than bitmap
Array controllers that cache data and hide individual disks
Impact of other replacement policies and sizes for the buffer cache
More Information
http://www.darklab.rutgers.edu
Synthetic Workloads
I/O time as function of file size
Synthetic Workloads
I/O time as function of simultaneous streams
Synthetic Workloads
I/O time as function of access frequency
Synthetic Workloads
Summary• No read-ahead hurts performance for files > 16KB
• No effect if simply replace segments with blocks
• FOR gains increase as file size decreases and # simultaneous streams increases
• HDC gains increase as requests are shifted toward a small # blocks
• FOR gains decrease as % writes increases
Synthetic Workloads
I/O time as function of percentage of writes