ANALYZING STORAGE SYSTEM WORKLOADS

25
ANALYZING STORAGE SYSTEM WORKLOADS Paul G. Sikalinda, Pieter S. Kritzinger {psikalin, psk}@cs.uct.ac.za, DNA Research Group Computer Science Department University of Cape Town, and Lourens O. Walters. [email protected] Mosaic Software Rondebosch Cape Town Republic of South Africa.

description

ANALYZING STORAGE SYSTEM WORKLOADS. Paul G. Sikalinda, Pieter S. Kritzinger {psikalin, psk}@cs.uct.ac.za, DNA Research Group Computer Science Department University of Cape Town, and Lourens O. Walters. [email protected] Mosaic Software Rondebosch Cape Town Republic of South Africa. 2. - PowerPoint PPT Presentation

Transcript of ANALYZING STORAGE SYSTEM WORKLOADS

Page 1: ANALYZING STORAGE SYSTEM WORKLOADS

ANALYZING STORAGE SYSTEM WORKLOADS

Paul G. Sikalinda, Pieter S. Kritzinger

{psikalin, psk}@cs.uct.ac.za, DNA Research Group

Computer Science DepartmentUniversity of Cape Town,

and Lourens O. Walters.

[email protected] Software

Rondebosch

Cape Town Republic of South Africa.

Page 2: ANALYZING STORAGE SYSTEM WORKLOADS

Presentation Outline

Introduction

Motivation and Objectives

Storage Systems

Storage System Workloads

The Storage System Workload Analyzed

Statistical Methodology

Workload Analysis Results

Conclusions

Future Work

2

Page 3: ANALYZING STORAGE SYSTEM WORKLOADS

3

– specification of …– design of …– modelling of …– building of …– security of …– *workload analysis of …– correctness analysis of …– performance analysis of … concurrent computing systems (CCS).

Introduction

The DNA Group specializes, among other things, in using theory, formal methods and software tools in the:

Page 4: ANALYZING STORAGE SYSTEM WORKLOADS

Introduction (cont’d)

ANALYZING STORAGE SYSTEM WORKLOADS

4

Page 5: ANALYZING STORAGE SYSTEM WORKLOADS

Introduction (cont’d)

RP

RQPROCESSOR

ANALYZING STORAGE SYSTEM WORKLOADS

•Start Address•Operation Type•Request Size•Timestamps•Etc.

Page 6: ANALYZING STORAGE SYSTEM WORKLOADS

Motivation and Objectives

A lot of effort is being spent in improving the I/O subsystem because it is a bottleneck in current computer systems.-In design, performance and correctness evaluation of storage systems the workload modelling is an important component.

Common assumption not correct:-Uniform distribution of start addresses,-Exponential inter-arrival times.

Therefore storage system workload analysis should be done to come up with correct models.

6

Page 7: ANALYZING STORAGE SYSTEM WORKLOADS

Motivation and Objectives (cont’d)

-Designing storage systems.

-Designing I/O optimization techniques (read caching, write caching, pre-fetching, I/O parallelism, I/O rescheduling) to improve performance.

-Understanding application behavior and requirements.

-Deciding to pool storage system resources (SSPs).

-Implementing intelligent storage systems.

etc.

7

Page 8: ANALYZING STORAGE SYSTEM WORKLOADS

Motivation and Objectives (cont’d)

Our aim was to analyze storage system workloads in terms of

(a) inter-arrival times, (b) sizes and (c) “seek distances” of I/O requests

and provide statistics for these parameters to be used to:

(a) derive models for storage system evaluation and

(b) design optimization techniques (read caching, I/O parallelism etc. )

8

Page 9: ANALYZING STORAGE SYSTEM WORKLOADS

Storage Systems

Enterprise Storage System (ESS)

9

Host/Bus adapter

Cache

Array controller

Path to disks

Path to cache

Path to controller

Path to host

Disk drives

Page 10: ANALYZING STORAGE SYSTEM WORKLOADS

Storage Systems (cont’d)

ESS are powerful disk storage systems with the following capabilities:

-High performance*,

-Large capacity and availability

-Protection against physical drive failure can be provided using RAID methods.

*But can not still match the processor speeds because of mechanical processes in the disk drives.

10

Page 11: ANALYZING STORAGE SYSTEM WORKLOADS

Storage System Workloads

I/O Request Servicing and workload classification:-Logical Workloads (File System Workloads)

-Storage System Workloads (Physical I/O Traffic)

11

Operating System

File System

Application Software

Disk System

I/O request

I/O request

Page 12: ANALYZING STORAGE SYSTEM WORKLOADS

Storage System Workloads (cont’d)

Workload Parameters:

-Logical Volume Number

-*Start Address (seek distances)

-*Request Size

-Operation Type (i.e., read or write)

-*Time Stamp (inter-arrival times)

12

Page 13: ANALYZING STORAGE SYSTEM WORKLOADS

The Storage System Workload Analyzed

We analyzed inter-arrival times, request sizes, and ”seek distances” of I/O requests from a system running a web search engine deviation.

Got the I/O trace files from Storage Performance Council (SPC). (http://www.storageperformance.org)

13

Page 14: ANALYZING STORAGE SYSTEM WORKLOADS

Statistical Methodology

-Visual Techniques:

-Histogram and

-ECDF graphs.-Key Data Statistics

-Sample mean,

-Variance and standard deviation,

-Coefficient of skew, kurtosis, and variation,

-Five number data summaries (minimum, lower quartile, median, upper quartile, maximum).

-Lower and upper outlier limits

14

Page 15: ANALYZING STORAGE SYSTEM WORKLOADS

Results 1: inter-arrival times (µm)

Sample Size 1055448

Five Number Summary (126, 242, 1695, 4487, 100100)

Sample Mean 2985.761

Sample Variance 12508927

Standard Deviation 3536.796

Coefficient of Variation 1.184554

Coefficient of Skew 2.142186

Coefficient of Kurtosis 8.884555

Upper Outlier 26142

15

Page 16: ANALYZING STORAGE SYSTEM WORKLOADS

Results 1: inter-arrival times

-Highly variable data. Range (126, 100100 microseconds)

-Coefficient of kurtosis shows that the distribution is heavy tailed.

16

Page 17: ANALYZING STORAGE SYSTEM WORKLOADS

Results 2: Request sizes (bytes)

Sample Size 1055449

Five Number Summary (512, 8192, 8192, 24580, 1138000)

Sample Mean 15510

Sample Variance 102017528

Standard Deviation 10100.37

Coefficient of Variation 0.6512577

Coefficient of Skew 3.441212

Coefficient of Kurtosis 287.6503

Upper Outlier 106520

17

Page 18: ANALYZING STORAGE SYSTEM WORKLOADS

Results 2: Request sizes

Distribution peaks – 8192 (60%), 16384(10%), 24576 (9%) and 32768 (20%).

Reason:OS Filesystem Block - 8192 bytes

18

Page 19: ANALYZING STORAGE SYSTEM WORKLOADS

Results 3: Seek distances (blocks)

Sample Size 1055448

Five Number Summary (-34926160, -8581248, 6.4, 8580496, 34910700)

Sample Mean 27.95

Sample Variance 170691900000000

Standard Deviation 13064910

Coefficient of Skew 0

Coefficient of Variation 467398.8

Upper Outlier 51482656

Lower Outlier -51482528

19

Page 20: ANALYZING STORAGE SYSTEM WORKLOADS

Results 3: Seek distances

-The distribution of seek distances is symmetrical.

20

Page 21: ANALYZING STORAGE SYSTEM WORKLOADS

Conclusions

(1) Analyzing storage system workloads is necessary to properly model the workloads:-To model Web inter-arrival time, Weibull, lognormal, beta, gamma, exponential probability density functions should be considered.

-To model Web data size and seek distance using probability mass function is more appropriate.

*We intend to use the models in simulations of ESS.

21

Page 22: ANALYZING STORAGE SYSTEM WORKLOADS

Conclusions (cont’d)

(2) The analysis results are useful when designing optimization techniques of storage system. E.g.,

-Cache management block size – 8192 bytes.

-I/O rescheduling and background tasking would be ideal for the workload.

-The storage system handling the workload we analyzed can be optimized to handle the symmetrical behavior*.

*The results are not broadly applicable.

22

Page 23: ANALYZING STORAGE SYSTEM WORKLOADS

Conclusions (cont’d)

(3) Other conclusions:

-Request sizes influenced by filesystem in use.

-Seek distances are not always uniform distributed.

*In summary, we have provided statistics about the parameters for the storage system workload that we analyzed and have shown how we can use them to derive models and design I/O optimization techniques.

23

Page 24: ANALYZING STORAGE SYSTEM WORKLOADS

Future Work

-Rigorously find a probability density function matching a given data set of inter-arrival times.

- Analyze the storage system workloads in terms of other parameters (e.g., logical volume numbers and operation types)

24

Page 25: ANALYZING STORAGE SYSTEM WORKLOADS

THANK YOU FOR YOUR ATTENTION!

?

25