Data Storage Performance – Equating Demand and Supply

Data Storage Performance – Equating Demand and Supply

Lalit Mohan

EMC Proven Profesional Knowledge Sharing 2009

Lalit MohanPMP, ITILV3 (Found.), CIRM, CPIM, EMCTAe (BC), MCSE (W2K), ICWAI (inter), B.Eng.(Hon.)Senior Solutions Architect

[email protected] Computer Systems (South Asia) Pte Ltd

2009 EMC Proven Professional Knowledge Sharing 2

Table of Contents

Executive Summary ................................................................................................................................. 4 Abstract .................................................................................................................................................... 5 Introduction .............................................................................................................................................. 6 Essential Terminology.............................................................................................................................. 8 Characteristics of ‘Demand’................................................................................................................... 10 ‘Supply’ Capability ................................................................................................................................ 11 Demand-Supply Framework (DSF) ....................................................................................................... 12 Improving Performance Capability........................................................................................................ 14 Example Case Scenarios ........................................................................................................................ 18

Case Scenario 1: UNIX® Transactional Workload ........................................................................... 18 Case Scenario 2: Windows® Messaging Workload .......................................................................... 21 Case Scenario 3: Mainframe Mixed workload................................................................................... 24

Recommendations in conclusion............................................................................................................ 25 Assumptions impact and remedy ........................................................................................................... 26 Limitation with improvements ............................................................................................................... 27 Author Biography................................................................................................................................... 29

Disclaimer: The views, processes or methodologies published in this compilation are those of the authors. They do not necessarily reflect EMC Corporation’s views, processes, or methodologies


Disclaimer: The views, processes or methodologies published in this compilation are those of the author. They do not necessarily reflect EMC Corporation’s views, processes or methodologies. List of Figures Figure 1: Characteristics of Demand....................................................................................................10 Figure 2: Supply Characteristics ..........................................................................................................11 Figure 3: Demand-Supply Superimposed ............................................................................................12 Figure 4: Demand-Supply Framework (DSF).......................................................................................13 Figure 5: DSF Pay-Off Tabular Representation ...................................................................................14 Figure 6: Building Change Strategy using DSF ...................................................................................16 Figure 7: DSF for ERP based Vendor Management Application .........................................................18 Figure 8: DSF for RDBMS based Telecom ’Value-Added Services’ Application .................................20 Figure 9: Healthy scenario DSF for Messaging Application.................................................................22 Figure 10: Unhealthy scenario DSF for the Messaging Application.....................................................23 Figure 11: DSF for Mainframe Mixed Application ................................................................................24 Figure 12: Recommended Processes using DSF ................................................................................26


Executive Summary Based on the Scarcity principle used in Economics, we cannot have more of everything; we must

make choices. The current climate of austerity is forcing businesses to reduce cost. Caring

businesses are choosing to aggressively cut non-people costs first. Achieving the same or even more performance from existing or scaled-down IT infrastructures, including Storage supports

this effort by freeing-up capacity to bear people cost.

Making this happen requires intense customer engagement. This is a guiding principle proposed in

2009 [1] for scoping, planning and implementing the customer’s strategic transformation and

acquisition initiatives. In this article, “demand-side” is the workload that business activity places on

the IT storage infrastructure; “supply-side” represents the infrastructure’s capability to meet the

former. Procurement, which should include the effort to match the two, is unintentionally restricted to

technical and commercial supply-side aspects, ignoring the demand-side. This approach reduces the

chance of achieving an ‘economically efficient’ outcome.

This article introduces the Demand-Supply Framework (DSF) theory, a 4-cell-matrix that enables

businesses to effectively measure the performance of their current Storage infrastructure in context of

their business activity. This enables informed choices of strategic initiatives with ensuing benefits, e.g.

saving money, improving efficiencies and returns on investment, reducing risk etc. [2].

Shared service centers, virtual or physical, within or [if possible] across the organizations internalize

positive externalities that otherwise are lost in duplicated facilities. This leads to under-production

of demand-servicing capability. DSF visually illustrates the potential of serving more business

demand with the same or less infrastructure.

"There is nothing so practical as a good theory."

--Kurt Lewin


Abstract Information Technology performance is the cumulative outcome of a number of individual

components, including storage. When the duration of storage processing is proportionately long,

visualize the workload generated by end-user computer systems as ‘Demand’, and the processing

service provided by the data storage system as ‘Supply’. How well supply meets demand defines the

‘quality of performance’ experienced by the business.

Matching projected demand with capability to supply drives the selection and design of data storage

components. Such a solution would operate at an optimum level, with demand equaling supply. In

this article, we apply the demand-supply analogy to building a 4-Cell matrix using appropriate data

storage domain performance characteristics as proxies to represent demand and supply.

This article will develop this 4-Cell matrix to capture and represent the performance of data storage

solutions with reference to several information technology infrastructure solutions, for example,

‘Messaging’ and ‘Enterprise Resource Planning’ applications in an open systems environment and

mainframe host applications in its proprietary environment.

We will review:

• Relevant terms and definitions

• Characteristics of ‘Demand’ placed on data storage components

• ‘Supply’ capability of the data storage component

• Combining demand and supply into the working framework

• Options for improving performance capability

• Case scenarios to illustrate key points

• Conclusion and recommendations

• Assumptions, impact and remedy; limitations with improvements

You (personnel responsible for evaluating, procuring and implementing) will benefit from this article by

learning how to better plan and design optimum data storage infrastructures to support centralization

of business information assets into efficient shared service centers. This is a necessity in the current

financial climate. This aggregation may produce positive externalities that enhance the value of

information to management, improving return on investment (ROI).


Introduction The purpose of this article is to evaluate the performance of storage as a sub-system in a typical,

complex, and contemporary IT infrastructure landscape. We will use a 4-cell matrix framework

capable of classifying the sub-system into four categories based upon the patterns observed in the

performance data. These classifications will guide the development of change strategies. A brief

description of each section follows.

Business Proposition as outlined in the Executive Summary :

‘Demand-Supply Framework (DSF)’, an empirical framework, is designed to allow businesses to

derive more benefits from existing or reduced IT storage infrastructures. It is expected to provide

additional monetary resources to invest in existing staff. DSF classifies the storage sub-system based

on performance in context of the enterprise workload. The outcome supports performance

improvement initiatives and cost-benefit analysis to evaluate the investment decision.

Essential Terminology :

At the very beginning, this article describes the terminology used to develop this framework. A

common language is necessary to accomplish the stated objective. The purpose of DSF is introduced

to graphically represent the real, current, and future state of storage sub-system performance.

Characteristics of ‘Demand’ Our contemporary knowledge pool is dominated by ‘supply-side’ information on the storage sub-

system (i.e. capabilities, performance benchmarks and comparisons). This article attempts to

enhance the use of this knowledge by introducing typical, real-life characteristics of demand using

representative performance data. This data can be captured easily using existing system tools and

programs to add the ‘demand-side’ perspective, enabling economical and efficient choices.

‘Supply’ Capability:

Supply characteristic of a storage sub-system are a well treated subject in reference material

available from academic and industry sources. This body of knowledge is also relevant to the

framework proposed in this article. Supply has an implicit connotation of quality. “Total Device

Response Time’, measured in milliseconds, is the most commonly used unit of measurement for

transactional or mixed workloads.


Demand-Supply Framework (DSF): Demand for IT workload processing and the infrastructure’s supply capability are interrelated. This is

the cornerstone of the hypothesis developed in this article. Operational and investment efficiency is

achieved only when demand equals supply (equilibrium). It may not be practical to develop a closed-

form model for matching the two. DSF based upon data from existing tools and programs can help

enterprises discover their current position within the equilibrium.

Improving Performance Capability:

Knowledge of your current position is a powerful motivator to envision the future, desired position.

This vision leads to realizing the objective. DSF is a key tool to assure such transitions. It facilitates

correct interpretation of the large and complex performance dataset, spelling the difference between

success and failure. This section uses a case scenario methodology for identifying deficiencies and

alleviating them.

Example Case Scenarios: Case Scenario 1: UNIX-Transaction Workload Case Scenario 2: Windows® Messaging Workload Case Scenario 3: Mainframe Mixed Workload No two IT Storage workloads are identical, but their constituent types repeat across businesses.

Workload is generated when businesses serve their customer online transactions. Batch-jobs

workload is generated as the data is processed,. People must communicate for the organization to

function, generating messaging traffic. The case scenarios in this article are based upon field

experience with these important storage workload-generating functions in a contemporary enterprise.

Recommendations in Conclusion:

The conclusion stresses the importance of a clear and objective system to measure and compare

performance of complex options and their combinations. A brief process for using DSF to achieve

economic efficiency is proposed.

Assumptions impacts and remedy, limitations with improvements: Following the above, key assumptions are explained along with their likely impacts. Subsequently,

suggestions are made to overcome the effect of these impacts. This is followed by known limitations

and probable improvements for mitigating them.

The article concludes with a brief author biography.

Keywords: demand, supply, knowledge, Business, performance, framework


Essential Terminology Here are definitions of the terms used in this article. I hope to eliminate misunderstanding as often

the same terms are used to mean different things.

Access Density: Access Density is the measure of storage performance per unit of capacity for a single disk drive unit.

It is computed by dividing a disk drives’ storage capacity by the throughput in IO/sec it is capable of

supporting with reasonable response time [3].

Demand-side’ view of storage infrastructure: It is the business activity placed on the IT storage sub-system. There are a variety of business

functions in an enterprise that use different applications and generate different workloads. IT storage

arrays service this aggregated heterogeneous workload, arriving at an objective basis to match the

storage arrays’ capability with the aggregated heterogeneous workload.

‘Economic-efficiency’ and DSF: Economic-efficiency is the state of operation when differing business applications are receiving their

respective expected ‘quality of performance’ from the storage array that processes their aggregated

workload. Both ‘Supply-side’ and ‘Demand-side’ considerations are needed to match the two. This

matching is embodied in DSF, a 4-cell matrix that depicts the storage array’s current and future state

of performance.

‘Game in strategic form’: It is the tabular presentation of the possibilities between two variables; it lists ‘pay-offs’ for each

possible outcome and used in the application of Game Theory, a branch of applied mathematics.

Going-wide-before-going-deep: A disk drive is a mechanical device, subject to the mechanical laws of inertia. Hence, any application

dataset should be spread over as many disk drives as possible for better performance. However, the

number of disk drives in any storage array is finite so a very large dataset may overlap even after

spreading on all the available disks ( i.e. going deeper on an existing disk drive). This is referred to as

‘Going-wide-before-going-deep’.


Hyper: Storage arrays typically supply large physical disks,. Currently, disks up to one terabyte are available

commercially. These drives are sliced into smaller pieces before they are assigned to individual host

computers in the process of configuring the storage array for use by multiple host computers. Host

computers see these slices as virtual physical disks; these slices are called ‘hypers’.

IO/Sec: An input/output (‘IO’) is a request made by the software program running on the host computer to the

storage device. It is characterized by size, typically in kilobytes or megabytes; and nature sequential,

(when consecutive IO is located adjacent to each other) or random (when every next IO is located

anywhere on the disk away from the preceding). The number of IOs completed in one second is

expressed as IO/Sec.

Progress-path: Locus of a request in a computing infrastructure made-up of a series of components and sub-

components that are traversed (e.g. software programs & processes, data structure abstractions or

constructs, caches, firmware and hardware etc.) until the request is complete.

Quality of Performance: Quality of performance in this article refers to the response time for a transaction or set of

transactions in context with the expectation. Performance is a relative concept, in context to the

quantity and nature of the workload processed. Good performance does not always mean

exceptionally low transactional or batch response time in complete disregard to the quantity and

nature of the workload. Rational Ignorance: Deliberate state of incomplete knowledge, maintained because obtaining the missing piece is too

costly when compared to the benefits of possessing such knowledge.

‘Supply-side’ view of storage: It refers to matter related only to the capability, features and functionalities of the storage array. It may

include performance information of the storage array when subjected to only artificial test workloads.

Response Time:

It is the processing time spent by the transaction in a sub-system, or the time interval between two

successive requests submitted to a purely sequential processing system or sub-system.

Contemporary information technology is capable of concurrently accepting and processing multiple

requests. Average performance of the system or sub-system benefits as a result; individual

transaction response time may increase.


Characteristics of ‘Demand’ The processing time spent by the transaction in a sub-system is its turn-around time; it can be

considered ‘Response Time’. The shorter the response time, the more workload that can be

presented to the sub-system for processing and vice-versa.

Imagine yourself working on your notebook. Under-configuration gives a lagging experience where

the system is struggling to keep up, and you may find yourself restlessly waiting to enter more

commands for processing. In effect, you are not able to realize your full potential.

As you enhance the system and it ambles along more quickly, you are able to present more work for

processing in the same amount of time until you reach the maximum rate of presenting work to the

system (determined by the nature of your work, and your dexterity among other factors).

It thus makes sense to enhance the system until its coping rate matches you workload presentation

rate - the optimum configuration. At this point, your investment in the system is most efficiently

utilized. Any system configuration short of the optimum results in under-utilization of your potential;

configuration beyond the optimum reduces the rate of return on your investment.

Figure 1: Characteristics of Demand


The ‘Law of Demand’ is a well known tool in an economist’s toolkit. It is proposed to be applicable

even in computing environments. Shorter transaction processing time (a proxy for price paid) yields

an increased ability to present work for processing (a proxy for quantity consumed), see Figure 1:

Demand Characteristics

The Demand-Supply based approach embodies the concept of workload elasticity, where more

workload is available as the sub-system is able to process it in less time and vice-versa. This is

missing in the traditional ‘requirement’ based approach.

Accurately measuring and determining the proxy demand curve may not be practical, easy, or even

feasible. Hence, the concept of rational ignorance (ignoring such an analysis) may be justified from

an individual demand perspective.

For an individual, the cost of analysis may be too much, and advantages few. Enterprise advantages

could far outweigh the cost of rational ignorance namely, idle capacity or lost opportunity. However,

the determination of such a proposed proxy demand curve for an enterprise may not be feasible. The

4-Cell matrix framework developed in this article overcomes this difficulty.

‘Supply’ Capability The storage sub-system appears to perform according to economist’s ‘law of supply’, the longer the

transaction processing time (a proxy for price charged) tolerated by business activity, the more

workload that can be processed (a proxy for quantity supplied).

Figure 2: Supply Characteristics


Figure-2 sketches a typical storage sub-system performance trend. The curve slopes up gently until

point ‘C’ which is often referred to as the Knee of the curve (Systems personnel also some times refer

to this as a ‘hockey-stick’ curve due to its shape). Beyond this point, the curve rises asymptotically.

The relatively flat and gently rising portion of the curve is the operationally useful part, where the

storage sub-system will typically operate. This typical behavior can be predicted or verified by

modeling storage as discrete systems [3].

The law of supply approach also recognizes elasticity in supply. If the applications are more tolerant

of response from the storage sub-systems, they can service more requests.

Demand-Supply Framework (DSF) The ultimate performance experience is achieved when ‘demand’ and ‘supply' are perfectly matched.

Simple as it may appear, the quantitative value of ‘demand’ and ‘supply’ is determined by the complex

combination of a large number of variables. A closed-form analytical or empirical model or simulation

requires complex mathematics and considered assumptions rendering the simulation with far less

fidelity then desired.

DSF, a 4-Cell Matrix, is designed to harness the abstraction of the Demand-Supply concept into a

more easily understood and easy-to-use format. All the variables are hypothesized to roll-up into two

camps, ‘Demand-side’ and ‘Supply-side.’ These two camps counter-act predicting optimum

equilibrium that shifts based on dynamic changes caused by known and unknown variables

DSF provides a quantitative view of how efficiently the storage sub-system meets specific enterprise

workloads. A useful picture emerges once this framework is populated with the measurements from

actual or similar operations, revealing whether the sub-system is over, under, or optimally utilized.

Figure 3: Demand-Supply Superimposed


Figure-3, superimposes the characteristics of the demand and supply developed earlier. The region

around the intersection of the two curves estimates the zone of efficient operation. DSF provides an

empirical approximation to evaluate if the current storage sub-system is operating efficiently. As

illustrated in figure-4, a four quadrant frame is superimposed upon the graph. The four quadrants are

represented using the alphabet. An explanation for each of the four quadrants is presented following

the next paragraph.

The choices for acquiring infrastructure components are strategic in nature. Choices made by

infrastructure planners are contingent upon the aggregate effect of actions of multiple business users,

often acting independently. The development of DSF is in part inspired by the tabular representation

of scenarios as a ‘game in strategic form’ drawn from ‘Game Theory.’ The four discrete quadrants,

illustrated in figure-4, categorize the pay-offs of the four possible combinations of actions. These pay-

offs are illustrated in figure-5 below.

Figure 4: Demand-Supply Framework (DSF)

Quadrant ‘A’ is the ideal zone of operation. The cross-point of demand and supply curves is centered

in this quadrant. From a Theory of Constraints (TOC) perspective, a system operating in this region

has no constraints [4]. In other words, all the sub-systems are well matched and operating at the

same rated capacity. Our ‘Goal’ [4] should be to configure and load a system to operate in this zone;

the other quadrants have a mismatch of demand and supply.

Operation in the ‘B’ quadrant indicates over-utilization where demand overwhelms the ability of the

storage sub-system to service the workload. Operation in Quadrant ‘C’ is of concern since the storage

sub-system struggles to service the workload even at low demand. Quadrant ‘D’ is a clear case of

under-utilization of the storage sub-system. The infrastructure owner is not working invested dollars

hard enough, sometimes intentionally, e.g. when an SLA is strict and ambitious.


Improving Performance Capability Progressing in our journey towards understanding system performance, identifying deficiencies and

choosing appropriate remedies, DSF is developed and used in more detail.

From a financial perspective, quadrant ‘A’ operation indicates that the dollars invested are being used

well. Quadrant ‘C, and ‘D’ imply a lower return on investment. Quadrant ‘B’ is pushing the return in

investment to its limit; end-user quality of performance may suffer and the system would not be able

to absorb a spike in the operation.

Figure 5: DSF Pay-Off Tabular Representation

How are the quadrants drawn? Referring to figure 5, the horizontal middle line for the 4-Cell matrix,

based on field experience, is proposed at a fixed level of twenty milliseconds. The vertical middle line

divides the workload into two groups of eighty-twenty percent of the total workload respectively. It

may be chosen at different values for different datasets or specific scenarios. These numerical figures

are guidelines; you may use the values of these variables best suited to your scenario.

DSF is based on the observed pattern for a variety of application environments against the desired

outcomes. Specifically, a desired outcome is the customer perception of ‘quality of performance’,

supported by data. It is a proposed measuring device to categorize and evaluate a computing

environment so the appropriate remedial response can be implemented.


The horizontal-axis is the independent variable; workload is modeled as ‘demand’. The proxy for the

storage world may vary depending upon the environment being studied. A common independent

variable is data ‘throughput’ to the storage system measured in terms of Inputs and Outputs per

Second (IO/Sec), or Megabytes per Second (MB/Sec). The Percentage Utilization (%Utilization) of

the storage sub-system or its components is another example.

Other variables may be used as proxy for the ‘demand’ workload. For example, in Exchange® you

might use messaging environment performance counters such as ‘RPC operations/sec’. This counter

is the direct measure of aggregate workload handed to the Exchange® servers by the clients Office

Outlook® in MAPI® mode. This workload drives the Exchange® Information Store service on the

Exchange® Server, which accesses storage to fulfill client requests.

The vertical-axis almost always represents the transaction processing time or device response time in

milliseconds. This can either be for a batch or for individual transactions depending upon the study.

The choice of variables may depend on the availability of data gathering tools and collectable data..

The choice is often made based on available information. The key consideration is the analytical

establishment of a definite relationship between the independent variable represented on the

horizontal-axis, and the dependent variable on the vertical-axis to ensure a sufficient-cause

relationship [4]. There may not always be a one-to-one relationship between these two. Other

variable(s) may be involved and have significant influence; these must be kept constant if the

environment permits their control. At the least, they must be measured and recorded so a reasonable

estimate of impact can be made to arrive at an objective conclusion.

Selecting proxy variables is both science and art. It is a science as it draws upon the knowledge of a

cross-section of technologies operating at various layers that work together, namely, software

application, operating system platform, Ethernet network, storage network and the storage array

layers. The best way to arrive at high fidelity proxy variables is to engage subject-matter experts

whose experience can populate technology mechanisms with values or quantities through

observation of infrastructures. This knowledge of what works and what doesn’t is invaluable in

deciding on the proxy variable that can provide a true picture of the system under scrutiny.

Once the measured variables for the system are plotted on the DSF, the current-state of the system

(Quadrant-A, B, C or D) becomes known. The enterprise can then devise a strategy for getting to the

recommended Quadrant ‘A’.


Figure 6: Building Change Strategy using DSF

Quadrant ‘A’ represents a zone of operation where the enterprise is efficiently using its invested IT

dollars. They only need to monitor for business growth and increasing transactions, which will almost

certainly increase system workloads. A high reading for the independent variable represented on the

horizontal-axis (e.g. high utilization) is not in itself a sufficient cause for alarm as long as the

dependent variable, the outcome (e.g. transaction processing time), is within reasonable limits as in

Figure-4.

A system operating in quadrant ‘B’ is under stress. Demand for service is overwhelming the ability to

perform. An enterprise must plan to move along the blue (dotted) arrow leading to quadrant ‘A’. There

are many possible options to make that change. An enterprise may work on the demand or the supply

side depending on the circumstances. Reduction or diversion of the workload is a possible demand-

side tactic. There are many supply-side options, ranging from the purchase of new capacity to using

cache technology to improve the system’s response. For instance, an outright replacement or

capacity upgrade may not be plausible in the current financial climate. Hence ‘caching’ technology

can be deployed at various interfaces along the progress-path of the transaction from application to

the storage. At the computer server layer, this has become a very feasible option. As memory prices

plunge, hosts increasingly have large physical and virtual memories. Progressing to the application

layer, caching may be enabled across the network in the application based upon the client-server

architecture, currently a dominant architecture in IT landscapes.


Running Outlook® MAPI client in cached mode is a well known example supporting this suggestion.

This results in asynchronous system operation that insulates user actions and experiences from the

latencies in the overall system [5]. At the storage level, there is a large multi-player industry devoted

to designing, developing and marketing Intelligent Cached Disk Arrays (ICDA). These were

practically non-existent fifteen years ago outside the Mainframe host computer domain. These

devices enhance storage performance considerably by exploiting fast, electronic memory combined

with a bank of slower mechanical disk devices.

A system operating in Quadrant ‘C’ is a cause for concern. Even in the absence of considerable

workload quantity, quality is not the independent variable outcome. There is a serious gap in the

design or configuration of the system that needs troubleshooting for the system to progress along the

blue (dotted) arrow to quadrant ‘A’. The root-cause may arise from many of the components in a

system due to system complexity.. A typical dilemma at the storage layer is caused by the disk drive

technology trend causing ‘Access Density’ [3] to fall progressively. The tendency is to purchase a

number of drives based upon the useable storage capacity desired by the enterprise. As the low

capacity drives (i.e. 36 gigabytes (GB), 73 GB and soon 146 GB sized disk drives) disappear from the

supplier’s price lists due to technological obsolescence, enterprises purchase fewer disk drives for

more GB. This dilemma is exacerbated by the decrease in IT budgets. Spreading the data by using

knowledge of the workload and the access pattern can help in such circumstances. However, the

dynamic nature of the access pattern in any enterprise raises the challenge and reduces the potential

to benefit from such intelligent but static one-time data spreading.

Here is a word of caution about the performance data generated by tools and programs in the various

computing environments including Mainframe host computers. At a low workload, high response

time is often seen purely due to statistical aberration. Statistical averaging is used extensively in the

reported data for typical tools and programs; this averaging over lower workload can and does result

in higher averages when compared to higher workloads. Due to this fallacy, the system falling in

Quadrant ‘C’ must be scrutinized more closely before arriving at a strategy to move it to the quadrant

‘A’. Using ‘weighted average’ in place of pure ‘averages’ is one approach to address this fallacy.

Quadrant ‘D’ is an under-loaded sub-system. From a financial perspective, the enterprise is utilizing

its investment fully. However, an enterprise may have a policy to restrict the level of system utilization

to maintain the quality and level of service. If this policy does not exist, the enterprise can devise a

strategy to move along the blue arrow to quadrant ‘A’. Once again, changes can be on the demand or

the supply-side. On the demand-side, the enterprise may increase its service levels to the end-users

or divert more workload towards the sub-system by other inviting other business units or external

parties. On the supply-side, the enterprise may scale down the sub-system to recoup or free its

investments.


Example Case Scenarios

Case Scenario 1: UNIX® Transactional Workload These application environments are characterized by read predominant random access of

information. The end-user experience is driven by the response time of individual transactions.

Consequently, the instruction processing components of the transaction progress-path is stressed

more then the capability of information transmission buses. The performance measurement in such

environments is IO/Sec as the independent variable, and the Total Device Response Time in

milliseconds as the dependent variable.

a. ERP based ’Vendor Management’ application: The performance data below represents a UNIX® server, running ERP based Vendor management

application in DSF format is plotted below in the Figure-7.

Figure 7: DSF for ERP based Vendor Management Application

Each ‘▲’ denotes the throughput IO/Sec versus the total device response time for each physical

device as seen by the server during the chosen interval. These devices viewed as physical disks by

the computer host are actually hosted as ‘hypers’ on the disk drives in a high-end storage array. The

measured IO/Sec is the average of the ten minute intervals selected during the peak period. This

picture highlights the distribution of workload among the various devices available to the application.

The nature of the workload and the spread of the application information are such that during the

selected interval, close to ninety percent of the workload is utilizing only four devices. This is a typical

profile for open systems host computers. These devices are still able to respond to the requests well

below the recommended measurement of twenty milliseconds. The current application end-user

experience in this enterprise was rated satisfactory.


In spite of the current satisfactory performance, a DSF approach can offer useful insights. It

impresses the importance of spreading the information ‘going-wide-before-going-deep’ on the

available devices.

In the current scenario, there are still application information devices with information tables (refer to

figure-7, label ‘a’) that are living in DSF quadrant ‘A.’. Not much of the workload traffic is finding its

way to these devices. As a result, higher response times are not reflected in the overall end-user

experience. Based on the nature of information and the transactions, business activity may shift,

channeling increasing amounts of workload traffic on these devices. In such a scenario, the end-user

experience will deteriorate, adversely affecting any formal and informal inter-department service-level

understanding. Capitalizing the predictive nature of DSF requires collaboration among the business,

application and infrastructure teams.

Additionally, the internal devices of the computer host housing the UNIX operating system are also in

quadrant ‘C’ (refer to figure-7, label ‘b’). This is not entirely unexpected as modern virtual memory

management operating systems access the operating system hosting drives extensively. Any

technique to improve the access response, e.g. using more drives in mirror protection mode, would

provide performance benefits.

Conclusion: The Vendor Management application is experiencing good storage performance. The enterprise is

getting good returns for dollars spent on the infrastructure with room to scale. However, there are

caveats regarding the nature of work load stressing the drives that are already exhibiting high device

response time; operating system activity could become a bottleneck.

b. RDBMS based Telecom ‘Value-Added Services’ application: ‘Value-added services’ application processing is dominated by transactional workload. Representative

performance data from the UNIX® based server running this application is plotted in figure-8. This

figure has two panes to show the performance picture ‘before’ and ‘after’ the introduction of a

technology upgrade to a mid-range storage array. This case scenario demonstrates the comparative

way to use DSF.

This plot depicts the impact of the technology upgrade only, as only this variable was changed in the

computing infrastructure. This data can be used to baseline the environment for a sensitivity study,

isolating the benefit resulting from a change in a single variable, e.g. new technology, and facilitating


efficient sizing of the environment. Typical examples of the variables that could be changed for the

purpose of a sizing study include disk drive organization for the Logical Units (LUN), file-system

organization at the server, operating system configuration, version or patch level changes, application

technical configuration, business activity changes resulting in quantity and quality of the workload

etc., in addition to any hardware change in the system. More than one variable usually changes in an

infrastructure shift during a real-life infrastructure transformation. The analysis below has

considerable value in helping us to understand the demand and supply view for an enterprise

facilitating economically efficient choices.

Figure 8: DSF for RDBMS based Telecom ’Value-Added Services’ Application

Refer to figure-8 above. The bottom pane depicts the DSF picture of the host computer before the

storage technology upgrade. The groups of storage devices marked ‘a’ carrying the bulk of the

workload are shown to be bordering the boundary of the quadrants ‘B’ and ‘C’. As described earlier,

this is undesirable. A slight upward swing in the workload could drift into ‘B’ and ‘C’ quadrants

providing higher total device response time and degrading application performance.

The DSF in the top pane depicts the situation after the technology upgrade is complete. Observe the

same group of devices as seen earlier, now marked ‘b’. The improved outcome is apparent. Not only

the total response time of the busy volume has come down, these volumes are actually processing

more workload concurrently. As there is no application or data layout or distribution changes, this

concurrency always existed in the workload but was previously suppressed. It is now leveraged by

the newer technology at the storage array level in the ‘after’ scenario. These devices are operating

well inside quadrant ‘A’, which may be the desired scenario.


In addition to the above observation, a large number of devices can be seen in quadrant ‘D.’ Their

current payoff is ‘under-loaded’ as in figure-6. If it is possible to introduce changes at the operating

system or application levels to restructure the data layout, it may be possible to utilize the capacity of

these devices to service more workload. The devices lying in quadrant ‘C’ may be studied for the

cause of their unacceptable performance and remedial strategies implemented. However, typical

devices in this quadrant are those that host the operating system swap space. The remedy may be to

use as many drives in the mirrored mode as possible.

Conclusion: It can be inferred and confirmed that the system depicted in figure-8 has achieved a service

workload capacity boost, justifying the dollars spent on the upgrade. The devices that take the bulk of

the workload from the database are less stressed after the storage upgrade. Hence, the system is

able to accommodate more transactions that may be the result of increased business activity due to

additional users or the addition of value-added services. All of this can be achieved without

deteriorating the existing, committed service levels.

Case Scenario 2: Windows® Messaging Workload

‘Messaging’ application: Messaging applications are typically implemented as two-tier client-server architectures. The local

area network links the two distinct tiers, ‘client’ and ‘email server’. This network is a key component in

determining the performance experience of the Messaging system. In this article, we focus on a

dedicated email server located in the server-tier of the setup. This server is attached to a storage

array hosting the message store of the Messaging system. We observe the performance of the

storage array as it supplies workload demand created by email traffic during a typical workday from

8.00am to 8.00pm. See figure-9.

The proxy indicator for the level of user activity that the mail server can successfully process is the

number of RPC Operations completed by the mail server every second. For Case Scenario 2, this is

the driver or the independent variable that generates the workload on the storage array, i.e. proposed

proxy for demand. As usual, the dependent variable is the device response time of the drive hosting

the message store. The workload on the storage array is a balanced mix of read and write. The size

of the transaction is usually small, approximately four kilobyte.


The four-quadrant DSF framework is superimposed on the mail server performance chart in figure-9.

However, there are differences in the way the DSF has been plotted in case scenario 2. In case

scenario 1, each data point on the chart corresponded to the device response time of a physical

device accessed by the host computer from the storage array. In case scenario 2, each data point

corresponds to the ten minute average of the drive hosting the message store, providing a temporal view. This logical drive at the operating system level is carved out of multiple slices taken from many

physical disks housed in the storage array.

Figure 9: Healthy scenario DSF for Messaging Application

People begin to access their email accounts concurrently as the workday begins. This translates into

slowly rising workload on the array illustrated by the grey line marked by ‘(i) 8.00m – 10.00 am’ in

figure-9. The storage array meets the increasing demand at an almost linearly increasing response

time of about twelve milliseconds. This is represented by the data points with symbol ‘Δ’. From the

DSF perspective, the system moves from the low usage quadrant ‘D’ to the edge of the efficient

operation quadrant ‘A.’

As the workday reaches full swing, random concurrent access by multiple email users fills up the

queues in the various components of the email system including storage and its sub-systems. This is

illustrated by the dark grey ellipse marked by ‘(ii) 10.00am – 12.00pm’, enclosing the data points

depicted by the symbol ‘■’. DSF’s state of the system hovers around the boundaries of quadrants ‘A’

and ‘D’. However, the maximum device response time never rises above a maximum of seventeen

milliseconds.


After the lunch hour, interactions mature and the randomness in email access moderates even

though throughput is generally higher, and queues are relatively relieved. As a result, device

response time reduces. This is illustrated by the dashed light grey ellipse marked by ‘(iii) 12.00pm –

6.00pm’, enclosing the data points depicted by the symbol ‘♦’. This time the DSF state of the system

hovers largely in quadrant ‘D’, bordering on efficient use of the storage system. The maximum device

response time stays close to fifteen milliseconds.

As the evening advances, email users logout from their email accounts, resulting in a progressive

workload decline. The storage array processes this reduced workload with lower device response

time. This is illustrated by the dark arrow marked by ‘(iv) 6.00pm – 8.00pm’, enclosing the data points

depicted by the symbol ‘○’. The DSF state of the system slips into the quadrant ‘A.

Conclusion: System transitions from one state to another during the course of the same day reveal their dynamic

footprint on the storage infrastructure. DSF is capable of capturing and displaying a large quantity of

time-variant data in a meaningful way. The system depicted here stays within the confines of

quadrants ‘A’ and ‘D’ throughout the day. This is a healthy state picture of performance. In this

scenario, figure-9 depicts the pattern of the busiest days for the enterprise, hence this email server

can be safely considered to host additional users’ mail accounts. The recommended strategy to

increase ROI is to add users in a phased approach punctuated with taking a DSF snapshot.

Figure-9 illustrates a healthy scenario, whereas figure-10 shows an undesirable situation for

comparison. The period labeled ‘e’, is likely to provide email users with slow and unacceptable

response experiences. To resolve this issue, the system manager may transfer some mailboxes to

reduce the workload on this system, or upgrade the system hardware or storage.

Figure 10: Unhealthy scenario DSF for the Messaging Application


Case Scenario 3: Mainframe Mixed workload The combined effect of increasing access density and squeezing budgets has pushed infrastructure

planners to choose storage sub-systems with fewer large drives. This trend leads to anxiety as they

are aware that storage performance is proportional to the number of drives,. Though the increase of

individual physical disk performance has lagged, the rate of increase in its access density,

electronics, and firmware in the storage arrays that mediate access to these newer larger drives have

taken enormous strides in supporting improved performance. Even though improved performance is

expected, data is necessary to support such choices. DSF is the suitable choice for processing and

presenting data for these before-after comparisons. It provides planners and their sponsors with

sufficient proof to support their choices. It will even point to changes that may have taken place in the

IT storage environment that they may not be aware of, as illustrated in this case scenario.

Mainframe based ‘Customer Promotion’ application in Telecom Industry: In this case, DSF is used to validate the improvement before and after a storage array upgrade in a

predominately read workload environment. As many variables as possible remain unchanged in the

mainframe environment. The study data pertains to the same business-day of the week and the peak

business time intervals of the day. Application and volser configuration are frozen. As the number of

disks in the new array was reduced, the ‘volser’ were re-relayed out on the reduced number of disk

drives. This was done on the basis of the number of read-miss workloads for the respective volser in

the historic performance data. Volser with high read-misses were spread on to as many different

physical disks as possible. Between the before and after scenario, the storage array and the layout of

the volser were the only variables that were expected to change in addition to the older and newer

generations storage arrays.

Figure 11: DSF for Mainframe Mixed Application


The performance data taken from the RMF output was plotted in the DSF format shown in figure-11.

Each data point is a representative ten minute average for the peak business activity interval, total

device response time for the production volsers. The symbol ‘○’ represents the before scenario and

the symbol ‘▲‘ represents the after scenario. From the DSF ideology perspective, it may be observed

that the ‘Customer Promotion System’ both before and after was operating well within the

recommended quadrants with capacity to accommodate growth. However, the improvement is also

clearly evident. 90% of the workload lies in a much tighter rectangle and more inside the ‘A’

quadrant, indicating the ability to scale in response to increasing workload. This is a testimony to the

technological improvement in the hardware, firmware and their integration in the newer generation of

arrays. Other considerations, e.g. technological obsolescence and reduced reliability of the older

equipment may be the driving factors in initiating such an upgrade. The end-user application

performance experience is still a key criterion. We have to prove that it was unchanged or improved

after the change was complete.

Conclusion: It is evident from the DSF plot shown in figure-11 that the newer system, in spite of the reduced

number of larger drives, demonstrated increased capacity for potential growth while preserving the

current user-experience. This data supports an upgrade. These are dollars well spent on the

infrastructure to support business growth. The storage can accommodate added use by end-users

without adversely impacting their experience.

Recommendations in conclusion Real-life infrastructures are complex. As stated in the abstract of this article, ‘IT performance of

Business transactions is the cumulative outcome of the performance of a number of individual

components, including storage.’ Traditional performance evaluation models and methodologies are

challenged to provide a cumulative picture of performance, given the complexity of contemporary

infrastructures. Consequently, they require a large number of variables, some of which may be

unknown and unanticipated, to reasonably describe performance.

With the DSF approach, as long as the primary proxy variables representing the Demand and Supply

are correctly identified in the system, observations and predictions are an empirical representation of

the current or changed state of the system. Large numbers of variables related to e.g. technology,

configuration and business factors, even the unknown ones, roll-up either to the Demand-side or the

Supply-side. They provide a practical approach for a scenario too complex to analyze using

simulation or closed-form models or representation [3].


Figure 12: Recommended Processes using DSF Figure-12 describes a process that may be conducted ex-poste or ex-ante. First, describe the change

and quantify it. Second, evaluate the impact of planned or unplanned changes to your infrastructure

including an analysis of whether the change(s) will impact the Demand-side, Supply-side or both.

Complete the analysis with a cost-benefit analysis. Finally, complete the process with an accept or

reject verdict. Adopt this model as the default to operate at or near an economically efficient

infrastructure.

The supply-demand framework discussion fares well with senior managers and business personnel in

customer organizations. This is important and helpful as they decide when and if to purchase.

Assumptions impact and remedy

Controlled Change:

An important assumption in plotting DSF is that every variable other then the two primary proxy

variables are unchanged in the ‘before’ and ‘after’ data collection period. The real-world customer

environment is dynamic in all the layers and sub-layers of the computing infrastructure facilities; this

assumption may be only rarely valid. The impact is that the DSF picture would show the effect of

unknown or unplanned changes.


The strength of the proposed DSF approach is the ability to capture complex changes in a real-world

infrastructure. The inability to effect a controlled change can be addressed by the logical cause-effect

analysis to identify and isolate the cause of the changes captured in the DSF plots and explain them

using knowledge of the configuration and deployed technology.

Storage-centric view:

The case scenarios presented in this article assume that the storage array is dedicated to the

respective applications and its host computer scrutinized. DSF produces a complete representation of

the entire demand and supply spectrum for that storage array by looking at the single host computer.

However, in real-world data centre infrastructures, storage arrays are deployed to serve aggregated

workloads from multiple applications residing on multiple host computers. The impact is that looking

at only one host computer will produce a partial picture on the DSF as a portion of demand by the

host computer on the same array will be invisible and hence missing from the DSF plot.

The entire demand and supply spectrum of a storage array must be taken into account for a

representative, reliable, and complete picture. DSF can be viewed as a storage-centric performance

plotting approach that is suited to real-world complex shared service centers for processing data.

Limitation with improvements

Ex Post nature:

The DSF approach may appear to be ex post. DSF data can be plotted only after the data for the

‘after’ scenario is available. This is after the implementation of the proposed changes. Consequently,

if roll-back is desirable, it may not be practical or feasible to accomplish.

This conception of DSF has its genesis in the complexity of our customers’ computing environment.

Modeling or simulating these environments suffers from the lack of fidelity due to assumptions that

may be totally or partly invalid. Once the baseline is established, a completed DSF can be used for

predictive and validating purposes, as amply demonstrated by the case scenarios presented in this

article. There is always a computing platform before it has to be improved by scaling up or down. The

combined effect of ‘before’ DSF, knowledge of the current infrastructure and technologies, existing

and proposed, unlocks the full benefits offered by DSF used in an ex ante mode.


Estimation:

The reduction of a complex system into a two dimensional empirical representation embodied in the

DSF is a large-scale estimation. We do not really know how much of the variation in the independent

(total device response time) primary proxy variable is due to the chosen dependent (IO/Sec, RPC

Operations / Sec etc.) primary proxy variable in each of the case scenarios presented. The only way

to find out with certainty is to perform statistical ‘linear multiple regressions’ analyses on a large

amount of data for a large number of variables. This should be followed by calculation confidence

building indicators like co-relation coefficient - R2 to reveal how much of the model dependent variable

variation is accounted for by the variation in independent variables in the model. We could also use a

compute t-test to calculate the statistical significance of a variable in the model. However, the

potential for ‘paralysis by analysis’ is substantial.

DSF is offered as a workable panacea. The limitation of having only two variables can be overcome

by plotting multiple DSF charts with different independent variables to discover which one impacts the

dependent variable the most. Planners have the flexibility to plot two primary proxy variables for a

single storage device (case scenario 2), or plotting for the same time interval on different devices

(case scenario 1, 3).

Automation:

Plotting DSF can be mechanized to eliminate inherent tedium, especially because it needs to done

repeatedly. A Graphic User Interface (GUI) driven application can be provided to deliver benefit with

less effort. This permits planners to build a case-specific DSF instance populating it with field

captured performance data. These cases can be stored for a comparative analysis to make strategic

choices.

----------END ----------


Author Biography

Lalit Mohan is a Senior Solutions Architect at EMC Corporation, where he

has been working since 1999. He graduated as an Engineer in 1985 from

the University of Delhi. Prior to EMC, he worked in the Automotive and the IT

industry in South Asia with ‘Engineering Automation’ and ‘Enterprise

Resource Planning’ software applications. During his current assignment at

EMC, he has performed different roles, namely: Systems Engineer, Systems

Engineering Manager, Project Manager and Solutions Architect.

He is a Volunteer with the Singapore Chapter of Project Management Institute (SPMI), contributing by

speaking, publishing articles and participating in social and charity events. These events are attended

by project managers from a cross-section of industries.

Lalit holds prestigious certifications including: Project Management Institute (PMI) - PMP (Project

Management Professional); The Association of Operations Management (APICS) - CIRM (Certified in

Integrated Resource Management) and CPIM (Certified in Production and Inventory Management);

and ITIL Foundation level certification. Additionally, he holds technical certifications from Microsoft®

(MCSE – Windows2000 stream) and EMC Proven Professional Expert level certification in Business

Continuity - EMCTAe. He is an EMC Symmetrix® Performance Engineering Evaluation Database

(SPEED) ‘guru’ community member since 2000. He has also qualified for the Cost Accounting

intermediate level certificate from the ICWAI (Institute of Cost & Works Accountants of India).

Data Storage Performance – Equating Demand and Supply

Documents

Transcript of Data Storage Performance – Equating Demand and Supply