Data Storage Performance – Equating Demand and Supply
Transcript of Data Storage Performance – Equating Demand and Supply
Data Storage Performance – Equating Demand and Supply
Lalit Mohan
EMC Proven Profesional Knowledge Sharing 2009
Lalit MohanPMP, ITILV3 (Found.), CIRM, CPIM, EMCTAe (BC), MCSE (W2K), ICWAI (inter), B.Eng.(Hon.)Senior Solutions Architect
[email protected] Computer Systems (South Asia) Pte Ltd
2009 EMC Proven Professional Knowledge Sharing 2
Table of Contents
Executive Summary ................................................................................................................................. 4 Abstract .................................................................................................................................................... 5 Introduction .............................................................................................................................................. 6 Essential Terminology.............................................................................................................................. 8 Characteristics of ‘Demand’................................................................................................................... 10 ‘Supply’ Capability ................................................................................................................................ 11 Demand-Supply Framework (DSF) ....................................................................................................... 12 Improving Performance Capability........................................................................................................ 14 Example Case Scenarios ........................................................................................................................ 18
Case Scenario 1: UNIX® Transactional Workload ........................................................................... 18 Case Scenario 2: Windows® Messaging Workload .......................................................................... 21 Case Scenario 3: Mainframe Mixed workload................................................................................... 24
Recommendations in conclusion............................................................................................................ 25 Assumptions impact and remedy ........................................................................................................... 26 Limitation with improvements ............................................................................................................... 27 Author Biography................................................................................................................................... 29
Disclaimer: The views, processes or methodologies published in this compilation are those of the authors. They do not necessarily reflect EMC Corporation’s views, processes, or methodologies
2009 EMC Proven Professional Knowledge Sharing 3
Disclaimer: The views, processes or methodologies published in this compilation are those of the author. They do not necessarily reflect EMC Corporation’s views, processes or methodologies. List of Figures Figure 1: Characteristics of Demand....................................................................................................10 Figure 2: Supply Characteristics ..........................................................................................................11 Figure 3: Demand-Supply Superimposed ............................................................................................12 Figure 4: Demand-Supply Framework (DSF).......................................................................................13 Figure 5: DSF Pay-Off Tabular Representation ...................................................................................14 Figure 6: Building Change Strategy using DSF ...................................................................................16 Figure 7: DSF for ERP based Vendor Management Application .........................................................18 Figure 8: DSF for RDBMS based Telecom ’Value-Added Services’ Application .................................20 Figure 9: Healthy scenario DSF for Messaging Application.................................................................22 Figure 10: Unhealthy scenario DSF for the Messaging Application.....................................................23 Figure 11: DSF for Mainframe Mixed Application ................................................................................24 Figure 12: Recommended Processes using DSF ................................................................................26
2009 EMC Proven Professional Knowledge Sharing 4
Executive Summary Based on the Scarcity principle used in Economics, we cannot have more of everything; we must
make choices. The current climate of austerity is forcing businesses to reduce cost. Caring
businesses are choosing to aggressively cut non-people costs first. Achieving the same or even more performance from existing or scaled-down IT infrastructures, including Storage supports
this effort by freeing-up capacity to bear people cost.
Making this happen requires intense customer engagement. This is a guiding principle proposed in
2009 [1] for scoping, planning and implementing the customer’s strategic transformation and
acquisition initiatives. In this article, “demand-side” is the workload that business activity places on
the IT storage infrastructure; “supply-side” represents the infrastructure’s capability to meet the
former. Procurement, which should include the effort to match the two, is unintentionally restricted to
technical and commercial supply-side aspects, ignoring the demand-side. This approach reduces the
chance of achieving an ‘economically efficient’ outcome.
This article introduces the Demand-Supply Framework (DSF) theory, a 4-cell-matrix that enables
businesses to effectively measure the performance of their current Storage infrastructure in context of
their business activity. This enables informed choices of strategic initiatives with ensuing benefits, e.g.
saving money, improving efficiencies and returns on investment, reducing risk etc. [2].
Shared service centers, virtual or physical, within or [if possible] across the organizations internalize
positive externalities that otherwise are lost in duplicated facilities. This leads to under-production
of demand-servicing capability. DSF visually illustrates the potential of serving more business
demand with the same or less infrastructure.
"There is nothing so practical as a good theory."
--Kurt Lewin
2009 EMC Proven Professional Knowledge Sharing 5
Abstract Information Technology performance is the cumulative outcome of a number of individual
components, including storage. When the duration of storage processing is proportionately long,
visualize the workload generated by end-user computer systems as ‘Demand’, and the processing
service provided by the data storage system as ‘Supply’. How well supply meets demand defines the
‘quality of performance’ experienced by the business.
Matching projected demand with capability to supply drives the selection and design of data storage
components. Such a solution would operate at an optimum level, with demand equaling supply. In
this article, we apply the demand-supply analogy to building a 4-Cell matrix using appropriate data
storage domain performance characteristics as proxies to represent demand and supply.
This article will develop this 4-Cell matrix to capture and represent the performance of data storage
solutions with reference to several information technology infrastructure solutions, for example,
‘Messaging’ and ‘Enterprise Resource Planning’ applications in an open systems environment and
mainframe host applications in its proprietary environment.
We will review:
• Relevant terms and definitions
• Characteristics of ‘Demand’ placed on data storage components
• ‘Supply’ capability of the data storage component
• Combining demand and supply into the working framework
• Options for improving performance capability
• Case scenarios to illustrate key points
• Conclusion and recommendations
• Assumptions, impact and remedy; limitations with improvements
You (personnel responsible for evaluating, procuring and implementing) will benefit from this article by
learning how to better plan and design optimum data storage infrastructures to support centralization
of business information assets into efficient shared service centers. This is a necessity in the current
financial climate. This aggregation may produce positive externalities that enhance the value of
information to management, improving return on investment (ROI).
2009 EMC Proven Professional Knowledge Sharing 6
Introduction The purpose of this article is to evaluate the performance of storage as a sub-system in a typical,
complex, and contemporary IT infrastructure landscape. We will use a 4-cell matrix framework
capable of classifying the sub-system into four categories based upon the patterns observed in the
performance data. These classifications will guide the development of change strategies. A brief
description of each section follows.
Business Proposition as outlined in the Executive Summary :
‘Demand-Supply Framework (DSF)’, an empirical framework, is designed to allow businesses to
derive more benefits from existing or reduced IT storage infrastructures. It is expected to provide
additional monetary resources to invest in existing staff. DSF classifies the storage sub-system based
on performance in context of the enterprise workload. The outcome supports performance
improvement initiatives and cost-benefit analysis to evaluate the investment decision.
Essential Terminology :
At the very beginning, this article describes the terminology used to develop this framework. A
common language is necessary to accomplish the stated objective. The purpose of DSF is introduced
to graphically represent the real, current, and future state of storage sub-system performance.
Characteristics of ‘Demand’ Our contemporary knowledge pool is dominated by ‘supply-side’ information on the storage sub-
system (i.e. capabilities, performance benchmarks and comparisons). This article attempts to
enhance the use of this knowledge by introducing typical, real-life characteristics of demand using
representative performance data. This data can be captured easily using existing system tools and
programs to add the ‘demand-side’ perspective, enabling economical and efficient choices.
‘Supply’ Capability:
Supply characteristic of a storage sub-system are a well treated subject in reference material
available from academic and industry sources. This body of knowledge is also relevant to the
framework proposed in this article. Supply has an implicit connotation of quality. “Total Device
Response Time’, measured in milliseconds, is the most commonly used unit of measurement for
transactional or mixed workloads.
2009 EMC Proven Professional Knowledge Sharing 7
Demand-Supply Framework (DSF): Demand for IT workload processing and the infrastructure’s supply capability are interrelated. This is
the cornerstone of the hypothesis developed in this article. Operational and investment efficiency is
achieved only when demand equals supply (equilibrium). It may not be practical to develop a closed-
form model for matching the two. DSF based upon data from existing tools and programs can help
enterprises discover their current position within the equilibrium.
Improving Performance Capability:
Knowledge of your current position is a powerful motivator to envision the future, desired position.
This vision leads to realizing the objective. DSF is a key tool to assure such transitions. It facilitates
correct interpretation of the large and complex performance dataset, spelling the difference between
success and failure. This section uses a case scenario methodology for identifying deficiencies and
alleviating them.
Example Case Scenarios: Case Scenario 1: UNIX-Transaction Workload Case Scenario 2: Windows® Messaging Workload Case Scenario 3: Mainframe Mixed Workload No two IT Storage workloads are identical, but their constituent types repeat across businesses.
Workload is generated when businesses serve their customer online transactions. Batch-jobs
workload is generated as the data is processed,. People must communicate for the organization to
function, generating messaging traffic. The case scenarios in this article are based upon field
experience with these important storage workload-generating functions in a contemporary enterprise.
Recommendations in Conclusion:
The conclusion stresses the importance of a clear and objective system to measure and compare
performance of complex options and their combinations. A brief process for using DSF to achieve
economic efficiency is proposed.
Assumptions impacts and remedy, limitations with improvements: Following the above, key assumptions are explained along with their likely impacts. Subsequently,
suggestions are made to overcome the effect of these impacts. This is followed by known limitations
and probable improvements for mitigating them.
The article concludes with a brief author biography.
Keywords: demand, supply, knowledge, Business, performance, framework
2009 EMC Proven Professional Knowledge Sharing 8
Essential Terminology Here are definitions of the terms used in this article. I hope to eliminate misunderstanding as often
the same terms are used to mean different things.
Access Density: Access Density is the measure of storage performance per unit of capacity for a single disk drive unit.
It is computed by dividing a disk drives’ storage capacity by the throughput in IO/sec it is capable of
supporting with reasonable response time [3].
Demand-side’ view of storage infrastructure: It is the business activity placed on the IT storage sub-system. There are a variety of business
functions in an enterprise that use different applications and generate different workloads. IT storage
arrays service this aggregated heterogeneous workload, arriving at an objective basis to match the
storage arrays’ capability with the aggregated heterogeneous workload.
‘Economic-efficiency’ and DSF: Economic-efficiency is the state of operation when differing business applications are receiving their
respective expected ‘quality of performance’ from the storage array that processes their aggregated
workload. Both ‘Supply-side’ and ‘Demand-side’ considerations are needed to match the two. This
matching is embodied in DSF, a 4-cell matrix that depicts the storage array’s current and future state
of performance.
‘Game in strategic form’: It is the tabular presentation of the possibilities between two variables; it lists ‘pay-offs’ for each
possible outcome and used in the application of Game Theory, a branch of applied mathematics.
Going-wide-before-going-deep: A disk drive is a mechanical device, subject to the mechanical laws of inertia. Hence, any application
dataset should be spread over as many disk drives as possible for better performance. However, the
number of disk drives in any storage array is finite so a very large dataset may overlap even after
spreading on all the available disks ( i.e. going deeper on an existing disk drive). This is referred to as
‘Going-wide-before-going-deep’.
2009 EMC Proven Professional Knowledge Sharing 9
Hyper: Storage arrays typically supply large physical disks,. Currently, disks up to one terabyte are available
commercially. These drives are sliced into smaller pieces before they are assigned to individual host
computers in the process of configuring the storage array for use by multiple host computers. Host
computers see these slices as virtual physical disks; these slices are called ‘hypers’.
IO/Sec: An input/output (‘IO’) is a request made by the software program running on the host computer to the
storage device. It is characterized by size, typically in kilobytes or megabytes; and nature sequential,
(when consecutive IO is located adjacent to each other) or random (when every next IO is located
anywhere on the disk away from the preceding). The number of IOs completed in one second is
expressed as IO/Sec.
Progress-path: Locus of a request in a computing infrastructure made-up of a series of components and sub-
components that are traversed (e.g. software programs & processes, data structure abstractions or
constructs, caches, firmware and hardware etc.) until the request is complete.
Quality of Performance: Quality of performance in this article refers to the response time for a transaction or set of
transactions in context with the expectation. Performance is a relative concept, in context to the
quantity and nature of the workload processed. Good performance does not always mean
exceptionally low transactional or batch response time in complete disregard to the quantity and
nature of the workload. Rational Ignorance: Deliberate state of incomplete knowledge, maintained because obtaining the missing piece is too
costly when compared to the benefits of possessing such knowledge.
‘Supply-side’ view of storage: It refers to matter related only to the capability, features and functionalities of the storage array. It may
include performance information of the storage array when subjected to only artificial test workloads.
Response Time:
It is the processing time spent by the transaction in a sub-system, or the time interval between two
successive requests submitted to a purely sequential processing system or sub-system.
Contemporary information technology is capable of concurrently accepting and processing multiple
requests. Average performance of the system or sub-system benefits as a result; individual
transaction response time may increase.
2009 EMC Proven Professional Knowledge Sharing 10
Characteristics of ‘Demand’ The processing time spent by the transaction in a sub-system is its turn-around time; it can be
considered ‘Response Time’. The shorter the response time, the more workload that can be
presented to the sub-system for processing and vice-versa.
Imagine yourself working on your notebook. Under-configuration gives a lagging experience where
the system is struggling to keep up, and you may find yourself restlessly waiting to enter more
commands for processing. In effect, you are not able to realize your full potential.
As you enhance the system and it ambles along more quickly, you are able to present more work for
processing in the same amount of time until you reach the maximum rate of presenting work to the
system (determined by the nature of your work, and your dexterity among other factors).
It thus makes sense to enhance the system until its coping rate matches you workload presentation
rate - the optimum configuration. At this point, your investment in the system is most efficiently
utilized. Any system configuration short of the optimum results in under-utilization of your potential;
configuration beyond the optimum reduces the rate of return on your investment.
Figure 1: Characteristics of Demand
2009 EMC Proven Professional Knowledge Sharing 11
The ‘Law of Demand’ is a well known tool in an economist’s toolkit. It is proposed to be applicable
even in computing environments. Shorter transaction processing time (a proxy for price paid) yields
an increased ability to present work for processing (a proxy for quantity consumed), see Figure 1:
Demand Characteristics
The Demand-Supply based approach embodies the concept of workload elasticity, where more
workload is available as the sub-system is able to process it in less time and vice-versa. This is
missing in the traditional ‘requirement’ based approach.
Accurately measuring and determining the proxy demand curve may not be practical, easy, or even
feasible. Hence, the concept of rational ignorance (ignoring such an analysis) may be justified from
an individual demand perspective.
For an individual, the cost of analysis may be too much, and advantages few. Enterprise advantages
could far outweigh the cost of rational ignorance namely, idle capacity or lost opportunity. However,
the determination of such a proposed proxy demand curve for an enterprise may not be feasible. The
4-Cell matrix framework developed in this article overcomes this difficulty.
‘Supply’ Capability The storage sub-system appears to perform according to economist’s ‘law of supply’, the longer the
transaction processing time (a proxy for price charged) tolerated by business activity, the more
workload that can be processed (a proxy for quantity supplied).
Figure 2: Supply Characteristics
2009 EMC Proven Professional Knowledge Sharing 12
Figure-2 sketches a typical storage sub-system performance trend. The curve slopes up gently until
point ‘C’ which is often referred to as the Knee of the curve (Systems personnel also some times refer
to this as a ‘hockey-stick’ curve due to its shape). Beyond this point, the curve rises asymptotically.
The relatively flat and gently rising portion of the curve is the operationally useful part, where the
storage sub-system will typically operate. This typical behavior can be predicted or verified by
modeling storage as discrete systems [3].
The law of supply approach also recognizes elasticity in supply. If the applications are more tolerant
of response from the storage sub-systems, they can service more requests.
Demand-Supply Framework (DSF) The ultimate performance experience is achieved when ‘demand’ and ‘supply' are perfectly matched.
Simple as it may appear, the quantitative value of ‘demand’ and ‘supply’ is determined by the complex
combination of a large number of variables. A closed-form analytical or empirical model or simulation
requires complex mathematics and considered assumptions rendering the simulation with far less
fidelity then desired.
DSF, a 4-Cell Matrix, is designed to harness the abstraction of the Demand-Supply concept into a
more easily understood and easy-to-use format. All the variables are hypothesized to roll-up into two
camps, ‘Demand-side’ and ‘Supply-side.’ These two camps counter-act predicting optimum
equilibrium that shifts based on dynamic changes caused by known and unknown variables
DSF provides a quantitative view of how efficiently the storage sub-system meets specific enterprise
workloads. A useful picture emerges once this framework is populated with the measurements from
actual or similar operations, revealing whether the sub-system is over, under, or optimally utilized.
Figure 3: Demand-Supply Superimposed
2009 EMC Proven Professional Knowledge Sharing 13
Figure-3, superimposes the characteristics of the demand and supply developed earlier. The region
around the intersection of the two curves estimates the zone of efficient operation. DSF provides an
empirical approximation to evaluate if the current storage sub-system is operating efficiently. As
illustrated in figure-4, a four quadrant frame is superimposed upon the graph. The four quadrants are
represented using the alphabet. An explanation for each of the four quadrants is presented following
the next paragraph.
The choices for acquiring infrastructure components are strategic in nature. Choices made by
infrastructure planners are contingent upon the aggregate effect of actions of multiple business users,
often acting independently. The development of DSF is in part inspired by the tabular representation
of scenarios as a ‘game in strategic form’ drawn from ‘Game Theory.’ The four discrete quadrants,
illustrated in figure-4, categorize the pay-offs of the four possible combinations of actions. These pay-
offs are illustrated in figure-5 below.
Figure 4: Demand-Supply Framework (DSF)
Quadrant ‘A’ is the ideal zone of operation. The cross-point of demand and supply curves is centered
in this quadrant. From a Theory of Constraints (TOC) perspective, a system operating in this region
has no constraints [4]. In other words, all the sub-systems are well matched and operating at the
same rated capacity. Our ‘Goal’ [4] should be to configure and load a system to operate in this zone;
the other quadrants have a mismatch of demand and supply.
Operation in the ‘B’ quadrant indicates over-utilization where demand overwhelms the ability of the
storage sub-system to service the workload. Operation in Quadrant ‘C’ is of concern since the storage
sub-system struggles to service the workload even at low demand. Quadrant ‘D’ is a clear case of
under-utilization of the storage sub-system. The infrastructure owner is not working invested dollars
hard enough, sometimes intentionally, e.g. when an SLA is strict and ambitious.
2009 EMC Proven Professional Knowledge Sharing 14
Improving Performance Capability Progressing in our journey towards understanding system performance, identifying deficiencies and
choosing appropriate remedies, DSF is developed and used in more detail.
From a financial perspective, quadrant ‘A’ operation indicates that the dollars invested are being used
well. Quadrant ‘C, and ‘D’ imply a lower return on investment. Quadrant ‘B’ is pushing the return in
investment to its limit; end-user quality of performance may suffer and the system would not be able
to absorb a spike in the operation.
Figure 5: DSF Pay-Off Tabular Representation
How are the quadrants drawn? Referring to figure 5, the horizontal middle line for the 4-Cell matrix,
based on field experience, is proposed at a fixed level of twenty milliseconds. The vertical middle line
divides the workload into two groups of eighty-twenty percent of the total workload respectively. It
may be chosen at different values for different datasets or specific scenarios. These numerical figures
are guidelines; you may use the values of these variables best suited to your scenario.
DSF is based on the observed pattern for a variety of application environments against the desired
outcomes. Specifically, a desired outcome is the customer perception of ‘quality of performance’,
supported by data. It is a proposed measuring device to categorize and evaluate a computing
environment so the appropriate remedial response can be implemented.
2009 EMC Proven Professional Knowledge Sharing 15
The horizontal-axis is the independent variable; workload is modeled as ‘demand’. The proxy for the
storage world may vary depending upon the environment being studied. A common independent
variable is data ‘throughput’ to the storage system measured in terms of Inputs and Outputs per
Second (IO/Sec), or Megabytes per Second (MB/Sec). The Percentage Utilization (%Utilization) of
the storage sub-system or its components is another example.
Other variables may be used as proxy for the ‘demand’ workload. For example, in Exchange® you
might use messaging environment performance counters such as ‘RPC operations/sec’. This counter
is the direct measure of aggregate workload handed to the Exchange® servers by the clients Office
Outlook® in MAPI® mode. This workload drives the Exchange® Information Store service on the
Exchange® Server, which accesses storage to fulfill client requests.
The vertical-axis almost always represents the transaction processing time or device response time in
milliseconds. This can either be for a batch or for individual transactions depending upon the study.
The choice of variables may depend on the availability of data gathering tools and collectable data..
The choice is often made based on available information. The key consideration is the analytical
establishment of a definite relationship between the independent variable represented on the
horizontal-axis, and the dependent variable on the vertical-axis to ensure a sufficient-cause
relationship [4]. There may not always be a one-to-one relationship between these two. Other
variable(s) may be involved and have significant influence; these must be kept constant if the
environment permits their control. At the least, they must be measured and recorded so a reasonable
estimate of impact can be made to arrive at an objective conclusion.
Selecting proxy variables is both science and art. It is a science as it draws upon the knowledge of a
cross-section of technologies operating at various layers that work together, namely, software
application, operating system platform, Ethernet network, storage network and the storage array
layers. The best way to arrive at high fidelity proxy variables is to engage subject-matter experts
whose experience can populate technology mechanisms with values or quantities through
observation of infrastructures. This knowledge of what works and what doesn’t is invaluable in
deciding on the proxy variable that can provide a true picture of the system under scrutiny.
Once the measured variables for the system are plotted on the DSF, the current-state of the system
(Quadrant-A, B, C or D) becomes known. The enterprise can then devise a strategy for getting to the
recommended Quadrant ‘A’.
2009 EMC Proven Professional Knowledge Sharing 16
Figure 6: Building Change Strategy using DSF
Quadrant ‘A’ represents a zone of operation where the enterprise is efficiently using its invested IT
dollars. They only need to monitor for business growth and increasing transactions, which will almost
certainly increase system workloads. A high reading for the independent variable represented on the
horizontal-axis (e.g. high utilization) is not in itself a sufficient cause for alarm as long as the
dependent variable, the outcome (e.g. transaction processing time), is within reasonable limits as in
Figure-4.
A system operating in quadrant ‘B’ is under stress. Demand for service is overwhelming the ability to
perform. An enterprise must plan to move along the blue (dotted) arrow leading to quadrant ‘A’. There
are many possible options to make that change. An enterprise may work on the demand or the supply
side depending on the circumstances. Reduction or diversion of the workload is a possible demand-
side tactic. There are many supply-side options, ranging from the purchase of new capacity to using
cache technology to improve the system’s response. For instance, an outright replacement or
capacity upgrade may not be plausible in the current financial climate. Hence ‘caching’ technology
can be deployed at various interfaces along the progress-path of the transaction from application to
the storage. At the computer server layer, this has become a very feasible option. As memory prices
plunge, hosts increasingly have large physical and virtual memories. Progressing to the application
layer, caching may be enabled across the network in the application based upon the client-server
architecture, currently a dominant architecture in IT landscapes.
2009 EMC Proven Professional Knowledge Sharing 17
Running Outlook® MAPI client in cached mode is a well known example supporting this suggestion.
This results in asynchronous system operation that insulates user actions and experiences from the
latencies in the overall system [5]. At the storage level, there is a large multi-player industry devoted
to designing, developing and marketing Intelligent Cached Disk Arrays (ICDA). These were
practically non-existent fifteen years ago outside the Mainframe host computer domain. These
devices enhance storage performance considerably by exploiting fast, electronic memory combined
with a bank of slower mechanical disk devices.
A system operating in Quadrant ‘C’ is a cause for concern. Even in the absence of considerable
workload quantity, quality is not the independent variable outcome. There is a serious gap in the
design or configuration of the system that needs troubleshooting for the system to progress along the
blue (dotted) arrow to quadrant ‘A’. The root-cause may arise from many of the components in a
system due to system complexity.. A typical dilemma at the storage layer is caused by the disk drive
technology trend causing ‘Access Density’ [3] to fall progressively. The tendency is to purchase a
number of drives based upon the useable storage capacity desired by the enterprise. As the low
capacity drives (i.e. 36 gigabytes (GB), 73 GB and soon 146 GB sized disk drives) disappear from the
supplier’s price lists due to technological obsolescence, enterprises purchase fewer disk drives for
more GB. This dilemma is exacerbated by the decrease in IT budgets. Spreading the data by using
knowledge of the workload and the access pattern can help in such circumstances. However, the
dynamic nature of the access pattern in any enterprise raises the challenge and reduces the potential
to benefit from such intelligent but static one-time data spreading.
Here is a word of caution about the performance data generated by tools and programs in the various
computing environments including Mainframe host computers. At a low workload, high response
time is often seen purely due to statistical aberration. Statistical averaging is used extensively in the
reported data for typical tools and programs; this averaging over lower workload can and does result
in higher averages when compared to higher workloads. Due to this fallacy, the system falling in
Quadrant ‘C’ must be scrutinized more closely before arriving at a strategy to move it to the quadrant
‘A’. Using ‘weighted average’ in place of pure ‘averages’ is one approach to address this fallacy.
Quadrant ‘D’ is an under-loaded sub-system. From a financial perspective, the enterprise is utilizing
its investment fully. However, an enterprise may have a policy to restrict the level of system utilization
to maintain the quality and level of service. If this policy does not exist, the enterprise can devise a
strategy to move along the blue arrow to quadrant ‘A’. Once again, changes can be on the demand or
the supply-side. On the demand-side, the enterprise may increase its service levels to the end-users
or divert more workload towards the sub-system by other inviting other business units or external
parties. On the supply-side, the enterprise may scale down the sub-system to recoup or free its
investments.
2009 EMC Proven Professional Knowledge Sharing 18
Example Case Scenarios
Case Scenario 1: UNIX® Transactional Workload These application environments are characterized by read predominant random access of
information. The end-user experience is driven by the response time of individual transactions.
Consequently, the instruction processing components of the transaction progress-path is stressed
more then the capability of information transmission buses. The performance measurement in such
environments is IO/Sec as the independent variable, and the Total Device Response Time in
milliseconds as the dependent variable.
a. ERP based ’Vendor Management’ application: The performance data below represents a UNIX® server, running ERP based Vendor management
application in DSF format is plotted below in the Figure-7.
Figure 7: DSF for ERP based Vendor Management Application
Each ‘▲’ denotes the throughput IO/Sec versus the total device response time for each physical
device as seen by the server during the chosen interval. These devices viewed as physical disks by
the computer host are actually hosted as ‘hypers’ on the disk drives in a high-end storage array. The
measured IO/Sec is the average of the ten minute intervals selected during the peak period. This
picture highlights the distribution of workload among the various devices available to the application.
The nature of the workload and the spread of the application information are such that during the
selected interval, close to ninety percent of the workload is utilizing only four devices. This is a typical
profile for open systems host computers. These devices are still able to respond to the requests well
below the recommended measurement of twenty milliseconds. The current application end-user
experience in this enterprise was rated satisfactory.
2009 EMC Proven Professional Knowledge Sharing 19
In spite of the current satisfactory performance, a DSF approach can offer useful insights. It
impresses the importance of spreading the information ‘going-wide-before-going-deep’ on the
available devices.
In the current scenario, there are still application information devices with information tables (refer to
figure-7, label ‘a’) that are living in DSF quadrant ‘A.’. Not much of the workload traffic is finding its
way to these devices. As a result, higher response times are not reflected in the overall end-user
experience. Based on the nature of information and the transactions, business activity may shift,
channeling increasing amounts of workload traffic on these devices. In such a scenario, the end-user
experience will deteriorate, adversely affecting any formal and informal inter-department service-level
understanding. Capitalizing the predictive nature of DSF requires collaboration among the business,
application and infrastructure teams.
Additionally, the internal devices of the computer host housing the UNIX operating system are also in
quadrant ‘C’ (refer to figure-7, label ‘b’). This is not entirely unexpected as modern virtual memory
management operating systems access the operating system hosting drives extensively. Any
technique to improve the access response, e.g. using more drives in mirror protection mode, would
provide performance benefits.
Conclusion: The Vendor Management application is experiencing good storage performance. The enterprise is
getting good returns for dollars spent on the infrastructure with room to scale. However, there are
caveats regarding the nature of work load stressing the drives that are already exhibiting high device
response time; operating system activity could become a bottleneck.
b. RDBMS based Telecom ‘Value-Added Services’ application: ‘Value-added services’ application processing is dominated by transactional workload. Representative
performance data from the UNIX® based server running this application is plotted in figure-8. This
figure has two panes to show the performance picture ‘before’ and ‘after’ the introduction of a
technology upgrade to a mid-range storage array. This case scenario demonstrates the comparative
way to use DSF.
This plot depicts the impact of the technology upgrade only, as only this variable was changed in the
computing infrastructure. This data can be used to baseline the environment for a sensitivity study,
isolating the benefit resulting from a change in a single variable, e.g. new technology, and facilitating
2009 EMC Proven Professional Knowledge Sharing 20
efficient sizing of the environment. Typical examples of the variables that could be changed for the
purpose of a sizing study include disk drive organization for the Logical Units (LUN), file-system
organization at the server, operating system configuration, version or patch level changes, application
technical configuration, business activity changes resulting in quantity and quality of the workload
etc., in addition to any hardware change in the system. More than one variable usually changes in an
infrastructure shift during a real-life infrastructure transformation. The analysis below has
considerable value in helping us to understand the demand and supply view for an enterprise
facilitating economically efficient choices.
Figure 8: DSF for RDBMS based Telecom ’Value-Added Services’ Application
Refer to figure-8 above. The bottom pane depicts the DSF picture of the host computer before the
storage technology upgrade. The groups of storage devices marked ‘a’ carrying the bulk of the
workload are shown to be bordering the boundary of the quadrants ‘B’ and ‘C’. As described earlier,
this is undesirable. A slight upward swing in the workload could drift into ‘B’ and ‘C’ quadrants
providing higher total device response time and degrading application performance.
The DSF in the top pane depicts the situation after the technology upgrade is complete. Observe the
same group of devices as seen earlier, now marked ‘b’. The improved outcome is apparent. Not only
the total response time of the busy volume has come down, these volumes are actually processing
more workload concurrently. As there is no application or data layout or distribution changes, this
concurrency always existed in the workload but was previously suppressed. It is now leveraged by
the newer technology at the storage array level in the ‘after’ scenario. These devices are operating
well inside quadrant ‘A’, which may be the desired scenario.
2009 EMC Proven Professional Knowledge Sharing 21
In addition to the above observation, a large number of devices can be seen in quadrant ‘D.’ Their
current payoff is ‘under-loaded’ as in figure-6. If it is possible to introduce changes at the operating
system or application levels to restructure the data layout, it may be possible to utilize the capacity of
these devices to service more workload. The devices lying in quadrant ‘C’ may be studied for the
cause of their unacceptable performance and remedial strategies implemented. However, typical
devices in this quadrant are those that host the operating system swap space. The remedy may be to
use as many drives in the mirrored mode as possible.
Conclusion: It can be inferred and confirmed that the system depicted in figure-8 has achieved a service
workload capacity boost, justifying the dollars spent on the upgrade. The devices that take the bulk of
the workload from the database are less stressed after the storage upgrade. Hence, the system is
able to accommodate more transactions that may be the result of increased business activity due to
additional users or the addition of value-added services. All of this can be achieved without
deteriorating the existing, committed service levels.
Case Scenario 2: Windows® Messaging Workload
‘Messaging’ application: Messaging applications are typically implemented as two-tier client-server architectures. The local
area network links the two distinct tiers, ‘client’ and ‘email server’. This network is a key component in
determining the performance experience of the Messaging system. In this article, we focus on a
dedicated email server located in the server-tier of the setup. This server is attached to a storage
array hosting the message store of the Messaging system. We observe the performance of the
storage array as it supplies workload demand created by email traffic during a typical workday from
8.00am to 8.00pm. See figure-9.
The proxy indicator for the level of user activity that the mail server can successfully process is the
number of RPC Operations completed by the mail server every second. For Case Scenario 2, this is
the driver or the independent variable that generates the workload on the storage array, i.e. proposed
proxy for demand. As usual, the dependent variable is the device response time of the drive hosting
the message store. The workload on the storage array is a balanced mix of read and write. The size
of the transaction is usually small, approximately four kilobyte.
2009 EMC Proven Professional Knowledge Sharing 22
The four-quadrant DSF framework is superimposed on the mail server performance chart in figure-9.
However, there are differences in the way the DSF has been plotted in case scenario 2. In case
scenario 1, each data point on the chart corresponded to the device response time of a physical
device accessed by the host computer from the storage array. In case scenario 2, each data point
corresponds to the ten minute average of the drive hosting the message store, providing a temporal view. This logical drive at the operating system level is carved out of multiple slices taken from many
physical disks housed in the storage array.
Figure 9: Healthy scenario DSF for Messaging Application
People begin to access their email accounts concurrently as the workday begins. This translates into
slowly rising workload on the array illustrated by the grey line marked by ‘(i) 8.00m – 10.00 am’ in
figure-9. The storage array meets the increasing demand at an almost linearly increasing response
time of about twelve milliseconds. This is represented by the data points with symbol ‘Δ’. From the
DSF perspective, the system moves from the low usage quadrant ‘D’ to the edge of the efficient
operation quadrant ‘A.’
As the workday reaches full swing, random concurrent access by multiple email users fills up the
queues in the various components of the email system including storage and its sub-systems. This is
illustrated by the dark grey ellipse marked by ‘(ii) 10.00am – 12.00pm’, enclosing the data points
depicted by the symbol ‘■’. DSF’s state of the system hovers around the boundaries of quadrants ‘A’
and ‘D’. However, the maximum device response time never rises above a maximum of seventeen
milliseconds.
2009 EMC Proven Professional Knowledge Sharing 23
After the lunch hour, interactions mature and the randomness in email access moderates even
though throughput is generally higher, and queues are relatively relieved. As a result, device
response time reduces. This is illustrated by the dashed light grey ellipse marked by ‘(iii) 12.00pm –
6.00pm’, enclosing the data points depicted by the symbol ‘♦’. This time the DSF state of the system
hovers largely in quadrant ‘D’, bordering on efficient use of the storage system. The maximum device
response time stays close to fifteen milliseconds.
As the evening advances, email users logout from their email accounts, resulting in a progressive
workload decline. The storage array processes this reduced workload with lower device response
time. This is illustrated by the dark arrow marked by ‘(iv) 6.00pm – 8.00pm’, enclosing the data points
depicted by the symbol ‘○’. The DSF state of the system slips into the quadrant ‘A.
Conclusion: System transitions from one state to another during the course of the same day reveal their dynamic
footprint on the storage infrastructure. DSF is capable of capturing and displaying a large quantity of
time-variant data in a meaningful way. The system depicted here stays within the confines of
quadrants ‘A’ and ‘D’ throughout the day. This is a healthy state picture of performance. In this
scenario, figure-9 depicts the pattern of the busiest days for the enterprise, hence this email server
can be safely considered to host additional users’ mail accounts. The recommended strategy to
increase ROI is to add users in a phased approach punctuated with taking a DSF snapshot.
Figure-9 illustrates a healthy scenario, whereas figure-10 shows an undesirable situation for
comparison. The period labeled ‘e’, is likely to provide email users with slow and unacceptable
response experiences. To resolve this issue, the system manager may transfer some mailboxes to
reduce the workload on this system, or upgrade the system hardware or storage.
Figure 10: Unhealthy scenario DSF for the Messaging Application
2009 EMC Proven Professional Knowledge Sharing 24
Case Scenario 3: Mainframe Mixed workload The combined effect of increasing access density and squeezing budgets has pushed infrastructure
planners to choose storage sub-systems with fewer large drives. This trend leads to anxiety as they
are aware that storage performance is proportional to the number of drives,. Though the increase of
individual physical disk performance has lagged, the rate of increase in its access density,
electronics, and firmware in the storage arrays that mediate access to these newer larger drives have
taken enormous strides in supporting improved performance. Even though improved performance is
expected, data is necessary to support such choices. DSF is the suitable choice for processing and
presenting data for these before-after comparisons. It provides planners and their sponsors with
sufficient proof to support their choices. It will even point to changes that may have taken place in the
IT storage environment that they may not be aware of, as illustrated in this case scenario.
Mainframe based ‘Customer Promotion’ application in Telecom Industry: In this case, DSF is used to validate the improvement before and after a storage array upgrade in a
predominately read workload environment. As many variables as possible remain unchanged in the
mainframe environment. The study data pertains to the same business-day of the week and the peak
business time intervals of the day. Application and volser configuration are frozen. As the number of
disks in the new array was reduced, the ‘volser’ were re-relayed out on the reduced number of disk
drives. This was done on the basis of the number of read-miss workloads for the respective volser in
the historic performance data. Volser with high read-misses were spread on to as many different
physical disks as possible. Between the before and after scenario, the storage array and the layout of
the volser were the only variables that were expected to change in addition to the older and newer
generations storage arrays.
Figure 11: DSF for Mainframe Mixed Application
2009 EMC Proven Professional Knowledge Sharing 25
The performance data taken from the RMF output was plotted in the DSF format shown in figure-11.
Each data point is a representative ten minute average for the peak business activity interval, total
device response time for the production volsers. The symbol ‘○’ represents the before scenario and
the symbol ‘▲‘ represents the after scenario. From the DSF ideology perspective, it may be observed
that the ‘Customer Promotion System’ both before and after was operating well within the
recommended quadrants with capacity to accommodate growth. However, the improvement is also
clearly evident. 90% of the workload lies in a much tighter rectangle and more inside the ‘A’
quadrant, indicating the ability to scale in response to increasing workload. This is a testimony to the
technological improvement in the hardware, firmware and their integration in the newer generation of
arrays. Other considerations, e.g. technological obsolescence and reduced reliability of the older
equipment may be the driving factors in initiating such an upgrade. The end-user application
performance experience is still a key criterion. We have to prove that it was unchanged or improved
after the change was complete.
Conclusion: It is evident from the DSF plot shown in figure-11 that the newer system, in spite of the reduced
number of larger drives, demonstrated increased capacity for potential growth while preserving the
current user-experience. This data supports an upgrade. These are dollars well spent on the
infrastructure to support business growth. The storage can accommodate added use by end-users
without adversely impacting their experience.
Recommendations in conclusion Real-life infrastructures are complex. As stated in the abstract of this article, ‘IT performance of
Business transactions is the cumulative outcome of the performance of a number of individual
components, including storage.’ Traditional performance evaluation models and methodologies are
challenged to provide a cumulative picture of performance, given the complexity of contemporary
infrastructures. Consequently, they require a large number of variables, some of which may be
unknown and unanticipated, to reasonably describe performance.
With the DSF approach, as long as the primary proxy variables representing the Demand and Supply
are correctly identified in the system, observations and predictions are an empirical representation of
the current or changed state of the system. Large numbers of variables related to e.g. technology,
configuration and business factors, even the unknown ones, roll-up either to the Demand-side or the
Supply-side. They provide a practical approach for a scenario too complex to analyze using
simulation or closed-form models or representation [3].
2009 EMC Proven Professional Knowledge Sharing 26
Figure 12: Recommended Processes using DSF Figure-12 describes a process that may be conducted ex-poste or ex-ante. First, describe the change
and quantify it. Second, evaluate the impact of planned or unplanned changes to your infrastructure
including an analysis of whether the change(s) will impact the Demand-side, Supply-side or both.
Complete the analysis with a cost-benefit analysis. Finally, complete the process with an accept or
reject verdict. Adopt this model as the default to operate at or near an economically efficient
infrastructure.
The supply-demand framework discussion fares well with senior managers and business personnel in
customer organizations. This is important and helpful as they decide when and if to purchase.
Assumptions impact and remedy
Controlled Change:
An important assumption in plotting DSF is that every variable other then the two primary proxy
variables are unchanged in the ‘before’ and ‘after’ data collection period. The real-world customer
environment is dynamic in all the layers and sub-layers of the computing infrastructure facilities; this
assumption may be only rarely valid. The impact is that the DSF picture would show the effect of
unknown or unplanned changes.
2009 EMC Proven Professional Knowledge Sharing 27
The strength of the proposed DSF approach is the ability to capture complex changes in a real-world
infrastructure. The inability to effect a controlled change can be addressed by the logical cause-effect
analysis to identify and isolate the cause of the changes captured in the DSF plots and explain them
using knowledge of the configuration and deployed technology.
Storage-centric view:
The case scenarios presented in this article assume that the storage array is dedicated to the
respective applications and its host computer scrutinized. DSF produces a complete representation of
the entire demand and supply spectrum for that storage array by looking at the single host computer.
However, in real-world data centre infrastructures, storage arrays are deployed to serve aggregated
workloads from multiple applications residing on multiple host computers. The impact is that looking
at only one host computer will produce a partial picture on the DSF as a portion of demand by the
host computer on the same array will be invisible and hence missing from the DSF plot.
The entire demand and supply spectrum of a storage array must be taken into account for a
representative, reliable, and complete picture. DSF can be viewed as a storage-centric performance
plotting approach that is suited to real-world complex shared service centers for processing data.
Limitation with improvements
Ex Post nature:
The DSF approach may appear to be ex post. DSF data can be plotted only after the data for the
‘after’ scenario is available. This is after the implementation of the proposed changes. Consequently,
if roll-back is desirable, it may not be practical or feasible to accomplish.
This conception of DSF has its genesis in the complexity of our customers’ computing environment.
Modeling or simulating these environments suffers from the lack of fidelity due to assumptions that
may be totally or partly invalid. Once the baseline is established, a completed DSF can be used for
predictive and validating purposes, as amply demonstrated by the case scenarios presented in this
article. There is always a computing platform before it has to be improved by scaling up or down. The
combined effect of ‘before’ DSF, knowledge of the current infrastructure and technologies, existing
and proposed, unlocks the full benefits offered by DSF used in an ex ante mode.
2009 EMC Proven Professional Knowledge Sharing 28
Estimation:
The reduction of a complex system into a two dimensional empirical representation embodied in the
DSF is a large-scale estimation. We do not really know how much of the variation in the independent
(total device response time) primary proxy variable is due to the chosen dependent (IO/Sec, RPC
Operations / Sec etc.) primary proxy variable in each of the case scenarios presented. The only way
to find out with certainty is to perform statistical ‘linear multiple regressions’ analyses on a large
amount of data for a large number of variables. This should be followed by calculation confidence
building indicators like co-relation coefficient - R2 to reveal how much of the model dependent variable
variation is accounted for by the variation in independent variables in the model. We could also use a
compute t-test to calculate the statistical significance of a variable in the model. However, the
potential for ‘paralysis by analysis’ is substantial.
DSF is offered as a workable panacea. The limitation of having only two variables can be overcome
by plotting multiple DSF charts with different independent variables to discover which one impacts the
dependent variable the most. Planners have the flexibility to plot two primary proxy variables for a
single storage device (case scenario 2), or plotting for the same time interval on different devices
(case scenario 1, 3).
Automation:
Plotting DSF can be mechanized to eliminate inherent tedium, especially because it needs to done
repeatedly. A Graphic User Interface (GUI) driven application can be provided to deliver benefit with
less effort. This permits planners to build a case-specific DSF instance populating it with field
captured performance data. These cases can be stored for a comparative analysis to make strategic
choices.
----------END ----------
2009 EMC Proven Professional Knowledge Sharing 29
Author Biography
Lalit Mohan is a Senior Solutions Architect at EMC Corporation, where he
has been working since 1999. He graduated as an Engineer in 1985 from
the University of Delhi. Prior to EMC, he worked in the Automotive and the IT
industry in South Asia with ‘Engineering Automation’ and ‘Enterprise
Resource Planning’ software applications. During his current assignment at
EMC, he has performed different roles, namely: Systems Engineer, Systems
Engineering Manager, Project Manager and Solutions Architect.
He is a Volunteer with the Singapore Chapter of Project Management Institute (SPMI), contributing by
speaking, publishing articles and participating in social and charity events. These events are attended
by project managers from a cross-section of industries.
Lalit holds prestigious certifications including: Project Management Institute (PMI) - PMP (Project
Management Professional); The Association of Operations Management (APICS) - CIRM (Certified in
Integrated Resource Management) and CPIM (Certified in Production and Inventory Management);
and ITIL Foundation level certification. Additionally, he holds technical certifications from Microsoft®
(MCSE – Windows2000 stream) and EMC Proven Professional Expert level certification in Business
Continuity - EMCTAe. He is an EMC Symmetrix® Performance Engineering Evaluation Database
(SPEED) ‘guru’ community member since 2000. He has also qualified for the Cost Accounting
intermediate level certificate from the ICWAI (Institute of Cost & Works Accountants of India).