EMC Proven Professional Knowledge Sharing 2010
Above The Clouds - Best practices to Create a Sustainable Computing Infrastructure to Achieve Business Value and Growth Paul Brant
Paul BrantEMC [email protected]
2010 EMC Proven Professional Knowledge Sharing
2
Table of Contents Table of Contents ................................................................................................ 2
Table of Figures .................................................................................................. 8
Table of Equations .............................................................................................. 9
Abstract .............................................................................................................. 11
Introduction ....................................................................................................... 13
Sustainability Metrics and what does it mean to be Sustainable ................................. 18
The Challenges to achieve Sustainability .................................................................... 20
The concept of Green .................................................................................................. 23
IT Sustainability and how to measure it ....................................................................... 24
The Carbon Footprint ........................................................................................ 26
Environment Pillar- Green Computing, Growing Sustainability.................... 31
Standards and Regulations ......................................................................................... 32
Best Practice – In the US, Consider Executive Order 13423 and energy-efficiency
legislation regulations ............................................................................................... 32
Best Practice – Use tools and resources to understand environmental impacts ...... 33
IT facilities and Operations .......................................................................................... 34
Best Practice – Place Data Centers in locations of lower risk of natural disasters ... 35
Best Practice – Evaluate Power GRID and Network sustainability for IT Data Centers
................................................................................................................................. 37
Effectiveness Pillar ........................................................................................... 39
Services and Partnerships ........................................................................................... 39
Tools and Best Practices ............................................................................................. 40
Efficiency Pillar ................................................................................................. 41
Information Management ............................................................................................. 42
Best Practice – Implement integrated Virtualized management into environment. .. 42
Best Practice - Having a robust Information Model .................................................. 44
Best Practices in Root Cause Analysis .................................................................... 46
2010 EMC Proven Professional Knowledge Sharing
3
Best practice - Effective root-cause analysis technique must be capable of identifying all
those problems automatically. .................................................................................. 47
Best Practice – Rules-based correlation using CCT ................................................ 48
Best Practice - Reduction of Downstream Suppression ........................................... 49
Self Organizing Systems ............................................................................................. 50
Best Practice - Dynamic Control in a Self Organized System .................................. 53
Best Practice – Utilize STR’s when implementing adaptive controllers .................... 54
Best Practice – Require System Centric Properties ................................................. 55
Application ................................................................................................................ 57
Best Practice – Architect a designed for Run solution ...................................................... 58
Storage ..................................................................................................................... 59
Compression, Archiving and Data Deduplication .............................................................. 60
Autonomic self healing systems ........................................................................................ 66
Storage Media – Flash Disks ............................................................................................ 69
Best Practice – Utilize low power flash technologies ........................................................ 70
Server Virtualization ................................................................................................. 70
Best Practice – Implement DRS ........................................................................................ 71
Network........................................................................................................................ 72
Best Practice - Architect Your Network to Be the Orchestration Engine for Automated
Service Delivery (The 5 S’s) ..................................................................................... 72
Scalable: ........................................................................................................................... 73
Simplified: .......................................................................................................................... 73
Standardized: .................................................................................................................... 73
Shared: .............................................................................................................................. 73
Secure: .............................................................................................................................. 73
Best Practice - Select the Right Cloud Network Platform ........................................ 74
Cloud network infrastructure ............................................................................................. 74
Cloud network operating system (OS) .............................................................................. 74
Cloud network management systems ............................................................................... 74
2010 EMC Proven Professional Knowledge Sharing
4
Cloud network security ...................................................................................................... 74
Best Practice – Consider implementing layer 2 Locator/ID Separation ....................... 75
Best Practice – Build a Case to Maximize Cloud Investments .................................... 76
Best Practice - Service Providers - Maximize and sustain Cloud Investments ............ 76
Best Practice - Enterprises - maximize and sustain Cloud Investments ...................... 77
Best Practice – Understand Information Logistics and Energy transposition tradeoffs 77
Infrastructure Architectures .......................................................................................... 80
Data Center Tier Classifications ............................................................................... 81
Cloud Overview ........................................................................................................ 84
Cloud Layers of Abstraction .............................................................................................. 87
Cloud Type Architecture(s) Computing Concerns .................................................... 88
Failure of Monocultures: .................................................................................................... 89
Convenience vs. Control ................................................................................................... 89
General distrust of external service providers ................................................................... 89
Concern to virtualize the majority of servers and desktop workloads ............................... 90
Fully virtualized environments are hard to manage .......................................................... 90
Many environments can't be virtualized onto x86 and hypervisors ................................... 90
Concerns on security ........................................................................................................ 91
Industry Standards ............................................................................................................ 91
Applications support for virtualized environments, or only the one the vendor sells ......... 91
Environmental Impact Concerns ....................................................................................... 92
Threshold Policy Concerns ................................................................................................ 93
Interoperability issues Concerns ....................................................................................... 93
Hidden Cost Concerns ...................................................................................................... 93
Unexpected behavior concerns ......................................................................................... 95
Private Cloud ............................................................................................................ 96
Best Practice – Implement a dynamic computing infrastructure ....................................... 98
Best Practice – Implement an IT Service-Centric Approach ............................................. 99
2010 EMC Proven Professional Knowledge Sharing
5
Best Practice – Implement a self-service based usage Model .......................................... 99
Best Practice – Implement a minimally or self-managed platform .................................. 100
Best Practice – Implement a consumption-based billing methodology ........................... 100
Public Cloud ........................................................................................................... 101
Community Cloud ................................................................................................... 101
Best Practice in Community Cloud – Use VM’s .............................................................. 106
Best Practice in Community Cloud – Use Peer to Peer Networking ............................... 106
Best Practice in Community Cloud – Distributed Transactions ....................................... 106
Best Practice in Community Cloud – Distributed Persistence Storage ........................... 107
Challenges in the federation of Public and Private Clouds ..................................... 107
Lack of visibility ............................................................................................................... 108
Multi-tenancy Issues ....................................................................................................... 108
Cloud computing needs to cover its assets ..................................................................... 109
Warehouse Scale Machines - Purposely Built Solution Options ............................ 109
Best Practice – WSC’s must achieve high availability .................................................... 111
Best Practice - WSC’s must achieve cost efficiency ....................................................... 111
WSC (Warehouse Scale Computer) Attributes ............................................................... 112
One Data Center vs. Several Data Centers .................................................................... 112
Best Practice – Use Warehouse Scale Computer Architecture designs in certain
scenarios ......................................................................................................................... 113
Architectural Overview of WSC’s .................................................................................... 113
Best Practice – Connect Storage Directly or via NAS in WSC environments ................. 114
Best Practice – WSC should consider using non-standard Replication Models ............. 114
Networking Fabric ........................................................................................................... 114
Best Practice – For WSC’s Create a Two level Hierarchy of networked switches .......... 115
Handling Failures ............................................................................................................ 115
Best Practice - Use Sharding and other requirements in WSC’s .................................... 116
2010 EMC Proven Professional Knowledge Sharing
6
Best Practice – Implement application specific compression .......................................... 118
Utility Computing .................................................................................................... 119
Grid computing ....................................................................................................... 123
Cloud Type Architecture Summary ......................................................................... 123
Infrastructure as a Service and more .............................................................................. 124
Amazon Web services .................................................................................................... 124
Cloud computing ............................................................................................................. 125
Grid Computing ............................................................................................................... 125
Similarities and differences ............................................................................................. 126
Business Practices Pillar ................................................................................ 127
Process Management and Improvement................................................................ 128
Best Practice - Provide incentives that support your primary goals: ............................... 128
Best Practice - Focus on effective resource utilization .................................................... 129
Best Practice - Use virtualization to improve server utilization and increase operational
efficiency ......................................................................................................................... 129
Best Practice - Drive quality up through compliance: ...................................................... 130
Best Practice - Embrace change management ............................................................... 131
Best Practice - Invest in understanding your application workload and behavior: .......... 133
Best Practice - Right-size your server platforms to meet your application requirements 133
Best Practice - Evaluate and test servers for performance, power, and total cost of
ownership ........................................................................................................................ 134
Best Practice - Converge on as small a number of stock-keeping units (SKUs) as you can134
Best Practice - Take advantage of competitive bids from multiple manufacturers to foster
innovation and reduce costs. .......................................................................................... 135
Standards ............................................................................................................... 135
Best Practice - Use standard interfaces to Cloud Architectures ..................................... 137
Security .................................................................................................................. 138
Best Practice – Determine if cloud vendors can deliver on their security claims ............ 139
2010 EMC Proven Professional Knowledge Sharing
7
Best Practice - Adopt federated identity policies backed by strong authentication practices139
Best Practice – Preserve segregation of administrator duties ........................................ 140
Best Practice - Set clear security policies ....................................................................... 141
Best Practice - Employ data encryption and tokenization ............................................... 141
Best Practice - Manage policies for provisioning virtual machines.................................. 142
Best Practice – Require transparency into cloud operations to ensure multi-tenancy and
data isolation ................................................................................................................... 142
Governance ............................................................................................................ 143
Best Practices – Do your due diligence of your SLA’s .................................................... 143
Compliance ............................................................................................................ 146
Best Practice - Know Your Legal Obligations ................................................................. 146
Best Practice - Classify / Label your Data & Systems ..................................................... 146
Best Practice - External Risk Assessment ...................................................................... 146
Best Practice - Do Your Diligence / External Reports ..................................................... 147
Best Practice - Understand Where the Data Will Be! ...................................................... 147
Best Practice - Track your applications to achieve compliance. ..................................... 148
Best Practice - With off-site hosting, keep your assets separate. ................................... 148
Best Practice - Protect yourself against power disruptions ............................................. 149
Best Practice - Ensure vendor cooperation in legal matters ........................................... 149
Profitability .............................................................................................................. 150
Business and Profit objectives to achieve Sustainability ................................................. 151
Best Practice - Consumer Awareness and Transparency .............................................. 151
Best Practice – Implement Efficiency Improvement ........................................................ 152
Best Practice - Product Innovation .................................................................................. 152
Best Practice - Carbon Mitigation .................................................................................... 152
Information Technology Sector Initiatives ....................................................................... 152
Best Practice – Virtualization .......................................................................................... 153
2010 EMC Proven Professional Knowledge Sharing
8
Best Practice - Recycling e-Waste .................................................................................. 153
Cloud Profitability and Economics ................................................................................... 153
Cloud Computing Economics .......................................................................................... 158
Best Practice – Consider Elasticity as part of the business deciding metrics ................. 158
Economics Pillar ........................................................................................................ 162
Best Practice – Consider Efficiency as only one part of the Economic Sustainable equation
............................................................................................................................... 164
Conclusion ....................................................................................................... 164
Appendix A – Green IT, SaaS, Cloud Computing Solutions ........................ 165
Appendix B – Abbreviations .......................................................................... 168
Appendix B – References ............................................................................... 173
Author’s Biography ......................................................................................... 175
Index ................................................................................................................. 176
Table of Figures
Figure 1 - Sustainability and Technology Interest Trends ........................................................... 13
Figure 2 – Achieving IT Sustainability ......................................................................................... 19
Figure 3 - US Energy Flows (Quadrillon BTUs)[21] .................................................................... 20
Figure 4 – Efficiency - Megawatts to Infowatts to Business Value Solutions .............................. 22
Figure 5 – IT Data Center Sustainability Taxonomy ................................................................... 27
Figure 6 – Top Level Sustainability Ontology (see notes below) ................................................ 30
Figure 7 - U.S. Federal Emergency Management Agency – Disaster MAP[22] ......................... 35
Figure 8 - U.S. Geological Survey Seismological Zones (0=lowest, 4=Highest) ........................ 36
Figure 9 - U.S. NOAA Hurricane Activity in the United States .................................................... 37
Figure 10 - Opportunities for efficiency improvements ............................................................... 40
Figure 11 - EPA Future Energy Use Projections ........................................................................ 42
Figure 12 – Closed Loop System ................................................................................................ 51
Figure 13 – Sustainability Ontology – Self Organizing Systems ................................................. 52
Figure 14 – A Sustainable Information transition lifecycle .......................................................... 59
2010 EMC Proven Professional Knowledge Sharing
9
Figure 15 – Self Organized VM application controller ................................................................. 71
Figure 16 - Energy in Electronic Integrated Circuits ................................................................... 78
Figure 17 - Moore's Law - Switching Energy .............................................................................. 79
Figure 18 - Data by physical vs. Internet transfer ....................................................................... 80
Figure 19 – Sustainability Ontology – Infrastructure Architectures ............................................. 84
Figure 20 - Cloud Topology ........................................................................................................ 86
Figure 21 - Cloud Computing Topology ...................................................................................... 87
Figure 22 - Using a Private Cloud to Federate disparate architectures ...................................... 98
Figure 23 - Community Cloud ................................................................................................... 102
Figure 24 - Community Cloud Architecture ............................................................................... 105
Figure 25 – Sustainability Ontology – Business Practices ........................................................ 128
Figure 26 - A continuous process helps maintain the effectiveness of controls as your
environment changes ................................................................................................................ 131
Figure 27 - Consistent and well-documented processes help ensure smooth changes in the
production environment ............................................................................................................ 132
Figure 28 – Provisioning for peak load ..................................................................................... 160
Figure 29 – Under Provisioning Option 1 .................................................................................. 160
Figure 30 – Under Provisioning Option 2 .................................................................................. 161
Table of Equations
Equation 1 – Computing Energy Efficiency ................................................................................. 25
Equation 2 – Computing Energy Efficiency-Detailed .................................................................. 25
Equation 3 – Computing Energy Efficiency-Detailed as a function on PUE ............................... 25
Equation 4 – IT Long Term Sustainability Goal .......................................................................... 28
Equation 5 – What is Efficient IT ................................................................................................. 39
Equation 6 – Linear model of a Control System ......................................................................... 54
Equation 7 – Energy Consumed by a CMOS ASIC .................................................................... 78
Equation 8 – Power Consumed by a CMOS ASIC ..................................................................... 78
Equation 9 – Cloud Computing - Cost Advantage .................................................................... 156
Equation 10 – Cloud Computing - Cost tradeoff for demand that varies over time ................... 156
2010 EMC Proven Professional Knowledge Sharing
10
Disclaimer: The views, processes or methodologies published in this article are those of the authors. They do not necessarily reflect EMC Corporation’s views, processes or methodologies.
2010 EMC Proven Professional Knowledge Sharing
11
Abstract
The IT industry is embarking on a new paradigm of service delivery. From the start, each
information technology wave, mainframe, minicomputer, PC/microprocessor to networked
distributed computing offered new challenges and benefits. We are embarking on a new wave,
offering new methods and technologies to achieve sustainable growth and increased business
value to companies ranging small businesses to major multinational corporations
There are various new technologies and approaches that businesses can now use to allow a
more efficient and sustainable growth path. For example, Cloud computing, Cloud services,
Private Clouds and Warehouse-Scale data center design methodologies are just a few of the
approaches to sustainable business growth. Other “services on demand” offerings are starting
to make their mark on the corporate IT landscape offering as well.
Every business has its own requirements and challenges when creating a sustainable IT
business model that addresses the need for continued growth and scalability. Can this new
wave, fostered by burgeoning new technologies such as Cloud computing and the never ending
accelerating information growth curve, turn the IT industry into a flat and level playing field?
What metrics should we implement to allow a systematic determination about technology
selection? What are the standards and best practices in the evaluation of each technology? Will
one technology fit all business, environmental and sustainable possibilities?
For example, to sustain business growth, IT consumers require specific standards so that data
and applications are not held captive by non-interoperable Cloud services providers. Otherwise,
we end up with walled gardens as we had with CompuServe, AOL, and Prodigy in the period
before the Internet and worldwide web emerged. Data and application portability standards have
to be firmly in place, with solid Cloud service provider backing.
Data centers are changing at a rapid exponential pace, faster than at any other point in history.
However, with all the changes, data center facilities and the associated information
management technologies, IT professionals face numerous challenges in unifying their peers to
solve problems for their companies. Sometimes you may feel as if you are talking different
languages or living on different planets. What do virtual computers and three-phase power have
in common? Has your IT staff or department ever come to you asking for more power without
2010 EMC Proven Professional Knowledge Sharing
12
considering that additional power/cooling is required? Do you have thermal hot spots in places
you never expected or contemplated? Has virtualization changed your network architecture or
your security protocols? What exactly does Cloud computing mean to your data center?
Is Cloud computing or SAAS (Storage or Software as a service) being performed in your data
center already? More importantly, how do you align the different data center disciplines to
understand how new technologies will work together to solve data center sustainability
problems? One possible Best Practice is a standardized data center stack framework that would
address the above issues allowing Best Practices to achieve a sustained business value growth
trajectory.
How do we tier Data Center efficiency and map it back to business value and growth?
In 2008, American data centers consumed more power than American televisions. Collectively,
data centers consumed more power than all the TVs in every home and every sports bar in
America. This really puts a new perspective on it. All of these questions will be addressed and
possible solutions provided.
In summary, this article will go above the Cloud, offering Best Practices that will align with the
most important goal, creating a sustainable computing infrastructure to achieve business value
and growth.
2010 EMC Proven Professional Knowledge Sharing
13
Introduction
Deploying IT, SaaS and Cloud Computing solutions to create a sustaining profitability model for
businesses centers on identifying processes and technologies that create value propositions for
all involved. This can be achieved by producing eco-centric, business analytics, metrics, key
performance indicators and sustainability measures with a goal to support developing green and
sustainable business models(See the section titled “Sustainability Metrics and what does it
mean to be Sustainable” on page 18 for the definition of sustainability) .
This can be a daunting task. The good news is there is growing interest in sustainability as well
as with various technologies. Some are on a major upward trend such as Cloud computing as
shown in Figure 1 - Sustainability and Technology Interest Trends, below. This trending
information shows the number of hits on Google’s search engine normalized to the topic of
Sustainability outlined in Blue. It appears that Sustainability and Cloud computing are certainly
on the upward trend. We will find out why.
Figure 1 - Sustainability and Technology Interest Trends1
1 Google Trends
2010 EMC Proven Professional Knowledge Sharing
14
According to Gartner[1], the top 10 strategic technologies for 2010 include:
Cloud Computing: Cloud computing is a style of computing that characterizes a model in which
providers deliver a variety of IT-enabled capabilities to consumers. We can exploit Cloud-based
services in a variety of ways to develop an application or a solution. Using Cloud resources
does not eliminate the costs of IT solutions, but does re-arrange some and reduce others. In
addition, enterprises consuming cloud services will increasingly act as cloud providers and
deliver application, information or business process services to customers and business
partners. Some have joked that Cloud computing is analogous to preferring to pay for the power we use, rather than buying a power plant! In addition, Gartner predicts that by 2012, 20 percent of businesses will own no IT assets.
Several interrelated trends are driving the movement toward decreased IT hardware assets,
such as virtualization, cloud-enabled services, and employees running personal desktops and
notebook systems on corporate networks. The need for computing hardware, either in a data
center or at the desktop, will not go away. However, if the ownership of hardware transitions to
third parties, there will be major shifts throughout the IT hardware industry. For example,
enterprise IT budgets either will shrink or be reallocated to more-strategic projects. Enterprise
IT staff will be either reduced or re-skilled to meet new requirements, and/or hardware
distribution will have to change radically to meet the requirements of the new IT hardware
sustainability model.
Advanced Analytics. Optimization and simulation use analytical tools and models to maximize
business process and decision effectiveness by examining alternative outcomes and scenarios,
before, during and after process implementation and execution. This can be viewed as a third
step in supporting operational business decisions. Fixed rules and prepared policies gave way
to more informed decisions powered by the right information delivered at the right time, whether
through customer relationship management (CRM) enterprise resource planning (ERP) or other
applications. The new step provides simulation, prediction, optimization and other analytics, not
simply information, to empower even more decision flexibility at the time and place of every
business process action. The new step looks into the future, predicting what can or will happen.
Client Computing. Virtualization is bringing new ways of packaging client computing
applications and capabilities. As a result, the choice of a particular PC hardware platform, and
2010 EMC Proven Professional Knowledge Sharing
15
eventually the OS platform, becomes less critical. Enterprises should proactively build a five to
eight year strategic client computing roadmap that outlines an approach to device standards,
ownership and support, operating system and application selection, deployment and update,
and management and security plans to manage diversity.
IT for Green: IT can enable many green initiatives. The use of IT, particularly among the white-
collar staff, can greatly enhance an enterprise’s green credentials. Common green initiatives
include the use of e-documents, reducing travel via teleconferencing and remote worker support
and tele-working. IT can also provide the analytic tools that others in the enterprise may use to
reduce energy consumption in the transportation of goods or other carbon management
activities.
According to Gartner, by 2014, most IT business cases will include carbon remediation costs.
Today, server vitalization and desktop power management demonstrate substantial savings in
energy costs, and those savings can help justify projects. Including carbon costs into business
cases provides an additional measure of savings, and prepares the organization for increased
scrutiny of its carbon impact.
Economic and political pressure to demonstrate responsibility for carbon dioxide emissions will
force more businesses to quantify carbon costs in business cases. Vendors will have to provide
carbon life cycle statistics for their products or face market share erosion. Incorporating carbon
costs in business cases will only slightly accelerate replacement cycles. A reasonable estimate
for the cost of carbon in typical IT operations is an incremental one or two percentage points of
overall cost. Therefore, carbon accounting will more likely shift market share than market size.
In 2012, 60 percent of a new PC's total life greenhouse gas emissions will have occurred before
the user first turns the machine on. Progress toward reducing the power needed to build a PC
has been slow. Over the course of its entire lifetime, a typical PC consumes 10 times its own
weight in fossil fuels, but around 80 percent of a PCs total energy usage still happens during
production and transportation.
Greater awareness among buyers and those that influence buying, greater pressure from eco-
labels, and increasing cost pressures and social pressure raised the IT industry’s awareness to
the problem of greenhouse gas emissions. Requests for proposal (RFPs) now frequently look
2010 EMC Proven Professional Knowledge Sharing
16
for both product and vendor environment-related criteria. Environmental awareness and
legislative requirements will increase recognition of production as well as usage-related carbon
dioxide emissions. Technology providers should expect to provide carbon dioxide emission data
to a growing number of customers.
Reshaping the Data Center: In the past, design principles for data centers were simple: Figure
out what you have, estimate growth for 15 to 20 years, then build to suit. Newly built data
centers often opened with huge areas of white floor space, fully powered and backed by an
uninterruptible power supply (UPS), water-and air-cooled and mostly empty. However, costs are
actually lower if enterprises adopt a pod-based approach to data center construction and
expansion. If you expect to need 9,000 square feet during the life of a data center, then design
the site to support it, but only build what is needed for five to seven years. Cutting operating
expenses, a large portion of overall IT spending for most clients, frees up money to reallocate to
other projects or investments either in IT or in the business itself.
Social Computing: Workers do not want two distinct environments to support their work – one
for their own work products (whether personal or group) and another for accessing “external”
information. Enterprises must focus on use of social software and social media in the enterprise,
and participation and integration with externally facing enterprise-sponsored and public
communities. Do not ignore the role of the social profile to bring communities together.
Security – Activity Monitoring: Traditionally, security has focused on putting up a perimeter
fence to keep others out, but it has evolved to monitoring activities and identifying patterns that
would have been missed previously. Information security professionals face the challenge of
detecting malicious activity in a constant stream of discreet events that are usually associated
with an authorized user and are generated from multiple network, system and application
sources. At the same time, security departments are facing increasing demands for ever-greater
log analysis and reporting to support audit requirements. A variety of complementary (and
sometimes overlapping) monitoring and analysis tools help enterprises better detect and
investigate suspicious activity – often with real-time alerting or transaction intervention. By
understanding the strengths and weaknesses of these tools, enterprises can better understand
how to use them to defend the enterprise and meet audit requirements.
2010 EMC Proven Professional Knowledge Sharing
17
Flash Memory: Flash memory is not new, but it is moving up to a new tier in the storage
echelon. Flash memory is a semiconductor memory device, familiar from its use in USB
memory sticks and digital camera cards. It is much faster than rotating disk, but considerably
more expensive although the differential is shrinking. As the price declines, the technology will
enjoy more than a 100 percent compound annual growth rate during the next few years and
become strategic in many IT areas including consumer devices, entertainment equipment and
other embedded IT systems. In addition, it offers a new layer of the storage hierarchy in servers
and client computers that has key advantages including space, heat, performance and
ruggedness.
Virtualization for Availability: Virtualization has been on the list of top strategic technologies in
previous years. It is on the list this year because Gartner emphasizes new elements such as live
migration for availability that have longer-term implications. Live migration is the movement of a
running virtual machine (VM), while its operating system and other software continue to execute
as if they remained on the original physical server. This takes place by replicating the state of
physical memory between the source and destination VMs, then, at some instant in time, one
instruction finishes execution on the source machine and the next instruction begins on the
destination machine.
However, if replication of memory continues indefinitely, but execution of instructions remains
on the source VM, and then the source VM fails the next instruction would now take place on
the destination machine. If the destination VM were to fail, just pick a new destination to start
the indefinite migration, therefore making very high availability possible.
The key value proposition is to displace a variety of separate mechanisms with a single “dial”
that can be set to any level of availability from baseline to fault tolerance, all using a common
mechanism and permitting the settings to be changed rapidly as needed. We could dispense
with expensive high-reliability hardware, with fail-over cluster software and perhaps even fault-
tolerant hardware, but still meet availability needs. This is key to cutting costs, lowering
complexity, and increasing agility as needs shift.
Mobile Applications: By year-end 2010, 1.2 billion people will carry handsets capable of rich,
mobile commerce providing an environment for the convergence of mobility and the web. There
are already many thousands of applications for platforms such as the Apple iPhone, in spite of
2010 EMC Proven Professional Knowledge Sharing
18
the limited market and need for unique coding. It may take a newer version designed to flexibly
operate on both full PC and miniature systems, but if the operating system interface and
processor architecture were identical, that enabling factor would create a huge turn upwards in
mobile application availability.
Sustainability Metrics and what does it mean to be Sustainable
What are these metrics and how does one wade through technology, environmental, business
and operational requirements looking for best practices in achieving IT Sustainability? All this
will be touched on as we go through the details.
First, what does “Sustainability” mean? Generally, some define it as:
“Meeting the needs of the present without compromising the ability of future generations to meet
their own needs 2."
Or
“Then I say the earth belongs to each generation during its course, fully and in its own right. The
second generation receives it clear of the debts and encumbrances, the third of the second, and
so on. For if the first could charge it with a debt, then the earth would belong to the dead and not
to the living generation. Then, no generation can contract debts greater than may be paid during
the course of its own existence.”3
This article will take an Information Technology (IT) centric approach to this idea of
sustainability. I believe that for the IT industry, including the decision makers, implementers,
vendors, and technologists in general, a definition of IT sustainability would be:
“A pro-active approach to ensure the long-term viability and integrity of the business by
optimizing IT resource needs, reducing environmental, energy and/or social impacts, and
managing resources while not compromising profitability to the business.”
One corollary to this definition would be that not only would developing a sustainable IT model
not compromise profitability, but, by conforming to best practices, would actually increase it!
2 Washington State Department of Ecology 3 Thomas Jefferson, September 6, 1789
2010 EMC Proven Professional Knowledge Sharing
19
The four pillars or focal points to achieve IT sustainability are “The Environment,” “Efficiency,”
“Effectiveness,” and “Business Practices” as shown in Figure 2 – Achieving IT Sustainability, on
page 19. Achieving sustainability requires a focused effort on many fronts. All of these pillars will
be discussed in great detail in the following sections.
The environment is not only an important social issue but is a responsibility for all of us to
manage. From the business perspective, we can address the environmental aspects by working
with the Manufacturing and Supply chain as well as following environmental Standards and
Regulations. As an EMC Employee, EMC’s internal IT, Facilities and operations departments
have made great strides in this particular area. It is important to not minimize what each of us
can do as an individual or as an inclusive industry as a whole.
Considering business practices and requirements is important as well. Understanding each
business’ operational, market and growth models as it relates to Sustainability is a given.
To be effective, it is important to have the appropriate tools and best practices to achieve
sustainable growth. Lastly, and one can argue most importantly as it relates to the IT industry,
that efficiency is paramount to what we can do and how the IT industry as a whole can make a
difference.
Figure 2 – Achieving IT Sustainability
2010 EMC Proven Professional Knowledge Sharing
20
The Challenges to achieve Sustainability
It is interesting to note that every IT process has some interaction with energy, given the fact
that all IT technologies are electrical or mechanical in nature. It is always difficult to determine
what metrics we should use to determine how to achieve sustainable growth. For example, one
aspect is the concept of energy efficiency as it relates to business value.
As shown in Figure 4 – Efficiency - Megawatts to Infowatts to Business Value Solutions on page
22, this diagram illustrates an example of the general inefficiencies with power delivery and IT
technology today. The question is what is the efficiency of information management per watt?
The efficiency challenge is multi-dimensional. Following the energy flow from the power plant to
the application, you can follow the energy loss through the full power delivery path. As shown, at
the source, the power plant, upwards of 70% of the energy entering the plant is reduced through
generation and power delivery to the Data Center. Given that most of the power consumed in
the United States comes from fossil fuels, as shown in Figure 3 - US Energy Flows (Quadrillon
BTUs), below, there is a major opportunity to reduce emissions by becoming more efficient.
Figure 3 - US Energy Flows (Quadrillon BTUs)[21]
Of the power entering the data center, 50% is lost. Adding to the loss in the data center includes
the fans and power supply conversions. In terms of the data center facility, there are solutions
such as “Fifth Light”, “CHP (Combined Heat and Power)”, “Flywheel”, “Liquid Cooling” and other
technologies that can make a difference.
2010 EMC Proven Professional Knowledge Sharing
21
Within the data center facility, given typical under-utilization of Server, Storage and Network
Bandwidth, as well as the challenges of inefficient and zero-value applications, the Megawatt to
Info watt efficiency can be less than 1%. So, for example, for every 100Watts of power, 0.3
watts of power are actually used. I believe the IT industry can do better.
The good news is that there are solutions. However, there is no silver bullet. There is no one
technology, business or process that will address at any great length the efficiency and
sustainability goal.
Figure 4shows some examples of what can be done at all stages in the Megawatt to Info watt
efficiency cycle. These range from virtualization, Consolidation, Network and Data optimization
as well as other environmental solutions, many of which are point technologies. However, with
these point technologies rolled out in tight orchestration, I am sure IT stakeholders can and will
make a difference.
For example, concerning virtualization, this technology alone cannot relieve the burden of rising
site infrastructure expenses and it can be argued that this technology alone cannot achieve
sustainability. See Equation 4 – IT Long Term Sustainability Goal on page 28. Virtualizing four
or ten servers onto a single physical host(s) will indeed cut power consumption and free up data
center capacity. However, for data centers nearing their limits, virtualization can play a key role
in delaying the time at which an expansion or new facility must be built, but this is not the total
solution.
2010 EMC Proven Professional Knowledge Sharing
22
Figure 4 – Efficiency - Megawatts to Infowatts to Business Value Solutions4
The problem is that the cost of electricity and site infrastructure TCO (Total Cost of Ownership)
is greatly out pacing the cost of the server itself. This can happen regardless of whether a single
box is running one application or, virtualized to handle multiple tasks. When electricity and
infrastructure costs greatly exceed server cost, any IT deployment decisions based on server
cost alone will result in a wildly inaccurate perception of the true total cost. Even when
virtualization frees up wasted site capacity for additional servers without spending new money
on site infrastructure, the opportunity cost (i.e. ensuring that scarce resources are used
efficiently) of deploying the capacity is the same. Data center managers can be in a position of
building expensive new capacity sooner than they need to[7].
4 EDS
2010 EMC Proven Professional Knowledge Sharing
23
Furthermore, virtualization is a one-time benefit. After consolidating servers so that they are all
running at full capacity, and planning future deployments so that newly purchased servers will
also be fully utilized, data center operators are still faced with the reality that each year’s
generation of servers will most likely draw more power than the previous hardware release.
After virtualization has taken some of the slack out of underutilized IT hardware, the trend in
power growth will resume.
Conversely, it is also possible that virtualization may allow each new server to be so productive
that it’s worthwhile to divert a greater fraction of the IT budget to pay the increased site
infrastructure and electricity cost, but a business can’t make that decision without considering
the true total cost.
The concept of Green
The concept of “Green” also comes to mind. Sustainability and being green go hand in hand.
Going green means change, but not all green solutions are efficient or sustainable. For
example, one might say, plant trees to become green. Well, it would take 6.6 billion trees to
offset the CO2 generated by all of the data centers in the world.5 Planting trees is green, but not
very efficient.
Green business models must reduce carbon dioxide. Developing green business models begins
by determining how a company's products, services or solutions can be produced in contexts
that reduce carbon dioxide (CO2) emissions. Market standards measure reductions by 1 million
metric tons. We will discuss the Greenhouse Gas Equivalencies Calculator in the “Standards
and Regulations” section, starting on page 32, translates difficult to understand statements into
more commonplace terms, such as "is equivalent to avoiding the carbon dioxide emissions of X
number of cars annually." This calculator also offers an excellent example of the analytics,
metrics and intelligence measures that IT, SAAS and Cloud Computing solutions must deliver
across the input and output chains in business models.
5 Robert McFarlane, Principal Data Center and Financial Trading Floor Consultant
2010 EMC Proven Professional Knowledge Sharing
24
IT Sustainability and how to measure it
Even though this relatively newly named paradigm, “The Cloud” has a great deal of potential to
contribute to a sustainable IT structure, it is just a part of the whole sustainability picture. In the
following sections, Cloud computing and its variants will be discussed in detail.
As mentioned, emerging green IT, SAAS and Cloud Computing solutions offer great potential to
extend, leverage and strengthen a company's business model by applying measures that can
be reported and managed. Innovative IT solutions are emerging to address sustainability
analytics and carbon reduction metrics. Third parties that work with companies as they develop
green goods and services business models are well positioned to guide IT, SaaS and Cloud
Computing companies toward developing solutions that produce green or eco-centric business
metrics.
A Green IT, SaaS and Cloud Computing list is shown in Appendix A – Green IT, SaaS, Cloud
Computing Solutions, starting on page 165, is being developed as solution providers report.
This list shows the capability of their IT solutions to produce CO2 and sustainability measures,
business analytics, market intelligence, metrics and key performance indicators that can be
applied by companies to develop green goods and services.
This list of metrics will become ever more important because executing profitable green
business plans will depend upon generating more and deeper levels of complex green house
gas and sustainability measurements and metrics.
How does one measure computing Efficiency? After all, if you cannot measure it, you cannot
improve it6.
For example, for a Server, the efficiency at its basic terminology is shown in Equation 1 –
Computing Energy Efficiency, shown below. The efficiency is the effective work done by the
energy used. This is also equal to the efficiency of the actual work done or the rate at which the
work is done by the power used.
6Lord Kelvin
2010 EMC Proven Professional Knowledge Sharing
25
Equation 1 – Computing Energy Efficiency
PowerpeedComputingS
EnergyUsedWorkDoneEfficiency ==
Breaking it down further, as shown in Equation 2 – Computing Energy Efficiency-Detailed,
below, the efficiency is also a function of the underlying hardware, its properties and the Data
Center as a whole.
Equation 2 – Computing Energy Efficiency-Detailed
ldingringTheBuiEnergyEntemputersovidedToCoEnergyX
mputersovidedToCoEnergyInChipsEnergyUsedX
InChipsEnergyUsedWorkDoneEfficency Pr
Pr=
This equation shows the dependency of all of the underlying hardware at all levels of the data
center stack, from the Server, Network, and all parts of the infrastructure.
The efficiency is also summed by including the power user efficiency metric that is often used as
shown in Equation 3 – Computing Energy Efficiency-Detailed as a function on PUE, on page 25,
below. The equation shows the dependencies of the business, individual components, and the
Data Center as a whole, to establish what efficiency means. For a more detailed discussion on
PUE, please refer to the EMC Proven Professional Knowledge Sharing article titled “Crossing
the Great Divide in Going Green: Challenges and Best Practices in Next Generation IT
Equipment, EMC Knowledge Sharing, 2008”.
Equation 3 – Computing Energy Efficiency-Detailed as a function on PUE
)/1( PUEEfficiencyDataCentreficiencyComputerEffficiencyComputingEEfficiency ∗∗=
When you think about a data center, what do you picture? Almost any aspect could be
imagined: mechanical & electrical systems, network infrastructure, storage, compute
environments, virtualization, applications, security, cloud, grid, fabric, unified computing, open
source, etc. Then consider how these items incorporate into areas of efficiency, sustainability, or
even a total carbon footprint [11].
2010 EMC Proven Professional Knowledge Sharing
26
The Carbon Footprint The view of a data center quickly becomes significantly more complex, leading to challenges
like answering the question of how efficient a data center is to company executives. Where does
someone start to measure for these types of complexities? Are the right technologies in place to
do so? Which metrics should you use for a particular industry and data center design? Data
Center professionals all over the world are asking the same questions and feeling the same
pressures [13].
Data centers are changing at a rapid pace; more than any other point in history. Yet with all the
change, data center facilities, and IT professionals face numerous challenges in unifying their
peers to solve problems for their companies. Has virtualization changed your network
architecture? What about your security protocols? What exactly does Cloud computing mean to
my data center? Is cloud computing being performed in your data center already? More
importantly, how do I align the different data center disciplines to understand how new
technologies work together to solve data center problems?
With ever increasing densities, sleep deprived data center IT professionals still have to keep the
data center operating, while facing additional challenges relating to power efficiencies and
interdepartmental communication.
To compound the problem, ‘Green’ has become the new buzzword in almost every facet of our
lives. Data centers are no exception to green marketing and are sometimes considered easy
targets due to large, concentrated power and water consumption. New green solutions
sometimes are not so green due to limited understanding of data center complexities. They may
disrupt cost saving and efficient technologies already in use.
Corporations are trying to calculate their carbon footprint, put goals in place to reduce it, and
may face pressure to apply a new solution without understanding the entire data center picture
and what options are available. Various government bodies around the world have seen the
increase in data center power consumption and realize it is only trending up. It is only a matter
of time before regulations are put into place that will cause data center operators to comply with
new rules, possibly beyond what a data center was originally designed for. Nevertheless, we all
know that the most visible pressure is that costs are rising, potentially reducing profitability.
2010 EMC Proven Professional Knowledge Sharing
27
Figure 5 – IT Data Center Sustainability Taxonomy
The recent economic uncertainty has everyone looking for ways to cut and optimize data
centers even further. Data centers have reached the CFO's radar and are under never ending
scrutiny to cut capital investments and operating expenses. So what are data center owners and
operators supposed to do? Invent their own standards? Metrics? Framework? Which industry
standards and metrics apply to your data center and will they help you show results to your
CFO? There has to be a better way.
With the advent of ‘Cloud computing’ and it multi-faceted variants, understand the data center
interdependencies from top to bottom is a new priority. By doing so, users can analyze potential
outsourcing, as an example, to a cloud technology solution. Outlining Figure 5 – IT Data Center
Sustainability Taxonomy, shown above, is one approach to define the metrics and moving parts
to achieve a framework to understand the challenges and mythologies required to achieve an
efficient approach to IT data center architectures.
2010 EMC Proven Professional Knowledge Sharing
28
At the bottom of the stack are the sustainability metrics. Understanding all the metrics that can
be used such as useful work as outlined in Figure 4 to lifecycle manage with a desire to “design
for run” to more eco-centric metrics such as Fuel and site selection is imperative. Design for run
is the concept that you need to consider the full life cycle of IT technology from the energy used
to create the product, the power consumer during the operational period and eventual
disposition of the IT asset.
A carbon score results from these metrics. A carbon score can be localized to more specific
data center relationships as outlined in the data center stack such as the network, server,
storage, etc. As a reference, we can also map the data center in a cloud mapping for more of a
variable metric score.
The output if the data center stack could be a consistent and design terminology definition,
allowing you to map into a more eco centric approach to define a more targeted focal point for
optimization such as the environment, real estate or physical data center as outlined. Out of that
could be an approach to define a score and certification that businesses and governments can
utilize to track and measure sustainable levels.
Therefore, what is the sustainability bottom line? What is the long term metric for Significantly
Improving Sustainability by restoring the economic productivity of IT?
Equation 4 – IT Long Term Sustainability Goal
seanceIncreanalPerformComputatioIncreaseEfficiency Δ≥Δ
The long-term solution is defined in Equation 4 – IT Long Term Sustainability Goal, shown
above. The goal is to have the rate of energy efficiency increase equal, or exceed, the rate of
computational performance increase.
According to the Uptime Institute showing efficiencies of IT equipment relative to computational
performance, server compute performance has been increasing by a factor of three every two
years, so a total factor of 27 (3 x 3 x 3 =27). However, energy efficiency is only doubling in the
2010 EMC Proven Professional Knowledge Sharing
29
same period (2 x 2 x 2 =8). 7This means computational performance increased by a factor of 27
between 2000 and 2006. Energy efficiency has gone up as well, but by only a factor of eight
during the same period.
This means that while power consumption per computational unit has dropped dramatically in a
six-year period (by 88 percent), the power consumption has still risen by a factor of more than
3.4 times.
Moore’s Law is a major contributor. The definition of Moore’s law is doubling the number of
transistors on a piece of silicon every 18 months8 resulting in a power density increase within
chips that causes temperatures inside and around those chips to rise dramatically. Virtually
everyone involved in large-scale IT computing is now aware of the resulting temperature and
cooling problems data centers are experiencing, but may not fully understand the risks as they
relate to sustainability.
In addition to a common framework as outlined in Figure 5 on page 27, an Ontology “what the
interdependencies are” as it relates to a sustainable IT framework consistent with what defines
achieving sustainability is useful. It is outlined in Figure 6 – Top Level Sustainability Ontology,
on page 30, below.
As defined in the figure, the four aspects or pillars of achieving Sustainability include “Business
Practices, Environment,” “Effectiveness,” and “Efficiency”. These four pillars of sustainability will
be the common theme of this article and all pillars we will cover each in detail.
7 Uptime Institute - The Invisible Crisis in the Data Center: The Economic Meltdown of Moore’s
Law 8 Gordon More the Intel cofounder originally predicted in 1965 doubling in 24 months. Real
world was faster
2010 EMC Proven Professional Knowledge Sharing
30
Figure 6 – Top Level Sustainability Ontology (see notes below)
Notes on figure above;
• Right point arrow above indicates a Ontology sub diagram, drilling down on that topic
• The sub diagram “Figure 19 – Sustainability Ontology – Infrastructure Architectures” can
be found on page 84
• The sub diagram “Figure 13 – Sustainability Ontology – Self Organizing Systems”, can
be found on page 52
• The sub diagram “Figure 25 – Sustainability Ontology – Business Practices”, can be
found on page 128
2010 EMC Proven Professional Knowledge Sharing
31
Environment Pillar- Green Computing, Growing Sustainability Green Computing is the efficient use of computing resources; the primary objective is to account
for the triple bottom line (People, Planet, and Profit), an expanded range of values and criteria
for measuring organizational and societal success. Given that computing systems existed
before concern over their environmental impact, it has generally been implemented
retroactively, but some consider it in the development phase. It is universal in nature, because
ever-increasingly sophisticated modern computer systems rely upon people, networks and
hardware. Therefore, the elements of a green solution may comprise items such as end user
satisfaction, management restructuring, regulatory compliance, disposal of electronic waste,
telecommuting, virtualization of server resources, energy use, thin client solutions and return on
investment.
Data centers are one of the greatest environmental concerns of the IT industry. They have
increased in number over time as business demands have increased, with facilities housing an
increasing amount of ever more powerful equipment. As data centers run into limits related to
power, cooling and space, their ever-increasing operation has created a noticeable impact on
power grids. To the extent that data center efficiency has become an important global issue,
leading to the creation of the Green Grid9, an international non-profit organization mandating an
increase in the energy efficiency of data centers. Their approach, virtualization, has improved
efficiency, but is optimizing a flawed model that does not consider the whole system, where
resource provision is disconnected from resource consumption [4].
For example, competing vendors must host significant redundancy in their data centers to
manage usage spikes and maintain the illusion of infinite resources. So, one would argue that
as an alternative, a more systemic approach is required, where resource consumption and
provision are connected, to minimize the environmental impact and allow sustainable growth.
9 Crossing the Great Divide in Going Green:
Challenges and Best Practices in Next Generation IT Equipment, EMC Knowledge Sharing,
2008
2010 EMC Proven Professional Knowledge Sharing
32
Standards and Regulations
Information technology has enabled significant improvements in the standards of living of much
of the developed world, and through its contributions to greater transport and energy efficiency,
improved design, reduced materials consumption and other shifts in current practices, may offer
a key to long-term sustainability.
However, the production, purchase, use and disposal of electronic products have also had a
significantly negative environmental impact. As with all products, these impacts occur at multiple
stages of a product’s life: extraction and refining of raw materials, manufacturing to turn raw
materials into finished product, product use, including energy consumption and emissions, and
end-of-life collection, transportation, and recycling/disposal. Since computers and other
electronic products have supply chains and customer bases that span the globe, these
environmental impacts are widely distributed across time and distance.
Best Practice – In the US, Consider Executive Order 13423 and energy-efficiency legislation regulations
Executive Order 13423 (E.O.), "Strengthening Federal Environmental, Energy, and
Transportation Management," which was signed into law January 2007 by former President
Bush, requires that all federal agencies set an energy efficiency and environmental performance
example to achieve a number of sustainability goals with target deadlines. To comply with this
E.O, IT solutions providers will have to change their current product and offering set or create
new products and offerings if they intend to supply government agencies. The suppliers'
commercial customers will benefit as well, fulfilling the EO’s ultimate goal.
While the E.O. does not establish non-compliance penalties, other agencies, a number of states
and at least one city -- New York City -- have enacted legislation to fine companies that violate
these new laws. The impact to IT and data centers is under the Clean Air Act, any source
emitting more than 250 tons of a pollutant would be forced to follow certain regulations and
potentially be exposed to significant financial penalties.
The E.O. is not clear who would be responsible for carbon dioxide generation, the corporate
power consumer or the power generation facility. One thing is certain though, the costs will be
passed on to the business. If electric utilities are charged for carbon dioxide production, they are
either going to pass those charges on to their customers or increase overall electric utility rates.
2010 EMC Proven Professional Knowledge Sharing
33
Best Practice – Use tools and resources to understand environmental impacts
A tool developed by the EPA (U.S. Environmental Protection Agency) called the EPEAT
(Electronic Product Environmental Assessment Tool) program was launched in 2006 to help
purchasers identify environmentally preferable electronic products. EPEAT developed their
environmental performance criteria through an open, consensus-based, multi-stakeholder
process, supported by U.S. EPA that included participants from the public and private
purchasing sectors, manufacturers, environmental advocates, recyclers, technology researchers
and other interested parties. Bringing these varied constituencies’ needs and perspectives to
bear on standard development enabled the resulting system not only to address significant
environmental issues, but also to fit within the existing structures and practices of the
marketplace, making it easy to use and therefore widely adopted.
To summarize EPEAT’s goals:
• Provide a credible assessment of electronic products based on agreed-upon criteria
• Evaluate products based on environmental performance throughout the life cycle
• Maintain a robust verification system to maintain the credibility of product declarations
• Help to harmonize numerous international environmental requirements
• Promote continuous improvement in the design of electronic products
• Lead to reduced impact on human and environmental health
For example, EPEAT Cumulative Benefits in the United States reflect that 101 million EPEAT
registered products have been sold in the US since the system’s debut in July 2006, and the
benefits of US EPEAT purchasing have increased over time and will continue to be realized
throughout the life of the products. The data in Table 1 - 2006 to 2008 EPEAT US Sales
Environmental Benefits, on page 34, below, shows the benefits of these sales, year to year and
cumulatively.
It is important to understand the standards and regulations, so that from the business as well as
from the purchasing perspective, we can make the appropriate decisions and continue on the
path to a sustainable future.
2010 EMC Proven Professional Knowledge Sharing
34
Table 1 - 2006 to 2008 EPEAT US Sales Environmental Benefits
IT facilities and Operations
A company needs to determine the most effective method to select a data center site that is
most efficient to attain sustainability. There are a number of key factors to consider. The site
selection process is key for most companies, not only because the selected site/provider will be
hosting mission-critical business services, but also because the chosen site will likely house
those critical systems and platforms for the foreseeable future. Since you only perform site
selection activities once or twice, it is important that all relevant factors be evaluated.
Geographical factors are often overlooked in site selection activities, or at best incompletely
examined. Many data centers produce information about hardware reliability or facility security,
but often geography, as a measure of a facility’s ability to serve and sustain its clients’ needs, is
often neglected[8].
2010 EMC Proven Professional Knowledge Sharing
35
Best Practice – Place Data Centers in locations of lower risk of natural disasters
The prevalence of natural disasters in U.S. regions is another factor by which companies can
measure data center operations as shown in Figure 7 - U.S. Federal Emergency Management
Agency – Disaster MAP, below. Enterprises that outsource or move data center operations to
other potential sites or locations can mitigate certain risks by choosing locations in areas
deemed low risk by historical and analytical data.
Figure 7 - U.S. Federal Emergency Management Agency – Disaster MAP[22]
We can predict where earthquakes may occur by using seismic zone data and fault line
analysis. Seismic zones are determined by compiling statistics about past earthquakes,
specifically magnitude and frequency. The map below titled Figure 8 - U.S. Geological Survey
Seismological Zones (0=lowest, 4=Highest), on page 36, illustrates U.S. seismic zones as
defined by the United States Geological Survey (USGS). For the purposes of illustrating seismic
activity in the United States, the USGS divides the country into zones, numbered from 0 to 4,
indicating occurrences of observed seismic activity and assumed probabilities for future activity.
2010 EMC Proven Professional Knowledge Sharing
36
Figure 8 - U.S. Geological Survey Seismological Zones (0=lowest, 4=Highest)
According to FEMA, flooding is a common event that can occur virtually anywhere in the United
States, including arid and semi-arid regions. The agency has defined flood zones according to
varying levels of risk. Based on current data, Texas maintains the distinction as the country’s
highest-risk flood zone. Note that FEMA is still collecting information and has not yet released
statistical data regarding Hurricane Katrina’s flood impact. Despite recent events, consider the
following flood facts from FEMA collected between 1960 and 1995:
• Texas had the most flood-related deaths during the past 36 years
• Total flood-related deaths in Texas were double that of California
• California ranked second to Texas in number of flood-related deaths
• Texas had more flood-related deaths than any other state 21 out of 36 years
Like flooding, tornadoes can occur in every US state of the country. However, some areas are
more prone to tornadoes than others. It also notes that most of the United States tornadoes
develop over its vast eastern plains.
2010 EMC Proven Professional Knowledge Sharing
37
In terms of hurricane activity, most occurrences are by coastal states and those states more
proximally located to the eastern and gulf coasts of the United States. Based on weather
patterns and historical data, according to Figure 9 – U.S. NOAA Hurricane, below, much of the
eastern United States, especially the Southeast and across the Gulf Coast, are significantly
susceptible to yearly hurricane activity.
Figure 9 - U.S. NOAA Hurricane Activity in the United States
Best Practice – Evaluate Power GRID and Network sustainability for IT Data Centers
Other factors should be considered when performing site-selection activities. Not only should
you consider the presence of various geographical factors, but also the availability of key
resources, such as power and network, to name just a few.
Power availability should be a major factor in any site-selection process. Recent headlines (New
Orleans) point to the potentially disastrous effects of deficient power infrastructures. When
evaluating the power infrastructure in a given area, it is important to ascertain several key
factors, including:
2010 EMC Proven Professional Knowledge Sharing
38
• Access to more than one grid – Is the provider in question connected to more than one
feed from the energy company in question?
• Power grid maturity – Does the grid(s) in question also feed a large number of residential
developments? Is there major construction occurring within the area served by that grid?
• On-site power infrastructure – Is the data center equipped to support major power
requirements, and sustain itself should the main supply of power fail?
As with power, network and carrier backbone availability, we must also consider the availability
and quality of network backbones. Key factors include:
• Fiber backbone routes and their proximity to the datacenter – Are major carrier routes
proximate to the datacenter?
• Type of fiber in proximity – For that fiber that is proximate to the data center, is it a major
fiber route, or a smaller spur off the main backbone? How much fiber is already in place,
and how much of it is ‘lit’, or ready for service? How much of it is ‘dark’?
• Carrier presence – While the presence of a fiber backbone is important, it is also
important to understand the presence of the carrier/ telecommunication provider(s) in the
area, from a business and support perspective. A carrier may have fiber in the area, but
if they have little or no presence themselves, or rely on third parties for maintenance,
service to the data center may suffer accordingly.
• Carrier type – Simply having a carrier’s backbone nearby does not necessarily indicate
that the carrier themselves are a Tier 1 provider. This is of particular interest when
internet access to/from the data center is being supplied via those carriers. In any event,
it is usually important to understand the internet carriers currently providing service into
the data center, and whether or not they are the telecommunications/fiber provider(s). It
is not enough that one or more carriers are present in a data center. At least one of the
carriers (optimally, more than one) should be a Tier 1 provider, meaning that they peer
directly with other major backbones at private and public peering exchanges. Only a
relatively small number of carriers can claim this level of efficiency, and all smaller
carriers must purchase access from the major carrier, which often introduces some level
of latency.
2010 EMC Proven Professional Knowledge Sharing
39
Effectiveness Pillar To have a sustainable model to achieve an efficient business model, we must have an effective
set of tools, best practices, partnerships and a services level component. The following
sections will discuss this in detail.
Services and Partnerships
In order to achieve sustainability, we must have actionable plans to achieve that goal.
Understanding what resources can be brought to bear to achieve new policies and procedures
can achieve that goal. As shown in Equation 5 – What is Efficient IT, outlined below, to achieve
an Efficient IT environment to achieve sustainability is to consider that leveraging external or
internal resources to create a assessment strategy, having the technical knowledge base with
experience and best practices, some of which is outlined in this paper, can achieve that goal.
Equation 5 – What is Efficient IT
As shown in Figure 10 - Opportunities for efficiency improvements, as an example below, with
the ability to utilize the efficient model in Equation 5, one can achieve a sustainable growth path
for a company’s IT infrastructure. This can be done by consolidation and virtualization,
implementing a tiered server and storage infrastructure model and utilizing common tools, key
performance indicators, resource management best practices and process automation.
2010 EMC Proven Professional Knowledge Sharing
40
Figure 10 - Opportunities for efficiency improvements
Tools and Best Practices
Executive management , SIO and IT personnel should review their IT management tools sets,
to establish whether investing in automation and better processes can reduce the percentage of
IT staff spent keeping the lights on. Such investments should also increase the utilization of IT
assets.
Dealing with server sprawl, improving network utilization and controlling the growth of enterprise
storage will all help business extend their IT budgets for hardware, maintenance, licensing,
software and staff.
With the latest generation of IT management tools and current best practices, both of which will
be covered in the following sections, this effort should be perfectly compatible to preserving and
improving business agility and a sustainable growth path [3].
2010 EMC Proven Professional Knowledge Sharing
41
Efficiency Pillar According to the EPA (U. S. Environmental Protection Agency), the energy used by the nation’s
servers and data centers is about 61 billion kilowatt-hours (kWh) in 2006 (1.5 percent of total
U.S. electricity consumption) for a total electricity cost of about $4.5 billion. This estimated level
of electricity consumption is more than the electricity consumed by the nation’s color televisions
and similar to the amount of electricity consumed by approximately 5.8 million average U.S.
households. Federal servers and data centers alone account for approximately 6 billion kWh (10
percent) of this electricity use, for a total electricity cost of about $450 million annually.
The energy use of the nation’s servers and data centers in 2006 is estimated to have doubled
since 2000. The power and cooling infrastructure that supports IT equipment in data centers
also uses significant energy, accounting for 50 percent of the total consumption of data centers
[5]. Among the different types of data centers, more than one-third (38 percent) of electricity use
is attributable to the nation’s largest (i.e., enterprise-class) and most rapidly growing data
centers.
Under current efficiency trends, national energy consumption by servers and data centers could
nearly double again in another five years (i.e., by 2011) to more than 100 billion kWh,
representing a $7.4 billion annual electricity cost. The peak load on the power grid from these
servers and data centers is currently estimated to be approximately 7 gigawatts (GW),
equivalent to the output of about 15 base load power plants. If current trends continue, this
demand would rise to 12 GW by 2011, which would require an additional 10 power plants.
These forecasts indicate that unless we improve energy efficiency beyond current trends, the
federal government’s electricity cost for servers and data centers could be nearly $740 million
annually by 2011, with a peak load of approximately 1.2 GW. As shown in Figure 11 - EPA
Future Energy Use Projections, shown below, according to the EPA, given historical projections,
annual energy will double every four years.
2010 EMC Proven Professional Knowledge Sharing
42
Figure 11 - EPA Future Energy Use Projections10
However, there is good news. By implementing a “State of the Art” or “Best Practices” Scenario
model outlined in the following sections, as shown in the diagram, it is possible to achieve
energy consumption sustainability by reversing the curve. Through best practices such as
server and storage consolidation, Power Management, Virtualization, Data Center facility
infrastructure enhancements such as improved transformers, UPS’s Chillers, Fans, pumps,
Free Air and liquid Cooling all outlined in Figure 4 – Efficiency - Megawatts to Infowatts to
Business Value Solutions, on page 22, all of these aspects will get us to where we want to go.
In order to be efficient in IT and to achieve sustainability, one can summarize it in three aspects.
We can achieve efficiency by Consolidating, Optimizing and Automating. In the following
sections, we will discuss best practices to allow businesses achieve a continued profitability
model while achieving sustainable growth.
Information Management
Best Practice – Implement integrated Virtualized management into environment.
In order to manage an efficient and sustainable infrastructure utilizing the most recent
architectural changes happening in the industry today, outlined in the section titled
“Infrastructure Architectures” starting on page 80, some form of management system needs to
10 Report to Congress on Server and Data Center Energy Efficiency Public Law 109-431, Aug
2007
2010 EMC Proven Professional Knowledge Sharing
43
be in place. As discussed in the “Infrastructure Architectures” section, the “Private” and “Hybrid”
type cloud architectures for example, allow application, servers, networks and storage to be
dynamically modified. It is more of a challenge to understand what is where.
The Best Practice is to implement a management solution that allows a business to:
• Monitor virtualized, high-availability clustered and load-balanced configurations and
isolate problems
• View and monitor the critical application processes running on your physical or virtual
machines
• Identify when your critical hardware (Server, PC, ESX, Hyper-V, etc) is operating in a
degraded state so you can proactively use VMotion ( data migration solution) to move
your critical apps to avoid service disruption
• Monitor the status of virtual machines and track their movement (VMotion or Quick
Migrate) in real time
• Isolate problems when for example, using Microsoft Cluster Services and Symantec
VERITAS Clustering
The management system needs to understand, end to end, all of the levels of virtualization and
integrate common information models to suitably scale. EMC’s IONIX family of management
software implements this type of functionality.
EMC’s Ionix Server Manager (EISM) Software understands the virtualization abstraction stack.
EISM implements detailed Discovery of ESX Servers, Virtual Machines, Physical Hosts with
VMs, VirtualCenter Instances and supports dynamic and on-going discovery of
added/deleted/moved VMs. EISM also understand dependencies and Relationships such as VM
topology dynamic (real-time) discovery, associations of VM’s with ESX server’s and physical
hosts.
It is also a best practice for the management application software to support more than one
virtualization platform. EMC’s EISM supports both (ESX) and Microsoft’s (Hyper-V) virtualization
platforms as well as numerous clustering and load balancing solutions.
2010 EMC Proven Professional Knowledge Sharing
44
Best Practice - Having a robust Information Model
A Best Practice in the management of Data Center Networks is a methodology that allows a
common management platform to support a common information model that provides key
knowledge for automating management applications.
The ICIM Common Information Model is a potential solution. Supported EMC’s Smarts product
suite of management applications illustrates how ICIM can be used. We use this model to
represent a networked infrastructure supporting complex business networks.
An information model, underlying a management platform, provides knowledge about managed
entities that is important to management applications, such as fault, performance, configuration,
security, and accounting. This information must be shared among applications for an integrated
OSS solution.
As Best Practice, an information model must maintain detailed data about the managed system
at multiple layers, spanning infrastructure, applications, and the business services typical in a
Data Center network. A robust information model enables solutions at every level that includes
element management, network management, service management, and business management.
Having a common information model has many benefits. Benefits include faster application
development, and stored information that is maintained in one place, providing a single coherent
view of the managed system. Applications can access the parts of the model pertinent to its
operation, with consistent views to each application.
In a Data Center management system, agents collect operational data on managed elements
(network, systems, applications, etc.) and provide this data to the management system.
Another Best Practice is an information model should represent the whole range of managed
logical and physical entities, from network elements at any layer through attached servers and
desktops, the applications that run on them, the middleware for application interaction, the
services the applications implement, the business processes the applications support, and the
end-users and customers of business processes.
2010 EMC Proven Professional Knowledge Sharing
45
The classes used by the DMTF (Desktop Management Task Force) CIM (Common Information
Model), are an excellent starting point for representing the complete range of entities. An
information model must also be able to describe the behaviors of managed entities. Since
events and problem behaviors play a central role in management processing, such as real-time
fault and performance management, network design and capacity planning, and other functions,
formalizing them within the CIM is a key enabler for management automation.
In addition, Best Practices reflect that data structures or repositories should play an important
role in supporting the semantic model. They must be flexible enough to represent the rich set of
information for each class of managed entities. They must be flexible enough to represent the
often-complex web of relationships between entities (logical and physical) within individual
layers, across layers, and across technology domains that is so typical in a Data Center.
Another important Best Practice is to automate the discovery process about entities and their
relationships within and across technology domains as much as possible. The ability to
automatically populate the information repository is a best practice.
For example, in a Data Center, auto-discovery is particularly effective in environments
supporting the TCP/IP protocol suite, including SNMP and other standard protocols that enable
automatic discovery of a large class of logical and physical entities and relationships in Network
ISO Layers 1-7.
Another Best Practice is to have a modeling language that can describe as many entities of a
managed environment as possible and their relationships within and across technology and
business domains in a consistent fashion. A high-level modeling language can simplify
development of managed entity models as well as reduce error.
The ICIM Common Information Model™ and its ICIM Repository provide excellent examples of
a semantics-rich common information model and an efficient information repository that meet all
requirements presented in earlier sections. ICIM is based on the industry-standard DTMF CIM,
a rich model for management information across networks and distributed systems.
CIM reflects a hierarchical object-oriented paradigm with relationship capabilities, allowing the
complex entities and relationships that exist in the real world to be depicted in the schema. ICIM
2010 EMC Proven Professional Knowledge Sharing
46
enhances the rich CIM semantics by adding behavioral modeling to the description of managed
entity classes to automate event correlation and health determination. This behavioral modeling
includes the description of the following information items:
Events or exceptional conditions - These can be asynchronous alarms, expressions over
MIB variables, or any other measurable or observable event.
Authentic problems - These are the service-affecting problems that must be fixed to maximize
availability and performance.
Symptoms of authentic problems - These events can be used to recognize that the problem
occurred.
By adding behavioral modeling, ICIM provides rich semantics that can support more powerful
automation than any other management system.
Best Practices in Root Cause Analysis
Cloud networks are becoming more difficult to manage. The number and heterogeneity of
hardware and software elements in networked systems are increasing exponentially, therefore
increasing the complexity of managing these systems in a similar growth pattern. The
introduction of each new technology adds to the list of potential problems that threaten the
delivery of network-dependent services.
Fixing a problem is often easy once it has been diagnosed. The difficulty lies in locating the root
cause of the myriad events that appear on the management console of a Data Center, cloud or
any infrastructure. It has been shown that 80 to 90 percent of downtime is spent analyzing data
and events in an attempt to identify the problem that needs to be corrected.
For Data Center managers charged with optimizing the availability and performance of large
multi-domain networked systems, it is not sufficient to collect, filter, and present data to
operators. Unscheduled downtime directly affects the bottom line. The need for applications that
apply intelligent analysis to pinpoint root-cause failures and performance problems automatically
is imperative, especially in the consumer driven video and audio vertical. Only when diagnosis is
automated can self-healing networks become a reality.
2010 EMC Proven Professional Knowledge Sharing
47
Many problems threaten service delivery. They include hardware failures, software failures,
congestion, loss of redundancy, and incorrect configurations.
Best practice - Effective root-cause analysis technique must be capable of identifying all those problems automatically.
This technique must work accurately for any environment and for any topology, including
interrelated logical and physical topologies, with or without redundancy. The solution must be able to
diagnose problems in any type of object - for example, a cable, a switch card, a server, or a
database application - at any layer, no matter how large or complex the infrastructure.
As important, accurate root-cause analysis is required to determine the appropriate corrective
action as well. If management software cannot automate root-cause analysis, that task falls to
operators. Because of the size, complexity, and heterogeneity of today's networks, and the
volume of data and alarms, manual analysis is extremely slow and prone to error.
A Best Practice is to intelligently analyze, adapt, and automate using a Codebook Correlation
Technology described in the following sections. This Best Practice will translate directly for the
customer into major business benefits, enabling organizations to introduce new services faster,
to exceed service-level goals, and to increase profitability.
Rules-based Correlation Limitations and Challenges
Typically, issues that event managers focus on include gathering and displaying ever more data
to users that is ineffective, as well as too much data, de-duplication and filtering. Because of the
lack of intelligence in legacy event managers, users of these systems often resort to developing
their own custom scripts to capture their specific rules for event processing.
Using customized rules is a development-intensive approach doomed to fail for all but the
simplest scenarios in simple static networks.
In this approach, the developer begins by identifying all of the events. Events can include
alarms, alerts, SNMP traps, threshold violations, and other sources that can occur in the
managed system. The Management Platform and user then attempt to write network-specific
rules to process each of these events as they occur.
2010 EMC Proven Professional Knowledge Sharing
48
An organization willing to invest the effort necessary to write rules faces enormous challenges.
Typically, there are hundreds and even thousands of network devices in a Data Center network.
The number of rules required for a typical network, without accounting for delay or loss of
alarms, or for resilience, can easily reach millions. The development effort necessary to write
these rules would require many person-years, even for a small network.
Changes in the network configuration can render some rules obsolete and require writing new
ones. At the point in time when their proper functioning is needed most, i.e., when network
problems are causing loss and delay, rules-based systems are the least reliable given their
constant maintenance and update cycles.
Due to the overall complexity of development, attempts to add intelligent rules to an unintelligent
event manager have not been successful in practice. In fact, rules-based systems have
consistently failed to deliver a return on the investment (ROI) associated with the huge
development effort.
Best Practice – Rules-based correlation using CCT
A Best Practice is utilizing Introducing Codebook Correlation Technology (CCT). CCT is a
mathematically founded next-generation approach to automating the correlation required for
service assurance. CCT is able to automatically analyze any type of problem in any type of
physical or logical object in any complex environment. It is also able to build intelligent analysis
into off-the-shelf solutions as well as automatically adapt the intelligent analysis to the managed
environment, even as it changes. As a result, CCT provides instant results even in the largest of
Data Center Networks.
CCT solutions dynamically adapt to topology changes, since the analysis logic is automatically
generated. This eliminates the high maintenance costs required by rules-based systems that
demand continual reprogramming.
CCT provides an automated, accurate real-time analysis of root-cause problems and their
effects in networked systems. Other advantages include minimal development. CCT supports
off-the-shelf solutions that embed intelligent analysis and automatically adapt to the
environment.
2010 EMC Proven Professional Knowledge Sharing
49
Any development that is required consists of developing behavior models. The amount of effort
is dependent not on the size of the managed environment, but on the number of problems that
need to be diagnosed.
Since CCT consists of a simple distance computation between events and problem signatures,
CCT solutions execute quickly. In addition, CCT utilizes minimal computing resources and
network bandwidth, since it monitors only the symptoms that are needed to diagnose problems.
Because CCT looks for the closest match between observed events and problem signatures, it
can reach the correct cause even with incomplete information
Leveraging CCT as a Best Practice to automate service assurance provides substantial
business benefits as well. These include the ability to roll out a new service more quickly,
achieve greater availability and performance of business critical systems.
Since CCT automatically generates its correlation logic for each specific topology, new Data
Center Network services can be managed immediately and new customers can be added to
new or existing services quickly. By eliminating the need for development, ongoing
maintenance, and manual diagnostic techniques, CCT enables IT organizations to be proactive
and to focus their attention on strategic initiatives that increase revenues and market share.
CCT provides a future-proof foundation for managing any type of complex infrastructure. This
gives CCT users the freedom to adopt new technology, with the assurance that it can be
managed effectively, intelligently, and automatically.
Best Practice - Reduction of Downstream Suppression
Reducing Downstream Event Suppression is another Best Practice. Some management
vendors implement a form of root-cause analysis that is actually a form of downstream event
suppression. Downstream suppression is a path-based technique that is used to reduce the
number of alarms to process when analyzing hierarchical Data Center networks. Downstream
suppression works as follows.
A polling device periodically polls Data Center devices to verify that they are reachable. When a
device fails to respond, downstream suppression does the following:
2010 EMC Proven Professional Knowledge Sharing
50
• Ignores failures from devices downstream (farther away from the "poller”) from the first
device
• Selects the device closest to the “poller” that fails to respond as the "root cause"
Downstream suppression requires that the network have a simple hierarchical architecture, with
only one possible path connecting the polling device to each managed device. This is typically
unrealistic. Today’s mission-critical Data Center networks leverage redundancy to increase
resilience. Downstream suppression does not work in redundant architectures because the
relationship of one node being downstream from another is undefined; there are multiple paths
between the manager and managed devices.
Utilizing downstream suppression to today's Data Center networks is limited. This technique
applies only to simple hierarchical networks with no redundancy, and addresses only one
problem, Data Center node failure. Because of these limitations, downstream suppression offers
little in the way of automating problem analysis, and certainly cannot claim to offer root-cause
analysis.
Self Organizing Systems
As the size and complexity of computer systems grow, system administration has become the
predominant factor of ownership cost and a main cause for reduced system dependability. All of
these factors impede an IT department from achieving an efficient and sustainable operational
model.
The research community, in conjunction with the various hardware and software vendors, has
recognized the problem and there have been several advances in this area.
All of these approaches propose some form of self-managed, self-tuned system(s) that minimize
manual administrative tasks. As a result, computers, networks and storage systems are
increasingly being designed as closed loop systems, as shown in Figure 12 – Closed Loop
System. As shown, a controller can automatically adjust certain parameters of the system based
on feedback from the system. This system can be either hardware, software or a combination.
2010 EMC Proven Professional Knowledge Sharing
51
Controller System
Measurements
Closed Loop Response
Figure 12 – Closed Loop System
Servers, storage systems, Network, and backup hardware are examples of such closed loop
systems aiming at managing the energy consumption and maximizing the utilization of data
centers. In addition, self-organizing systems can also meet performance goals in file servers
through virtualization, using VMware and other virtualization product offerings as well such as
Internet services, databases and Storage tiering. As shown in Figure 13 – Sustainability
Ontology – Self Organizing Systems, on page 52, this methodology or technology can be used
in many additional scenarios, ranging from Storage, Application, Server and Consolidation as
well as the various levels of Virtualization types that can be achieved through Self Organizing
System Theory and applications.
2010 EMC Proven Professional Knowledge Sharing
52
Figure 13 – Sustainability Ontology – Self Organizing Systems
It is important that the resulting closed-loop system is stable (does not oscillate) and converges
quickly to the desired end state when applying dynamic control.
In order to achieve a more sustainable solution, a more rigorous approach is needed for
designing dynamically controlled systems. In particular, it is best practice to use the time true
approach of control theory because it results in systems that can be shown to work beyond the
narrow range of a particular experimental evaluation.
2010 EMC Proven Professional Knowledge Sharing
53
Computer and infrastructure system designers can take advantage of decades of experience in
the field and can apply well-understood and often automated methodologies for controller
design. Many computer management problems can be formulated, so that standard controllers
or systems are applied to solve them. Therefore, a best practice is that the systems community
should stick with systems design; in this case, systems that are acquiescent to dynamic
feedback-based control. This provides the necessary tunable system parameters and exports
the appropriate feedback metrics, so that a controller (hardware or software) can be applied
without destabilizing the system, while ensuring fast convergence to the desired goals.
Traditionally, control theory and/or feedback systems have been concerned with environments
that are governed by laws of physics (i.e., mechanical devices), and as a result, allowed to
make assertions about the existence or non-existence of certain properties. This is not
necessarily the case with software systems. Checking whether a system is controllable or, even
more, building controllable systems is a challenging task often involving non-intuitive analysis
and system modifications.
As a first step, propose a set of necessary and sufficient properties that any system must abide
by to be controllable by a standard adaptive controller that needs little or no tuning for the
specific system. This is the goal leading to achievement of sustainability. These properties are
derived from the theoretical foundations of a well-known family of adaptive controllers. From a
control or feedback systems perspective, there are two very specific IT diverse management
problems:
1) Enforcing soft performance goals in networked or storage service by dynamically adjusting
the shares of competing workloads
2) Controlling the number of blades, storage subsystems, Network Nodes, etc, assigned to a
workload to meet performance goals within power budgets
Best Practice - Dynamic Control in a Self Organized System
Many computer management problems are defined as online optimization problems. The
objective is to have a number of measurements obtained from the system that can converge to
the desired goals by dynamically setting a number of system parameters (or actuators). The
problem is formalized as an objective function that has to be minimized.
2010 EMC Proven Professional Knowledge Sharing
54
Existing research has shown that, in the general case, adaptive controllers are needed to trace
the varying behavior of computer systems and their changing workloads [9].
Best Practice – Utilize STR’s when implementing adaptive controllers
Let us focus on one of the best-known families of adaptive controllers, i.e. Self-Tuning
Regulators (STR), that have been widely used in practice to solve on-line optimization control
problems, Using this technology can help attain sustainability. The term “self-tuning” comes
from the fact that the controller parameters are automatically tuned to obtain the desired
properties of the closed-loop system. The design of closed loop systems involves many tasks
such as modeling, design of control law, implementation, and validation. STR controllers aim to
automate these tasks. Therefore, STR’s can be used out-of-the-box for many practical cases.
Other types of adaptive controllers proposed in many feedback system design methodologies
require more intervention by the designer.
An STR consists of two basic modules or functions: the model estimation module and the on-
line estimates model that describes the measurements from the system as a function of a finite
history of past actuator values and measurements. That model(s) is then used by the control
law module that sets their actuator values. A best practice is using a linear model of the
following form for model estimation in the STR as defined in Equation 6 – Linear model of a
Control System, shown below:
Equation 6 – Linear model of a Control System
( ) ( )01
01)( dituitytY n
i i
n
i i BA −−+−= ∑∑ −
==
Where;
y(t) is a vector of the N measurements sampled at time t and
u(t) is a vector capturing the M actuator settings at time t.
Ai and Bi are the model parameters with dimensions compatible with those of vectors y(t) and
u(t).
n is the model order that captures how much history the model takes into account.
d0 is the delay between an actuation and the time the first effects of that actuation are observed.
The unknown model parameters Ai and Bi are estimated using Recursive Least-Squares (RLS)
estimation. This standard, computationally fast estimation technique that fits Equation 6 to a
number of measurements, so that the sum of squared errors between the measurements and
2010 EMC Proven Professional Knowledge Sharing
55
the model curve is minimized. Discreet-time models are assumed. One time unit in this discreet-
time model that corresponds to an invocation of the controller, i.e., sampling of system
measurements, estimation of a model, and setting the actuators. The relation between actuation
and observed system behavior is not always linear. For example, while throughput is a linear
function of the share of resources (e.g., CPU cycles) assigned to a workload, storage processor,
etc, the relation between latency and resource share is nonlinear as Little’s law indicates.
However, even in the case of nonlinear metrics, a linear model is often a good enough local
approximation to be utilized by a controller as the latter usually only makes small changes to
actuator settings. We can estimate the advantage of using linear models in computationally
efficient ways, resulting in tractable control laws.
The control law is essentially a function that, based on the estimated system model defined in
Equation 6 at time t, decides what the actuator values u(t) should be to minimize the objective
function. In other words, the STR derives u(t) from a closed-form expression as a function of
previous actuations, previous measurements and the estimated system measurements y(t).
from a systems perspective, the important point is that these computationally efficient
calculations can be performed on-line. The STR requires little system-specific tuning as it uses
a dynamically estimated model of the system and the control law automatically adapts to system
and workload dynamics. For this process to apply and for the resulting closed-loop system to be
stable and to have predictable convergence time, control theory has come with a list of
necessary and sufficient properties that the target system must abide by.
Best Practice – Require System Centric Properties
In the following paragraphs, guidelines about how one can verify whether a property is satisfied
and what the challenges for enforcing them are presented.
Monotonic. The elements of matrix B0 in Equation 6 on page 54 must have known signs that
remain the same over time. The concept behind this property is that the real (non-estimated)
relation between any actuator and any measurement must be monotonic and of known sign.
This property usually refers to some physical law. Therefore, it is generally easy to check for it
and get the signs of B0. For example, in the long term, a process with high fraction of CPU
cycles gets higher throughput and lower latency than one with a smaller fraction.
2010 EMC Proven Professional Knowledge Sharing
56
Accurate models. The estimated model in Equation 6 is a good enough local approximation of
the system’s behavior. As discussed, the model estimation is performed periodically. A
fundamental requirement is that the model around the current operating point of the system
captures the dynamic relation between actuators and measurements sufficiently. In practice,
this means that the estimated model must track only real system dynamics. I used the term
‘noise’ to describe deviations in the system behavior that are not captured by the model. It has
been shown that to ensure stability in linear systems where there is a known upper bound on
the noise amplitude, the model should be updated only when the model error is twice the noise
bound.
There are three main sources for the previously discussed noise:
1) un-modeled system dynamics, due i.e., to contention on the network as an example
2) a fundamentally volatile relation between certain actuators and measurements
3) quantization errors when a linear model is used to approximate locally in an operating range
the behavior of a nonlinear system.
Known system delay. We know the time delay or the delta time reference between system
sample periods of the feedback system 0d relative to actuator intervals.
Known system order. We know the upper bound on the degree of the system. Known System
ensures that the controller knows when to expect the first effects of its actuations, while Known
System Order ensures that the model sufficiently remembers many prior measurements (y) to
capture the dynamics of the system. These properties are needed for the controller to observe
the effects of its actuations and then attempt to correct any error in subsequent actuations. If the
model order were less than the system order, then the controller would not remember having
ever actuated when the measurements finally are affected. The values of d0 and n are derived
experimentally. The designer is faced with a tradeoff: their values must be high enough to
capture the causal relations between actuation and measurements but not too high, so that the
STR remains computationally efficient. Note that 0d = 1 and n = 1 are ideal values.
Minimum phase. Recent actuations have higher impact to the measurements than older
actuations. A minimum phase system is one for which the effects of an actuation can be
corrected or canceled by another, later actuation. It is possible to design Stirs that deal with
2010 EMC Proven Professional Knowledge Sharing
57
non-minimum phase systems, but they involve experimentation and non-standard design
processes. In other words, without the minimum phase requirement, we cannot use off-the-shelf
controllers. Typically, physical systems are minimum phase. The causal effects of events in the
system fade as time passes by. Sometimes, this is not the case with computer systems. To
ensure this property, a designer must re-set any internal state that reflects older actuations.
Alternatively, the sample interval can always be increased until the system becomes minimum
phase. However, longer sampling intervals result in slower control response.
Linear independence. The elements of each of the vectors y(t) and u(t) must be linearly
independent. The quality of the estimated model is poor unless this property holds. The
predicted value for y(k) may be way off the actual measurements. The reason is that matrix
inversion in the RLS estimator may result in matrices with very large numbers, which in
combination with the limited resolution of floating point arithmetic of a CPU, result in models that
are not necessarily correct. Sometimes, simple intuition about a system may be sufficient to
ascertain if there are linear dependencies among actuators.
Zero-mean measurements and actuator values. The elements of each of the vectors y(k) and
u(k) should have a mean value close to 0. If the actuators or the measurements have a large
constant component to them, RLS tries to accurately predict this constant component and may
fail to capture the comparably small effect the values of the actuators have. If there is a large
constant component in the measurements and it is known, then you can simply deduct it from
the reported measurements. If unknown, you can easily estimate it using a moving average.
Comparable magnitudes of measurements and actuator values. The values of the elements in
y(k) and u(k) should not differ by more than one order of magnitude. If the measurement values
or the actuator values differ considerably, then RLS predicts more accurately the effects of the
higher value. You can easily solve this problem by scaling the measurements and actuators, so
that their values are comparable. This scaling factor can also be estimated using a moving
average.
Application
To design a sustainable IT infrastructure, it is very important to deal with and understand the
application and the resources needed.
2010 EMC Proven Professional Knowledge Sharing
58
Best Practice – Architect a designed for Run solution The Designed for Run11 strategy provides enterprises a clear path for transformation and
modernization while controlling costs and protecting the user experience. It considers the
function and expense of the client's IT system and applications as a whole. Designed for Run
ensures the key factors that affect agility and TCO, rigidity, complexity and resource utilization
are addressed from the start in the planning phase.
The reason why we need to consider this methodology is that most organizations recognize the
need for IT modernization but fear the time, expense and risk involved. However, failure to
modernize will result in high costs, complex systems that are difficult to fix, outages that have a
business impact, and low resource utilization. These factors weigh heavily in a TCO equation
and compromise sustainability.
The Designed for Run approach offers these benefits:
• Reduces risk by bringing deep industry knowledge, expert planning and thorough testing
to every part of the process
• Extracts value within your existing IT, extending the life of legacy elements by integrating
them to support the business through modernization
• Lowers total cost of ownership by building maximum asset utilization into your IT system
and enabling you to anticipate operational IT expenses
The approach to a “design to run” plan follows; it is defined in three phases, Plan, Build, Run
and Automate;
In the “Planning” phase, we focus on reducing risk. The objective is to reduce system
complexity, poor resource utilization, and outages. After assessing your existing IT applications
and infrastructure, we tie your IT strategy to your business strategy, determine the system's
level of maturity, and use existing industry frameworks to develop a blueprint for your system of
the future.
In the “Building” phase we focus on extracting value by designing high quality into the system,
building it for zero outages. Begin by implementing optimized processes for IT applications and
11 HP trademark
2010 EMC Proven Professional Knowledge Sharing
59
then implement the elements of a modern architecture onto a modern infrastructure. This
enables us to best use existing systems while gaining the speed, flexibility and innovation
required in the 21st century.
In the “Run” phase, we focus on optimizing for total cost of ownership. After the transformation
is complete, there are no costly surprises for running the finished system. Rigorous, closed-loop
operational processes drive consistency and relentless pursuit of incident prevention. One will
enjoy complete visibility of your enterprise's IT systems, making it possible to avoid costly errors
in budgeting and performance this increasing sustainability.
In the Automate phase, minimize management by automating this process.
Storage
Information management is at the core of achieving efficiency and sustainability. At this point, it
is the belief and assumed to be common knowledge that the digital footprint, the storage that
both businesses and people in general utilize, is exploding at an exponential rate. Something
must be done to address this growth.
Storage has a life cycle, from creation to eventual archival or deletion. In order for any business
to achieve a sustainable growth path, having an information lifecycle management (ILM)
solution is a given. Technologies will be discussed in the following sections having a similar
common approach to data life cycle management as shown in Figure 14 – A Sustainable
Information transition lifecycle. The approach is to consider going from a highly usable
performance entity, “Thick”, where the data is on a disk drive that has high performance
characteristics with standard provisioning. This leads to eventual “Thin” or virtual provisioning.
Thin or virtual provisioning can also include performance as well as capacity tiering going to a
very low granular level. The data can then become “Small” through data deduplication,
compression and other redundancy methods.
Figure 14 – A Sustainable Information transition lifecycle
2010 EMC Proven Professional Knowledge Sharing
60
At some point in the data’s life cycle, performance may not be much of an issue and the
physical media can spin down in a smart way, going into a “Green” state to eventually be spun
up with hints on when the application may need it. Eventually, the data may never be touched or
deleted arriving at a “Gone” state. We will discuss all of these life stages.
Compression, Archiving and Data Deduplication
In order to achieve sustainability of data growth and therefore gain efficiency, it is imperative to
implement some form of technology to reduce the amount of data through its life cycle from
creation to its eventual archiving and deletion. One way is to use compression t reduce or
minimize the number of copies of data as well as the data footprint. The standard example is
when a person sends an email to twenty people with a file attachment. Each person will have a
copy of that file. If each recipient saves that file in the mail archive, there will be twenty copies.
This is obviously not a sustainable approach. Data deduplication is a technology that deals with
this un-sustainable data growth pattern. Data compression achieves efficiency and
sustainability.
Best Practice – Implement Deduplication Technology Data deduplication is an application specific form of data compression where redundant data is
eliminated, typically to improve storage utilization. In this process, duplicate data is deleted,
leaving only one copy. However, indexing or the ability to retrieve it from various sources of all
data is retained should it ever be required. Deduplication is able to reduce the required storage
capacity since only the unique data is stored. Backup applications generally benefit the most
from de-duplication due to the nature of repeated full backups of an existing file system or
multiple severs having similar images of an OS[10].
When implementing a data deduplication system, it is important to consider scalability to
achieve true sustainability. Performance should remain acceptable as the storage capacity and
high deduplication granularity. Data deduplication should also be unaffected by data loss due to
errors in the deduplication algorithm.
2010 EMC Proven Professional Knowledge Sharing
61
Best Practice – Implement a data-deduplication solution addressing Scaling and hash collisions It is critical that data deduplication solutions detect duplicate data elements, making the
determination that one file, block or byte is identical to another. Data deduplication products
determine this by processing every data element through a mathematical "hashing" algorithm to
create a unique identifier called a hash number. Each number is then compiled into a list,
defined as the hash index.
When the system processes new data elements, the resulting hash numbers are compared
against the hash numbers already in the index. If a new data element produces a hash number
identical to an entry already in the index, the new data is considered a duplicate, and it is not
saved to disk. A small reference "stub" that relates back to the identical data that has been
stored is put in the original place. If the new hash number is not already in the index, the data
element is considered new and stored to disk normally.
It is possible that a data element can produce an identical hash result even though the data is
not identical to the saved version. Such a false positive, also called a hash collision, can lead to
data loss. There are two ways to reduce false positives:
• Use more than one hashing algorithm on each data element. Using in or out-of-band
indexing with SHA-1 and MD5 algorithms is the best practice approach. This
dramatically reduces the potential for false positives.
• Another best practice to reduce collisions is to use a single hashing algorithm but
perform a bit-level comparison of data elements that register as identical.
The challenge with both approaches is that more processing power from the host system be it
the Source Host (Source De-duplication) or the Target (Target De-duplication), reduces index
performance and slows the deduplication process. Target deduplication is the process of data
reduction at the Server vs. at the target (VTL, Disk, etc.). As the deduplication process becomes
more granular and examines smaller chunks of data, the index becomes much larger and the
probability of collisions increases and can exacerbate any performance hit.
2010 EMC Proven Professional Knowledge Sharing
62
Best Practice – Include and consider scaling and encryption in the deduplication Process Another issue is the relationship between deduplication, more traditional compression and
encryption in a company's storage infrastructure. Ordinary compression removes redundancy
from files, and encryption "scrambles" data so that it is completely random and unreadable.
Both compression and encryption play an important role in data storage, but eliminating
redundancy in the data can impair the deduplication process. Indexing and deduplication should
be performed first if encryption or traditional compression is required along with deduplication.
Each "chunk" of data (i.e., a file, block or bits) is processed using a hash algorithm, such as
MD5 or SHA-1, generating a unique reference for each piece. The resulting hash reference is
then compared to an index of other existing hash numbers. If that hash number is already in the
index, the data does not need to be stored again. If we have a new entry, the new hash number
is added to the index and the new data is stored.
The more granular a deduplication platform is, the larger an index will become. For example,
file-based deduplication may handle an index of millions, or even tens of millions, of unique
hash numbers. Block-based deduplication will involve many more unique pieces of data, often
numbering into the billions. Such granular deduplication demands more processing power to
accommodate the larger index. This can impair performance as the index scales unless the
hardware is designed to accommodate the index properly.
In rare cases, the hash algorithm may produce the same hash number for two different chunks
of data. When such a hash collision occurs, the system fails to store the new data because it
sees that hash number already. Such a "false positive" can result in data loss.
Best Practice – Utilize multi Hash algorithms, Metadata Hashing, Compression and Data Reduction It is a best practice to implement multiple hash algorithms, examining metadata to identify data
deduplication, compression and data reduction, and thereby prevent hash collisions and other
abnormalities. Data deduplication is typically used in conjunction with other forms of data
reduction, such as compression and delta differencing. In data compression technology, which
has existed for about three decades, algorithms are applied to data in order to simplify large or
repetitious parts of a file.
2010 EMC Proven Professional Knowledge Sharing
63
Delta differencing is primarily used in archiving or backup, reduces the total volume of stored
data by saving only the changes to a file since its initial backup. For example, a file set may
contain 400 GB of data, but if only 100 MB of data has changed since the previous backup, then
only that 100 MB is saved. Delta differencing is frequently used in WAN-based backups to make
the most of available bandwidth in order to minimize the backup window. For additional
information on WAN or network sustainability efficiency options, please refer to the section titled
“Network” starting on page 72.
Data deduplication also has ancillary benefits. Less deduplicated or compressed data can be
backed up faster, resulting in smaller backup windows, reduced recovery point objectives
(RPOs) and faster recovery time objectives (RTOs). Disk archive platforms are able to store
considerably more files. If tape is the ultimate backup target, smaller backups also use fewer
tapes, resulting in lower media costs and fewer tape library slots being used.
For a virtual tape library (VTL), the reduction in disk space requirements translates into longer
retention periods for backups within the VTL itself and therefore more sustainable. Data
transfers are accomplished sooner, freeing the network for other tasks, allowing additional data
to be transferred or reducing costs using slower, less-expensive WANs. For additional
information on WAN or network sustainability efficiency options, please refer to the section titled
“Network”, starting on page 72.
In addition to Archiving, Flash Storage, Data Compression, De-duplication and Archiving, we will
discuss other technologies that will advance the ability for a business to achieve sustainability.
Best Practice – Use Self organizing systems theory for Storage Storage virtualization products in the market today are a good first step, but enhancements are
needed. Storage virtualization is creating efficiencies by inserting a layer of abstraction between
data and storage hardware, and that same concept can be taken further to present a layer of
abstraction between data and the method in which data is stored. RAID is actually a well-known
form of data virtualization, because the linear sequence of bytes for data is transformed to stripe
the data across the array, and includes the necessary parity bits. RAID’s data virtualization
technique was designed over 20 years ago to improve data reliability and I/O performance.
Even though it is a reliable and proven technology, we continue to need RAID technology as we
transition more from structured data to large quantities of unstructured data.
2010 EMC Proven Professional Knowledge Sharing
64
One of EMC’s most recent technologies called “Virtual LUN” and “Fast Fully Automated Storage
Tiering”, also known as FAST are examples. This technology automatically and dynamically
moves data across storage tiers, so that it is in the right place at the right time simply by pooling
storage resources, defining the policy, and applying it to an application similar to the modeling
parameters Ai and Bi in Equation 6 – Linear model of a Control System on page 54.
FAST enables applications to remain optimized by eliminating trade-offs between capacity and
performance. Automated storage tiering dynamically monitors and automatically relocates data
to increase operational efficiency, lower costs and increase efficiency.
By utilizing a “FAST” technology that implements transparent mobility (i.e. the application does
not know that a transfer is going on) and dynamically moving applications across different
storage types, great strides can be made to sustaining the ability to manage information. As of
this writing, Flash, Fiber Channel, and SATA drive technologies are all currently supported in
EMC’s implementation of FAST.
“Dispersal” is another new technology being considered in storage. It is a natural successor for
RAID for data virtualization because it can be configured with M of N fault tolerance, which can
provide much higher levels of data reliability than RAID. Dispersal essentially packetizes the
data (N packets), and only requires a subset (M packets) to fit perfectly and recreate the data.
There will no longer be a tight coupling between hardware and the storage of the data packets.
This is one major change for data virtualization that will occur as Dispersal replaces RAID and
will eliminate the concept of having copies of data on hardware.
Today’s RAID systems stripe data and parity bits across disks within an array, within an
appliance. When asked, “Where is my data?” the answer is typically “On this piece of
hardware.” This gives people peace of mind in terms of sensing something that is intangible
(since the data is actually virtualized) is actually tangible because it is contained within a
physical device.
The shift for IT storage administrators will be asking, “Where is my data?” since it will be
virtualized across multiple devices in multiple locations to “Is my data protected?” because the
2010 EMC Proven Professional Knowledge Sharing
65
root of the first question is the second. Once people get comfortable with actually giving up the
control of knowing exactly where their data resides, they will realize the benefits of data
virtualization.
Increased fault tolerance is the largest benefit to storing data packets across multiple hardware
nodes. RAID is structured to provide disk drive fault tolerance. As a disk drive fails, the other
disks can reconstruct the data.
Dispersal provides not only disk drive level fault tolerance, but also device drive fault tolerance,
and even location fault tolerance. When an entire device fails, the data can be reconstructed
from virtualized data packets on other devices, whether centrally located or across multiple
sites. Self-organizing systems can be used to address reconstructed virtualized technology
issues.
Current products focus on vitalizing storage pools and access from storage hardware. This is a
good step towards avoiding a silo style storage system. However, it appears there is still too
much management burden placed on the storage administrator. The management systems do
have self-discovery, but that simply means listing the hardware nodes that have been added to
the system. The burden is still on the storage administrator to determine where and how to
deploy those nodes.
Another step must occur to simplify the management burden; the system must evolve to self
organize. Self-organizing systems are made up of small units that can determine an inherent
order collectively. Instead of a storage administrator having to determine which pools and tiers
to add storage nodes to, the nodes themselves will evolve to contain metadata and rules, and
inherently place themselves within the storage tiers, as the system requires.
An example of metadata and rules could be related to disk characteristics –SSD (Solid State
Disks) suited for tier 1-performance scenarios. Another example would be related to GPS
location – storage nodes could know which data center they have been installed within, and
determine which storage pools they need to join. This type of self-organized system attributes
are currently in development, for example, EMC’s ATMOS cloud storage offerings.
2010 EMC Proven Professional Knowledge Sharing
66
Beyond the storage system, self-organizing systems, in terms of provisioning hardware,
systems will also self organize the tiers. Storage administrators will define the requirements for
tiers (QoS, data reliability, performance), and the storage nodes will self organize underneath
them. That means that when capacity and performance nodes are added to the system, the
system will also determine which tiers need those resources.
Emergent patterns will surface once the storage nodes have metadata and rules, and the
toolset for managing the system will change. Rather than managing the physical hardware, and
individual storage pools and access, management will occur at a system level – what storage is
required in which locations based on how information is dynamically moving across storage
nodes and tiers.
Autonomic self healing systems
The concept of “Self Healing” systems is a subset of self-organizing systems (see section “Self
Organizing Systems”, starting on page 50, for more information) in that it utilizes a similar closed
loop system theory construct. The fundamental approach or rules are:
• The System must know itself
• The System must be able to reconfigure itself within its operational environment
• The System must preemptively optimize itself
• The System must detect and respond to its own faults as they develop
• The System must detect and respond to intrusions and attacks
• The System must know its context of use
• The System must live in an open world assuming the security requirement allow
• The System must actively shrink the gap between user/business goals and IT solutions
to achieve sustainability
Autonomic computing is really about making systems self-managing. If you think about
biological systems like the human body, they are tremendously complex and very robust. The
human body, for example, is constantly making adjustments. Your heart rate is being controlled;
your breathing rate is controlled. All of these things happen beneath the level of conscious
control. Biological systems give a good example for thinking about computer systems. When we
take a look at the attributes of biological systems, we can find attributes that we wish our
2010 EMC Proven Professional Knowledge Sharing
67
computer systems had, like self-healing, self-configuring, and self-protecting. We can begin to
build the attributes that we see in biological systems into complex computer systems. In the
end, it translates into real customer benefits because these more complex systems are easier to
administer, control and are more sustainable.
Best Practice – Utilize Autonomic self-healing systems for Storage As it relates to Storage, in addition to FAST, Dispersal and RAID technology to store and
protect, another relatively new technology has been implemented and is commonly referred to
as "autonomic self-healing storage." This technology promises to substantially increase
reliability of disk systems. Autonomic self-healing storage is different from RAID, redundant
array of independent nodes (RAIN), snapshots, continuous data protection (CDP) and mirroring.
RAID, RAIN, etc, are designed to restore data from a failure situation.
These technologies however, are actually self-healing data, not self-healing storage and restore
data when there is a storage failure and mask storage failures from the apps. RAID and RAIN
do not restore the actual storage hardware.
Autonomic self-healing systems transparently restore both the data and storage from a failure. It
has been statistically proven that as HDDs proliferate, so will the number of hard disk drives
failures, which can lead to lost data. Analyzing what happens when a HDD fails illustrates the
issue:
If a hard disk drive fails, the drive must be physically replaced, either manually or from an online
pool of drives. Depending on the RAID set level, the HDD's data is rebuilt on the spare. RAID
1/3/4/5/6/10/60 all rebuild the hard disk drives data, based on parity. RAID 0 cannot rebuild the
HDD's data.
The time it takes to rebuild the HDD's data depends on the hard disk drives capacity, speed and
RAID type. A 1 TB 7,200 rpm SATA HDD with RAID 5 will take approximately 24 hours to 30
hours to rebuild the data, assuming the process is given a high priority.
If the rebuild process is given a low priority and made a background task to be completed in off
hours, the rebuild can take as long as eight days. The RAID group is subject to a higher risk of a
second disk failure or non-recoverable read error during the rebuild, which would lead to lost
2010 EMC Proven Professional Knowledge Sharing
68
data. This is because the parity must read every byte on every drive in the RAID group to
rebuild the data. (Exceptions are RAID 6, RAID 60)
SATA drives typically have a rated non-recoverable read error rate of 1 x 1014: roughly 1 out of
100,000,000,000,000 bits will have a non-recoverable read error. This means that a seven-drive
RAID 5 group with 1 TB SATA drives will have approximately a 50% chance of failing during a
rebuild, resulting in the loss of the data in that RAID group.
Enterprise-class drives (Fiber Channel or SAS) are rated at 1 x 1015 for non-recoverable read
errors, which translates into less than a 5% chance of the RAID 5 group having a failure during
a rebuild. RAID 6 eliminates the risk of data loss should a second HDD fail. You pay for that
peace of mind with decreased write performance vs. RAID 5, and an additional parity drive in
the RAID group. Eventually, the hard disk drive is sent back to the factory. Using typical
MTBF’s, there will be approximately 40 HDD "service events" per year.
Best Practice – Consider Autonomic Storage solutions utilizing Standards New Storage systems, including EMC’s VMAX, tackle end-to-end Autonomic self healing and
error detection and correction, including silent data corruption (See section titled “Best Practice
– Implement Undetected data corruption technology into environment”, on starting on page 69).
In addition, sophisticated algorithms that attempt to "heal-in-place" failed HDDs before requiring
a RAID data rebuild are also implemented. A technology that is currently being developed is a
relatively new concept of "fail-in-place" so that in the rare circumstance when a HDD truly fails
(i.e., it is no longer usable), no service event is required to replace the hard disk drive for a
RAID data rebuild. This would add to the sustainability equation.
The T10 DIF is a relatively new standard and only applies to SCSI protocol HDDs (SAS and
Fiber Channel). However, as of this writing, there is no standard spec for end-to-end error
detection and correction for SATA hard disk drives. As a result, EMC and others have devised
proprietary solutions for SATA end-to-end error detection and correction methodologies.
The American National Standards Institute's (ANSI) T10 DIF (Data Integrity Field) specification
calls for data to be written in blocks of 520 bytes instead of the current industry standard 512
bytes. The eight additional bytes or "DIF" provide a super-checksum that is stored on disk with
the data. The DIF is checked on every read and/or write of every sector. This makes it possible
2010 EMC Proven Professional Knowledge Sharing
69
to detect and identify data corruption or errors, including misdirected, lost or torn writes. ANSI
T10 DIF provides three types of data protection:
• Logical block guard for comparing the actual data written to disk
• Logical block application tag to ensure writing to the correct logical unit (virtual LUN)
• Logical block reference tag to ensure writing to the correct virtual block
When errors are detected, they can then be fixed by the storage system's standard correction
mechanisms. Self-healing storage solves tangible operational problems in the data center and
allows a more sustainable and efficient environment. This technology reduces service events,
costs, management, the risk of data loss, and application disruptions.
Best Practice – Implement Undetected data corruption technology into environment Another problem with HDDs that is rarely mentioned but is quite prevalent is "silent data
corruption." Silent data corruption(s) are storage errors that go unreported and undetected by
most storage systems, resulting in corrupt data being provided to an application with no
warning, logging, error messages, or notification of any kind.
Most storage systems do not detect these errors, which occur on average with 0.6% of SATA
HDDs and .06% of enterprise HDDs over 17 months 12. Silent data corruption occurs when
RAID does not detect data corruption errors, such as misdirected or lost writes. It can also occur
with a “torn write”, data that is partially written and merges with older data, so the data ends up
part original data and part new data. Because the hard disk drive does not recognize the errors,
the storage system is not aware of it either, so there is no attempt at a fix. See the section titled
“Autonomic self healing systems” starting on page 66, above for additional information.
Storage Media – Flash Disks
A number of techniques can be applied to reduce power consumption in a storage system and
therefore increase efficiency. Disk drive technology vendors are developing low spin disks,
which can be slowed or stopped to reduce power consumption when not in use. Another
12 An Analysis of Data Corruption in the Storage Stack," L.N. Bairavasundaram et al., presented
at FAST '08 in San Jose, Calif.
2010 EMC Proven Professional Knowledge Sharing
70
technology is FLASH Disks. Caching techniques that reduce disk accesses and the use of 2.5-
inch rather than 3.5-inch formats can reduce voltage requirements from 12 volts to 6 volts. An
industry-wide move toward higher-capacity Serial ATA (SATA) drives and 2.5-inch disks is
under way, which some claim will lead to better energy performance.
Best Practice – Utilize low power flash technologies A best practice is to utilize FLASH storage as applicable to the Data Center Tiered Storage
requirements. Low power technologies are starting to enter the data center. With the advent of
Solid State Disks (SSDs), this enabling semiconductor technology can and will have a major
impact on power efficiencies.
For example, an SSD system can be based on double data rate (DDR) DRAM technology and
integrated with battery backup. It requires a Fiber Channel (FC) interface consistent with
conventional hard drives. This SSD technology has been available for years and has
established itself in a niche market that serves large processing-intensive government projects
and companies involved in high-volume, high-speed/low-latency transactions such as stock
trading systems.
For additional information on Flash Disk Technologies, please refer to the 2008 EMC Proven
Professional Knowledge Sharing article titled “Crossing the Great Divide in Going Green:
Challenges and Best Practices in Next Generation IT Equipment”.
Server Virtualization
With the advent of virtualization at the Host level, there are new possibilities to balance
application workloads as per the requirements of self-organized systems. ’s infrastructure now
provides two new capabilities:
1. resource pools, to simplify control over the resources of a host
2. clusters, to aggregate and manage the combined resources of multiple hosts as a single
collection
In addition, now has functionality called Distributed Resource Scheduling (DRS) that
dynamically allocates and balances computing capacity across the logical resource pools
defined for Infrastructure.
2010 EMC Proven Professional Knowledge Sharing
71
DRS continuously monitors utilization across the resource pools and intelligently allocates
available resources among virtual machines based on resource allocation rules that reflect
business needs and priorities. Virtual machines operating within a resource pool are not tied to
the particular physical server on which they are running at any given point in time. When a
virtual machine experiences increased load, DRS first evaluates its priority against the
established resource allocation rules and then, if justified, allocates additional resources by
redistributing virtual machines among the physical servers. VMotion executes the live migration
of the virtual machine to a different server with complete transparency to end users. The
dynamic resource allocation ensures that capacity is preferentially dedicated to the highest
priority applications, while at the same time maximizing overall resource utilization.
Best Practice – Implement DRS Utilizing DRS and VirtualCenter provides a view and management of all resources in the cluster
emulating a self-organized solution. As shown in Figure 15 – Self Organized VM application
controller, below, a global scheduler within VirtualCenter enables resource allocation and
monitoring for all virtual machines running on ESX Servers that are part of the cluster.
Figure 15 – Self Organized VM application controller
DRS provides automatic initial virtual machine placement on any of the hosts in the cluster, and
also makes automatic resource relocation and optimization decisions as hosts or virtual
machines are added or removed from the cluster. DRS can also be configured for manual
control, in which case it only recommends that you can review and carry out. DRS provides
several additional benefits to IT operations:
2010 EMC Proven Professional Knowledge Sharing
72
• Day-to-day IT operations are simplified as staff members are less affected by localized
events and dynamic changes in their environment. Loads on individual virtual machines
invariably change, but automatic resource optimization and relocation of virtual
machines reduces the need for administrators to respond, allowing them to focus on the
broader, higher-level tasks of managing their infrastructure.
• DRS simplifies the job of handling new applications and adding new virtual machines.
Starting up new virtual machines to run new applications becomes more of a task of
high-level resource planning and determining overall resource requirements, than
needing to reconfigure and adjust virtual machines settings on individual ESX Server
machines.
• DRS simplifies the task of extracting or removing hardware when it is no longer needed,
or replacing older host machines with newer and larger capacity hardware. To remove
hosts from a cluster, you can simply place them in maintenance mode, so that all virtual
machines currently running on those hosts are reallocated to other resources of the
cluster. After monitoring the performance of remaining systems to ensure that adequate
resources remain for currently running virtual machines, you can remove the hosts from
the cluster to allocate them to a different cluster, or remove them from the network if the
hardware resources are no longer needed. Adding new resources to the cluster is also
straightforward, as you can simply drag and drop new ESX Server hosts into a cluster
Network
Best Practice - Architect Your Network to Be the Orchestration Engine for Automated Service Delivery (The 5 S’s)
With respect to network technologies, we must rethink how, going forward, we will build out data
communication networks. This is especially true with new data center architectures being
developed to address the needs for efficiency and sustainability. The challenge is to develop a
set of best practices with the requirement to address a cloud-ready network to automate service
delivery. Built correctly, it can become the orchestration engine for your cloud and sustainability
strategy. Cloud services need a network that embraces the five architectural goals. Design a
cloud network with these principles[12]:
2010 EMC Proven Professional Knowledge Sharing
73
Scalable: Your cloud network must scale without adding complexity or sacrificing performance. This
means scaling in lockstep with the dynamic consumption of software, storage, and application
resources without “throwing more infrastructure” at the problem.
Simplified: To achieve scale and reduce operational costs, you must simplify your network design. Fewer
moving parts, collapsed network tiers, a single operating system if possible, and fewer
interfaces ensure scalability and pave the way for automation.
Standardized: Cloud computing requires commoditized, standards-based technologies. Likewise, your cloud
network cannot be based on proprietary components that increase the capital and operation
costs associated with delivering cloud services.
Shared: Cloud networks must be built with multi-tenancy in mind. Different customers, departments, and
lines of business will consume various cloud services with their own unique requirements. A
shared network with differentiated service levels is required.
Secure: Cloud networks must embrace security on two levels:
1) controls built into the fabric that prevent wide-scale infrastructure and application
breaches and disruptions
2) overlying identity and data controls to combat regulatory, privacy, and liability concerns.
This security must be coordinated so that the network secures traffic along three key
connections: among virtual machines within the data center, between data centers, and
from clients to data centers.
It is important to note that these five principles are interrelated. Take simplicity, for example. It is
critical to simplify your cloud network infrastructure to ensure scalability as well as reduce the
number of moving parts needed to secure the end-to-end platform. However, to simplify, you
must standardize the components and build on an open network platform as well as use shared
components to get economies of scale.
2010 EMC Proven Professional Knowledge Sharing
74
Best Practice - Select the Right Cloud Network Platform
The cloud network is a platform. However, few companies and even fewer vendors ask the
question “A platform for what?” The answer is automation. Successful cloud networking requires
building a network with automation as part of its core focus. Automation makes troubleshooting,
security, provisioning, and other service delivery components less expensive and more reliable.
To bake automation into your cloud services, select a vendor that embraces automation in four
areas:
Cloud network infrastructure Routing, switching, and network appliances are the core components of delivering high-
performance cloud services. A best Practice is to find a vendor whose infrastructure provides
scalability and standardization, the building blocks for automation. In addition, the network
infrastructure must be application-aware to provide the granularity for quality-of-service and
quality-of-experience delivery requirements.
Cloud network operating system (OS) Hardware is the core platform, but the OS is the key to automation. Look for a vendor that
provides an OS that is standardized, shared, and simplified across its entire routing, switching,
and appliance infrastructure. A good platform OS has hooks to automate delivery across as well
as an open development ecosystem for third party, cloud-specific applications to run natively on
the network.
Cloud network management systems The infrastructure and OS are responsible for enabling automation, but the management system
is responsible for orchestrating it. A best practice is to select a vendor with a single
management system for its entire portfolio. Cloud management requires rich policy interfaces
and the ability to define differentiated services in the cloud.
Cloud network security Security appliances are part of your platform infrastructure, but you will also need end-to-end
security automated across the cloud. A best practice is to select a vendor with baked-in security
that protects the network cloud at the macro (broadly across all infrastructure and the OS) and
micro (granularly, to protection individual sessions) levels.
2010 EMC Proven Professional Knowledge Sharing
75
Best Practice – Consider implementing layer 2 Locator/ID Separation
With advances in the ability to move applications via virtualization, such as Vmotion, one of the
issues and challenges is the current IETF IP routing and addressing architecture protocols.
Having a single numbering space the "IP address" for both host transport session identification
and network routing creates scaling and interoperability issues. This is particularly true with
new infrastructure architectures outlined in the subsequent section titled “Infrastructure
Architectures”, starting on page 80. We can realize a number of scaling benefits by separating
the current IP address into separate spaces for Endpoint Identifiers (EIDs) and Routing Locators
(RLOCs); among them are:
1. Reduction of the routing table size in the "default-free zone" (DFZ). RLOCs would be
assigned by internet providers at client network attachment points greatly improving
aggregation and reducing the number of globally visible, routable prefixes.
2. Cost-effective multi-homing for sites that connect to different service providers, including
Cloud Servicer Providers, so that providers can control their own policies for packet flow
into the site without using extra routing table resources of core routers.
3. Easing of renumbering burden when clients change providers. Because host EIDs are
numbered from a separate, non-provider assigned and non-topologically-bound space,
they do not need to be renumbered when a client site changes its attachment points to
the internal or external network.
4. Traffic engineering capabilities that can be performed by network elements and do not
depend on injecting additional state into the routing system.
5. Mobility without address changing. Existing mobility mechanisms will be able to work in a
locator/ID separation scenario. It will be possible for a host (or a cluster of physical or
virtual hosts) to move to a different point in the network topology (Internal, external or
hybrid Clouds) either retaining its initial address or acquiring a new address based on
the new network location. A new network location could be a physically different point in
the network topology or the same physical point of the topology with a different provider.
Currently, the IETF(Internet Engineering Task Force) is working on standards13 that will
implement this type of protocol. Cisco is a major driver in this endeavor. By decoupling end
13 IETF Locator/ID Separation Protocol (LISP)- http://tools.ietf.org/html/draft-ietf-lisp-05
2010 EMC Proven Professional Knowledge Sharing
76
point locators (addresses) from the routing information, the ability to implement dynamic
network changes will allow cloud providers and consumer to be more efficient and sustainable.
Best Practice – Build a Case to Maximize Cloud Investments
Regardless of whether the plan is to consume or provide cloud services, a best practice is to
select a vendor that provides the right infrastructure, OS, management, and security. Articulate
a compelling business case by considering these business and technical recommendations.
Best Practice - Service Providers - Maximize and sustain Cloud
Investments
The recommendation is that CEOs and business executives at cloud providers should consider
optimizing revenue with a healthy mix of small, medium-size, and large companies. Smaller
firms provider short sales cycles and quick cash, but enterprises provide long-term profitability.
Focus on monetizing your assets by starting with just one or two of the cloud flavors; do not
overstretch and do IaaS, PaaS, and SaaS out of the gate. Allow customers to cut through the
clutter and focus on cost by providing tiered pricing and a self-service portal where users can
immediately pay by plunking down a credit card.
With respect to CTOs and technical executives at cloud, providers should demonstrate that your
cloud network starts on a low-cost, fixed-priced service and quickly scales capacity. Provide a
road map for how you will scale across the IaaS, PaaS, and SaaS flavors with proper network
capacity to consume all services.
Offer cloud network service-level agreements (SLAs) that tackle accessibility, reliability, and
performance. To offer a sustainable offering, consider that cloud services are standardized, but
SLAs are customized. It is important to demonstrate that the offering can tailor SLAs and
provide business-specific granularity.
It is also a best practice to design cloud networks with visibility and quality-of-service reports for
customers to run their own reports and audits, but also dedicate ample resources to
accommodate customers auditing your services to ensure they are SAS Type II- and PCI-
compliant.
2010 EMC Proven Professional Knowledge Sharing
77
Best Practice - Enterprises - maximize and sustain Cloud Investments
It is a best practice that CIOs, line-of-business managers, and other enterprise business
executives should consider focusing on cloud services as providing cost savings in the short
term and automation and flexibility as driving competitive advantages in the long term. Identify
business processes that drive revenue or customer interactions, but do not require mission
critical infrastructure. Demonstrate business value by putting them in the cloud first. Restructure
how you position SLAs’ with business peers. Focus on being a service provider, and drive
conversations from technical SLAs to business outcome SLAs.
A best practice for Network Architects and senior infrastructure leaders is to build an
environment with a hybrid internal/public cloud in mind. Use basic building blocks like virtualized
machines and high-performance networks to ensure that you can scale quickly. Provide
granular, real-time visibility across your cloud network. This allows service-level monitoring, cost
tracking, integration with security operations, and detailed audit logs. Build identity management
hooks into the cloud to automate user provisioning; enforce proper access management of
partners, suppliers, and customers; and appease auditors.
Best Practice – Understand Information Logistics and Energy
transposition tradeoffs
You must also consider network efficiencies in terms of creating a sustainable business or
environmental model. As will be discussed in the business practices section titled “
2010 EMC Proven Professional Knowledge Sharing
78
Economics,” starting on page 162 , it will be shown that the most efficient network may not be
the most macro sustainable network.
As shown in Figure 16 - Energy in Electronic Integrated Circuits, on page 78, below, each
network switch has a line card, a card that transmits data packets over a digital network. Each
Line card has many CMOS ASICS and each ASIC had millions of CMOS gates.
Figure 16 - Energy in Electronic Integrated Circuits
As shown in Equation 7 – Energy Consumed by a CMOS ASIC and Equation 8 – Power
Consumed by a CMOS ASIC, both shown below, the energy consumed is a function of the
energy of each gate plus the product of the capacitance and the voltage level.
Equation 7 – Energy Consumed by a CMOS ASIC14
[ ] 2
21 VCEEnergy WireGate ∑∑ +=
The power consumed by the ASIC is the product of energy consumed and the data bit rate. As
you can see, as the bit rate increases, the power increases at a linear rate.
Equation 8 – Power Consumed by a CMOS ASIC15
14 IEEE ASIC Design Journal, Nov 2007 15 IEEE ASIC Design Journal, Nov 2007
2010 EMC Proven Professional Knowledge Sharing
79
[ ] xBitRateVCEPower WireGate ⎥⎦⎤
⎢⎣⎡ += ∑∑ 2
21
The good news is “Moore’s Law” benefits us in that the switching energy is decreasing over time
as shown in Figure 17 - Moore's Law - Switching Energy, shown on page 79. However, network
use is increasing even faster.
Figure 17 - Moore's Law - Switching Energy16
It is also interesting to note that in some situations, it is more energy efficient to utilize physical
transport than utilizing the IP network. Based on the equations defining power and energy
utilization, in the example of transporting large amounts of data for backup, replication or
general data movement purposes, a physical move is more efficient. Take for example the case
shown in Figure 18 - Data by physical vs. Internet transfer, on page 80. In this case, transferring
9PB of data by physically moving it would be more efficient than transferring the data over the
internet given the equivalent time interval. In addition, the number of Kg of CO2 is substantially
reduced.
16 Intel, 2007
2010 EMC Proven Professional Knowledge Sharing
80
Figure 18 - Data by physical vs. Internet transfer17
So, what are the best practices for a vendor or business in terms of an efficient network? The
first is to choose a vendor with lowest power and smallest footprint per unit (lambda, port, and
bit). Leverage long haul technologies and ROADMs (reconfigurable optical add-drop
multiplexer) to reduce intermediate regeneration. Push fiber and (passive) WDM closer to the
end user and eliminate local exchanges. Aggregate multiple service networks onto a single
optical backhaul network. Concentrate higher layer routing into fewer, more efficient data
centers and CO’s (Central Offices). Use service demarcation techniques to allow lower layer
switching, aggregation, and backhaul all the way to the core.
Infrastructure Architectures
In order to achieve efficiency and sustainability in the data center or wherever the IT entity is
located, it is important to understand the various architectures that are available. Depending on
17 Rod Tucker, ARC Special Research Centre for Ultra-Broadband Information Networks
(CUBIN)
2010 EMC Proven Professional Knowledge Sharing
81
the use case and business requirements, some architecture types may be a better fit. In some
use cases, implementing all or a subset may make sense. As shown in Figure 19 –
Sustainability Ontology – Infrastructure Architectures, shown below on page 84, this diagram
outlines the possible architectures available today.
The first is the legacy data center, which consists of a physical location with centralized IT
equipment, power, cooling and support. The others are utility computing, warehouse scale
machines and cloud computing. Cloud Computing has a few variants that will be discussed in
subsequent sections. Warehouse Scale Machines are unique in the sense that this architecture
supports specific business models. Businesses that support a few specific applications are one
thing; being able to scale to thousands of servers spanning multiple data centers across the
globe, such as Google, is another.
Datacenters are essentially very large devices that consume electrical power and produce heat.
The datacenter’s cooling system removes that heat, consuming additional energy in the
process. The heat must be removed as well. It is not surprising, then, that the bulk of the
construction costs of a datacenter are proportional to the amount of power delivered and the
amount of heat to be removed. In other words, most of the money is spent either on power
conditioning and distribution or on cooling systems.
Data Center Tier Classifications
The overall design of a datacenter is often classified as belonging to “Tier I–IV.” Tier I
datacenters have a single path for power and cooling distribution, without redundant
components. Tier II adds redundant components to this design (N + 1), improving availability.
Tier III datacenters have multiple power and cooling distribution paths but only one active path.
They also have redundant components and are concurrently maintainable, that is, they provide
redundancy even during maintenance, usually with an N + 2 setup. Tier IV datacenters have
two active power and cooling distribution paths, redundant components in each path, and are
supposed to tolerate any single equipment failure without impacting the load. These tier
classifications are not 100% precise. Most commercial datacenters fall somewhere between
tiers III and IV, choosing a balance between construction costs and reliability. Real-world
datacenter reliability is also strongly influenced by the quality of the organization running the
datacenter, not just by the datacenter’s design. Typical availability estimates used in the
2010 EMC Proven Professional Knowledge Sharing
82
industry range from 99.7% availability for tier II datacenters to 99.98% and 99.995% for tiers III
and IV, respectively.
Datacenter sizes vary widely. Two thirds of US servers are housed in datacenters smaller than
5,000 sq ft and with less than 1 MW of critical power. Most large datacenters are built to host
servers from multiple companies (often called co-location datacenters, or “colos”) and can
support a critical load of 10–20 MW. Very few datacenters today exceed 30 MW of critical
capacity.
The data center, as we know it, is changing. Not only is the data center changing physically in
terms of power, cooling and other metrics, how a business uses the information infrastructure is
changing. As shown in Figure 19 – Sustainability Ontology – Infrastructure Architectures, on
page 84, the method or architecture of data centers are changing. What this means is the
options for the business or IT department are exceedingly increasing. Not only is the standard
data center architecture available, but also options such as Utility Computing, Warehouse Scale
Data Centers as well as the various flavors of Cloud offerings are now available. So, how does
one determine what are the best options or the best mix of technologies or architectures to meet
the sustainability and business requirements? Best practices will be discussed.
“Cloud Computing” is rising quickly, with its data centers growing at an unprecedented rate.
However, this is accompanied with concerns about privacy, efficiency at the expense of
resilience, and environmental sustainability, because of the dependence on Cloud vendors such
as Google, Amazon, EMC and Microsoft. There is, however, an alternative model for the Cloud
conceptualization, providing a paradigm for Clouds in the community, utilizing networked
personal computers for liberation from the centralized vendor model.
Community Cloud Computing offers an alternative architecture, created by combining the Cloud
with paradigms from Grid Computing, principles from Digital Ecosystems, and sustainability
from Green Computing, while remaining true to the original vision of the Internet. It is more
technically challenging than Cloud Computing, dealing with distributed computing issues,
including heterogeneous nodes, varying quality of service, and additional security constraints.
However, these challenges are attainable, and with the need to retain control over our digital
lives and the potential environmental consequences, it is a challenge that should be pursued.
2010 EMC Proven Professional Knowledge Sharing
83
The recent development of Cloud Computing provides a compelling value proposition for
organizations to outsource their Information and Communications Technology (ICT)
infrastructure. However, there are growing concerns over the control ceded to large Cloud
vendors, especially the lack of information privacy. In addition, the data centers required for
Cloud Computing are growing exponentially, creating an ever-increasing carbon footprint, and
therefore raising environmental concerns. The distributed resource provision from Grid
Computing, distributed control from Digital Ecosystems, and sustainability from Green
Computing, can remedy these concerns. Therefore, Cloud Computing combined with these
approaches would provide a compelling socio-technical conceptualization for sustainable
distributed computing that utilizes the spare resources of networked personal computers to
collectively provide the facilities of a virtual data center and form a Community Cloud. This
essentially reformulates the Internet to reflect its current uses and scale, while maintaining the
original intentions for sustainability in the face of adversity. Include extra capabilities embedded
into the infrastructure to become as fundamental and invisible as moving packets is today.
Cloud Computing is likely to have the same impact on software that foundries have had on the
hardware industry. At one time, leading hardware companies required a captive semiconductor
fabrication facility, and companies had to be large enough to afford to build and operate it
economically. However, processing equipment doubled in price every technology generation. A
semiconductor fabrication line costs over $3B today, so only a handful of major “merchant”
companies with very high chip volumes, such as Intel and Samsung, can still justify owning and
operating their own fabrication lines. This motivated the rise of semiconductor foundries that
build chips for others, such as Taiwan Semiconductor Manufacturing Company (TSMC).
Foundries enable “fab-less” semiconductor chip companies whose value is in innovative chip
design: A company such as nVidia can now be successful in the chip business without the
capital, operational expenses, and risks associated with owning a state-of-the-art fabrication
line. Conversely, companies with fabrication lines can time-multiplex their use among the
products of many fab-less companies, to lower the risk of not having enough successful
products to amortize operational costs. Similarly, the advantages of the economy of scale and
statistical multiplexing may ultimately lead to a handful of Cloud Computing providers who can
amortize the cost of their large datacenters over the products of many “datacenter-less”
companies.
2010 EMC Proven Professional Knowledge Sharing
84
Figure 19 – Sustainability Ontology – Infrastructure Architectures
Cloud Overview
Cloud Computing is the use of Internet-based technologies for the provision of services,
originating from the cloud as a metaphor for the Internet, based on depictions in computer
network diagrams to abstract the complex infrastructure it conceals. It can also be seen as a
commercial evolution of the academic-oriented Grid Computing succeeding where Utility
Computing struggled while making greater use of the self-management advances of Autonomic
Computing as discussed in the section titled “Autonomic self healing systems”, on page 66.
Cloud Computing offers the illusion of infinite computing resources available on demand, with
the elimination of upfront commitment from users, and payment for the use of computing
resources on a short-term basis as needed.
2010 EMC Proven Professional Knowledge Sharing
85
Furthermore, it does not require the node providing a service to be present once its service is
deployed. It is being promoted as the cutting-edge of scalable web application development, in
which dynamically scalable and often-virtualized resources are provided as a service over the
Internet, with users having no knowledge of, expertise in, or control over the technology
infrastructure of the Cloud supporting them. It currently has significant momentum in two
extremes of the web development industry. The consumer web technology incumbents who
have resource surpluses in their vast data centers, and various consumers and start-ups that do
not have access to such computational resources. Cloud Computing conceptually incorporates
Software-as-a-Service (SaaS), Web 2.0 and other technologies with reliance on the Internet,
providing common business applications online through web browsers to satisfy the computing
needs of users, while the software and data are stored on the servers.
The cloud has three core attributes. First, clouds are built differently than traditional IT. Rather
than dedicating specific infrastructure elements to specific applications, the cloud uses shared
pools that applications can dynamically use as needed.
This pooling, can have the multifaceted benefit of saving capital expenditures since business
units share the resources and provide better application experiences, since there are more
resources based on this shared resource assuming the right QOS management infrastructure
when the application needs it.
Second, clouds are operated differently than traditional IT. Most IT management today is about
managing specific point types, devices, applications, network links, etc. Managing a cloud is all
about managing service delivery. One manages outcomes, rather than individual components.
The cloud brings the concept of "automated" to a new level that is an entirely different
operational model, biased to low-touch and zero-touch IT operational models. Please refer to
the section titled “Self Organizing Systems”, starting on page 50, for additional details.
Finally, clouds are consumed differently than traditional IT. You pay for what you use, when you
use it. It is convenient to consume. Compare that with the traditional model of having to pay for
all the physical infrastructure associated with your application, whether you are using it or not,
“pay for the power I use, rather than buying a power plant ...”
2010 EMC Proven Professional Knowledge Sharing
86
Resource Consumption,
Resource Provisioning, Coordinator
Client
Client
Client
Client
Client
Client
Client
Client
Figure 20 - Cloud Topology
Figure 20 - Cloud Topology, shown above, shows the typical configuration of Cloud Computing
at run-time when consumers visit an application served by the central Cloud, which is housed in
one or more data centers. The Cloud resources include consumption and resource provision.
The role of coordinator for resource provisioning is also included and is centrally controlled.
Even if the central node is implemented as a distributed grid, which is typical of a standard data
center, control is still centralized. Providers, who are the controllers, are usually companies with
other web activities that require large computing resources, and in their efforts to scale their
primary businesses, have gained considerable expertise and hardware. For them, Cloud
Computing is a way to resell these as a new product while expanding into a new market.
Consumers include everyday users, Small and Medium sized Enterprises (SMEs), and
ambitious start-ups whose innovation potentially threatens the incumbent providers.
2010 EMC Proven Professional Knowledge Sharing
87
IaaS(Infrastructure as a Service)
PaaS(Platform as a
Service)
SaaS(Software as a
Service)
Vendor(s)
Developers
Clients/End Users
Deliver
Deliver
Deliver
Consume
Consume
Provide
Pro
vide
P
rovi
de
Consume
Figure 21 - Cloud Computing Topology
Cloud Layers of Abstraction
While there is a significant buzz around Cloud Computing, there is little clarity over which
offerings qualify as typical use cases or their interrelation with the other solutions. The key to
resolving this confusion is the realization that the various offerings fall into different levels of
abstraction, as shown in Figure 21 - Cloud Computing Topology defined above. They are
focused at different market segments.
Infrastructure-as-a-Service (IaaS): At the most basic level of Cloud Computing offerings, there
are providers such as Amazon and Mosso who provide machine instances to developers. These
instances essentially behave like dedicated servers that are controlled by the developers, who
therefore have full responsibility for their operation. Therefore, once a machine reaches its
performance limits, the developers have to manually instantiate another machine and scale their
application out to it. This service is intended for developers who can write arbitrary software on
top of the infrastructure with only small compromises in their development methodology.
Platform-as-a-Service (PaaS): One level of abstraction above services like Google App Engine
provides a programming environment that abstracts machine instances and other technical
2010 EMC Proven Professional Knowledge Sharing
88
details from developers. The programs are executed over data centers, not concerning the
developers with matters of allocation. In exchange for this, the developers have to handle some
constraints that the environment imposes on their application design, for example, the use of
key-value stores instead of relational databases.
Note that “key-value stores” is defined as a distributed storage system for structured data that
focuses on scalability, at the expense of the other benefits of relational databases. Examples
include Google’s "BigTable" and Amazon’s SimpleDB.
Software-as-a-Service (SaaS): At the consumer-facing level are the most popular examples of
Cloud Computing, with well-defined applications offering users online resources and storage.
This differentiates SaaS from traditional websites or web applications, which do not interface
with user information (e.g. documents) or do so in a limited manner. Popular examples include
Microsoft’s (Windows Live) Hotmail, office suites such as Google Docs and Zoho, and online
business software such as Salesforce.com. We can categorize the roles of the various entities
to better understand Cloud Computing.
The vendor as resource provider has already been discussed. The application developers utilize
the resources provided, building services for the end users. This separation of roles helps define
the stakeholders and their differing interests. However, actors can take on multiple roles, with
vendors also developing services for the end users, or developers utilizing the services of others
to build their own services. Yet, within each Cloud, the role of provider, and therefore the
controller, can only be occupied by the vendor providing the Cloud.
It is also important to consider Cloud interfaces. In order to allow a business to achieve
sustainability utilizing an external or internal cloud resource, it is important to consider
developing a standard interface or API that would allow users to interoperate between various
cloud implementations as well as be able to federate each entity. This topic will be covered in
the section titled “Standards”, starting on page 135.
Cloud Type Architecture(s) Computing Concerns
The Cloud Computing model is not without concerns. They include:
2010 EMC Proven Professional Knowledge Sharing
89
Failure of Monocultures:
The uptime of Cloud Computing (defined as a measure of the time a computer system has been
running) based solutions is an advantage, when compared to businesses running their own
infrastructure, but we often overlook the co-occurrence of downtime in vendor-driven
monocultures. The use of globally decentralized data centers for vendor Clouds minimizes
failure, aiding its adoption. However, when a cloud fails, there is a cascade effect, crippling all
organizations dependent on that Cloud, and all those dependent upon them.
This was illustrated by the Amazon (S3) Cloud outage, which disabled several other dependent
businesses. Therefore, failures are now system-wide, instead of being partial or localized.
Therefore, the efficiencies gained from centralizing infrastructure for Cloud Computing are
increasingly at the expense of the Internet’s resilience.
Convenience vs. Control
The growing popularity of Cloud Computing comes from its convenience, but also brings vendor
control, an issue of ever-increasing concern. For example, Google Apps for in-house e-mail
typically provides higher uptime, but its failure highlights the issue of lock-in that comes from
depending on vendor Clouds. The even greater concern is the loss of information privacy, with
vendors having full access to the resources stored on their Clouds. Both the British and US
governments are considering a ‘G Cloud’ for government business applications. In particularly
sensitive cases of SMEs and start-ups, the provider-consumer relationship that Cloud
Computing fosters between the owners of resources and their users could potentially be
detrimental, as there is a potential conflict of interest for the providers. They profit by providing
resources to up-and-coming players, but also wish to maintain dominant positions in their
consumer facing industries.
General distrust of external service providers
As soon as you say the word "cloud" people immediately think of an ugly world where big
portions of critical IT are being put in the hands of vendors they do not know and they do not
trust, similar to out sourcing. The difference is that private clouds are about efficiency, control
and choice not to mention sustainability.
2010 EMC Proven Professional Knowledge Sharing
90
Choice means the business decides whether everything runs internally, externally, or any mix
you choose. Choice also means that you will have multiple service providers competing for your
business, and it will be easy to switch between them if you need to for some reason. Switching
between providers is an issue and is addressed in the section titled “Standards,” starting on
page 135.
Concern to virtualize the majority of servers and desktop workloads
We can understand that there may be some delay between the actual capabilities of a given
technology, and the general availability of those capabilities. Virtualization is no exception.
Fortunately, a few vendors such as EMC, and Cisco have collaborated (V-Block solutions) to
develop a set of standard designs allowing businesses to have a well-known and proven
solution. Seeing is believing, through the solution just mentioned or from one of the thousands
of enterprise IT environments that are pushing serious workloads, using and getting great
results from desktop virtualization as well.
Fully virtualized environments are hard to manage
This is true if you try to manage them with tools and processes designed for the physical IT
world. Indeed, virtualization efforts often stall because of IT leadership failing to recognize that
the operational model is very different (and vastly improved!) in the virtual world. To get to a
private cloud or indeed any virtualization at scale, the management and operational model will
have to be completely re-engineered. Please refer to the section titled “Information
Management,” starting on page 42, for additional details regarding management.
The upside in virtualizing is enormous and will have the ability to move to a more sustainable
operational and business model by responding far more quickly to changing requirements, as
well as providing far higher service levels.
Many environments can't be virtualized onto x86 and hypervisors
That is often true. All legacy applications are difficult, impractical or not worth the effort to bring
over to an Intel instruction set. The question is should having a 20 years of legacy equipment on
2010 EMC Proven Professional Knowledge Sharing
91
data center floor stop a business from moving forward? Which part of your environment is
growing faster? We would estimate that applications that are running on x86 instruction sets are
growing faster. In three years, how much of your world will be legacy, and how much on newer
platforms?
A best practice is to cap the investment in legacy, start building new applications on the new
environment, and selectively migrate from old to new when the opportunity presents itself.
Concerns on security
As discussed in the section titled “Security,” starting on page 138, there are issues, but the best
practices are discussed. It can be argued that fully virtualized environments can be made far
more secure than anything in the physical world can at a lower cost, and with less effort. It is
interesting to point out that trillions of dollars flow around the globe every day in the financial
cloud, a dynamic and federated environment of shared computing resources. So far, I have not
lost a dime.
Industry Standards
Unfortunately, usable industry standards usually develop at an absurdly abysmal pace. Even
when we have them, it is often the case that everyone implements them differently, defeating
the purpose. When it comes to private clouds, there are a few basic and usable standards in
place (i.e. OVF, the open virtualization format), with a few more coming, but it is going to take
time before we as an industry have this sorted out.
A best practice is to keep in mind open standards, but in the short-term advantage specific
technologies that do the job today (e.g.), and keep your options open. Please refer to the
section titled “Standards,” starting on page 135 for additional information.
Applications support for virtualized environments, or only the one the vendor sells
Certain software vendors, such as Oracle, have challenges in making their licensing schemes
work in virtualized environments or may claim to have support concerns. Ironically, these same
software vendors often use virtualization to develop their products. It is unfortunate since this
2010 EMC Proven Professional Knowledge Sharing
92
obstacle can be a long term deterrent in staying with that particular application. A best practice
is to vocalize your business infrastructure and sustainability message to your software vendors,
rather than conforming to theirs.
Environmental Impact Concerns
The ever-increasing carbon footprint from the exponential growth of the data centers required
for Cloud Computing within the IT industry is another concern. IT is expected to exceed the
airline industry by 2020 in terms of carbon footprint raising sustainability concerns.
The industry is being motivated to address the problem by legislation, the operational limit of
power grids (being unable to power any more servers in their data centers), and the potential
financial benefits of increased efficiency. The primary solution is the use of virtualization to
maximize resource utilization, but the problem remains. While these issues are common to
Cloud Computing, they are not flaws in the Cloud concept, but the vendor provisioning methods
and the implementation of Clouds. There are attempts to address some of these concerns, such
as a portability layer between vendor Clouds to avoid lock-in. However, this will not alleviate
issues such as inter-Cloud latency.
An open source implementation of the Amazon (EC2) Cloud, called Eucalyptus, allows data
centers to execute code compatible with Amazon’s Cloud. This allows creation of private
internal Clouds, avoiding vendor lock-in and providing information privacy, but only for those
with their own data center and so is not really Cloud Computing (which by definition is to avoid
owning data centers). Therefore, vendor Clouds remain synonymous with Cloud Computing
One solution is a possible alternative model for the Cloud conceptualization, created by
combining the Cloud with paradigms from Grid Computing, principles from Digital Ecosystems,
and sustainability from Green Computing, while remaining true to the original vision of the
Internet. This option will be covered in the section titled “Community Cloud”, starting on page
101. This cloud type is a challenged solution for the enterprise, but it may be the ultimate cloud
architecture in the long term.
One incentive for cloud computing is that it may be more environmentally friendly. First,
reducing the number of hardware components needed to run applications on the company's
internal data center and replacing them with cloud computing systems reduces energy for
2010 EMC Proven Professional Knowledge Sharing
93
running and cooling hardware. By consolidating these systems in remote centers, they can be
handled more efficiently as a group.
Second, techniques for cloud computing promote telecommuting techniques, such as remote
printing and file transfers, potentially reducing the need for office space, buying new furniture,
disposing of old furniture, having your office cleaned with chemicals and trash disposed, and so
on. They also reduce the need to drive to work and the resulting carbon dioxide emissions.
Threshold Policy Concerns
Let us suppose you have a program that does credit card validation in the cloud, and you hit the
crunch for the December buying season. Higher demand would be detected and more instances
would be created to fill that demand. As we moved out of the buying crunch, the need
diminishes and the instances of that resource would be de-allocated and put to other use.
A best practice is to test if the program works. Then, develop, or improve and implement, a
threshold policy in a pilot study before moving the program to the production environment.
Check how the policy detects sudden increases in the demand and results in the creation of
additional instances to fill in the demand. Also, check to determine how unused resources are to
be de-allocated and turned over to other work.
Interoperability issues Concerns
If a company outsources or creates applications with one cloud-computing vendor, the company
may find it is difficult to change to another computing vendor that has proprietary APIs and
different formats for importing and exporting data. This creates problems of achieving
interoperability of applications between these two cloud-computing vendors. You may need to
reformat data or change the logic in applications. Although industry cloud-computing standards
do not exist for APIs or data import and export, IBM and Amazon Web Services have worked
together to make interoperability happen.
Hidden Cost Concerns
Cloud computing does not tell you what the hidden costs are. For instance, companies could
incur higher network charges from their service providers for storage and database applications
containing terabytes of data in the cloud. This outweighs costs they could save on new
infrastructure, training new personnel, or licensing new software. In another instance of incurring
network costs, companies who are far from the location of cloud providers could experience
latency, particularly when there is heavy traffic.
2010 EMC Proven Professional Knowledge Sharing
94
Best Practice - Assess cloud storage migration costs upfront
Another hidden cost is Migration. Today, many cloud storage providers provide the basic cost
per gigabyte of capacity. For example, at the time of this publication, the basic cost for Amazon
Web Services is approximately $0.15. Pricing for Zetta starts at ~$0.25 and decreases as more
data is stored to the cloud. For archive cloud storage providers that provide additional features
like WORM (Write Once Read Many) and information lifecycle management, the basic cost is in
the realm of ~$1.00 per GB.
Therefore, a potential customer should be able to calculate how much disk storage they need
and then determine a monthly cost for storing data in the cloud. While this sounds simple, few
providers actually mention that basic storage costs are only part of the picture. The issue is
migration into the storage cloud
All providers will charge for data transfers in and out of the cloud based on the volume of data
transferred (typical cost is $0.10 per GB). Some will also charge for metadata functions such as
directory or file attribute listings, and copying or deleting files. While these metadata operation
costs are generally miniscule on a per-operation basis (maximum of $0.01 per 1,000 for
Amazon), they can add up based on the amount of users the customer has accessing cloud
storage data.
Another piece of cloud storage pricing is how a customer actually gets to the data stored in the
cloud. Some cloud storage providers, including Autonomy Zantaz and Iron Mountain Inc.,
support private data lines that connect the customer's infrastructure to the cloud storage
infrastructure. Others, such as Zetta, estimate the Telco circuit and cross-connect fees for
customer access data will add up to as much as 20% of their total cost per month. Whether or
not this will be an issue depends on the type of data storage and the customers’ access
patterns.
Perhaps the least well understood cost of cloud storage is the mass transfer of data in or out of
the cloud. Some providers, like Zetta, do not charge transfer fees for data migration into the
cloud. Others, such as Amazon, include a stated pricing plan for large-scale data transfers using
a portable medium, charging a time-based fee for the data load and a handling fee for the
portable device.
Consumers should make sure that every cloud storage request counts. Therefore, a data
migration plan is crucial and things like virus scanners, indexing services and backup software
2010 EMC Proven Professional Knowledge Sharing
95
should be carefully configured so as not to access the cloud storage medium as just another
network drive.
As the cloud continues to evolve, cloud storage providers who can provide the most
sophisticated cost analysis tools will be best suited to help potential customers accurately
determine costs. Yet customers must still look at all potential costs, including transfer, bulk load,
network and on-site appliances as discussed in the section titled “Best Practice – Understand
Information Logistics and Energy transposition tradeoffs”, starting on page 77 .
Unexpected behavior concerns Let us suppose your credit card validation application works well at your company's internal
data center. It is important to test the application in the cloud with a pilot study to check for
unexpected behavior. Examples of tests include how the application validates credit cards, and
how, in the scenario of the December buying crunch, it allocates resources and releases
unused resources, turning them over to other work. If the tests show unexpected results of
credit card validation or releasing unused resources, you will need to fix the problem before
running the application in the cloud.
Security issue concerns
In February 2008, Amazon's S3 and EC2 suffered a three-hour outage. Even though an SLA
provides data recovery and service credits for this type of outage, consumers missed sales
opportunities and executives were cut off from critical business information they needed.
Instead of waiting for an outage to occur, consumers should do security testing on their own,
checking how well a vendor can recover data. The test is very simple; no tools are needed. All
you have to do is to ask for old data you have stored and check how long it takes the vendor to
recover. If it takes too long to recover, ask the vendor why and how much service credit you
would get in different scenarios. Verify if the checksums match the original data.
Test a trusted algorithm to encrypt the data on your local computer, and then try to access data
on a remote server in the cloud using the decryption keys. If you cannot read the data once you
have accessed it, the decryption keys are corrupted, or the vendor is using its own encryption
algorithm. You may need to address the algorithm with the vendor.
2010 EMC Proven Professional Knowledge Sharing
96
Another issue is the potential for problems with data in the cloud. You may want to manage
your own private keys to protect the data. Check with the vendor on private key management.
Amazon will give you the certificate if you sign up for it.
Software development in cloud concerns
To develop software using high-end databases, the most likely choice is to use cloud server
pools at the internal data corporate center and extend resources temporarily with Amazon Web
services for testing purposes. This allows project managers to better control costs, manage
security, and allocate resources to the cloud a project is assigned to. Project managers could
also assign individual hardware resources to different cloud types: Web development cloud,
testing cloud, and production cloud. The cost associated with each cloud type may differ. The
cost per hour or usage with the development cloud is most likely lower than the production
cloud, as additional features, such as SLA and security, are allocated to the production cloud.
The managers can limit projects to certain clouds. For instance, services from portions of the
production cloud can be used for the production configuration. Services from the development
cloud can be used for development purposes only. To optimize assets at varying stages of the
project of software development, the managers can get cost-accounting data by tracking usage
by project and user. If the costs are high, managers can use Amazon EC2 to temporarily extend
resources at a very low cost, if security and data recovery issues have been resolved.
Private Cloud
First, what is a private cloud? Since this is a relatively new concept, there are many definitions.
However, the first aspect is there are no presumptions as to where applications physically run
compared to a public cloud. Applications can run in a data center the business owns and/or run
at a service provider's location.
Second, there is no requirement to rewrite the applications to get to a private cloud. Many public
clouds require that applications comply with a pre-defined software stack.
Private clouds are different through the technology of virtualization. For example, utilizing
hypervisors such as, anything that runs on an Intel instruction set can be private cloud
2010 EMC Proven Professional Knowledge Sharing
97
structured. As a result, there is no need to rewrite applications just to get to a private cloud
model.
Finally, the private cloud model assumes that control of the private cloud firmly remains in IT's
hands, and not some external service provider. IT controls service delivery, if they choose; and
security and compliance, if they choose (see section titled “Security”, starting on page 138 for
more details). IT controls the mix of internal and external resources, if they choose, or whether
they want an IaaS, PaaS, or SaaS model.
To summarize the definition of a private cloud, it is a fully virtualized computing environment
using a next-generation operational and security model with a flexible consumption model both
internal and external with IT fully in control.
Private clouds are a stepping-stone to external clouds, particularly for financial services. Many
believe that future datacenters will look like internal clouds.
Private clouds can also be designed utilizing various existing technologies federating multiple
aspects of the virtualized data center and Cloud Computing shown in Figure 22 - Using a
Private Cloud to Federate disparate architectures.
This creates what many would consider an internal or private cloud. With the cloud resources of
the external cloud and the virtualization resources of the internal cloud information can, if
properly designed, move securely across the pool of resources, and possibly across legacy
resources in the internal cloud and public cloud resources. This architectural advantage is key in
that they are never separate resources; they are all one pool. The resources are aggregated
and federated together so that applications can act on the combined resources as a single pool
of resources, just like the single pool of resources available to us today when one uses
virtualization to join servers from multiple racks in a data center. This forms the private cloud
that enables us to get the best of both worlds. The word “Private” is used because the use and
operation of the cloud resources are completely controlled and only available to the enterprise.
This cloud resource that looks and behaves just like the resources purchased in the past.
2010 EMC Proven Professional Knowledge Sharing
98
This architecture offers the advantage of achieving sustainability and efficiency; you get the best
of both worlds. You can achieve trusted, controlled reliability and security while getting the
flexibility, dynamic, on-demand, and sustainable efficiency of a cloud type architecture.
Figure 22 - Using a Private Cloud to Federate disparate architectures
Let us take it a step further and examine the core principles, or best practices that uniquely
define private cloud computing.
Best Practice – Implement a dynamic computing infrastructure Private cloud computing requires a dynamic computing infrastructure. The foundation for the
dynamic infrastructure is a standardized, scalable, and secure physical infrastructure. There
should be levels of redundancy to ensure high levels of availability, but mostly it must be easy to
extend as usage growth demands it, without requiring architecture rework.
Next, it must be virtualized. Today, virtualized environments leverage server virtualization
(typically from Microsoft, or Xen) as the basis for running services. These services need to be
easily provisioned and de-provisioned via software automation. These service workloads need
to be moved from one physical server to another as capacity demands increase or decrease.
Finally, this infrastructure should be highly utilized, whether provided by an external cloud
provider or an internal IT department. The infrastructure must deliver business value over and
above the investment.
2010 EMC Proven Professional Knowledge Sharing
99
A dynamic computing infrastructure is critical to effectively supporting the elastic nature of
service provisioning and de-provisioning as requested by users in the private cloud, while
maintaining high levels of reliability and security. The consolidation provided by virtualization,
coupled with provisioning automation, creates a high level of utilization and reuse, ultimately
yielding a very effective use of capital equipment
Best Practice – Implement an IT Service-Centric Approach Cloud computing is IT (or business) service-centric. This is in stark contrast to more traditional
system or “server”- centric models. In most cases, users of the private cloud generally want to
run some business service or application for a specific, timely purpose. IT administrators do not
want to be bogged down in the system and network administration of the environment. They
would prefer to quickly and easily access a dedicated instance of an application or service. By
abstracting away the server-centric view of the infrastructure, system users can easily access
powerful pre-defined computing environments designed specifically around their service.
An IT Service Centric approach enables user adoption and business agility. The easier and
faster a user can perform an administrative task, the more expedient the business becomes,
reducing costs, driving revenue and approaching an IT sustainable model.
Best Practice – Implement a self-service based usage Model Interacting with the private cloud requires some level of user self-service. Best of breed self-
service provides users the ability to upload, build, deploy, schedule, manage, and report on their
business services on demand within the enterprise. A self-service private cloud offering must
provide easy-to-use, intuitive user interfaces that equip users to productively manage the
service delivery lifecycle.
The benefit of self-service from the users' perspective is a level of empowerment and
independence that yields significant business agility. One benefit often overlooked from the
internal service provider's or IT team's perspective is that the more self-service that can be
delegated to users, the less administrative involvement is necessary. This saves time and
money and allows administrative staff to focus on more strategic, high-valued responsibilities.
2010 EMC Proven Professional Knowledge Sharing
100
Best Practice – Implement a minimally or self-managed platform An IT team or service provider must leverage a technology platform that is self-managed in
order to efficiently provide a cloud for their constituents. Best-of-breed clouds enable self-
management via software automation, leveraging the following capabilities as discussed in the
section titled “Information Management,” starting on page 42:
• A provisioning engine for deploying services and tearing them down recovering
resources for high levels of reuse
• Mechanisms for scheduling and reserving resource capacity
• Capabilities for configuring, managing, and reporting to ensure resources can be
allocated and reallocated to multiple groups of users
• Tools for controlling access to resources and policies for how resources can be used or
operations can be performed
All of these capabilities enable business agility while simultaneously enacting critical and
necessary administrative control. This balance of control and delegation maintains security and
uptime, minimizes the level of IT administrative effort, and keeps operating expenses low,
freeing up resources to focus on higher value projects.
Best Practice – Implement a consumption-based billing methodology Finally, private cloud computing is usage driven. Consumers pay for only what resources they
use and therefore are charged or billed on a consumption-based model. Cloud computing
platforms must provide mechanisms to capture usage information that enables chargeback
reporting and/or integration with billing systems such as a charge back system.
The value from a user's perspective is the ability for the business units to pay only for the
resources they use, ultimately keep their costs down. From a provider's perspective, it allows
them to track usage for charge back and billing purposes.
In summary, these five best practices are necessary to produce an enterprise private cloud,
capable of achieving compelling business value including savings on capital equipment and
operating costs, reduced support costs, and significantly increased business agility. This
enables corporations to improve their profit margins and competitiveness in the markets they
serve.
2010 EMC Proven Professional Knowledge Sharing
101
Public Cloud
Public cloud solutions are the most well known examples of cloud storage. In a public cloud
implementation, an organization accesses third-party resources (like Amazon S3™,EMC Atmos,
Iron Mountain®, Google™, etc.) on an as-needed basis, without the requirement to invest in
additional internal infrastructure. In this pay-per-use model, public cloud vendors provide
applications, computer platforms and storage to the public, delivering significant economies of
scale. For storage, the difference between the purchase of a dedicated local appliance and the
use of a public cloud is not the functional interface, but merely the fact that the storage is
delivered on demand.
Either the customer or business unit pays for what they actually use or in other cases, what they
have allocated for use. As an extension of the financial benefits, public clouds offer a scalability
that is often beyond what a user would be able to otherwise afford. Publicly accessible clouds
offer storage capacity using multi-tenancy solutions, meaning multiple customers are serviced at
once from the same infrastructure. This results in some common concerns when evaluating
public cloud solutions, including security and privacy, as well as the possibilities of latency and
compliance issues. When considering the use of public cloud options for data storage, pay
attention to the management, both now and in the future, of both the clouds and the data, as
well as the integration of the Cloud service usage with internal IT.
Since there are numerous white papers discussing the public cloud, I will refer these papers for
additional details. I will mention however, that the five (5) best practices outlined in the section
titled “Private Cloud”, starting on page 96, are also applicable to the public cloud.
Community Cloud
Community Clouds are digital ecosystems that distribute adaptive open socio-technical
systems, with properties of self-organization, scalability and sustainability, inspired by natural
eco-systems. This is an interesting approach to the sustainability issue, especially from a
regional perspective.
In a traditional market-based economy, made up of sellers and buyers, the parties exchange
property. In the new network-based economy, made up of servers and clients, the parties share
access to services and experiences. Digital Ecosystems support network-based economies that
rely on next-generation IT to extend the Service-Oriented Architecture (SOA) concept with the
2010 EMC Proven Professional Knowledge Sharing
102
automatic combination of available and applicable services in a scalable architecture, to meet
business user requests for applications that facilitate business processes. Digital Ecosystems
research is yet to consider scalable resource provision, and therefore risks being subsumed into
vendor Clouds at the infrastructure level, while striving for decentralization at the service level.
Therefore, the realization of their vision requires a form of Cloud Computing, but with their
principle of community-based infrastructure where individual users share ownership.
One aspect of the Community Cloud as it relates to other Cloud architectures is that Community
Clouds are less dependent on vendors and can, in the long run, achieve a higher level of
environmental sustainability. The Community Cloud approach is to combine distributed resource
provisioning from Grid Computing, distributed control from Digital Ecosystems and sustainability
from Green Computing with the use cases of Cloud Computing, while making greater use of
self-management advances from Autonomic Computing. Replacing vendor Clouds by shaping
the underutilized resources of user machines forms a Community Cloud, with nodes potentially
fulfilling all roles, consumer, producer, and most importantly coordinator, as shown in Figure 23
- Community Cloud, below.
Figure 23 - Community Cloud
The figure shows an environment that includes nodes of varying functionality of user machines
or servers/clients allowing potentially all nodes to fulfill all roles, consumer, producer, and
coordinator. This concept of the Community Cloud draws upon Cloud Computing, Grid
2010 EMC Proven Professional Knowledge Sharing
103
Computing, Digital Ecosystems, Green Computing and Autonomic Computing. This is a model
of Cloud Computing that is part of the community, without dependence on Cloud vendors.
There are a number of advantages:
1) Openness: Removing dependence on vendors makes the Community Cloud the open
equivalent to vendor Clouds, and therefore identifies a new dimension in the open versus
proprietary struggle that has emerged in code, standards and data, but has yet to be expressed
in the realm of hosted services.
2) Community: The Community Cloud is as much a social structure as a technology paradigm.
Community ownership of the infrastructure carries with it a degree of economic scalability,
without which there would be diminished competition and potential stifling of innovation as
risked in vendor Clouds.
3) Individual Autonomy: In the Community Cloud, nodes have their own utility functions in
contrast with data centers, in which dedicated machines execute software as instructed.
Therefore, with nodes expected to act in their own self-interest, centralized control would be
impractical, as with consumer electronics like game consoles. Attempts to control user
machines counter to their self-interest results in cracked systems, from black market hardware
modifications and arms races over hacking and securing the software (routinely lost by the
vendors). In the Community Cloud, where no concrete vendors exist, it is even more important
to avoid antagonizing the users, instead embracing their self-interest and harnessing it for the
benefit of the community with measures such as a community currency.
4) Identity: In the Community Cloud, each user would inherently possess a unique identity,
which combined with the structure of the Community Cloud should lead to an inversion of the
currently predominant membership model. Therefore, instead of users registering for each
website (or service) as a new user, they could simply add the website to their identity and grant
access, allowing users to have multiple services connected to their identity, instead of creating
new identities for each service. This relationship is reminiscent of recent application platforms,
such as Facebook’s f8 and Apple’s App Store, but decentralized in nature and so free from
vendor control. In addition, it allows for the reuse of the connections between users, akin to
Google’s Friend Connect, instead of reestablishing them for each new application.
2010 EMC Proven Professional Knowledge Sharing
104
5) Graceful Failures: The Community Cloud is not owned or controlled by any one organization,
and therefore not dependent on the lifespan or failure of any one organization. It therefore
should be robust and resilient to failure, and immune to the system-wide cascade failures of
vendor Clouds. Due to the diversity of its supporting nodes, their failure is graceful, non-
destructive, and with minimal downtime, as the unaffected nodes mobilize to compensate for the
failure.
6) Convenience and Control: The Community Cloud, unlike vendor Clouds, has no inherent
conflict between convenience and control. This results from its community ownership that
provides distributed control which is more democratic. However, whether the Community Cloud
can provide a technical quality equivalent or one superior to its centralized counterparts requires
further research.
7) Community Currency: The Community Cloud requires its own currency to support the sharing
of resources, a community currency, which in economics is a medium (currency) not backed by
a central authority (e.g. national government), for exchanging goods and services within a
community. It does not need to be restricted geographically, despite sometimes being called a
local currency. An example is the Fureai kippu system in Japan, which issues credits in
exchange for assistance to senior citizens. Family members living far from their parents can
earn credits by assisting the elderly in their local community, which can then be transferred to
their parents and redeemed by them for local assistance.
8) Quality of Service: Ensuring acceptable quality of service (QoS) in a heterogeneous system
will be a challenge, not least because achieving and maintaining the different aspects of QoS
will require reaching critical mass in the participating nodes and available services. Thankfully,
the community currency could support long-term promises by resource providers and allow the
higher quality providers, through market forces, to command a higher price for their service
provision. Interestingly, the Community Cloud could provide a better QoS than vendor Clouds,
utilizing time-based and geographical variations advantageously in the dynamic scaling of
resource provision.
9) Environmental Sustainability: It is anticipated that the Community Cloud will have a smaller
carbon footprint than vendor Clouds, on the assumption that making use of underutilized user
machines requires less energy than the dedicated data centers require for vendor Clouds. The
2010 EMC Proven Professional Knowledge Sharing
105
server farms within data centers are an intensive form of computing resource provision, while
the Community Cloud is more organic, growing and shrinking in a symbiotic relationship to
support the demands of the community, which in turn supports it.
10) Service Composition: The great promise of service oriented computing is that the marginal
cost of creating the nth application will be virtually zero, as all the software required already
exists to satisfy the requirements of other applications. Only their composition and orchestration
are required to produce a new application. Within vendor Clouds it is possible to make services
that expose themselves for composition and compose these services, allowing the hosting of a
complete service-oriented architecture. However, current service composition technologies have
not gained widespread adoption. Digital Ecosystems advocate service cross pollination to avoid
centralized control by large service providers, because easy service composition allows
coalitions of SMEs to compete simply by composing simpler services into more complex
services that only large enterprises would otherwise be able to deliver. So, one could extend
decentralization beyond resource provisioning and up to the service layer, to enable service
composition within the Community Cloud.
Figure 24 - Community Cloud Architecture
As shown in Figure 24 - Community Cloud Architecture, above, is an architecture in which the
most fundamental layer deals with distributing coordination. One layer above, resource
provision and consumption are arranged on top of the coordination framework. Finally, the
service layer is where resources are combined into end-user accessible services, to then
themselves be composed into higher-level services.
2010 EMC Proven Professional Knowledge Sharing
106
The concept is the distribution of server functionality between pluralities of nodes provided by
user machines, shaping underutilized resources into a virtual data center. Even though this is a
simple and straightforward idea, it poses challenges on many different levels. The approach can
be divided into three layers, Coordination, Resource, Service and Consumption.
Distributing coordination is taken for granted in homogeneous data centers where good
connectivity, constant presence and centralized infrastructure can be assumed. One layer
above, resource provisioning and consumption are arranged on top of the coordination
framework. This would also be a challenge in a distributed heterogeneous environment. Finally,
the service layer is where resources are combined into end-user accessible service(s). It is also
possible to federate these services into higher-level services.
Best Practice in Community Cloud – Use VM’s To achieve coordination, the nodes need to be deployed as isolated virtual machines, forming a
fully distributed network that can provide support for distributed identity, trust, and transactions.
Using Virtual Machines (VMs), executing arbitrary code in the machine of a resource-providing
user would require a sandbox for the guest code, and a VM to protect the host. The role of the
VM is to make system resources safely available to the Community Cloud. The Cloud
processes could be run safely without danger to the host machine. In addition to VMs, possible
platforms can include Java Virtual Machine’s, lightweight JavaScripts. The age of multi-core
processors, in many cases, has resulted in unused or underutilized cores occurring in modern
personal computers, which lend themselves well to the deployment and background execution
of Community Cloud facing VMs.
Best Practice in Community Cloud – Use Peer to Peer Networking Best Practice is to implement a P2P network. Newer P2P solutions offer sufficient guarantees of
distribution, immunity to super-peer failure, and resistance to enforced control. For example, in
the Distributed Virtual Super-Peer (DVSP) model, a collection of peers logically combines to
form a virtual super-peer that dynamically changes over time to facilitate fluctuating demands.
Best Practice in Community Cloud – Distributed Transactions A key element of distributed coordination is the ability of nodes to jointly participate in
transactions that influence their individual state. Appropriately defined business processes can
2010 EMC Proven Professional Knowledge Sharing
107
be executed over a distributed network with a transactional model maintaining the properties on
behalf of the initiator. Newer transaction models maintain these properties while increasing
efficiency and concurrency. Focusing on distributing the coordination of transactions is
fundamental to permitting multi-party service composition without centralized control.
Best Practice in Community Cloud – Distributed Persistence Storage Best Practice is to require storage on its participating nodes, taking advantage of the ever-
increasing surplus on most personal computers. However, the method of information storage in
the Community Cloud is an issue with multiple aspects. First, information can be file-based or
structured. Second, while constant and instant availability can be crucial, there are scenarios in
which recall times can be relaxed. Such varying requirements call for a combination of
approaches, including distributed storage and distributed databases. Information privacy in the
Community Cloud should be provided by the encryption of user information when on remote
nodes, only being unencrypted when accessed by the user. This allows for the secure and
distributed storage of information.
Challenges in the federation of Public and Private Clouds
Cloud computing tops Gartner's “Top 10 Strategic Technologies for 2010.” They define a
strategic technology as “one with the potential for significant impact on the enterprise in the next
three years. The fundamental challenge is that the industry has shoe horned anything that can
be loosely defined as cloud, virtual, IT consolidation, or anything on the network in the same
term being cloud. There is a trend to interchange public, private, hybrid, cloud and other variant
services.”
Gartner predicts that through 2012, “IT organizations will spend more money on private cloud
computing investments than on offerings from public cloud providers.” There are two primary
reasons why the enterprise will not make major strides towards the public cloud in the near
term– lack of visibility and multi-tenancy issues that cloak the real concern about critical data
security. Some consider that “Security” is key and could be a show stopper for public clouds at
least in the short term.
It is interesting to note that recently in the United States, the FBI raided at least two Texas data
centers, serving search-and-seizure warrants for computing equipment, including servers,
2010 EMC Proven Professional Knowledge Sharing
108
routers and storage. The FBI was seeking equipment that may have been involved in fraudulent
business practices by a handful of small VoIP vendors18.
It appears that, in the United States, if the FBI finds out that there is a threat and it is coming
from a hosted provider (i.e. cloud provider), and if the servers that are used for the scam/threat
are virtualized (cloud providers), the FBI will confiscate everything, possibly your data! The
reason is that it is much harder to figure out where the server and data is located. For additional
details, please refer to the 2010 Proven Professional article titled “How to Trust the Cloud – “Be
Careful up There” for additional information.
Lack of visibility
The public cloud is opaque and lacks a level of true accountability that will paralyze any
enterprise account from releasing their prized data assets to a set of unknown entities. Look at
the value proposition - no one consuming the service has visibility into the infrastructure. The
providers themselves are not looking at the infrastructure. Are SLAs relevant? In addition, if so,
who can enforce or even monitor them?
The public cloud has received so much buzz in large part because it professes to offer
significant cost savings over buying, deploying and maintaining an in-house IT infrastructure.
While this is massively appealing, it does not answer any of the fundamentals of Quality of
Service, network and data security, to name a few. Imagine the concern of opening up your
internal systems with a direct pipe into the ‘cloud.’
Multi-tenancy Issues
Multi-tenancy is the second reason why businesses of any real size will not make the leap to the
public cloud. Wikipedia defines multi-tenancy as “a principle in software architecture where a
single instance of the software runs on a server, serving multiple client organizations (tenants).”
In other words, many people using the same IT assets and infrastructure.
18 http://www.wired.com/threatlevel/2009/04/data-centers-ra/#ixzz0fv24gOsn
2010 EMC Proven Professional Knowledge Sharing
109
So here is the concern, EC2, Google, etc., provide true multi-tenancy but at what cost to
compliance and security? What about such hot topics such as PCI or forensics? How safe are
the tenants on a system? Who is on the same system as you, a hacker or perhaps your nearest
competition? How secure is the isolation between clients? What data have you trusted to this
cloud? If you buy the argument, it will be your patient records, payroll, client list, etc. It will be
essentially your most important data assets. Please refer to the EMC Proven Professional
article titled “How to Trust the Cloud – Be Careful up There” for more information.
Cloud computing needs to cover its assets
Until the public cloud can provide visibility all the way down to the IT infrastructures’ simplest
asset – logs - enterprises simply will not risk it. To be deployed properly, a public cloud needs to
understand logs and log management for security, business intelligence, IT optimization, PCI
forensics, parsing out billing info, and the list goes on.
Until then, in the grand scheme of risk mitigation, enterprises may fear the cloud and segment
public cloud from ITaaS in a private cloud. Most have taken all of the Cloud variants and placed
them into a single bucket. In fact, there is a tremendous value in cloud computing.
Nevertheless, public clouds and enterprise computing are a world apart and should be treated
as such. In addition, there are many risks to consider along the way. Please refer to the EMC
Proven Professional article titled “How to Trust the Cloud – Be Careful up There” for more
information.
Warehouse Scale Machines - Purposely Built Solution Options
Cloud computing, utility computing and other cloud paradigms are most certainly on IT
managers and architects lists. However, there are other architectures that should be considered
to achieve sustainability, especially in specific use cases.
The trend toward server-side computing and the exploding popularity of Internet services has
created a new class of computing systems. This architecture has been defined as warehouse-
scale computers, or WSCs. The name calls attention to the most distinguishing feature of these
machines, the massive scale of their software infrastructure, data repositories, and hardware
platform. This perspective is a departure from a view of the computing problem that implicitly
assumes a model where one program runs in a single machine. In addition, this new class deals
2010 EMC Proven Professional Knowledge Sharing
110
with a use case where a limited number of applications need to scale to an enormous scale as
with internet services.
In warehouse-scale computing, the program is an Internet service that may consist of tens or
more individual programs that interact to implement complex end-user services such as email,
search, or maps. These programs might be implemented and maintained by different teams of
engineers, perhaps even across organizational, geographic, and company boundaries as is the
case with mashups. The computing platform required to run such large-scale services bears
little resemblance to a pizza-box server or even the refrigerator-sized high-end multiprocessors
that reigned in the last decade. The hardware for such a platform consists of thousands of
individual computing nodes with their corresponding networking and storage subsystems, power
distribution and conditioning, equipment, and extensive cooling systems. The enclosure for
these systems is in fact a building structure and often indistinguishable from a large warehouse.
Had scale been the only distinguishing feature of these systems, we might simply refer to them
as datacenters. Datacenters are buildings where multiple servers and communication gear are
co-located because of their common environmental requirements and physical security needs,
and for ease of maintenance. In that sense, a WSC could be considered a type of datacenter.
Traditional datacenters, however, typically host a large number of relatively small- or medium-
sized applications, each running on a dedicated hardware infrastructure that is de-coupled and
protected from other systems in the same facility. Those datacenters host hardware and
software for multiple organizational or business units or even different companies. Different
computing systems within such a datacenter often have little in common in terms of hardware,
software, or maintenance infrastructure, and tend not to communicate with each other at all.
WSCs currently power the services offered by companies such as Google, Amazon, Yahoo, and
Microsoft’s online services division. This application requirement differs significantly from
traditional datacenters in that they belong to a single organization, use a relatively
homogeneous hardware and system software platform, and share a common systems
management layer. Often much of the application, middleware, and system software is built in-
house compared to the predominance of third-party software running in conventional
datacenters.
2010 EMC Proven Professional Knowledge Sharing
111
Most importantly, WSCs run a smaller number of very large applications (or Internet services),
and the common resource management infrastructure allows significant introduction deployment
flexibility. The requirements of homogeneity, single-organization control, and enhanced focus on
cost efficiency motivate designers to take new approaches in constructing and operating these
systems.
Best Practice – WSC’s must achieve high availability Internet services must achieve high availability, typically aiming for at least 99.99% uptime
(about an hour of downtime per year). Achieving fault-free operation on a large collection of
hardware and system software is difficult and is made more difficult by the large number of
servers involved. Although it might be theoretically possible to prevent hardware failures in a
collection of 10,000 servers, it would surely be extremely expensive. Consequently, WSC
workloads must be designed to gracefully tolerate large numbers of component faults with little
or no impact to service level performance and availability.
Best Practice - WSC’s must achieve cost efficiency Building and operating a large computing platform is expensive, and the quality of a service may
depend on the aggregate processing and storage capacity available, further driving costs up
and requiring a focus on cost efficiency. For example, in information retrieval systems such as
Web search, the growth of computing needs is driven by three main factors.
Increased service popularity translates into higher request loads. The size of the problem keeps
growing. The Web is growing by millions of pages per day, which increases the cost of building
and serving a Web index. Even if the throughput and data repository could be held constant, the
competitive nature of this market continuously drives innovations to improve the quality of
results retrieved and the frequency with which the index is updated.
Although smarter algorithms can achieve some quality improvements, most substantial
improvements demand additional computing resources for every request. For example, in a
search system that also considers synonyms of the search terms in a query, retrieving results is
substantially more expensive, either the search needs to retrieve documents that match a more
complex query that includes the synonyms or the synonyms of a term need to be replicated in
the index meta data structure for each term. The relentless demand for more computing
capabilities makes cost efficiency a primary metric of interest in the design of WSCs. Cost
2010 EMC Proven Professional Knowledge Sharing
112
efficiency must be defined broadly to account for all the significant components of cost,
including hosting-facility capital and operational expenses (which include power provisioning
and energy costs), hardware, software, management personnel, and repairs.
WSC (Warehouse Scale Computer) Attributes
Today’s successful Internet services are no longer a miscellaneous collection of machines co-
located in a facility and wired up together. The software running on these systems, such as
Gmail or Web search services, execute at a scale far beyond a single machine or a single rack.
They run on no smaller a unit than clusters of hundreds to thousands of individual servers.
Therefore, the machine, the computer, is this large cluster or aggregation of servers itself and
needs to be considered a single computing unit. The technical challenges of designing WSCs
are no less worthy of the expertise of computer systems architects than any other class of
machines. First, they are a new class of large-scale machines driven by a new and rapidly
evolving set of workloads. Their size alone makes them difficult to experiment with or simulate
efficiently; therefore, system designers must develop new techniques to guide design decisions.
Fault behavior, and power and energy considerations have a more significant impact in the
design of WSCs, perhaps more so than in other smaller scale computing platforms. Finally,
WSCs have an additional layer of complexity beyond systems consisting of individual
servers or small groups of servers; WSCs introduce a significant new challenge to programmer
productivity, a challenge perhaps greater than programming multi-core systems. This additional
complexity arises indirectly from the larger scale of the application domain and manifests itself
as a deeper and less homogeneous storage hierarchy, higher fault rates, and possibly higher
performance variability.
One Data Center vs. Several Data Centers
Multiple datacenters are sometimes used as complete replicas of the same service, with
replication being used primarily for reducing user latency and improving server throughput (a
typical example is a Web search service). In these cases, a given user query tends to be fully
processed within one datacenter, and our machine definition seems appropriate.
2010 EMC Proven Professional Knowledge Sharing
113
However, in cases where a user query may involve computation across multiple datacenters,
our single-datacenter focus is a less obvious fit. Typical examples are services that deal with
nonvolatile user data updates requiring multiple copies for disaster tolerance reasons. For such
computations, a set of datacenters might be the more appropriate system. However, think of the
multi-datacenter scenario as more analogous to a network of computers.
In many cases, there is a huge gap in connectivity quality between intra- and inter-datacenter
communications causing developers and production environments to view such systems as
separate computational resources. As the software development environment for this class of
applications evolves, or if the connectivity gap narrows significantly in the future, a need may
arise to adjust the choice of machine boundaries.
Best Practice – Use Warehouse Scale Computer Architecture designs in certain scenarios All but a few large Internet companies might consider WSCs because their sheer size and cost
render them unaffordable. This may not be true. It can be argued that the problems that today’s
large Internet services face will soon be meaningful to a much larger constituency because
many organizations will soon be able to afford similarly sized computers at a much lower cost.
Even today, the attractive economics of low-end server class computing platforms puts clusters
of hundreds of nodes within the reach of a relatively broad range of corporations and research
institutions. When combined with the trends toward large numbers of processor cores on a
single die, a single rack of servers may soon have as many or more hardware threads than
many of today’s datacenters. For example, a rack with 40 servers, each with four 8-core dual-
threaded CPUs, would contain more than two thousand hardware threads. Such systems will
arguably be affordable to a very large number of organizations within just a few years, while
exhibiting some of the scale, architectural organization, and fault behavior of today’s WSCs.
Architectural Overview of WSC’s
The hardware implementation of a WSC will differ significantly from one installation to the next.
Even within a single organization such as Google, systems deployed in different years use
different basic elements, reflecting the hardware improvements provided by the industry.
However, the architectural organization of these systems has been relatively stable over the
years.
2010 EMC Proven Professional Knowledge Sharing
114
Best Practice – Connect Storage Directly or via NAS in WSC environments With Google’s implementation, disk drives are connected directly to each individual server and
managed by a globally distributed file system. Alternately, they can be part of Network Attached
Storage (NAS) devices that are directly connected to the cluster-level switching fabric. NAS
tends to be a simpler solution to deploy initially because it pushes the responsibility for data
management and integrity to a NAS appliance vendor. In contrast, using the collection of disks
directly attached to server nodes requires a fault-tolerant file system at the cluster level. This is
difficult to implement, but can reduce hardware costs (the disks leverage the existing server
enclosure), and networking fabric utilization (each server network port is effectively dynamically
shared between the computing tasks and the file system).
Best Practice – WSC should consider using non-standard Replication Models The replication model between these approaches is also fundamentally different. A NAS
solution provides extra reliability through replication or error correction capabilities within each
appliance, whereas systems like GFS implement replication across different machines.
However, GFS-like systems are able to keep data available even after the loss of an entire
server enclosure or rack and may allow higher aggregate read bandwidth because the same
data can be sourced from multiple replicas. Trading off higher write overheads for lower cost,
higher availability, and increased read bandwidth was the right solution for many of Google’s
workloads. An additional advantage of having disks co-located with compute servers is that it
enables distributed system software to exploit data locality.
Some WSCs, including Google’s, deploy desktop-class disk drives instead of enterprise-grade
disks because of the substantial cost differential between the two. Because the data is nearly
always replicated in some distributed fashion (as in GFS), this mitigates the possibly higher fault
rates of desktop disks. Moreover, because field reliability of disk drives tends to deviate
significantly from the manufacturer’s specifications, the reliability edge of enterprise drives is not
clearly established.
Networking Fabric
Choosing a networking fabric for WSCs involves a trade-off between speed, scale, and cost.
Typically, 1-Gbps Ethernet switches with up to 48 ports are essentially a commodity component,
2010 EMC Proven Professional Knowledge Sharing
115
costing less than $30/Gbps per server to connect a single rack as of the writing of this article. As
a result, bandwidth within a rack of servers tends to have a homogeneous profile.
However, network switches with high port counts, which are needed to tie together WSC
clusters, have a much different price structure and are more than ten times more expensive (per
1-Gbps port) than commodity switches. In other words, a switch that has 10 times the bi-section
bandwidth costs about 100 times as much.
Best Practice – For WSC’s Create a Two level Hierarchy of networked switches As a result of this cost variation, the networking fabric of WSCs is often organized at the two-
level hierarchy. Commodity switches in each rack provide a fraction of their bi-section bandwidth
for interact communication through a handful of uplinks to the more costly cluster-level switches.
For example, a rack with 40 servers, each with a 1-Gbps port, might have between four and
eight 1-Gbps uplinks to the cluster-level switch, corresponding to an oversubscription factor
between 5 and 10 for communication across racks. In such a network, programmers must be
aware of the relatively scarce cluster-level bandwidth resources and try to exploit rack-level
networking locality, complicating software development and possibly affecting resource
utilization.
Alternatively, one can remove some of the cluster-level networking bottlenecks by spending
more money on the interconnect fabric. For example, Infiniband interconnects typically scale to
a few thousand ports but can cost $500–$2,000 per port. Alternatively, lower-cost fabrics can be
formed from commodity Ethernet switches by building “fat tree” networks. How much to spend
on networking vs. spending the equivalent amount on buying more servers or storage is an
application-specific question that has no single correct answer. One assumption is that intra-
rack connectivity is often cheaper than inter-rack connectivity.
Handling Failures
The sheer scale of WSCs requires that Internet services software tolerate relatively high
component fault rates. Disk drives, for example, can exhibit annualized failure rates higher than
4%. Between 1.2 and 16 average server-level restarts per year is typical. With such high
component failure rates, an application running across thousands of machines may need to
react to failure conditions on an hourly basis.
2010 EMC Proven Professional Knowledge Sharing
116
The applications that run on warehouse-scale computers (WSCs) dominate many system
design trade-off decisions. Some of the distinguishing characteristics of software that runs in
large Internet services are the system software and tools needed for a complete computing
platform. Here is some terminology to define the different software layers in a typical WSC
deployment:
Platform-level software:
Platform-level software is the common firmware, kernel, operating system distribution, and
libraries expected to be present in all individual servers to abstract the hardware of a single
machine and provide basic server-level services.
Cluster-level infrastructure is the collection of distributed systems software that manages
resources and provides services at the cluster level; ultimately, we consider these services as
an operating system for a datacenter. Examples are distributed file systems, schedulers, remote
procedure call (RPC) layers, as well as programming models that simplify the usage of
resources at the scale of datacenters, such as MapReduce, Dryad, Hadoop,
Application-level software
Application-level software is the software that implements a specific service. It is often useful to
further divide application-level software into online services and offline computations, because
those tend to have different requirements. Google search, Gmail, and Google Maps are
examples of online services. Offline computations are typically used in large-scale data analysis
or as part of the pipeline that generates the data used in online services; for example, building
an index of the Web or processing satellite images to create map files for the online service.
Best Practice - Use Sharding and other requirements in WSCs Sharding is splitting a data set into smaller fragments (shards) and distributing them across a
large number of machines. Operations on the data set are dispatched to some or all of the
machines hosting shards, and results are coalesced by the client. The sharding policy can vary
depending on space constraints and performance considerations. Sharding also helps
2010 EMC Proven Professional Knowledge Sharing
117
availability because recovery of small data fragments can be done more quickly than larger
ones.
In large-scale services, service-level performance often depends on the slowest responder out
of hundreds or thousands of servers. Reducing response-time variance is critical. In a sharded
service, load balancing can be achieved by biasing the sharding policy to equalize the amount
of work per server.
That policy may need to be informed by the expected mix of requests or by the computing
capabilities of different servers. Note that even homogeneous machines can offer different
performance characteristics to a load-balancing client if multiple applications are sharing a
subset of the load-balanced servers.
In a replicated service, a load-balancing agent can dynamically adjust the load by selecting to
which servers to dispatch a new request. It may still be difficult to approach perfect load
balancing because the amount of work required by different types of requests is not always
constant or predictable. Health checking and watchdog timers are required.
In a large-scale system, failures are often manifested as slow or unresponsive behavior from a
given server. In this environment, no operation can rely on a given server to respond to make
forward progress. In addition, it is critical to quickly determine that a server is too slow or
unreachable and steer new requests away from it. Remote procedure calls (RPC’s) must set
well-informed time-out values to abort long-running requests, and infrastructure-level software
may need to continually check connection-level responsiveness of communicating servers and
take appropriate action when needed.
Integrity checks are required. In some cases, besides unresponsiveness, faults are manifested
as data corruption. Although those may be more rare, they do occur and often in ways that
underlying hardware or software checks do not catch (e.g., there are known issues with the
error coverage of some networking CRC checks). Extra software checks can mitigate these
problems by changing the underlying encoding or adding more powerful redundant integrity
checks. See section titled “Best Practice – Implement Undetected data corruption technology
into environment”, starting on page 69 for additional details.
2010 EMC Proven Professional Knowledge Sharing
118
Best Practice – Implement application specific compression Often a large portion of the equipment costs in modern datacenters is in the various storage
layers. For services with very high throughput requirements, it is critical in this environment to fit
as much of the working set as possible in DRAM.
This makes compression techniques very important because the extra CPU overhead of
decompressing is still orders of magnitude lower than the penalties involved in going to disks.
Although generic compression algorithms can do quite well on the average, application-level
compression schemes that are aware of the data encoding and distribution of values can
achieve significantly superior compression factors or better decompression speeds. Eventual
consistency, keeping multiple replicas up to date, using the traditional guarantees offered by a
database management system, significantly increases complexity, workloads and software
infrastructure and reduces availability of distributed applications.
Fortunately, large classes of applications have more relaxed requirements and can tolerate
inconsistent views for limited periods, provided that the system eventually returns to a stable
consistent state. Response time of large parallel applications can also be improved by the use
of redundant computation techniques. Several situations may cause a given subtask of a large
parallel job to be much slower than its siblings may, either due to performance interference with
other workloads or software/hardware faults. Redundant computation is not as widely deployed
as other techniques because of the obvious overhead.
However, the completion of a large job is being held up by the execution of a very small
percentage of its subtasks in some situations. One such example is the issue of stragglers, as
described in the paper on MapReduce19. In this case, a single slower worker can determine the
response time of a huge parallel task. MapReduce’s strategy is to identify such situations
toward the end of a job and speculatively start redundant workers only on those slower jobs.
This strategy increases resource usage by a few percentage points while reducing a parallel
computation’s completion time by more than 30%.
19 http://labs.google.com/papers/mapreduce.html
2010 EMC Proven Professional Knowledge Sharing
119
Utility Computing
Utility computing can be thought of as a previous incarnation of Cloud Computing with a
business slant.
While utility computing often requires a cloud-like infrastructure, its focus is on the business
model. Simply put, a utility computing service is one in which customers receive computing
resources from a service provider (hardware and/or software) and “pay by the glass,” much as
you do for your water, electric service and other utilities at home.
Amazon Web Services (AWS), despite a recent outage, is the current incumbent for this model
as it provides a variety of services, among them the Elastic Compute Cloud (EC2), in which
customers pay for compute resources by the hour, and Simple Storage Service (S3), for which
customers pay based on storage capacity. Other utility services include Sun’s Network.com,
EMC’s recently launched storage cloud service, and those offered by startups, such as Joyent
and Mosso.
The primary benefit of utility computing is better economics. As mentioned previously, corporate
data centers are notoriously underutilized, with resources such as servers often idle 85 percent
of the time. This is due to over provisioning (buying more hardware than is needed on average
in order to handle peaks).
Any application needs a model of computation, a model of storage and, assuming the
application is even trivially distributed, a model of communication.
The statistical multiplexing necessary to achieve elasticity and the illusion of infinite capacity
requires resources to be virtualized so that the implementation of how they are multiplexed and
shared can be hidden from the programmer.
Different utility computing offerings are distinguished based on the level of abstraction
presented to the programmer and the level of management of the resources. For example,
Amazon EC2 is at one end of the spectrum. An EC2 instance looks much like physical
hardware, and users can control nearly the entire software stack, from the kernel upwards. The
API exposed is “thin”: a few dozen API calls to request and configure the virtualized hardware.
There is no a priori limit on the kinds of applications that can be hosted; the low level of
2010 EMC Proven Professional Knowledge Sharing
120
virtualization of raw CPU cycles, block-device storage, IP-level connectivity allow developers to
code whatever they want. On the other hand, this makes it inherently difficult for Amazon to
offer automatic scalability and failover, because the semantics associated with replication and
other state management issues are highly application-dependent.
AWS offers a number of higher-level managed services, including several different managed
storage services for use in conjunction with EC2, such as SimpleDB. However, these offerings
have higher latency and non-standard API’s, and our understanding is that they are not as
widely used as other parts of AWS. At the other extreme of the spectrum are application
domain-specific platforms, such as Google AppEngine and Force.com, the SalesForce business
software development platform. AppEngine is targeted exclusively at traditional web
applications, enforcing an application structure of clean separation between a stateless
computation tier and a state-full storage tier. Furthermore, AppEngine applications are expected
to be request-reply based, and as such, they are severely rationed in how much CPU time they
can use in servicing a particular request. AppEngine’s impressive automatic scaling and high-
availability mechanisms, and the proprietary MegaStore (based on “BigTable”) data storage
available to AppEngine applications, all rely on these constraints. Therefore, AppEngine is not
suitable for general-purpose computing. Similarly, SalesForce.com is designed to support
business applications that run against the salesforce.com database, and nothing else.
Microsoft’s Azure is an intermediate point on this spectrum of flexibility vs. programmer
convenience. Azure applications are written using the .NET libraries, and compiled to the
Common Language Runtime, a language independent managed environment. The system
supports general purpose computing, rather than a single category of application. Users get a
choice of language, but cannot control the underlying operating system or runtime. The libraries
provide a degree of automatic network configuration and failover/scalability, but require the
developer to declaratively specify some application properties in order to do so. Therefore,
Azure is intermediate between complete application frameworks like AppEngine on the one
hand, and hardware virtual machines like EC2 on the other.
As shown in Table 2 - Examples of Cloud Computing vendors and how each provides
virtualized resources (computation, storage), I summarize how these three classes virtualize
computation, storage, and networking. The scattershot offerings of scalable storage suggest
that scalable storage with an API comparable in richness to SQL remains an open challenge.
Amazon has begun offering Oracle databases hosted on AWS, but the economics and licensing
2010 EMC Proven Professional Knowledge Sharing
121
model of this product makes it a less natural fit for Cloud Computing. Will one model beat out
the others in the Cloud Computing space?
We can draw an analogy with programming languages and frameworks. Low-level languages
such as C and assembly language allow fine control and close communication with the bare
metal, but if the developer is writing a Web application, the mechanics of managing sockets,
dispatching requests, and so on are cumbersome and tedious to code, even with good libraries.
On the other hand, high-level frameworks such as Ruby on Rails make these mechanics
invisible to the programmer, but are only useful if the application readily fits the request/reply
structure and the abstractions provided by Rails; any deviation requires diving into the
framework at best, and may be awkward to code. No reasonable Ruby developer would argue
against the superiority of C for certain tasks, and vice versa. Different tasks will result in demand
for different classes of utility computing.
Continuing the language analogy, just as high-level languages can be implemented in lower
level ones, highly managed cloud platforms can be hosted on top of less-managed ones. For
example, AppEngine could be hosted on top of Azure or EC2; Azure could be hosted on top of
EC2. Of course, AppEngine and Azure each offer proprietary features (AppEngine’s scaling,
failover and MegaStore data storage) or large, complex API’s (Azure’s .NET libraries) that have
no free implementation, so any attempt to “clone” AppEngine or Azure would require re-
implementing those features or API’s.
Table 2 - Examples of Cloud Computing vendors and how each provides virtualized resources (computation, storage),
Virtualize
computation
Class
Google AppEngine Microsoft Azure Amazon Web
Services
Computation
model (VM)
-Predefined application
structure and framework;
programmer-provided
“handlers” written in Python,
all persistent state stored in
MegaStore (outside Python
code)
-Microsoft Common
Language Runtime
(CLR) VM; common
intermediate form
executed in managed
environment
-Machines are
-x86 Instruction Set
Architecture (ISA) via
Xen VM
-Computation
elasticity allows
scalability, but
developer must build
2010 EMC Proven Professional Knowledge Sharing
122
-Automatic scaling up and
down of computation and
storage; network and server
failover; all consistent with 3-
tier Web app structure
provisioned based on
declarative descriptions
(e.g. which “roles” can
be replicated); automatic
load balancing
the machinery, or
third party VAR such
as RightScale for
example
Storage
model
-MegaStore/BigTable -SQL Data Services
(restricted
view of SQL Server)
-Azure storage service
-Range of models
from block store
(EBS) to augmented
key/blob store
(SimpleDB)
-Automatic scaling
varies from no scaling
or sharing (EBS) to
fully automatic
(SimpleDB, S3),
depending on which
model used
-Consistency
guarantees vary
widely depending on
which model used
APIs vary from
standardized(EBS) to
proprietary
Networking
model
-Fixed topology to
accommodate 3-tier Web
app structure
-Scaling up and down is
automatic and programmer
invisible
-Automatic based on
programmer’s
declarative descriptions
of app components
(roles)
-Fixed topology to
accommodate 3-tier
Web app structure
-Scaling up and down
is automatic and
programmer invisible
2010 EMC Proven Professional Knowledge Sharing
123
Grid computing
Grid Computing is a form of distributed computing in which a virtual super computer is
composed of networked, loosely coupled computers, acting in concert to perform very large
tasks .A typical configuration is one in which resource provisioning is managed and allocated by
a group of distributed nodes . The Central virtual Super computer is where the resource is
consumed and provisioned. The role of coordinator for resource provision is also centrally
controlled.
It has been applied to computationally intensive scientific, mathematical, and academic
problems through volunteer computing, and used in commercial enterprise for such diverse
applications as drug discovery, economic forecasting, seismic analysis, and back-office
processing to support e-commerce and web services. What distinguishes Grid Computing from
cluster computing is being more loosely coupled, heterogeneous, and geographically dispersed.
In addition, grids are often constructed with general purpose grid software libraries and
middleware, dividing and apportioning pieces of a program to potentially thousands of
computers. However, what distinguishes Cloud Computing from Grid Computing is being web-
centric, despite some of its definitions being conceptually similar (such as computing resources
being consumed as electricity are from power grids).
Cloud Type Architecture Summary
In terms of cloud computing service types and the similarities and differences between cloud,
grid and other cloud types, it may be advantageous to use Amazon Web Services as an
example.
To get cloud computing to work, you need three things: thin clients (or clients with a thick-thin
switch), grid computing, and utility computing. Grid computing links disparate computers to form
one large infrastructure, harnessing unused resources. Utility computing can be considered
paying for what you use on shared servers like those that you pay for a public utility (such as
electricity, gas, and so on).
With grid computing, you can provision computing resources as a utility that can be turned on or
off. Cloud computing goes one step further with on-demand resource provisioning. This
eliminates over-provisioning when used with utility pricing. It also removes the need to over-
provision in order to meet the demands of millions of users.
2010 EMC Proven Professional Knowledge Sharing
124
Infrastructure as a Service and more
A consumer can get service from a full computer infrastructure through the Internet. This type of
service is called Infrastructure as a Service (IaaS). Internet-based services such as storage and
databases are part of the IaaS. Other types of services on the Internet are Platform as a Service
(PaaS) and Software as a Service (SaaS). PaaS offers full or partial application development
that users can access, while SaaS provides a complete turnkey application, such as Enterprise
Resource Management through the Internet.
To get an idea of how Infrastructure as a Service (IaaS) is/was used in real life, consider The
New York Times, which processed terabytes of archival data using hundreds of Amazon's EC2
instances within 36 hours. If The New York Times had not used EC2, it would have taken days
or months to process the data.
The IaaS divides into two types of usage: public and private. Amazon EC2 uses public server
pools in the infrastructure cloud. A more private cloud service uses groups of public or private
server pools from an internal corporate data center. One can use both types to develop software
within the environment of the corporate data center, and, with EC2, temporarily extend
resources at low cost, for example, for testing purposes. The mix may provide a faster way of
developing applications and services with shorter development and testing cycles.
Amazon Web services With EC2, customers create their own Amazon Machine Images (AMIs) containing an operating
system, applications, and data, and they control how many instances of each AMI run at any
given time. Customers pay for the instance-hours (and bandwidth) they use, adding computing
resources at peak times and removing them when they are no longer needed. The EC2, Simple
Storage Service (S3), and other Amazon offerings scale up to deliver services over the Internet
in massive capacities to millions of users.
Amazon provides five different types of servers ranging from simple-core x86 servers to eight-
core x86_64 servers. You do not have to know which servers are in use to deliver service
instances. You can place the instances in different geographical locations or availability zones.
Amazon allows elastic IP addresses that can be dynamically allocated to instances.
2010 EMC Proven Professional Knowledge Sharing
125
Cloud computing With cloud computing, companies can scale up to massive capacities in an instant without
having to invest in new infrastructure, train new personnel, or license new software. Cloud
computing is of particular benefit to small and medium-sized businesses who wish to completely
outsource their data-center infrastructure, or large companies who wish to get peak load
capacity without incurring the higher cost of building larger data centers internally. In both
instances, service consumers use what they need on the Internet and pay only for what they
use.
The service consumer no longer has to be at a PC, use an application from the PC, or purchase
a specific version that is configured for smart phones, PDAs, and other devices. The consumer
does not own the infrastructure, software, or platform in the cloud. The consumer has lower
upfront costs, capital expenses, and operating expenses. The consumer does not care about
how servers and networks are maintained in the cloud. The consumer can access multiple
servers anywhere on the globe, without knowing which ones and where they are located.
Grid Computing Cloud computing evolved from grid computing and provides on-demand resource provisioning.
Grid computing may or may not be in the cloud depending on what type of users are using it. If
the users are systems administrators and integrators, they care how things are maintained in
the cloud. The providers install and virtualize servers and applications. If the users are
consumers, they do not care how things are run in the system.
Grid computing requires the use of software that can divide and farm out pieces of a program as
one large system image to several thousand computers. One concern about grid is that if one
piece of the software on a node fails, other pieces of the software on other nodes may fail. This
is minimized if that component has a failover component on another node, but problems can still
arise if components rely on other pieces of software to accomplish one or more grid computing
tasks. Large system images and associated hardware to operate and maintain them can
contribute to large capital and operating expenses.
2010 EMC Proven Professional Knowledge Sharing
126
Similarities and differences Cloud computing and grid computing are scalable. Scalability is accomplished through load
balancing of application instances running separately on a variety of operating systems and
connected through Web services. CPU and network bandwidth is allocated and de-allocated on
demand. The system's storage capacity goes up and down depending on the number of users,
instances, and the amount of data transferred at a given time.
Both computing types involve multi-tenancy and multitask, meaning that many customers can
perform different tasks, accessing a single or multiple application instance. Sharing resources
among a large pool of users assists in reducing infrastructure costs and peak load capacities.
Cloud and grid computing provide service-level agreements (SLAs) for guaranteed uptime
availability of, say, 99 percent. If the service slides below the level of the guaranteed uptime
service, the consumer will get service credit for receiving data late.
The Amazon S3 provides a Web services interface for storing and retrieving data in the cloud.
Setting a maximum limits the number of objects you can store in S3. You can store an object as
small as 1 byte and as large as 5 GB or even several terabytes. S3 uses the concept of buckets
as containers for each storage location of your objects. The data is stored securely using the
same data storage infrastructure that Amazon uses for its e-commerce Web sites.
While the storage computing in the grid is well suited for data-intensive storage, it is not
economically suited for storing objects as small as 1 byte. In a data grid, the amounts of
distributed data must be large for maximum benefit.
A computational grid focuses on computationally intensive operations. Amazon Web Services in
cloud computing offers two types of instances: standard and high-CPU.
2010 EMC Proven Professional Knowledge Sharing
127
Business Practices Pillar It is important to recognize that there are tough challenges that data center managers, industry
operators, and IT businesses face as they all struggle to support their businesses in the face of
budget cuts and uncertainty about the future. It is natural that environmental sustainability is
taking a back seat in many companies at this time. However, the fact is, being “lean and green”
is good for both the business and the environment, and organizations that focus their attentions
accordingly will see clear benefits. Reducing energy use and waste improves a company’s
bottom line, and increasing the use of recycled materials is a proven way to demonstrate good
corporate citizenship to your customers, employees, and the communities in which you do
business.
That said, it is not always easy to know where to begin in moving to greener and more efficient
operations. As shown in Figure 25 – Sustainability Ontology – Business Practices, shown
below, on page 128, many methods and best practices can be implemented. The diagram
outlines the structure of how a company can achieve a high level of efficiency and sustainability
through better process improvement and management, and how conforming to standards and
addressing governance and compliance standards can help achieve this goal. With that in mind,
this section enumerates the many best business practices for environmentally sustainable
business. It is hoped that if companies follow these best practices, it will lead to optimal use of
resources and help teams and management stay aligned with the core strategies and goals of
achieving a sustainable IT.
2010 EMC Proven Professional Knowledge Sharing
128
Figure 25 – Sustainability Ontology – Business Practices
Process Management and Improvement
Best Practice - Provide incentives that support your primary goals: Incentives can help you achieve remarkable results in a relatively short period of time if you
apply them properly. Take energy efficiency as an example. A broad range of technology
improvements and best practices are already available that companies can use to improve
efficiency in the data center. However, industry adoption for these advances has been relatively
low. One possible reason is that the wrong incentives are in place. For instance, data center
2010 EMC Proven Professional Knowledge Sharing
129
managers are typically compensated based on uptime and not efficiency. Best practice is to
provide specific incentives to reward managers for improving the efficiency of their operations,
using metrics such as Power Usage Effectiveness (PUE), which determines the energy
efficiency of a data center by dividing the amount of power entering a data center by the power
used to run the computer infrastructure within it. Uptime is still an important metric, but the
incentives being appropriately balanced against the need to improve energy efficiency is the
goal[2].
Another outmoded incentive in the industry involves how data center hosting costs are allocated
back to internal organizations. Most often, these costs are allocated based on the proportion of
floor space used. These incentives drive space efficiency and ultra-robust data centers, but they
come at a high cost, and typically are not energy efficient. Space-based allocation does not
reflect the true cost of building and maintaining a data center. A best practice is to achieve
substantial efficiency gains by moving to a model that allocates costs to internal customers,
based on the proportion of energy their services consume. It is anticipated that business units
would began evaluating their server utilization data to make sure they did not already have
unused capacity before ordering more servers.
Best Practice - Focus on effective resource utilization Energy efficiency is an important element in any company’s business practices, but equally
important is the effective use of deployed resources. For example, if only 50 percent of a data
center’s power capacity is used, then highly expensive capacity is stranded in the uninterruptible
power supplies (UPSs), generators, chillers, and so on. In a typical 12 Megawatt data center,
this could equate to $4-8 million annually in unused capital expenditure [3]. In addition, there is
embedded energy in the unused capacity, since it takes energy to manufacture the UPSs,
generators, chillers, and so on. Stranding capacity will also force organizations to build
additional data centers sooner than necessary.
Best Practice - Use virtualization to improve server utilization and increase operational efficiency As noted in the best practice above, underutilized servers are a major problem facing many data
center operators. In today’s budgetary climate, IT departments are being asked to improve
efficiency, not only from a capital perspective, but also with regard to operational overhead. By
migrating applications from physical to virtual machines and consolidating these applications
2010 EMC Proven Professional Knowledge Sharing
130
onto shared physical hardware, it is quite common to see several instances in data centers
where server resources are under-utilized. Industry analysts have reported that utilization levels
are often well below 20 percent. Utilizing technologies such as a Hyper-V to increase
virtualization and therefore utilization year after year, in turn helps increase the productivity per
watt of our operations. Utilizing infrastructure architectures such as Amazon, Microsoft’s
Windows Azure cloud operating system, and EMC’s VCE Cloud virtualization are a best
practice.
One immediate benefit of virtual environments is improved operational efficiency. Operations
teams can deploy and manage servers in a fraction of the time it would take to deploy the
equivalent physical hardware or perform a physical configuration change. In a virtual
environment, managing hardware failures without disrupting service is as simple as a click of a
button or automated trigger, which rolls virtual machines from the affected physical host to a
healthy host.
A server running virtualization will often need more memory to support multiple virtual machines,
and there is small software overhead for virtualization. However, the overall value proposition
measured in terms of work done per cost and per watt is much better than the dedicated
underutilized physical server case.
Key benefits of virtualization include:
• Reduction in capital expenditures
• Decrease in real estate, power, and cooling costs
• Faster time to market for new products and services
• Reduction in outage and maintenance windows
Best Practice - Drive quality up through compliance: Many data center processes are influenced by the need to meet regulatory and security
requirements for availability, data integrity, and consistency. See section titled “Compliance” on
page 146 for additional information. Quality and consistency are tightly linked and can be
managed through a common set of processes. Popular approaches to increasing quality are
almost without exception tied to observing standards and reducing variability.
2010 EMC Proven Professional Knowledge Sharing
131
A continuous process helps maintain the effectiveness of controls as your environment
changes. Compliance boils down to developing a policy and then operating consistently as
measured against that policy. The extended value that can be offered by standardized,
consistent processes that address compliance will also help you achieve higher quality benefits.
A best practice is to achieve certification to the international information security standard,
ISO/IEC 27001:2005. For instance, through monitoring one’s data center systems for policy
compliance, many companies have exposed processes that were causing problems, and found
opportunities for improvements that benefitted multiple projects. As outlined in Figure 26 - A
continuous process helps maintain the effectiveness of controls as your environment changes.
This is a best practice.
PerformControl
TestControl
DesignControl
Analyze and Correct
Document Issues
Feedback to Design
Figure 26 - A continuous process helps maintain the effectiveness of controls as your environment changes
Best Practice - Embrace change management Poorly planned changes to the production environment can have unexpected and sometimes
disastrous results, which can then spill over into the planet’s environment when the impacts
involve lower energy utilization and other inefficient use of resources. Changes may involve
hardware, software, configuration, or process. Standardized procedures for a request, approval,
coordination, and execution of changes can greatly reduce the number and severity of
2010 EMC Proven Professional Knowledge Sharing
132
unplanned outages. Data center organizations should adopt and maintain repeatable, well-
documented processes, where the communication of planned changes enables teams to
identify risks to dependent systems and develop appropriate workarounds in advance.
Figure 27 - Consistent and well-documented processes help ensure smooth changes in the production environment
A best practice is to manage changes to a data center’s hardware and software infrastructure
through a review and planning process that is based on the Information Technology
Infrastructure Library (ITIL) framework. An example of this type of process is shown in Figure 27
- Consistent and well-documented processes help ensure smooth changes in the production
environment. Proposed changes are reviewed prior to approval to ensure that sufficient
diligence has been applied. Additionally, planning for recovery in the case of unexpected results
is crucial. Rollback plans must be scrutinized to ensure that all known contingencies have been
considered. When developing a change management program, it is important to consider the
influences of people, processes, and technology. By employing the correct level of change
management, businesses can increase customer satisfaction and improved service level
performance without placing undue burden on its operations staff.
Other features that your change management process should include:
2010 EMC Proven Professional Knowledge Sharing
133
• Documented policies around communication and timeline requirements
• Standard templates for requesting, communicating, and reviewing changes
• Post-implementation review, including cases where things went well
Best Practice - Invest in understanding your application workload and behavior: The applications in your environment and the particulars of the traffic on your network are
unique. The better you understand them, the better positioned you will be to make
improvements. Moving forward in this regard requires hardware engineering and performance
analysis expertise within your organization, so you should consider staffing accordingly.
Credible and competent in-house expertise is needed to properly evaluate new hardware,
optimize your request for proposal (RFP) process for servers, experiment with new
technologies, and provide meaningful feedback to your vendors. Once you start building this
expertise, the first goal is to focus your team on understanding your environment, and then
working with the vendor community. Make your needs known to them as early as possible. It is
an approach that makes sense for any company in the data center industry that is working to
increase efficiency. If you do not start with efficient servers, you are just going to pass
inefficiencies down the line.
Best Practice - Right-size your server platforms to meet your application requirements Another best practice in data centers involves “right-sizing the platform.” This can take two
forms. One is where you work closely with server and other infrastructure manufacturers to
optimize their designs and remove items you don’t use, such as more memory slots and
input/output (I/O) slots than you need, and focus on high efficiency power supplies and
advanced power management features. With the volume of servers that many large
corporations purchase, most manufacturers are open to meeting these requests, as well as
partner with us to drive innovation into the server space to reduce resource consumption even
further. Of course, not all companies purchase servers on a scale where it makes sense for
manufacturers to offer customized stock-keeping units (SKUs). That is where the second kind of
right-sizing comes in. It involves being disciplined about developing the exact specifications that
you need servers to meet for your needs, and then not buying machines that exceed your
specifications. It is often tempting to buy the latest and greatest technology, but you should only
do so after you have evaluated and quantified whether the promised gains provide an
acceptable return on investment (ROI).
2010 EMC Proven Professional Knowledge Sharing
134
Consider that you may not need the latest features server vendors are selling. Understand your
workload and then pick the right platform. Conventional wisdom has been to buy something
bigger than your current needs so you can protect your investment. However, with today’s rapid
advances in technology, this can lead to rapid obsolescence. You may find that a better
alternative is to buy for today's needs and then add more capacity as and when you need it.
Also, look for opportunities to use a newer two-socket quad-core platform to replace an older
four-socket dual-core, instead of overreaching with newer, more capable four-socket platforms
with four or six cores per socket. Of course, there is no single answer. Again, analyze your
needs and evaluate your alternatives.
Best Practice - Evaluate and test servers for performance, power, and total cost of ownership A best practice, and what many large corporations are doing in the procurement process, is
usually built around testing. Hardware teams run power and performance tests on all “short list”
candidate servers, and then calculate the total cost of ownership, including power usage
effectiveness (PUE) for energy costs. The key is to bring the testing in-house so you can
evaluate performance and other criteria in your specific environment and on your workload. It is
important to not rely on benchmark data, which may not be applicable to your needs and
environment.
For smaller organizations that do not have resources to do their own evaluation and testing,
SPECpower_ssj2008 (the industry-standard SPEC benchmark that evaluates the power and
performance characteristics of volume server class computers) can be used in the absence of
anything else to estimate workload power. In addition to doing its own tests, Microsoft requests
this data from vendors in all of its RFPs. For more information, visit the Standard Performance
Evaluation Corp. web site at www.spec.org/specpower.
Best Practice - Converge on as small a number of stock-keeping units (SKUs) as you can A best practice is to make leading data center initiatives move to a server “standards” program
where internal customers choose from a consolidated catalogue of servers. Narrowing the
number of SKUs can allow IT departments to make larger volume buys, thereby cutting capital
costs. However, perhaps equally important, it helps reduce operational expenditures and
complexities around installing and supporting a variety of models. This increases operational
2010 EMC Proven Professional Knowledge Sharing
135
consistency and results in better pricing, as long-term orders are more attractive to vendors.
Finally, it provides exchangeable or replaceable assets. For example, if the demand for one
online application decreases while another increases, it is easier to reallocate servers as
needed with fewer SKUs.
Best Practice - Take advantage of competitive bids from multiple manufacturers to foster innovation and reduce costs. Competition between manufacturers encourages thorough ongoing analysis of proposals from
multiple companies. That puts most of the weight on price, power, and performance. A best
practice is to develop hardware requirements and then share them with multiple manufacturers.
Then, work actively to develop an optimized solution. Energy efficiency, power consumption,
cost effectiveness and application performance per watt each play key roles in hardware
selection. The competition motivates manufacturers to be price competitive, drive innovation,
and provide the most energy efficient, lowest total cost of ownership (TCO) solutions. In many
cases, online services do not fully use the available performance. Hence, it makes sense to give
more weight to price and power. It is important to remember that power affects not only energy
consumption costs, but also data center capital allocation costs.
Standards
To achieve sustainability, utilizing the various architectures described in the section titled
“Infrastructure Architectures, starting on page 80, a best practice is to create standards allowing
the various technologies to not only interoperate, but allow the business to not get “locked-in” to
a particular vendor or strategy.
Focusing on cloud computing architectures, this technology is an approach in delivering IT
services that promises to be highly agile and lower costs for consumers, especially up-front
costs. This approach impacts not only the way computing is used, but also the technology and
processes that are used to construct and manage IT within enterprises and service providers.
Coupled with the opportunities and promise of cloud computing are elements of risk and
management complexity. Adopters of cloud computing should consider asking questions such
as:
• How do I integrate computer, network, and storage services from one or more cloud
service providers into my business and IT processes?
• How do I manage security and business continuity risk across several cloud providers?
2010 EMC Proven Professional Knowledge Sharing
136
• How do I manage the lifecycle of a service in a distributed multiple-provider environment
in order to satisfy service-level agreements (SLAs) with my customers?
• How do I maintain effective governance and audit processes across integrated
datacenters and cloud providers?
• How do I adopt or switch to a new cloud provider?
The definitions of cloud computing, including private and public clouds, Infrastructure as a
Service (IaaS), and Platform as a Service (PaaS) are taken from work by the National Institute
of Standards and Technology (NIST). In part, NIST defines cloud computing as “a model for
enabling convenient, on-demand network access to a shared pool of configurable computing
resources (for example, networks, servers, storage, applications, and services) that can be
rapidly provisioned and released with minimal management effort or service provider
interaction.”
NIST defines four cloud deployment models:
• Public clouds (cloud infrastructure made available to the general public or a large
industry group)
• Private clouds (cloud infrastructure operated solely for an organization)
• Community clouds (cloud infrastructure shared by several organizations)
• Hybrid clouds (cloud infrastructure that combines two or more clouds)
There is an Open Cloud Standards Incubator project going on which includes all of the
deployment models defined above. The focus of the project is management aspects of IaaS,
with some work involving PaaS. These aspects include service-level agreements (SLAs), quality
of service (QoS), workload portability, automated provisioning, and accounting and billing.
The fundamental IaaS capability made available to cloud consumers is a cloud service.
Examples of services are computing systems, storage capacity, and networks that meet
specified security and performance constraints. Examples of consumers of cloud services are
enterprise datacenters, small businesses, and other clouds.
Many existing and emerging standards will be important in cloud computing. Some of these,
such as security-related standards, apply generally to distributed computing environments.
2010 EMC Proven Professional Knowledge Sharing
137
Others apply directly to virtualization technologies that are expected to be important building
blocks in cloud implementations.
The dynamic infrastructure enabled by virtualization technologies aligns well with the dynamic
on-demand nature of clouds. Examples of standards include SLA management and compliance,
federated identities and authentication, and cloud interoperability and portability.
Best Practice - Use standard interfaces to Cloud Architectures There are multiple competing proposals for interfaces to clouds and given the embryonic stage
of the industry, it is important for users to insist that cloud providers use standard interfaces to
provide flexibility for future extensions and to avoid becoming locked into a vendor. With the
backing of key players in the industry, this aspect of portability is a primary value that standards-
based cloud infrastructure offers. Three scenarios show how cloud consumers and providers
may interact using interoperable cloud standards. These scenarios are examples only; many
more possibilities exist.
1. Building flexibility to do business with a new provider without excessive effort or cost
2. Ways that multiple cloud providers may work together to meet the needs of a consumer
of cloud services
3. How different consumers with different needs can enter into different contractual
arrangements with a cloud provider for data storage services
As previously discussed, many standards bodies are rallying to generate a common standard
allowing varying cloud offerings to interoperate and federate. The DMTF for example, is working
with affiliated industry organizations such as the Open Grid Forum, Cloud Security Alliance,
TeleManagement Forum (TMF), Storage Networking Industry Association (SNIA), and National
Institute of Standards and Technology (NIST). The DMTF has also established formal
synergistic relationships with other standards bodies. The intent of these alliance partnerships is
to provide mutual benefit to the organizations and the related standards bodies.
Alliances play an important role in helping the cloud community to provide a unified view of
management initiatives. For example, SNIA has produced an interface specification for cloud
storage. The Open Cloud Standards Incubator will not only leverage that work but also
collaborate with SNIA to ensure consistent standards. The Incubator expects to leverage
2010 EMC Proven Professional Knowledge Sharing
138
existing DMTF standards including Open Virtualization Format (OVF), Common Information
Model (CIM), CMDB Federation (CMDBf), CIM Simplified Policy Language (CIM-SPL), and the
DMTF's virtualization profiles, as well as standards from affiliated industry groups.
The ultimate goal of the Open Cloud Standards is to enable portability and interoperability
between private clouds within enterprises and hosted or public cloud service providers. A first
step has been initiated in the development of use cases, a service lifecycle, and reference
architecture. This is still a work in progress, but in order for any business to utilize cloud
architectures and leverage these architectures to achieve efficiency and sustainability, an
interface standard is required.
Security
With respect to sustainability and new technologies, such as Cloud Computing, moving to a new
business model such as going to the cloud offers economies of scale and flexibility that are both
good and bad from a security point of view. The massive concentrations of resources and data
present a more attractive target to attackers, but cloud-based defenses can be more robust,
scalable and cost-effective. For a more detailed discussion focusing on the details on Cloud
security, please refer to the Proven Professional article titled “How to Trust the Cloud – “Be
Careful up There””.
The new cloud economic/sustainability model has also driven technical change in terms of:
Scale: Commoditization and the drive towards economic sustainability and efficiency have led
to massive concentrations of the hardware resources required to provide services. This
encourages economies of scale for all the kinds of resources required to provide computing
services.
Architecture: Optimal resource use demands computing resources that are abstracted from
underlying hardware. Unrelated customers who share hardware and software resources rely on
logical isolation mechanisms to protect their data. Computing, content storage and processing
are massively distributed. Global markets for commodities demand edge distribution networks
where content is delivered and received as close to customers as possible. This tendency
towards global distribution and redundancy means resources are usually managed in bulk, both
physically and logically.
2010 EMC Proven Professional Knowledge Sharing
139
Given the reduced cost and flexibility it brings, a migration to cloud computing is compelling for
many SMEs. However, there can be concerns for SMEs migrating (also see “Best Practice -
Assess cloud storage migration costs upfront on page 94 for additional information) to the cloud
including the confidentiality of their information and liability for incidents involving the
infrastructure
Following are some best practices for managing trust in public and private clouds:
Best Practice – Determine if cloud vendors can deliver on their security claims
Because information security is only as strong as its weakest link, it is essential for
organizations to evaluate the quality of their cloud vendors. Having a high-profile “brand name”
vendor and an explicit SLA is not enough. Organizations must aggressively verify whether cloud
vendors can deliver upon and validate their security claims. Enterprises must make a firm
commitment that they will protect the information assets outside their corporate IT environment
to at least the same high standard of security that would apply if those same information assets
were preserved in-house. In fact, because these assets are stored outside the organization, it
could be argued that the standard for protection should be even higher. Security practitioners
must be particularly diligent in assessing the security profiles of those cloud vendors entrusted
with highly sensitive data or mission-critical functions.
Best Practice - Adopt federated identity policies backed by strong authentication practices A federated identity allows a user to access various web sites, enterprise applications and cloud
services using a single sign-on. Federated identities are made possible when organizations
agree to honor each other’s trust relationships, not only in terms of access but also in terms of
entitlements. Establishing “ties of federation” agreements between parties to share a set of
policies governing user identities, authentication and authorization, provides users with a more
convenient and secure way of accessing, using and moving between services, whether those
services reside in the enterprise or in a cloud. Federated identity policies go hand-in-hand with
strong authentication policies. Whereas federation policies bridge the trust gap between
members of the federation, strong authentication policies bridge the security gap, creating the
secure access infrastructure to bring all members of the community together.
2010 EMC Proven Professional Knowledge Sharing
140
The federation of identity and authentication policies will eventually become standard practice in
the cloud, not just because users will demand it but as a matter of convenience. For
organizations, federation also delivers cost benefits and improved security. Companies can
centralize the access and authentication systems maintained by separate business units. They
can reduce potential points of threat, such as unsafe password management practices, as users
will no longer have to enter credentials and passwords in multiple places. For federated identity
policies to become more widely used, the information technology and security industry will have
to knock down barriers to implementing such policies. So far, it appears the barriers are not
economic or technological, but trust-related. Federated identity models, like the strong
authentication services that enforce them, are only as strong as their weakest link. Each
member of the federation must be trusted to comply with the group’s security policies.
Expanding the circle of trust means expanding the threat surface where problems could arise
and increasing the potential for single points of failure in the community of trust. The best way of
ensuring that trust and security are preserved within communities of federation is to require all
community members to enforce a uniform, acceptable level of strong authentication. Some IT
industry initiatives are attempting to establish security standards that facilitate federated
identities and authentication. For instance, the OASIS Security Services Technical Committee
has developed the Security Assertion Markup Language (SAML), an XML-based standard for
exchanging authentication and authorization data between security domains, to facilitate web
browser single sign-on. SAML appears to be evolving into the definitive standard for enterprises
deploying web single sign-on solutions.
Best Practice – Preserve segregation of administrator duties While data isolation and preventing data leakage are essential, enterprise systems
administrators still need appropriate levels of access to manage and configure their company’s
applications within the shared infrastructure. Furthermore, in addition to systems administrators
and network administrators, private clouds introduce a new function into the circle of trust: the
cloud administrator. Cloud administrators, the IT professionals working for the cloud provider,
need sufficient access to an enterprise’s virtual facilities to optimize cloud performance while
being prevented from tapping into the proprietary information they are hosting on behalf of their
tenants. Enterprises running private clouds on hosted servers should consider requiring that
2010 EMC Proven Professional Knowledge Sharing
141
their data center operator disable all local administration of hypervisors, using a central
management application instead to better monitor and reduce risks of unauthorized
administrator access.
As an added security measure, enterprises should preserve a separation of administrator duties
in the cloud. The temptation may be to consolidate duties, as many functions can be centrally
administered from the cloud using virtualization management software. However, as with
physical IT environments, in which servers, networks and security functions are split among
several administrators or departments, segregating those functions within the cloud can provide
added security by decentralizing control. Furthermore, organizations can use centralized
virtualization management capabilities to limit administrative access, define roles and
appropriately assign privileges to individual administrators. By segregating administrator duties
and employing a centralized virtualization management console, organizations can safeguard
their private clouds from unauthorized administrator access.
Best Practice - Set clear security policies Set clear policies to define trust and be equipped to enforce them. In a private cloud, trust
relationships are defined and controlled by the organization using the cloud. While every party in
the trust relationship will naturally protect information covered by government privacy and
compliance regulations, employee tax ID numbers, proprietary financial data, etc., organizations
will also need to set policies for how other types of proprietary data are shared in the cloud. For
instance, a corporation may classify information such as purchase orders or customer
transaction histories as highly sensitive, even as trade secrets, and may establish risk-based
policies for how cloud providers and business partners store, handle and access that data
outside the enterprise. For “Trust” relationships to work, there must be clear, agreed-upon
policies for what information is privileged, how that data is managed and how cloud providers
will report and validate their performance in enforcing the standards set by the organization.
These agreed-upon standards must be enforced by binding service level agreements (SLAs)
that clearly stipulate the consequences of security breaches and service agreement violations.
Best Practice - Employ data encryption and tokenization The cloud provider in online backups sometimes stores enterprise data used in cloud
applications. Encrypting data is often the simplest way to protect proprietary information against
unauthorized access, particularly by administrators and other parties within the cloud.
2010 EMC Proven Professional Knowledge Sharing
142
Organizations should encrypt data residing with or accessible to cloud providers. As in
traditional enterprise IT environments, organizations should encrypt data in applications at the
point of ingest. Additionally, they should ensure cloud vendors support data encryption controls
that secure every layer of the IT stack. Segregate sensitive data from the users or identities they
are associated with as an additional precaution to secure data residing in clouds. For instance,
companies storing credit card data often keep credit card numbers in separate databases from
where cardholders’ personal data is stored, reducing the likelihood that security breaches will
result in fraudulent purchases. Companies also can protect sensitive cardholder information in
the cloud through a form of data masking called tokenization. This method of securing data
replaces the original number with a token value that has no explicit relationship to the original
value. The original card number is kept in a separate, secure database called a vault.
Best Practice - Manage policies for provisioning virtual machines Best Practice is to secure their virtual infrastructure; companies using private clouds must be
able to oversee how virtual machines are provisioned and managed within their clouds. In
particular, managing virtual machine identities is crucial, as they are used for basic
administrative functions, such as identifying the systems and people with which virtual machines
are physically associated, and moving software to new host servers. Organizations establishing
a security position based on virtual machine identities should know how those identities are
created, validated and verified, and what safety measures their cloud vendors have taken to
safeguard those identities. Additionally, information security leaders should set their identity
access and management policies to grant all users, whether human or machine, the lowest level
of access needed for each to perform their authorized functions within the cloud.
Best Practice – Require transparency into cloud operations to ensure multi-tenancy and data isolation In the virtualized environment of the cloud, many different companies, or “tenants,” may share
the same physical computing, storage and network infrastructure. Cloud providers need to
ensure isolation of access that software, data and services can be safely partitioned within the
cloud and that tenants sharing physical facilities cannot tap into their neighbors’ proprietary
information and applications.
The best way to ensure secure data isolation and multi-tenancy is to partition access to
appropriate cloud resources for all tenants. Cloud vendors should furnish log files and reports of
2010 EMC Proven Professional Knowledge Sharing
143
user activities. Some cloud vendors are able to provide an even higher degree of visibility
through applications that allow enterprise IT administrators to monitor the data traversing their
virtual networks and to view events within the cloud in near real time. Specific performance
metrics should be written into managed service agreements and enforced with financial
consequences if those agreed-upon performance conditions are not upheld.
Organizations and businesses with private clouds should work with cloud vendors to ensure
transferability of security controls. In other words, if data or virtual resources are moved to
another server or to a backup data center, the security policies established for the original
server or primary data center should automatically be implemented in the new locations.
Governance
The ability to govern and measure enterprise risk within a company-owned data center is
difficult and it seems surprisingly still in the early stages of maturation in most organizations.
Cloud computing brings new unknowns to governance and enterprise risk.
Online agreements and contracts of this type for the most part are still untested in a court of law
and consumers have yet to experience an extended outage of services that they may someday
determine to need on a 24/7 basis. Questions still remain about the ability of user organizations
to assess the risk of the provider through onsite assessments.
The storage and use of information considered sensitive by nature may be allowed, but it could
be unclear as to who is responsible in the event of a breach. If both the code authored by the
user and the service delivered by the provider are flawed, who is responsible? Current statutes
cover the majority of the United States but how are the laws of foreign countries, especially the
European Union, to be interpreted in the event of disputes? Many questions remain with respect
to Cloud Governance and Enterprise Risk.
Best Practices – Do your due diligence of your SLAs Cloud consumers considering using Cloud Services should perform in depth due diligence prior
to the execution of any service “Terms of Service,” “Service Level Agreements” (SLAs), or use.
This due diligence should assess the arrangement of risks known at present and abilities of
partners to work within and contribute to the customer’s enterprise risk management program
for the length of the engagement. Some recommendations include:
2010 EMC Proven Professional Knowledge Sharing
144
1. Consider creating a Private (Virtual) Cloud or a Hybrid Cloud that provides the
appropriate level of controls while maintaining risk at an acceptable level.
2. Review what type of provider you prefer, such as software, infrastructure or platform.
Gain clarity on how pricing is performed with respect to bandwidth and CPU utilization in
a shared environment. Compare usage as measured by the cloud service provider with
your own log data, to ensure accuracy.
3. Request clear documentation on how the facility and services are assessed for risk
and audited for control weaknesses, the frequency of assessments and how control
weaknesses are mitigated in a timely manner. Ask the service provider if they make the
results of risk assessments available to their customers.
4. Require the definition of what the provider considers critical success factors, key
performance indicators, and how they measure them relative to IT Service Management
(Service Support and Service Delivery).
5. Require a listing of all provider third party vendors, their third party vendors, their roles
and responsibilities to the provider, and their interfaces to your services.
6. Request divulgence of incident response, recovery, and resiliency procedures for any
and all sites and associated services.
7. Request a review of all documented policies, procedures and processes associated
with the site and associated services assessing the level of risk associated with the
service.
8. Require the provider to deliver a comprehensive list of the regulations and statutes
that govern the site and associated services, and how compliance with these items is
executed.
9. Perform full contract or terms of use due diligence to determine roles, responsibilities,
and accountability. Ensure legal counsel review, including an assessment of the
enforceability of local contract provisions and laws in foreign or out-of-state jurisdictions.
10. Determine whether due diligence requirements encompass all material aspects of
the cloud provider relationship, such as the provider’s financial condition, reputation
(e.g., reference checks), controls, key personnel, disaster recovery plans and tests,
insurance, communications capabilities and use of subcontractors.
Request a scope of services including:
• Performance standards
2010 EMC Proven Professional Knowledge Sharing
145
• Rapid provisioning – de-provisioning
• Methods of multi-tenancy and resource sharing
• Pricing
• Controls
• Financial and control reporting
• Right to audit
• Ownership of data and programs
• Procedures to address a Legal Hold
• Confidentiality and security
• Regulatory compliance
• Indemnification
• Limitation of liability
• Dispute resolution
• Contract duration
• Restrictions on, or prior approval for, subcontractors
• Termination and assignment, including timely return, of data in a machine-
readable format
• Insurance coverage
• Prevailing jurisdiction (where applicable)
• Choice of Law (foreign outsourcing arrangements)
• Regulatory access to data and information necessary for supervision
• Business Continuity Planning.
Consumers, Businesses, Cloud Service Providers, and Information Security and Assurance
professionals must collaborate to focus on the potential issues and solutions listed above, and
to discover the holes. The Cloud Security Alliance (CSA), one of the standards bodies outlined
in the section titled “Standards”, starting on page 135, calls for collaboration in setting standard
terms and requirements that drive governance and enterprise risk issues to a mature and
acceptable state allowing for negotiation. The CSA is working to address these issues so
businesses can take full advantage of the nimbleness, expansive service options, flexible
pricing and cost savings of Cloud Services to achieve a sustainable IT solution.
2010 EMC Proven Professional Knowledge Sharing
146
Compliance
With cloud computing resources as a viable and cost effective means to outsource entire
systems and increase sustainability, maintaining compliance with your security policy and the
various regulatory and legislative requirements to which your company has adhered can
become even more difficult to demonstrate. The cost of auditing compliance is likely to increase
without proper planning. With that in mind, it is imperative to consider all of your requirements
and options prior to progressing with cloud computing plans [6].
Best Practice - Know Your Legal Obligations Best Practice is your organization must fully understand all of the necessary legal requirements.
The regulatory landscape is typically dictated by the industry in which you reside. Depending on
where your organization operates, you are likely subject to a lengthy collection of legislation that
governs how you treat specific types of data, and it is your obligation to understand it and
remain compliant. Without understanding your obligations, an organization cannot formulate its
data processing requirements. It is a best practice to engage internal auditors, external auditors,
and legal counsel to ensure that nothing is left out.
Best Practice - Classify / Label your Data & Systems Your company must classify date to adequately protect it. Considering the regulatory and
legislative requirements discussed earlier, your organization needs to classify its data to isolate
what data requires the most stringent protection from the public, or otherwise less sensitive
data. The data and systems must also be clearly labeled and the processes surrounding the
handling of the data formalized. At this point, your organization can consider cloud-computing
resources for data and systems not classified at a certain level, which would be subject to
burdensome regulatory requirements.
Best Practice - External Risk Assessment A third party risk assessment of the systems and data being considered for cloud resources
should be conducted to ensure all risks are identified and accounted for. This includes a Privacy
Impact Assessment (PIA) as well as other typical Threat Risk Assessments (TRA). Impacts to
other internal systems, such as Data Leakage Protection (DLP) systems should also be
considered. Be prepared to discover extensive risks with costly remediation strategies in order
to consider cloud computing for regulated data.
2010 EMC Proven Professional Knowledge Sharing
147
Best Practice - Do Your Diligence / External Reports At a minimum, you need to understand the security of the organization hosting your cloud
computing resources and what they are prepared to offer. If you have very stringent security
requirements, you may want to mandate that your cloud provider be certified to ISO/IEC 27001:
2005 annually. It is also likely that an organization will need to improve your processes and
operational security maturity to manage your cloud provider to that level of security. It is
important to utilize the risk assessment and data classification exercises previously mentioned
to provide the amount of security required to ensure the appropriate confidentiality, integrity and
availability of your data and systems without over spending. Assuming ISO/IEC 27001:2005
certification is too costly or not available within the class of service you seek, the assurance
statement most likely to be available is the Statement on Auditing Standards (SAS )70 Type II.
Work these requirements into the contract requirements and ensure that you see a previous
certificate of compliance prior to formalizing an agreement.
Similarly, the business should demand the results of external security scans and penetration
tests on a regular basis due to the unique attack surfaces associated with cloud computing. The
value of certifications such as ISO/IEC 27001:2005 or audit statements like SAS 70 are the
source of significant debate among security professionals. Skeptics will point out that through
the scoping process; an organization can exclude critical systems and process from scrutiny
and present an unrealistic picture of organizational security. This is a legitimate issue, and our
recommendation is that domain experts develop standards relating to scoping these and other
certifications, so that over time, the customer will expect broad scoping. Customers must
demand an ISO certification based upon a comprehensive security program. In the end, this will
benefit the cloud provider as well, as a certifiably robust security program will pay for itself in
reduced requests for audit.
Best Practice - Understand Where the Data Will Be! If your company is considering using cloud-computing resources for regulated data, it is
imperative to understand where the data will be processed and stored under any and all
situations. Of course, this task is far from simple for all parties, including cloud-computing
providers. However, with respect to legislative compliance surrounding where data can and
cannot be transmitted or stored, the cloud computing provider will need to be able to
demonstrate assurance that the data will be where they say it is and only there. This applies to
third parties and other outsourcers used by the cloud computing provider. If the provider has
2010 EMC Proven Professional Knowledge Sharing
148
reciprocal arrangements or other types of potential outsourcing of the resources, strict attention
to how this data is managed, handled, and located must extend to that third party arrangement.
If the potential provider you have engaged cannot do this, investigate others. As this
requirement becomes more prevalent, it is likely the option will likely become available.
Remember, if that assurance cannot be provided, some of your data and processing cannot use
public cloud computing resources as defined in Domain 1 without exception. Private clouds may
be the appropriate option in this case.
Best Practice - Track your applications to achieve compliance. To manage an application effectively, you have to know where it is. Establish a "chain of
custody" that enables you to see where applications are running and manage them against any
legal concerns. The chain of custody includes identifying the machine the application is
installed on, what data is associated with that application, who is in control of the machine, and
what controls are in place.
With server virtualization, applications move among different machines, and without careful
control over the chain of custody, you can expose an application or the data to circumstances
where a high-security application may be shifted into a low-security environment. Before you
change anything in the environment, consider whether the change will create unauthorized
access to the application or related data.
Best Practice - With off-site hosting, keep your assets separate. If a third party controls or hosts one of your servers, keeping your operating assets separate
from those of the host's other customers is critical to avoid potential liability for security
exposures, including improper access. For hosted applications, you also need to ensure that
settings for one application cannot drift or migrate into the control of another, so no other host
customers can access your data.
To do this, you need to evaluate how the host distributes and controls applications and data
stored in its server array. Depending on the configurations of the hosts and client machines,
settings and programmatic adjustments can trickle down and install in an unexpected manner.
2010 EMC Proven Professional Knowledge Sharing
149
This is why you need to make sure that appropriate security controls are in place. You do not
want unexpected updates or configuration controls to gain control over your data or application
versions. Make sure your contract with the hosting company details the technical specifications
that protect your data and users, and that the hosting company provides testing and monitoring
reports that show compliance with your controls.
Best Practice - Protect yourself against power disruptions Any CIO overseeing a data center knows that power outages are a common occurrence. The
reason is simple -- the power to run and cool a data center is more and more vulnerable. A 2006
AFCOM survey reported that 82.5% of data center outages in a five-year period were power-
related.
If your data center has experienced power-related business interruptions, consider drafting
contract terms for your own customers that protect you from liability if the power supply to your
facilities is disrupted or lost. You may want more than general "acts of God" clauses in your
customer-facing agreements.
If you are considering a shift to a hosted extension of your data center, you need to understand
your hosted site's power supply and capabilities. Make sure your contract precisely defines
those capabilities and allocates the risks for any service disruptions that occur. Account for this
in your own customer contracts as well. Draft them carefully to ensure that power disruptions to
your suppliers do not expose you to liability that you would avoid if your data center were in-
house.
Best Practice - Ensure vendor cooperation in legal matters What happens when virtualization and compliance collide and the matter ends up in court?
When a legal collision between virtualization and e-discovery occurs, such as if a third-party
host was unable to produce documents a business needs for a legal action, a service provider
can be a significant risk variable.
To avoid this scenario, it is best practice that you obtain the provider’s commitment to
cooperate in legal matters. This must be done contractually with the third-party data custodian.
2010 EMC Proven Professional Knowledge Sharing
150
In conclusion, virtualizing any aspect of your data center changes the game for compliance and
e-discovery. It is best practice to make sure you know exactly where your applications are
running, that your server controls are intact, and that your service provider contract provisions
are "virtualization-friendly." There are benefits of a virtual data center or cloud. It is wise to
address the issues above and not worry about whether your compliance controls are falling out
of the cloud.
Profitability
The business of sustainability in Information Technology is the catalyst for sustainable and
profitable growth. To put it another way, profitability and sustainability go hand in hand.
There is a new definition of profitability that has been the mantra of Sustainability for some time.
It has called the “Triple Bottom Line.” The three (3) bottom lines include social, environmental,
and economic extents, and you should align these extents to profitability. A more detailed
definition of the extents is shown in Table 3 - Extent of Sustainability to achieve profitability,
shown below. In principle, the basic idea is as simple as it is compelling. Resources may only
be used at a rate at which they can be replenished naturally. It is obvious that the way in which
the industrialized world operates today is not sustainable and that change is imperative.
Table 3 - Extent of Sustainability to achieve profitability
Social Environmental Economic Labor, health, and safety:
Address occupational health,
safety, working conditions, and
so on
Human rights and diversity:
Ensure compliance with human
rights and organizational
diversity
Product safety: Ensure
consumer safety • Retention and
qualification: Attract, foster, and
retain top talent by fostering
“green” profile
Energy optimization: Manage
energy costs via planning, risk
management, and process
improvements
Water optimization: Ensure
sustainable and cost-effective
water supply •
Raw materials optimization:
Control raw material–related
costs and manage price volatility
Air and climate change: Reduce
or account for greenhouse gas
emissions
Sewage: Manage sewage
emissions and impact on water
Sustainability performance
management: Provide key
performance indicators to
manage sustainability efforts
Sustainable business
opportunity: Enable new goods
and services for customers
Emission trading: Ensure
financial optimization (cap and
trade)
Reporting: Comply with external
demands for adequate reporting
and disclosures
2010 EMC Proven Professional Knowledge Sharing
151
supply
Land pollution: Avoid or reduce
land pollution
Waste: Manage waste in a
sustainable way • Sustainable
product life cycle: Sustainably
develop new products and
manage life cycle
Sustainability is very relevant not only at times of growth, but specifically during times of
economic challenge. The main drivers of sustainability do not change for the following reasons:
Regulation will continue to increase. That is specifically true in the case of carbon emissions,
but will likely include many other environmental and social aspects in the future. Energy prices
will continue to fluctuate and, with economic recovery, rise sharply and increase cost pressure.
Consumer awareness will continue to intensify and force transparency and optimization across
entire business networks and supply chains.
Business and Profit objectives to achieve Sustainability Looking at the big picture, the new sustainability model is all about being environmentally
friendly and making money. This sounds great to CEOs thinking about sustainability, but what
are companies actually doing to achieve this? The backbone of most programs is based on the
best Business Practices consisting of Awareness and Transparency, Efficiency Improvements,
Innovation and Mitigation.
Best Practice - Consumer Awareness and Transparency Consumer Awareness and Transparency communicates the value of your sustainability initiative
and is key to building brand equity. Transparency offers accountability to the program and
avoids “green-washing.” In addition to getting out the word, awareness programs are often
promoted as educational, providing a series of sustainability best practices to improve industry
at large. Whether you view this through a lens of being altruistic or self-serving, the net result
promotes and advances sustainable practices.
2010 EMC Proven Professional Knowledge Sharing
152
Best Practice – Implement Efficiency Improvement Efficiency improvement, how to do more with less, is a central theme in most sustainability
programs. Efficiencies improve products or processes, typically without making major changes
to the underlying product or technology. Modifying engine design to be 20% percent more
efficient is an example of product efficiency, whereas redesigning packaging to reduce waste, or
transporting components and finished products more efficiently are examples of process
efficiency. The effects of efficiencies are additive, each contributing to the sustainability goals of
the company, driving the bottom line and creating potential for increased brand value.
Best Practice - Product Innovation Product Innovation is often more challenging than efficiency enhancements because it results in
fundamental changes to products and processes. Innovation tends to have a higher barrier to
entry than efficiency programs, requiring ideas that challenge the status quo and require
significant R&D and marketing investments. The risks of failure for both product development
and ultimately customer acceptance are higher for innovations, but so too are the potential
rewards. Developments of thin module photovoltaic (TMPV) solar cells and algae-based bio-
diesel, both with potential to significantly change the economics of renewable energy, are
examples of innovation.
Best Practice - Carbon Mitigation Carbon Mitigation offsets green house gas (GHG) emissions through projects that remove
carbon from the atmosphere. The Kyoto Protocol’s cap and trade mechanism created the
framework for trading carbon allowances as a way for companies to meet mandatory GHG
emissions targets. It also paved the way for a voluntary carbon offset market targeted to
companies without mandatory requirements or those seeking to be carbon-neutral. Carbon-
neutral status is also becoming popular for individuals with a number of sites and affinity credit
cards catering to this desire.
Information Technology Sector Initiatives IT has become pervasive across all sectors and although invisible in many ways, it forms a
service backbone for almost all products. People rarely think about it, but computers and
communications are invoked for every cell phone call, every online purchase, every item
shipped by a courier, every Google search and every invoice processed. In short, everything in
the modern economy has an associated IT carbon footprint.
2010 EMC Proven Professional Knowledge Sharing
153
Network and Server Infrastructure Until recently, computing and communications were all about capacity and speed, with little
thought to energy requirements. Following Moore’s Law, computing power and speed grow
consistently to the point where, in some markets, it can cost more to power a server than to
purchase it. In response, innovation has turned to designing low power chips that deliver high
performance without the energy penalty. Network and server infrastructure manufacturers are
focusing on reducing energy, space and cooling requirements with a new breed of high-density,
high-capacity platforms using state of the art energy-efficient chipsets and components, not to
mention the Cloud and the new paradigms that arise.
The proposition is fundamentally ROI based and is especially attractive to businesses that have
hit power, size, or cooling barriers in their existing installations. As described in the previous
sections, the technologies outlined below will go a long way to achieve sustainability
Best Practice – Virtualization When analyzing efficiency improvements, one option is to eliminate a facility altogether.
Virtualization favors consolidating many distributed datacenters into a specially designed,
centralized “Cloud” facility. An example of this is Google’s advanced data center facility,
affectionately known as a Googleplex, which is alleged as being the most efficient and
economical datacenter. While some argue the Googleplex is search specific, the concept of
achieving Google economies of scale for applications across the board holds merit.
Best Practice - Recycling e-Waste Equipment vendors are increasing e-waste collection and recycling in efforts to reduce heavy
metal and toxins levels in local landfills. Companies such as Dell and HP have long-standing e-
cycling programs as part of their cradle-to-grave sustainability programs.
Cloud Profitability and Economics
Cloud Computing, the long-held dream of computing as a utility, has the potential to transform a
large part of the IT industry, making software even more attractive as a service and shaping the
way IT hardware is designed and purchased. Developers with innovative ideas for new Internet
services no longer require large capital outlays in hardware to deploy their service or the human
2010 EMC Proven Professional Knowledge Sharing
154
expense to operate it. They need not be concerned about over provisioning for a service whose
popularity does not meet their predictions, therefore wasting costly resources, or under
provisioning for one that becomes wildly popular, therefore missing potential customers and
revenue. Moreover, companies with large batch-oriented tasks can get results as quickly as
their programs can scale, since using 1000 servers for one-hour costs no more than using one
server does for 1000 hours. This elasticity of resources, without paying a premium for large
scale, is unprecedented in the history of IT.
Cloud Computing refers to both the applications delivered as services over the Internet and the
hardware and systems software in the datacenters that provide those services. The services
themselves have long been referred to as Software as a Service (SaaS). The datacenter
hardware and software is what we call a Cloud. When a Cloud is made available in a pay-as-
you-go manner to the public, we call it a Public Cloud; the service being sold is Utility
Computing. We use the term Private Cloud to refer to internal datacenters of a business or other
organization, not made available to the public. Therefore, Cloud Computing is the sum of SaaS
and Utility Computing, but does not include Private Clouds. People can be users or providers of
SaaS, or users or providers of Utility Computing. We focus on SaaS Providers (Cloud Users)
and Cloud Providers, who have received less attention than SaaS Users. From a hardware
point of view, there are three aspects to Cloud Computing.
1. The illusion of infinite computing resources available on demand, thereby eliminating the
need for Cloud Computing users to plan for provisioning
2. The elimination of an up-front commitment by Cloud users, thereby allowing companies
to start small and increase hardware resources only when there needs increase
3. The ability to pay for use of computing resources on a short-term basis as needed (e.g.,
processors by the hour and storage by the day), and release them as needed, thereby
rewarding conservation by letting machines and storage go when they are no longer
useful
You can argue that the construction and operation of extremely large-scale, commodity-
computer datacenters at low-cost locations was the key enabler of Cloud Computing. They
uncovered the factors of 5 to 7 percent decrease in cost of electricity, network bandwidth,
operations, software, and hardware available at these very large economies of scale. These
factors, combined with statistical multiplexing to increase utilization compared to a private cloud,
2010 EMC Proven Professional Knowledge Sharing
155
meant that cloud computing could offer services below the costs of a medium-sized datacenter
and yet still make a profit.
Any application needs a model of computation, a model of storage, and a model of
communication. The statistical multiplexing necessary to achieve elasticity and the illusion of
infinite capacity requires each of these resources to be virtualized to hide the implementation of
how they are multiplexed and shared.
One view is that different utility-computing offerings will be distinguished based on the level of
abstraction presented to the programmer and the level of management of the resources.
Amazon EC2 is at one end of the spectrum. An EC2 instance looks much like physical
hardware, and users can control nearly the entire software stack, from the kernel upwards. This
low level makes it inherently difficult for Amazon to offer automatic scalability and failover,
because the semantics associated with replication and other state management issues are
highly application-dependent.
At the other extreme of the spectrum are application domain specific platforms, such as Google
AppEngine. AppEngine is targeted exclusively to traditional web applications, enforcing an
application structure of clean separation between a stateless computation tier and a state-full
storage tier. AppEngine’s impressive automatic scaling and high-availability mechanisms and
the proprietary MegaStore data storage available to AppEngine applications, all rely on these
constraints. Applications for Microsoft’s Azure are written using the .NET libraries, and compiled
to the Common Language Runtime, a language-independent managed environment. Therefore,
Azure is intermediate between application frameworks like AppEngine and hardware virtual
machines like EC2.
From a business and profitability perspective, when is Utility Computing preferable to running a
Private Cloud?
Case 1: Demand for a service varies with time
Provisioning a data center for the peak load it must sustain a few days per month leads to
underutilization at other times. Instead, Cloud Computing lets an organization pay by the hour
for computing resources, potentially leading to cost savings even if the hourly rate to rent a
machine from a cloud provider is higher than the rate to own one.
2010 EMC Proven Professional Knowledge Sharing
156
Case 2: Demand is unknown in advance
For example, a web startup will need to support a spike in demand when it becomes popular,
followed potentially by a reduction once some visitors turn away.
Case 3: Batch Processing
Organizations that perform batch analytics can use the ”cost associatively” of cloud computing
to finish computations faster: using 1000 EC2 machines for 1 hour costs the same as using 1
machine for 1000 hours.
For the first case of a web business with varying demand over time and revenue proportional to
user hours, the tradeoff is shown in Equation 9 – Cloud Computing - Cost Advantage, below.
Cloud Computing is more profitable when the following is true:
Equation 9 – Cloud Computing - Cost Advantage
Profit From Using Cloud Computing ≥ Profit From Using a Fixed Capacity Data Center
Equation 10 – Cloud Computing - Cost tradeoff for demand that varies over time
⎟⎠⎞
⎜⎝⎛ −∗≥−∗
nUtilizatioCostvenueNetUserHoursCostvenueNetUserHours DataCenter
DataCenterCloudCloud Re)Re(
In Equation 10 – Cloud Computing - Cost tradeoff for demand that varies over time, above, the
left-hand side multiplies the net revenue per user-hour by the number of user-hours, giving the
expected profit from using Cloud Computing. The right-hand side performs the same calculation
for a fixed-capacity datacenter by factoring in the average utilization, including nonpeak
workloads. Whichever side is greater represents the opportunity for higher profit.
As shown in Table 4 - Best Practices in Cloud Architectures, shown below, defines best
practices to achieve growth of Cloud Computing Architectures. The first three best practices
concern adoption and the next five deal with growth. The last two are marketing related. All best
practices should aim at horizontal scalability of virtual machines over the efficiency of a single
VM.
2010 EMC Proven Professional Knowledge Sharing
157
Table 4 - Best Practices in Cloud Architectures
Best Practice Solution
Increase Availability of Service Use Multiple Cloud Providers; Use Elasticity to
Prevent DDOS (Distributed Denial of Service
attacks)
Implement some form of Data Lock-In Standardize APIs; Compatible SW to enable
Surge Computing
Data Confidentiality and Audit ability Deploy Encryption, VLANs, Firewalls;
Geographical Data Storage
Reduce Data Transfer Bottlenecks FedExing Disks; Data Backup/Archival; Higher
BW Switches
Minimize Performance Unpredictability Improved VM Support; Flash Memory; Gang
Schedule VMs
Implement a Scalable Storage Solution Data De-duplication, Tiered Storage and other
Scalable Store Solutions
Reduce or minimize Bugs in Large-Scale
Distributed Systems
Implement a full functioned fault isolation and
root cause analysis function as well as
Distributed VMs
Implement Architecture to Scale quickly Implement Auto-Scalar that relies on Machine
Learning Snapshots to encourage Cloud
Computing Conservationism
Reputation Fate Sharing Offer reputation-guarding services like those
for email
Implement best of breed tiered Software
Licensing
Pay-for-use licenses; Bulk use sales, etc
In addition:
1. Applications need to both scale down rapidly as well as scale up, which is a new requirement.
Such software also needs a pay-for-use licensing model to match needs of Cloud Computing.
2. Infrastructure Software needs to be aware that it is no longer running on bare metal, but on
VMs. It needs to have billing built in from the beginning.
2010 EMC Proven Professional Knowledge Sharing
158
3. Hardware Systems should be designed at the scale of a container (at least a dozen racks),
which will be is the minimum purchase size. Cost of operation will match performance. Cost of
purchase in important, rewarding energy proportionality, such as by putting idle portions of the
memory, disk, and network into low power mode. Processors should work well with VMs, flash
memory should be added to the memory hierarchy, LAN switches, and WAN routers must
improve in bandwidth and cost.
Cloud Computing Economics
When deciding whether hosting a service in the cloud makes sense over the long term, you can
argue that the fine-grained economic models enabled by Cloud Computing make tradeoff
decisions more flowing, and in particular the elasticity offered by clouds serves to transfer risk.
Although hardware resource costs continue to decline, they do so at variable rates. For
example, computing and storage costs are falling faster than WAN costs. Cloud Computing can
track these changes and potentially pass them through to the customer more effectively than
building your own datacenter, resulting in a closer match of expenditure to actual resource
usage.
In making the decision about whether to move an existing service to the cloud, you must
examine the expected average and peak resource utilization, especially if the application may
have highly variable spikes in resource demand; the practical limits on real-world utilization of
purchased equipment; and operational costs that vary depending on the type of cloud
environment being considered. See the section titled “
2010 EMC Proven Professional Knowledge Sharing
159
Economics Pillar,” starting on page 162 for additional details on economic issues. After all,
profitability and economics go hand in hand.
Best Practice – Consider Elasticity as part of the business deciding metrics Although the economic appeal of Cloud Computing and its variants is often described as
“converting capital expenses to operating expenses” (CapEx to OpEx),the phrase “pay as you
go” may more directly capture the economic benefit to the buyer or consumer. Hours purchased
via Cloud Computing can be distributed non-uniformly in time (e.g., use 100 server-hours today
and no server-hours tomorrow, and still pay only for what you use). In the networking
community, this way of selling bandwidth is already known as usage-based pricing. In addition,
the absence of up-front capital expense allows capital to be redirected to core business
investment. Even though, as an example, Amazon’s pay-as-you-go pricing could be more
expensive than buying and depreciating a comparable server over the same period, you can
argue that the cost is outweighed by the extremely important Cloud Computing economic
benefits of elasticity and transference of risk. This is especially true if the risks of over
provisioning (underutilization) and under provisioning (saturation) are paramount.
Starting with elasticity, the key observation is that Cloud Computing’s ability to add or remove
resources at a fine granularity (for example, one server at a time with EC2) and with a lead-time
of minutes rather than weeks allows us to match resources to workload much more closely.
Real world estimates of server utilization in datacenters range from 5% to 20%. This may sound
shockingly low, but it is consistent with the observation that for many services, the peak
workload exceeds the average by factors of 2 to 10. Few users deliberately provision for less
than the expected peak, and therefore they must provision for the peak and allow the resources
to remain idle at nonpeak times. The more pronounced the variation, the more the waste.
A simple example demonstrates how elasticity reduces this waste and can more than
compensate for the potentially higher cost per server-hour of paying-as-you-go vs. buying.
For example, regarding elasticity, assume a service has a predictable daily demand where the
peak requires 500 servers at noon but the trough requires only 100 servers at midnight, as
shown in Figure 28 – Provisioning for peak load, on page 160.
2010 EMC Proven Professional Knowledge Sharing
160
Figure 28 – Provisioning for peak load
As long as the average utilization over a whole day is 300 servers, the actual utilization over the
whole day (shaded area under the curve) is 300 x 24 = 7200 server-hours; but since we must
provision to the peak of 500 servers, we pay for 500 x 24 = 12000 server-hours, a factor of 1.7
more than what is needed. Therefore, as long as the pay-as-you-go cost per server-hour over 3
years is less than 1.7 times the cost of buying the server, you can save money using utility
computing.
In fact, the Figure 28 – Provisioning for peak load diagram above example underestimates the
benefits of elasticity. In addition to simple diurnal or 24 hour pattern, most nontrivial services
also experience seasonal or other periodic demand variations (e.g., e-commerce peaks in
December and photo sharing sites peak after holidays) as well as some unexpected demand
bursts due to external events (e.g., news events). Since it can take weeks to acquire and rack
new equipment, the only way to handle such spikes is to provision for them in advance. We
already saw that even if service operators predict the spike sizes correctly, capacity is wasted,
and if they overestimate, the spike they provision for is even worse. They may also
underestimate the spike as shown in Figure 29 – Under Provisioning Option 1 on page 160
however, accidentally turning away excess users. While the monetary effects of over
provisioning are easily measured, those of under provisioning are more difficult to measure yet
equally serious given performance and scalability concerns.
2010 EMC Proven Professional Knowledge Sharing
161
Figure 29 – Under Provisioning Option 1
Not only do rejected users generate zero revenue; they may never come back due to poor
service. Figure 30 – Under Provisioning Option 2 on page 161 aims to capture this behavior.
Users will abandon an under provisioned service until the peak user load equals the
datacenter’s usable capacity, at which point users again receive acceptable service, but with
fewer potential users.
2010 EMC Proven Professional Knowledge Sharing
162
Figure 30 – Under Provisioning Option 2
Regarding Figure 28, even if peak load can be correctly anticipated without elasticity, we waste
resources (shaded area) during nonpeak times. In the case of Figure 29 – Under Provisioning
Option 1, potential revenue from users not served (shaded area) is sacrificed. Lastly, in Figure
30 – Under Provisioning Option 2, some users desert the site permanently after experiencing
poor service. This attrition and possible negative press result in a permanent loss of a portion of
the revenue stream.
2010 EMC Proven Professional Knowledge Sharing
163
Economics Pillar
It seems to be a foregone conclusion that by increasing energy efficiency, the world as a whole
reduces carbon emissions and overall, aids in the goal of sustainability.
Consider the economics of passenger airline flights. A number of years ago, there was thought
of building wide body aircraft so that by building planes that could handle more people, less
planes would be used and therefore increase efficiency (i.e., reduce carbon gas emissions).
Interestingly, the opposite happened. At the micro level, yes, the passenger per flight efficiency
increased, but since the cost of airfare was reduced, more people started to fly and therefore
greenhouse emissions increased.
It has become an article of faith among environmentalists, seeking to reduce greenhouse gas
emissions, that improving the efficiency of energy use will lead to a reduction in energy
consumption. This proposition has even been adopted by many countries who are promoting
energy efficiency as the most cost effective solution to global warming.
However, in the United States, there has been a backlash against energy efficiency as an
instrument of energy policy. This has been stimulated partly by disillusionment with the failures
of energy conservation programs undertaken by utilities, and partly by the growing influence of
the 'contrarians', those hostile to government mandated environmental programs.
The debate as to whether energy efficiency is effective (i.e., reduces energy consumption) has
spread from the pages of obscure energy economic journals in the early 1990s to the pages of
the leading US science journal, Science, and newspaper, the New York Times in the mid 1990s.
It has recently produced such polemics as the US book by “Inhaber” entitled “Why Energy
Conservation Fails.” Inhaber argues, with the aid of an extensive bibliography, that energy
efficiency programs are a waste of time and effort.
This debate has also promoted discussion among the climate change community and US
energy analysts over the extent of the “rebound” or “take-back” effect. That is how much of the
energy saving produced by an efficiency investment is taken back by consumers in the form of
higher consumption, both on the micro and macro levels.
2010 EMC Proven Professional Knowledge Sharing
164
The Khazzoom-Brookes postulate20, first put forward by the US economist Harry Saunders in
1992, says that energy efficiency improvements that, on the broadest considerations, are
economically justified at the micro level, lead to higher levels of energy consumption at the
macro level, than in the absence of such improvements.
It argues against the views of conservationists who promote energy efficiency as a means of
reducing energy consumption. We can identify every little benefit from each individual act of
energy efficiency and then aggregate them all to produce a macroeconomic total. In effect, it
adopts a macroeconomic (top down) approach rather than the microeconomic (bottom up)
approach used by conservationists.
It warns that although it is possible to reduce energy consumption through improved energy
efficiency, it would be at the expense of loss of economic output. It can be argued that
overzealous pursuit of energy efficiency per se would damage the economy through
misallocation of resources. In other words, reduced energy consumption is possible but at an
economic cost.
The effect of higher energy prices, either through taxes or producer-induced shortages, initially
reduces demand, but in the longer term, encourages greater energy efficiency. This efficiency
response amounts to a partial accommodation of the price rise and therefore the reduction in
demand is blunted. The result is a new balance between supply and demand at a higher level of
supply and consumption than if there had been no efficiency response.
For example, under the economic conditions of falling fuel prices and a free market approach
that have prevailed in the United Kingdom most of this century, energy consumption has
increased at the same time as energy efficiency has improved. During periods of high energy
prices, such as 1973-4 and 1979-80, energy consumption fell. Whether this is due to the
adverse consequences of higher fuel prices on economic, or activity or energy efficiency
improvements, is a matter of fierce dispute.
20 http://www.zerocarbonnetwork.cc/News/Latest/The-Khazzoom-Brookes-Postulate-Does-
Energy-Efficiency-Really-Save-Energy.html
2010 EMC Proven Professional Knowledge Sharing
165
The lower level of energy consumption at times of high energy prices may be at the expense of
reduced economic output. This, in turn, is due to the adverse effect on economic productivity as
a whole, and of the high price of an important resource.
Best Practice – Consider Efficiency as only one part of the Economic Sustainable equation
Energy is only one factor of production. Therefore, there are no economic grounds for favoring
energy productivity over labor or capital productivity. Governments may have non-economic
reasons, such as combating global warming, for singling out energy productivity
However, climate policies that rely only on energy efficiency technologies may need
reinforcement by market instruments such as fuel taxes and other incentive mechanisms.
Without such mechanisms, a significant portion of the technological achievable carbon and
energy savings could be lost to the rebound.
Conclusion Data centers are changing at a rapid, exponential pace. Cloud computing and all of its variants
have been discussed. How we align the different data center disciplines to understand how new
technologies will work together to solve data center sustainability problems remains a key
discussion area. We reviewed Best Practices to achieve business value and sustainability. In
summary, this article went above the Cloud, offering Best Practices that align with the most
important goal of creating a sustainable computing infrastructure to achieve business value and
growth.
2010 EMC Proven Professional Knowledge Sharing
166
Appendix A – Green IT, SaaS, Cloud Computing Solutions
SaaS or Cloud Computing Company Name
Webpage Address Value Proposition Offered to Green Goods and Services Companies
APX, Inc. - http://www.apx.com/
Analytics, Technology, Information and
Services for the Energy and Environmental
Markets. See Environmental Registry and
Banking >
http://www.apx.com/environmental/environme
ntal-registries.asp and Market Operations >
http://www.apx.com/marketoperations/ for
examples.
Carbon Fund
Offsets
http://www.carbonfund.o
rg/calculators
Carbon footprint calculator and preset
option enables customers to easily and
affordably offset your carbon footprint by
pressing the “Offset Your Footprint Now!”
button after adding a list of items to a
shopping list. Offers detailed information to
learn more about carbon offsets.
Cloud Computing
Expo
http://cloudcomputingex
po.com/
Information About Cloud Computing
CO2 Stats http://www.co2stats.com
/
CO2Stats makes your site carbon neutral
and shows visitors you are environmentally
friendly.
2010 EMC Proven Professional Knowledge Sharing
167
Erico http://www.ecorio.org/
Mobile phone capability to track carbon
footprint.
GaBi Software
http://www.pe-
international.com/english
/gabi/
Software tools and databases for product
and process sustainability analyses
Greenhouse Gas
Equivalencies
Calculator
http://www.epa.gov/clea
nenergy/energy-
resources/calculator.html
Green company business models must
offer a value proposition that reduces
carbon dioxide. Designing the green
business model begins by identifying how
its products, services or solutions can
reduce carbon dioxide (CO2) emissions.
Market standards trend toward measuring
reductions by 1 million metric tons. It can
be difficult to visualize what a "metric ton of
carbon dioxide" really is. This calculator
translates difficult to understand
statements into more commonplace terms,
such as "is equivalent to avoiding the
carbon dioxide emissions of X number of
cars annually." It also offers an excellent
example of the kind green analytics,
metrics and intelligence measures that
SaaS / Cloud Computing solutions must
address.
idela sports
entertainment
http://idealsportsent@gm
ail.com
promoting a healthy active life style, and
aiding in the preservation of the
environment through the demonstration of
bike riding to reduce C02 emissions and
exercise.
2010 EMC Proven Professional Knowledge Sharing
168
PE
INTERNATIONAL
Experts in
Sustainability
http://www.pe-
international.com/english
/
PE INTERNATIONAL provides
conscientious companies with cutting-edge
tools, in-depth knowledge and an
unparalleled spectrum of experience in
making both corporate operations and
products more sustainable. Applied
methods include implementing
management systems, developing
sustainability indicators, life cycle
assessment (LCA), carbon footprint,
design for environment (DfE) and
environmental product declarations (EPD),
technology benchmarking, or eco-
efficiency analysis and emissions
management. PE INTERNATIONAL offers
two leading software solutions, with the
GaBi software for product sustainability
and the SoFi software for corporate
sustainability.
Planet Metrics http://www.planetmetrics
.com/
Rapid Carbon Modeling (RCM) approach
enables organizations to efficiently assess
their exposure to commodity, climate, and
reputational risks and the implications of
these forces on the corporation, its
suppliers and customers.
Point Carbon http://www.pointcarbon.
com/trading/
Point Carbon Trading Analytics provides
the market with independent analysis of
the power, gas and carbon markets. We
offer 24/7 accessible web tools, aimed at
continuously providing our clients with the
2010 EMC Proven Professional Knowledge Sharing
169
latest market-moving information and
forecasts.
SoFi
http://www.pe-
international.com/english
/sofi/
SoFi is a leading software system for
environmental and sustainability /
corporate social responsibility
management, it is currently used in 66
countries. The fast information flow and the
consistent database in SoFi will help you to
improve your environmental and
sustainability performance. The main
product lines are: SoFi EH&S for
Environmental Management and
Occupational Safety SoFi CSM for
Sustainable Corporate Management SoFi
EM for Emissions Management and
Benchmarking
Trucost PLC http://www.trucost.com/
Trucost is an environmental research
organization working with companies,
investors and government agencies to
understand the impacts companies have
on the environment. Trucost is an
independent organization founded in 2000.
Appendix B – Abbreviations
Acronym Description Comment
2010 EMC Proven Professional Knowledge Sharing
170
DCE
Data Center Efficiency =
IT equipment / Total
facility power
Shows a ratio of how well a data
center is consuming power
DCPE
Data Center Performance
Efficiency = Effective IT
workload / total facility
power
Shows how effectively a data center
is consuming power to produce a
given level of service or work such
as energy per transaction or energy
per business function performed
PUE
Power usage
effectiveness = Total
facility power / IT
equipment power
Inverse of DCE
Kilowatts (kw) Watts / 1,000 One thousand watts
Annual kWh kWh x 24 x 365 kWh used in on year
Megawatts (mw) kW / 1,000 One thousand kW
BTU/hour watts x 3.413
Heat generated in an hour from
using energy in British Thermal
Units. 12,000 BTU/hour can equate
to 1 Ton of cooling.
kWh 1,000 watt hours The number of watts used in one
hour
Watts
Amps x Volts (e.g. 12
amps * 12 volts = 144
watts)
Unit of electrical energy power
Watts BTU/hour x 0.293 Convert BTU/hr to watts
Volts
Watts / Amps (e.g. 144
watts / 12 amps = 12
volts)
The amount of force on electrons
2010 EMC Proven Professional Knowledge Sharing
171
Amps
Watts / Volts (e.g. 144
watts / 12 volts = 12
amps)
The flow rate of electricity
Volt-Amperes (VA) Volts x Amps Sometimes power expressed in
Volt-Amperes
kVA Volts x Amp / 1000 Number of kilovolt-amperes
kW kVA x power-factor Power factor is the efficiency of a
piece of equipments’ use of power
kVA kW / power-factor Kilovolt-Amperes
U 1U = 1.75”
EIA metric describing height of
equipment in racks
2010 EMC Proven Professional Knowledge Sharing
172
Activity / Watt
Amount of work
accomplished per unit of
energy consumed. This
could be IOPS,
Transactions or
Bandwidth per watt.
Indicator of how much work and
how efficient energy is being used
to accomplish useful work. This
metric applies to active workloads
or actively used and frequently
accessed storage and data.
Examples would be IOPS per watt,
Bandwidth per watt, Transactions
per watt, Users or streams per watt.
Activity per watt should also be
used in conjunction with another
metric such as how much capacity
is supported per watt and total watts
consumed for a representative
picture.
IOPS / Watt
Number of I/O operations
(or transactions) / energy
(watts)
Indicator of how effectively energy
is being used to perform a given
amount of work. The work could be
I/Os, transactions, throughput or
other indicator of application
activity. For example SPC-1 / Watt,
SPEC / Watt, TPC / Watt,
transaction / watt, IOP / Watt.
2010 EMC Proven Professional Knowledge Sharing
173
Bandwidth / Watt
GBPS or TBPS or PBPS
/ Watt Amount of data
transferred or moved per
second and energy used.
Often confused with
Capacity per watt
This indicates how much data is
moved or accessed per second or
time interval per unit of energy
consumed. This is often confused
with capacity per watt given that
both bandwidth and capacity
reference GByte, TByte, PByte.
Capacity / Watt GB or TB or PB (storage
capacity space / watt
Indicator of how much capacity
(space) or bandwidth supported in a
given configuration or footprint per
watt of energy. For inactive data or
off-line and archive data, capacity
per watt can be an effective
measurement gauge. However, for
active workloads and applications
activity per watt also needs to be
looked at to get a representative
indicator of how energy is being
used
Mhz / Watt Processor performance /
energy (watts)
Indicator of how effectively energy
is being used by a CPU or
processor.
Carbon Credit Carbon offset credit
Offset credits that can be bought
and sold to offset your CO2
emissions
CO2 Emission
Average 1.341 lbs per
kWh of electricity
generated
The amount of average carbon
dioxide (CO2) emissions from
generating an average kWh of
electricity
2010 EMC Proven Professional Knowledge Sharing
174
Appendix B – References
[1] Gartner's Top Predictions for IT Organizations and Users, 2010 and Beyond: A New Balance Brian Gammage, Daryl C. Plummer, Ed Thompson, Leslie Fiering, Hung LeHong, Frances Karamouzis, Claudio Da Rold, Kimberly Collins, William Clark, Nick Jones, Charles Smulders, Meike Escherich, Martin Reynolds, Monica Basso, Publication Date: 29 December 2009
[2] Cloud Computing Value Chains:Understanding Businesses and Value Creation in the Cloud, Ashraf Bany Mohammed, Jorn Altmann and Junseok Hwang, Dec 2009
[3] Cloud Data Management Interface Specification, Version 0.80, Jan 2009
[4] HP and the cloud for industry analysts, Rebecca Lawson, Director of worldwide cloud marketing initiatives, Fall 2009
[5] Belady, C., Electronics Cooling, Volume 13, No. 1, February 20007
[6] White Paper - Creating HIPAA-Compliant Medical Data Applications with Amazon Web Services, April 2009
[7] Guidelines for energy efficient data centers, February 16,2007
[8] Evaluating Data Center High-Availability Service Delivery, A FORTRUST White Paper, June 2008
[9] Probabilistic Latent Semantic Indexing, Proceedings of the Twenty-Second Annual International SIGIR Conference on Research and Development in Information Retrieval Thomas Hofmann International Computer Science Institute, Berkeley, CA & EECS Department, CS Division, UC Berkeley, [email protected]
[10] A closer look at Data de-duplication and VTL, Jan Poos, Sun Microsystems
[11] The Green Data Center: Understanding Energy Regulations, Power Consumption and More, Using chargeback to reduce data center power consumption: Five steps, Search tech Data, IBM, Nov 2009
[12] Supporting Sustainable Cloud Services Investing In The Network To Deliver Scalable, Reliable, And Secure Cloud Computing, A commissioned study conducted by Forrester Consulting on behalf of Juniper Networks, October 2009
[13] PROXY PROPOSALS FOR MEASURING DATA CENTER PRODUCTIVITY CONTRIBUTORS: JON HAAS, INTEL MARK MONROE, SUN MICROSYSTEMS JOHN PFLUEGER, DELL JACK POUCHET, EMERSON PETER SNELLING, SUN MICROSYSTEMS ANDY RAWSON, AMD FREEMAN RAWSON, IBM WHITE PAPER #17, ©2009 The Green Grid.
[14] S.V. Garimella, Joshi, Y.K. , Bar-Cohen, A., Mahajan, R., Toh, K.C., Carey, V.P.,Baelmans, M., Lohan, J., Sammakia, B. and Andros, F., “Thermal Challenges in Next Generation Electronic Systems – Summary of Panel Presentations and Discussions,”IEEE Trans. Components and Packaging Technologies”, 2002
[15] Shah, A.J., Carey, V.P., Bash, C.E and Patel, C.D., “Energy Analysis of Data Center Thermal Management Systems, Proceedings of the 2003 IMECE, paper IMECE2003-42527, 2003.
2010 EMC Proven Professional Knowledge Sharing
175
[16] Shah, A.J., Carey, V.P., Bash, C.E. and Patel, C.D, “An Energy-Based Control Strategy for Computer Room Air-Conditioning Units in Data Centers,” paper IMECE2004-61384, Proceedings of the 2004 IMECE, Anaheim, CA, 2004.
[17] Data Center Power Efficiency, Technical committee white paper, The green Grid, February 20, 2007
[18] Power Efficiency and Storage Arrays, Technology concepts and business considerations, EMC, July, 2007
[19] IDC, Industry trends and market analysis, October 30, 2007
[20] http://www.ashrae.org/
[21] http://www.fema.gov/hazard/map/index.shtm#disaster
[22] US Department of Energy, Energy Information Industry, Annual Energy Review 2006, June 27, 2007
2010 EMC Proven Professional Knowledge Sharing
176
Author’s Biography
Paul Brant is a Senior Technology Consultant at EMC in the Global Technology Solutions
Group located in New York City. He has over 25 years experience in semiconductor VLSI
design, board level hardware and software design and IT solutions in various roles including
engineering, marketing and technical sales. He also holds a number of patents in the data
communication and semiconductor fields. Paul has a Bachelors and Masters Degree in
Electrical Engineering from New York University (NYU) located in down town Manhattan as well
as a Masters in Business Administration (MBA), from Dowling College located in Suffolk County,
Long Island, NY. In his spare time, he enjoys his family of five, bicycling and other various
endurance sports.
2010 EMC Proven Professional Knowledge Sharing
177
Index Amazon ....82, 87, 88, 89, 92, 93, 94, 95, 96,
101, 110, 119, 120, 121, 123, 124, 126,
130, 155, 158, 173
ANSI ......................................................... 68
AppEngine ............................. 120, 121, 155
ASIC ................................................... 77, 78
Autonomic Computing .............. 84, 102, 103
Business Practices ........... 19, 127, 128, 151
CCT .................................................... 48, 49
Chillers ..................................................... 42
CHP ......................................................... 20
CIM .................................................. 45, 138
Cloud ....1, 11, 12, 13, 14, 23, 24, 46, 72, 73,
74, 76, 77, 81, 82, 83, 84, 85, 86, 87, 88,
89, 92, 93, 96, 99, 100, 101, 102, 103,
104, 105, 106, 107, 109, 119, 120, 121,
123, 125, 126, 130, 136, 137, 138, 140,
142, 143, 144, 145, 153, 154, 155, 156,
157, 158, 159, 164, 165, 166
Cloud computing .............. 13, 14, 24, 26, 27
Cloud Computing .24, 81, 82, 83, 85, 86, 88,
89, 92, 97, 102, 121, 154, 156, 158, 173
Consolidation ..................................... 21, 51
CRM ......................................................... 14
data center .....11, 12, 16, 20, 21, 23, 25, 26,
27, 28, 31, 34, 35, 38, 65, 69, 73, 80, 81,
82, 83, 86, 91, 92, 95, 96, 106, 124, 127,
128, 129, 130, 131, 132, 133, 134, 135,
141, 143, 149, 150, 153, 155, 164, 169
Digital Ecosystems ...... 82, 83, 92, 101, 102,
103, 105
DMTF ........................................ 45, 137, 138
Downstream Event Suppression .............. 49
DRAM ............................................... 70, 118
Effectiveness ........................ 19, 29, 39, 129
EISM ......................................................... 43
EMC ................................ 1, 10, 19, 174, 175
Environment ................................. 19, 29, 31
EPA .............................................. 33, 41, 42
EPEAT ................................................ 33, 34
ERP .......................................................... 14
ESX .............................................. 43, 71, 72
Executive Order 13423 ............................. 32
FAST ............................................ 64, 67, 69
FBI .................................................. 107, 108
Federated ....................................... 139, 140
FEMA ........................................................ 36
Fifth Light .................................................. 20
Flash ........................................... 17, 64, 157
Flash Memory ................................. 17, 157
Flywheel ................................................... 20
Google13, 81, 82, 87, 88, 89, 101, 103, 109,
110, 113, 114, 116, 120, 121, 152, 153,
155
Governance ............................................ 143
Green .. 15, 23, 24, 31, 82, 83, 92, 102, 103,
165, 166
Grid Computing ..... 82, 83, 84, 92, 102, 103,
123, 125
HDD .................................................... 67, 68
Hyper-V ............................................ 43, 130
IaaS .............................. 76, 87, 97, 124, 136
2010 EMC Proven Professional Knowledge Sharing
178
IBM ................................................... 93, 173
ICIM ............................................. 44, 45, 46
IONOX ..................................................... 43
ISO ........................................... 45, 131, 147
IT 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
23, 24, 26, 27, 28, 29, 31, 32, 34, 37, 39,
40, 41, 42, 49, 50, 53, 57, 58, 59, 64, 71,
72, 80, 81, 82, 85, 89, 90, 92, 97, 98, 99,
100, 101, 107, 108, 109, 127, 129, 134,
135, 139, 140, 141, 142, 143, 144,
145,鶬152, 153, 165, 169, 175
Joyent .................................................... 119
Liquid Cooling .......................................... 20
Little’s law ................................................ 55
Moore’s Law ............................... 29, 78, 153
Mosso .............................................. 87, 119
Optimization ............................................. 14
PaaS ............................ 76, 87, 97, 124, 136
People, Planet, Profit ............................... 31
RAID .......................... 63, 64, 65, 67, 68, 69
RAID 6 ..................................................... 68
RFP ........................................................ 133
Risk ................................................ 143, 146
RLS .................................................... 54, 57
ROI ........................................... 48, 133, 153
Ruby on Rails ......................................... 121
SaaS ..13, 24, 76, 85, 88, 97, 124, 154, 165,
166
SAS ............................................ 68, 76, 147
SATA ................................ 64, 67, 68, 69, 70
Security 16, 74, 95, 97, 137, 138, 139, 140,
145
Self Healing .............................................. 66
Self Organizing Systems .................... 50, 52
Self-organizing .......................................... 65
SNMP ................................................. 45, 47
Social Computing ................................... 16
STR .............................................. 54, 55, 56
Sustainability . 13, 18, 19, 20, 21, 23, 24, 27,
28, 29, 30, 31, 81, 82, 84, 104, 150, 151,
167
T10 DIF ..................................................... 68
TCO ............................................ 22, 58, 135
Telco ......................................................... 94
Texas ........................................................ 36
triple bottom line ....................................... 31
United States ............ 20, 33, 35, 36, 37, 143
VERITAS .................................................. 43
Virtualization .... 14, 17, 42, 51, 70, 90, 138,
153
VM ................ 17, 43, 71, 106, 121, 156, 157
VTL ............................................. 61, 63, 173
Warehouse-Scale ..................................... 11
WDM ......................................................... 80
WORM ...................................................... 94
WSCs ..... 109, 110, 111, 112, 114, 115, 116
Zantaz ....................................................... 94
Zetta ......................................................... 94
Top Related