Cultivating a Crystal Ball for Data Center Availability and Performance

8
PLAYBOOK F O R C H A N G E Cultivating a Crystal Ball for Data Center Availability and Performance

Transcript of Cultivating a Crystal Ball for Data Center Availability and Performance

Page 1: Cultivating a Crystal Ball for Data Center Availability and Performance

PLAYBOOKF O R C H A N G E

Cultivating a Crystal Ball for Data Center Availability and Performance

Page 2: Cultivating a Crystal Ball for Data Center Availability and Performance

Cultivating a Crystal Ball for Data Center Availability and Performance

Once Upon a Time . . .Data center managers had a crystal ball that could see into servers; monitor power and cooling systems; and correct problems before they occurred. They could see where every watt of energy was going and how it was being used.

All of that real-time information was mined and analyzed and integrated so that each person on the management team saw exactly what they needed to see when they needed to see it.

As a result, downtime was a distant memory. Over-provisioning was unnecessary. Stranded capacity didn’t exist. Workloads were managed so that equipment utilization rates were always high, optimizing efficiency and eliminating unnecessary capital investments. They had a better understanding of costs and could make smart decisions about when and how to use virtual and cloud-based resources.

With consistently high availability, dynamic capacity and superior efficiency, the data center management team was always one step ahead of the business.

And, of course, they all lived happily ever after.

Why You Need a Data Center Crystal Ball

•Data center managers use four different software platforms on average to manage their physical infrastructure1

•The average data center utilizes only 62% of available rack space1

•The average cost of a data center downtime event is $505,0002

•Only 20% of organizations have mechanisms in place to evaluate and justify cloud ROI3

•The average data center PUE is 1.8 4; state-of-the-art data centers have PUEs of 1.3 or less

Turning Fantasy into RealityThe good news is that this particular fairy tale is quickly becoming a reality. Technologies are emerging today that give IT management unprecedented visibility into real-time operations.

With that kind of visibility into the present, it is much easier to plan for the future.

This playbook tells you how to cultivate your own crystal ball for data center decision making. And, while we can’t promise “happily ever after,” following the suggestions in this playbook will deliver higher availability, better efficiency and improved planning.

1

1 Emerson Network Power customer insight studies

2 Emerson Network Power and Ponemon Institute Cost of Downtime Study

3 Open Group 2012 Cloud Computing Survey

4 Uptime Institute 2012 Data Center Survey

Page 3: Cultivating a Crystal Ball for Data Center Availability and Performance

Cultivating a Crystal Ball for Data Center Availability and Performance

2

Assessing Current Systems and PracticesDoes your data center look the way the design engineers envisioned it would?

If it’s more than three years old, chances are it doesn’t. Even new facilities go through an abrupt transition when designers hand the reins over to operators. From that point forward, the facility is constantly evolving as new hardware is added or moved and new applications are deployed.

Assessments perform two valuable functions. First, they identify vulnerabilities in critical systems that, if left unaddressed, could lead to downtime. Second, they identify opportunities to reduce costs and save money.

Data Center Assessment ServicesElectrical Determines if the electrical system is adequate to support the data center load both now and in the future. Evaluates the integrity of a facility’s power system and isolates potential problems and vulnerabilities.

What Assessments Can Tell You

• Whether power and cooling systems have the capacity to support current and future loads

• Whether data center equipment is at risk of failure from overheating

• Whether there are flaws in the electrical system that could lead to failure

• Where there are opportunities to improve efficiency

Thermal Enhances operational performance, exposes vulnerabilities within the cooling system that could lead to equipment failure, and ensures cooling systems can handle current and anticipated loads. Computational Fluid Dynamics (CFD) can enhance the assessment by documenting airflow patterns and identifying underfloor obstructions and hot spots.

Efficiency assessment Evaluates electrical and cooling infrastructure with CFD modeling to identify opportunities to reduce energy costs.

Infrared scan Identifies defective components, degraded electrical connections and other conditions that could result in a fire or electrical breakdown using a non-invasive method.

Short-circuit and coordination studies Uncovers inadequately rated or uncoordinated protection equipment to prevent damage to critical equipment and eliminate nuisance maintenance trips.

One-line diagram Creates a one-line diagram for the data center, which serves as a blueprint for operational, maintenance and testing activities.

Arc Flash study Analyzes electrical hazards and recommends ways to mitigate hazards, including equipment labeling, personal protection equipment and training.

Even the best crystal ball can’t prevent problems if basic systems are inadequate, poorly configured or not maintained.

Page 4: Cultivating a Crystal Ball for Data Center Availability and Performance

Cultivating a Crystal Ball for Data Center Availability and Performance

Configuring Your Crystal BallThe data center is a complex ecosystem of interdependent systems. Configuring a management platform to see across those systems – and consolidate and analyze data to create meaningful information – requires a three-tiered approach.

1. Local monitoring The foundation for the data center crystal ball comes at the device level with the ability to remotely monitor and access various data center systems. Through monitoring, data center personnel have visibility into equipment operating status and receive real-time alerts and alarms to notify them of potential problems. It also enables remote access for faster response to equipment problems.

2. Global planning and aggregation Pulling in data from devices across the data center creates the ability to identify dependencies and optimize systems, such as cooling or power. Aggregated data can also be used to address key planning questions, such as:

• Is there enough space, power and cooling to meet future needs?

• How can equipment be commissioned and decommissioned more efficiently?

• Are systems working in concert to optimize efficiency?

3. Intelligence At the enterprise level, data is turned into business intelligence that provides data center personnel with the ability to proactively address issues before they affect operations, respond more quickly to changes in the infrastructure, and make better decisions about future requirements. This “data center intelligence” can be used to extend the life of the data center, reduce mean-time-to-repair, synchronize infrastructure virtualization, minimize capital expenditures and analyze performance against SLAs.

With each level, the data center monitoring and management system becomes more valuable by providing a more holistic view and more meaningful information to IT and facilities personnel, allowing them to optimize performance while maintaining or improving availability.

What You’ll Learn from a Data Center Crystal Ball

• Device-level power consumption

• Real-time facility PUE

• Power, cooling and server utilization

• Alarms related to environmental conditions and power quality across the data center

• Data center physical configuration

• Location of stranded capacity

3

EnterpriseIntelligence

Data Aggregation

Local Monitoring

Page 5: Cultivating a Crystal Ball for Data Center Availability and Performance

4

Cultivating a Crystal Ball for Data Center Availability and Performance

Plan Globally, Monitor Locally A comprehensive approach to data center monitoring includes IT equipment, power, cooling and space.

Server health monitoring Service processors provide visibility into a server’s on-board instrumentation to improve asset productivity, lower operating costs and speed mean-time-to-repair. They provide insight into server temperature, CPU status, fan speed and voltages as well as remote reset or power-cycle capabilities. Service processors also store event logs related to server hardware and can be programmed to trigger alerts if operating thresholds are exceeded.

Temperature monitoring Uneven temperatures across the data center can damage equipment and waste energy. Installing a network of temperature sensors helps ensure that all equipment is oper-ating within the ASHRAE recommended temperature range (64.4° F to 80.6° F). By sensing temperatures at multiple loca-tions, the airflow and cooling capacity of precision cooling units can be more precisely controlled, resulting in more efficient operation. Additionally, cooling costs can be cut by allowing safe operation closer to the upper end of the temperature range.

Power monitoring Power monitoring can prevent overloading and help improve efficiency. Power should be monitored at the Uninterruptible Power Supply (UPS), the room Power Distribution Unit (PDU) and within the rack. The best view of IT power consumption comes from the power distribution units inside racks, which

Data That Fuels the Crystal Ball

• Cooling system supply and return air temperatures

• Server inlet temperatures

• Power consumption at the facility, rack and device level

• Server CPU status and temperature

• UPS system status, capacity and battery health

• Environmental conditions within the rack

• Alarms across all systems

enable continuous monitoring of volts, kilowatts (kW), amps and kW per hour. In addition to more effective power management, rack PDUs support more accurate chargeback of IT services and identify stranded capacity. UPS batteries should also be monitored, as battery failure is the leading cause of UPS system loss of power.

Rack monitoringVisibility into conditions in the rack can prevent many of the most common threats to rack-based equipment. A rack monitoring unit can be configured to trigger alarms when rack doors are opened, when water or smoke is detected, or when temperature or humidity thresholds are exceeded.

Leak detection Fluid leaks can cause immediate and permanent damage to IT equipment. Leak detection systems use strategically located sensors to detect leaks across the data center and trigger alarms to prevent damage. Sensors should be positioned at every point fluids are present in the data center, including around water and glycol piping, humidifier supply and drain lines, condensate drains and unit drip pans.

Page 6: Cultivating a Crystal Ball for Data Center Availability and Performance

Cultivating a Crystal Ball for Data Center Availability and Performance

Aggregating Data Across SystemsThe data center doesn’t just power the digital world, it mirrors its challenges. There is so much data being produced today that it can prove difficult to capture and aggregate it all.

Rack-Level Data CollectionA new type of data center appliance has emerged to address the challenge of collecting the huge volumes of operating data being generated in any given data center rack. These appliances consolidate KVM, serial console, rack PDU, embedded service processor and environmental management in a single device. This saves rack space and power consumption compared to deploying multiple devices for alerts, telemetry, environ-mental sensors, and device access and control; however, the biggest benefit is the ability to scale data collection and aggregate the stream of real-time operating data generated within a particular rack.

Who Benefits from the Data Center Crystal Ball?

• IT executive management responsible for IT strategy and business alignment

• Data center management

• Facility management

• Data center personnel responsible for server deployment and capacity management

• Data center personnel responsible for infrastructure systems, such as power and cooling

System-Level ManagementThere are multiple opportunities to use aggregated data to get a more holistic view of data center systems. For example, when temperature-sensor and cooling-unit data from across the data center are brought together they can be used to manage all cooling units as one system, improving system efficiency and performance.

On a rack- or aisle-basis, environmental data can be integrated with power data from the rack PDUs to get a more complete view of what is happening in certain areas of the data center.

Finally, power usage data can be brought together from across the power system to measure Power Usage Effectiveness (PUE) or support other efficiency initiatives.

5

Server

Operating

Data Devi

ceLe

vel Po

wer

Cons

umpt

ion

Rack

Env

ironm

ent

Rack A

larms

Appliance

SCALABLE DATAAGGREGATION

Page 7: Cultivating a Crystal Ball for Data Center Availability and Performance

- Electrical

PowerCooling

Server Health

Environment

We Are Here

MONITOR

Systems

Efficiency

DCIM+ITSM

CONTROL

EnvironmentAlarms

Aggregate

AssessCurrentState

MANAGEINTEGRATE

- Thermal- Efficiency

RealtimeOptimization!

Cultivating a Crystal Ball for Data Center Availability and Performance

Transforming Data into Business IntelligenceWhen real-time operating data from across the data center is analyzed, integrated with ITSM application data, and presented in meaningful ways to facilities and IT management, real-time data center optimization becomes a reality.

ITSM maps the relationships between applications and the IT resources that support them, while DCIM does the same for IT resources and the facility assets that support them. Together, the two deliver a holistic view of the application support system.

How the Crystal Ball Drives Efficiency and Availability

•Increase collaboration across IT and facilities in planning and controlling changes

• Proactively manage capacity based on real-time visibility into IT and facilities infrastructures

• Identify and rectify data center issues before they affect operations

• Increase equipment utilization and asset productivity

• Eliminate stranded capacity

• Accurately project the ROI of cloud-based resources

When presented visually to experienced personnel capable of interpreting that information and projecting it into the future, the integrated system enables data center personnel to:

• Collaborate, plan and control changes

• Proactively prevent downtime

• Discover and use hidden capacity

• Calculate actual costs to support applications or users

With the emergence of cloud computing, analytics, the mobile workforce and socially connected markets, businesses will increasingly demand efficient, highly available and agile data centers. That is what integrated, holistic data center management makes possible.

In summary, you don’t need a “crystal ball” just to help you see the future of your data center; you need it to help you create the future of your business.

6

Page 8: Cultivating a Crystal Ball for Data Center Availability and Performance

EmersonNetworkPower.com

Emerson Network Power and the Emerson Network Power logo are trademarks and service marks of Emerson Electric Co. All other trademarks are the property of their respective owners. ©2013 Emerson Electric Co. PB 00002 (04/13)

About Emerson Network PowerEmerson Network Power provides efficient, reliable critical infrastructure solutions for data centers, communications networks, healthcare and industrial facilities around the world. With proven innovations in power, thermal management, IT solutions and a global network of service experts covering more than 150 countries, we make the future of communications and information technology possible.

We understand how data center infrastructure is becoming more complex at almost every level, and more essential to the success of the business than ever before. Get the insight and resources you need to lead your organization into the future at EmersonNetworkPower.com/CIOtopics.