T7 BCP's Big Secret · Dual power supply per server eliminates single point of failure but...

21
1 Dave Sonner Central Area Sales Director Emerson Network Power - Liebert BCP’s Big Secret: It’s Powerful and Cool! Agenda Business and IT resiliency Protecting the IT systems for resiliency Threats to system availability New technologies require new power and cooling strategies The assessment process Evaluating vulnerabilities Key power considerations: Redundancy and single points of failure Key cooling considerations: High-density equipment Key monitoring considerations: Predictive analysis Case Study Next Steps Our Focus Today Elements – Power Maintain availability and integrity – Cooling Designed to handle electronic loads 24x7 – Monitoring Immediate event notification and predictive

Transcript of T7 BCP's Big Secret · Dual power supply per server eliminates single point of failure but...

Page 1: T7 BCP's Big Secret · Dual power supply per server eliminates single point of failure but increases cabling under the floor. Eliminate Single Points of Failure Tier II UPS A Rack

1

Dave Sonner Central Area Sales DirectorEmerson Network Power - Liebert

BCP’s Big Secret: It’s Powerful and Cool!

Agenda� Business and IT resiliency

� Protecting the IT systems for resiliency

� Threats to system availability

� New technologies require new power and cooling strategies

� The assessment process

� Evaluating vulnerabilities

� Key power considerations: Redundancy and single points of failure

� Key cooling considerations: High-density equipment

� Key monitoring considerations: Predictive analysis

� Case Study

� Next Steps

Our Focus Today

• Elements

– Power

• Maintain availability and integrity

– Cooling

• Designed to handle electronic loads 24x7

– Monitoring

• Immediate event notification and predictive

Page 2: T7 BCP's Big Secret · Dual power supply per server eliminates single point of failure but increases cabling under the floor. Eliminate Single Points of Failure Tier II UPS A Rack

2

CEOs are Aiming to Achieve Four Major Goals

Source: Gartner Inc. - CEO Study February, 2004

Grow revenue

Improve productivity

Increase profitability

Manage risk

Through business innovation

Through efficiency

Through effectiveness

Through risk management and compliance

Addressing risk management and ensuring effectiveness demands that the organization put in

place a comprehensive continuity program

What we see today, is that Business Continuity efforts at many organizations are shifting from recovery (reactive) to resiliency

(proactive adaptability)

Resiliency: Shifting from Reactive to Proactive

Business Resilience Is About

� Ensuring the survival of the enterprise

� Positioning the business to seamlessly take advantage of and benefit from change

� Supporting an environment of innovation and anticipation

� Maintaining the ability to achieve regulatory and governance obligations

� Ensuring that process and policy do not deter action and success

� Protecting brand, reputation and integrity

Page 3: T7 BCP's Big Secret · Dual power supply per server eliminates single point of failure but increases cabling under the floor. Eliminate Single Points of Failure Tier II UPS A Rack

3

IT Resilience Is About

� Ensuring the organization’s ability to respond to business challenges

� Allowing seamless and continuous business transactions to occur

� Maintaining flexibility to meet changing IT demands

� Providing an infrastructure that compliments the rate of change

� Enabling the organization to meet security and privacy obligations

� Monitoring and predicting change to ensure continuity of operations

Infrastructure Resilience Is About

ReliabilityEliminating all sources of failurefrom component to system level

Total CostEnsuring lowest life cycle

costs

FlexibilityProviding path to respond easily to

changing business needs

Traditional Approach to Infrastructure

Network

Closets

Less CriticalMost Critical

Computer RoomsData Center

V700

V700

V700

V700

V700

V700

Desktop

Page 4: T7 BCP's Big Secret · Dual power supply per server eliminates single point of failure but increases cabling under the floor. Eliminate Single Points of Failure Tier II UPS A Rack

4

Resiliency Requires New Approach

Critical

Computer RoomsData Center

V700

V700

V700

V700

V700

V700

DesktopNetworkClosets

Power and Cooling Matter

• Power and cooling infrastructure is the foundation for network hardware and software

• Typical spend on critical power and cooling is 3% of total data center cost

• Necessary to take ‘system’ view of investment

Threats Include:

• People

• Facilities

• Power

• Natural disasters

• Inadequate service level agreements

• External agency and organizations

• IT Initiatives

Page 5: T7 BCP's Big Secret · Dual power supply per server eliminates single point of failure but increases cabling under the floor. Eliminate Single Points of Failure Tier II UPS A Rack

5

Regulations Raise the Stakes

• Regulations “suggest” availability of systems within a “reasonable” timeframe

• Require sufficient UPS runtime for short outages

• Generator power for prolonged outages

• Business model - outages can be tolerated but not for a prolonged period of time

• Senior Management is highly vested in ensuring data integrity and availability

Power and Cooling Challenges

IT In

itia

tive

s

70%

45%

65%

Source: 2006 Emerson Network Power/Continuity Insights Survey

• 76% do not test generator systems by switching their load

from utility to generator

• 68% have not determined how long their computers can

operate without cooling

• 44% say their cooling systems are not redundant

• 30% are not monitoring power and cooling systems

• 27% have backup power systems without redundancy

• 27% do not know if power and cooling infrastructures can

support network expansion.

Is Your Power and Cooling Infrastructure Prepared?

Source: 2006 Emerson Network Power/Continuity Insights Survey

Page 6: T7 BCP's Big Secret · Dual power supply per server eliminates single point of failure but increases cabling under the floor. Eliminate Single Points of Failure Tier II UPS A Rack

6

How To Assess Your Infrastructure

The Assessment Process

Evaluate Evaluate

VulnerabilitiesVulnerabilities

DetermineDetermine

Actions Actions

RequiredRequired

Prioritize Prioritize

ActionsActions

RequiredRequired

Implement Implement

ActionsActions

Review,Review,

Adjust,Adjust,

ReviseRevise

Identify Identify

AssetsAssetsConduct Gap Conduct Gap

AnalysisAnalysis

Applied to Power, Cooling,

Monitoring, and Management Practices

Quantify Quantify

Downtime Downtime

CostsCosts

Evaluating Vulnerabilities

Page 7: T7 BCP's Big Secret · Dual power supply per server eliminates single point of failure but increases cabling under the floor. Eliminate Single Points of Failure Tier II UPS A Rack

7

Key Power Questions1. Has your load been calculated based on a combination of real power

usage and planned expansion?

2. Are your line-drawings up to date so you can identify single points

of failure?

3. In a multiple-bus system, are computer systems with multiple power supplies plugged into circuits from different UPS sources?

4. Do you know if your UPS batteries can still provide the runtime

originally specified?

5. Do you switch your load to generator to test, instead of just turning

on your generator?

Key Cooling Questions

1. Have you calculated the amount of time your computer systems areas can operate without cooling in the event of an outage?\

2. Are you using comfort or precision cooling?

3. Are your racks arranged in hot aisle / cold aisle configuration?

4. Do you have adequate cooling redundancy with loads distributed

between multiple cooling systems?

5. Do you know the MTBF and expected life span of your cooling

equipment components to ensure you have performed adequate preventative maintenance?

6. Do you inspect your racks routinely for hot spots and document temperature measurements for trending?

Key Monitoring Questions

1. Do you monitor for heat, smoke, humidity, and waterleakage in your computer rooms?

2. Do you have UPS battery monitoring systems in place and a preventative maintenance program?

3. Do you have UPS and cooling system monitoring in place?

4. How often do you review your monitoring logs?

Page 8: T7 BCP's Big Secret · Dual power supply per server eliminates single point of failure but increases cabling under the floor. Eliminate Single Points of Failure Tier II UPS A Rack

8

Best Practices:

Ensuring Redundancy and Eliminating Single Points of Failure

Technology Trends – Data Center Power

Rack6 kW

UPS

PDU

Utility Source

Dual Input PDU

LBSUPS 1

Load

STS

Load

STS

Load

Rack24 kW

Utility Source / Generator #2

Utility Source / Generator #1

UPS 2

PDU

PDUPDU

PDUPDU

1990’s Today

IT Requirements and UPS Architecture

Zero downtimeDual busRedundant UPS or DC UPS

(7-8)

99.99999% to

99.999999

Computer / communication system is the business

Tier

4

Increased uptime (business continuity)

Redundant UPS

(5-6)99.999% to 99.9999%

Computer / communication system is critical to business

Tier

3

(4)

99.99%

Orderly shutdown and improved uptime

Fewer unexpected interruptions

Data integrity.

Some business interruption acceptable

Tier

2

None

(3)

99.90%Avoid power transients

Protect

hardware

Tier

1

Single Module System

Requirements Concerns9’s of

Availability

UPS

System

Page 9: T7 BCP's Big Secret · Dual power supply per server eliminates single point of failure but increases cabling under the floor. Eliminate Single Points of Failure Tier II UPS A Rack

9

Eliminate Single Points of Failure

Rack 4

Electrical Dist.

Panel

Rack 1

Rack 2

Rack 3

UPS A

� Single UPS� Single electrical distribution panel

� Single cable per computer device� Single power supply per server

= Single Point Failure

Tier II

UPS A

Rack 4

Electrical Dist.

PanelRack

1

Rack 2

Rack 3

����

⌧ = Single Point Failure

� Single UPS� Single electrical distribution panel

� Single cable per computer device� Dual power supply per server eliminates single point

of failure but increases cabling under the floor.

Eliminate Single Points of FailureEliminate Single Points of Failure

Tier II

UPS A

Rack 4

Electrical Dist.

PanelRack

1Rack

2Rack

3

����

⌧ ����

= Single Point Failure

� Single UPS

� Single electrical distribution panel� Smart power strips eliminate excess cabling & improve

air flow under the floor.

Eliminate Single Points of FailureEliminate Single Points of Failure

Tier II

Page 10: T7 BCP's Big Secret · Dual power supply per server eliminates single point of failure but increases cabling under the floor. Eliminate Single Points of Failure Tier II UPS A Rack

10

UPS A

Electrical

Dist. Panel

Electrical Dist. Panel

Rack 4

Electrical Dist. Panel

Rack 1

Rack 2

Rack 3

����

����

����

= Single Point Failure����

� Single UPS

� Redundant electrical distribution panel and smart power strips.

Eliminate Single Points of FailureEliminate Single Points of Failure

Tier III

Eliminate Single Points of FailureEliminate Single Points of Failure

UPS A

Electrical

Dist. Panel

Electrical Dist. Panel

Rack 4

Electrical Dist. Panel

Rack 1

Rack 2

Rack 3

����

UPS B

����

��������

����

= Single Point Failure

� Redundant UPS

Tier

IV

System Architecture Flexibility

• Change system configurations to get the right availability

• Trade capacity for availability

• Protect the initial investment

System A

1 + 1

System B

2 + 1

System C

2 + 2

Move in any direction

Page 11: T7 BCP's Big Secret · Dual power supply per server eliminates single point of failure but increases cabling under the floor. Eliminate Single Points of Failure Tier II UPS A Rack

11

Maintain Current One-Line Diagram

FDC B

FDC A

Npower B

Npower A

Overhead Power cabling

Data Cabling through Cabinet

Building Switch-

gear

Utility

ATS

Generator

Server Power Supply Requirements Demand Flexibility

Rack 1

Rack 2

Electrical Dist.

Panel

Electrical

Dist. Panel

1990’s Today

Rack 1

Rack 2

Electrical Dist.

Panel

Electrical

Dist. Panel

Distribution Flexibility – Provides control at the rack and equipment levelsAdaptive rack distribution delivers control and visibility inside the rack to expand rack power requirements and support dual-corded devices

Adaptive In-rack

Power Distribution

Page 12: T7 BCP's Big Secret · Dual power supply per server eliminates single point of failure but increases cabling under the floor. Eliminate Single Points of Failure Tier II UPS A Rack

12

Power Best Practices - Summary

• Understand how Power Tiers of

Protection can fulfill business resiliency

• Minimize single points of failure

• Document “As Built” Conditions

• Deploy flexible, Adaptive power systems

Heat Density and the

Blade Server Challenge

Mix of Heat Loads per Rack within a Data Center

(based on maximum power and configuration)

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

0-5 >5-10 >10-15 >15-20 >20-25 >25-30 >30

KW per Rack Range

% M

ix w

ith

in t

he D

C

2002

2006

2008

Technology Trends – Data Center Heat Loads

Page 13: T7 BCP's Big Secret · Dual power supply per server eliminates single point of failure but increases cabling under the floor. Eliminate Single Points of Failure Tier II UPS A Rack

13

Cooling Becomes More Critical

Time Without Cooling (Minutes)

Avera

ge

Ca

bin

et A

ir I

nle

t T

em

pera

ture

10,000 sf. Data Center

Time to 40’C

450 W/sf 300 W/sf

150 W/sf

Best Practices:

Precision Cooling

Document Environmental Conditions

Page 14: T7 BCP's Big Secret · Dual power supply per server eliminates single point of failure but increases cabling under the floor. Eliminate Single Points of Failure Tier II UPS A Rack

14

Back to Front – 10 kW/Rack

Rack

Rack

Rack

Rack

Rack

Ra

ck

Ra

ck

Rack

Rack

Ra

ck

Ra

ck

Rack

© Copyright 2005 Liebert Corporation. All rights reserved.

Hot Aisle / Cold AisleHot Aisle / Cold Aisle

Supplemental Cooling

Supplemental

Cooling Module

Pumping UnitWaterless Refrigerant

Plug-and-Play

15 kW Per Connector

Page 15: T7 BCP's Big Secret · Dual power supply per server eliminates single point of failure but increases cabling under the floor. Eliminate Single Points of Failure Tier II UPS A Rack

15

Virginia Tech

• Third Fastest Supercomputer in the World, 2003 (10.28 TeraFlops)

Seventh Fastest in Nov. 2004

• 1,100 Dual Apple G5

• 3,000 sf Section of a 9,000 sf Data Center

• 18 Inch Raised Floor

• Initial Load = 193 W/sf (Tower model)

• Ultimate Load > 400 W/sf (1 U

model)

• Website: http://don.cc.vt.edu/

© Copyright 2005 Liebert Corporation. All rights reserved.

Rack Flexibility for Better Cooling

• Add expansion channels to make racks deeper for new

deeper equipment

• Deeper racks with cable management channels improve rack

airflow

• Reduce/remove obstructions under floor

• Use blanking panels in open rack spaces to prevent recirculation

• Air flow enhancers can be used on a limited basis to increase

rack air flow

Proper Cable Management Helps Cool the Rack

© Copyright 2005 Liebert Corporation. All rights reserved.

Page 16: T7 BCP's Big Secret · Dual power supply per server eliminates single point of failure but increases cabling under the floor. Eliminate Single Points of Failure Tier II UPS A Rack

16

Best Practices:

Monitoring = Visibility + Control

19% End of

Discharge

18%

Bad Batteries

Regardless of the problem, monitoring is a critical part of avoidance

Source: Liebert Global Services

For example, battery monitoring can prevent the

two leading causes of dropped loads{ }

Mission-Critical Monitoring

Centralized monitoring

Local and remote monitoring

Remote

monitoring of UPS using

SNMP

Page 17: T7 BCP's Big Secret · Dual power supply per server eliminates single point of failure but increases cabling under the floor. Eliminate Single Points of Failure Tier II UPS A Rack

17

Monitoring to Manage Distribution• Building Management System or dedicated support system monitoring for

UPS and PDU

• Load Management Monitoring

– Panel board monitoring

– Power strip monitoring & receptacle control

– Maximize panel board usage

• Provides Facilities & IT Managers overview

– Total distribution system

– Ability to respond to changing load requirements

Case Study

Case Study: Global ManufacturerSituation

– Small data center – 400 square feet

– Handles all data for global manufacturing and supply chain

Initial Problem

– Hot spots from adding blade servers

Analysis revealed dozens of risks and vulnerabilities– Comfort cooling

– Ineffective rack configurations

– Lack of cooling redundancy

– Poor air circulation – raised floor not high enough

– Lack of system visibility without monitoring

– Fire risk due to lack of suppression system

Page 18: T7 BCP's Big Secret · Dual power supply per server eliminates single point of failure but increases cabling under the floor. Eliminate Single Points of Failure Tier II UPS A Rack

18

Risk Levels

Vulnerability to Downtime

Tolerance to Downtime

Page 19: T7 BCP's Big Secret · Dual power supply per server eliminates single point of failure but increases cabling under the floor. Eliminate Single Points of Failure Tier II UPS A Rack

19

Financial Impact of Disruptions

Relative Tolerance to Downtime

Threat Priorities

Page 20: T7 BCP's Big Secret · Dual power supply per server eliminates single point of failure but increases cabling under the floor. Eliminate Single Points of Failure Tier II UPS A Rack

20

Case Study: Global Manufacturer

Do not allow hot air from computer systems to be forced back onto the computer systems or the device can ultimately be damaged. Cooling systems should be configured in a hot aisle/ cold aisle arrangement and guarantee that cold supply air is directed to the cold aisle through perforated tile or an overhead system.

Configure your space in a hot aisle/cold aisle arrangement and guarantee that cold supply air is directed to the cold aisle through perforated tiles or an overhead system.

When planning the computer systems area, make sure that sufficient space is set aside for cooling as computer systems are added.

Make sure that sufficient space is set aside for cooling as computer systems are added. Consider supplemental cooling approaches that require minimal floor space.

Proper cooling configurations entail multiple units sharing the load at partial capacity. In the event that one unit fails, all cooling is done by the remaining units while the non-working unit is repaired or replaced.

Set up proper cooling configurations to entail multiple units sharing the load at partial capacity. Monitoring should be set up so that the administrator is notified if there is a unit failure.

Monitoring should be integrated with the cooling systems so that you can monitor for excessive humidity.

Integrate monitoring with the cooling systems to monitor for excessive humidity.

Water intrusion detection systems should be installed, maintained, and integrated with facilities monitoring.

Install water intrusion detection systems.

Install the piping infrastructure to support simple capacity expansions before they are required.

Install a piping infrastructure that allows for easy addition of supplemental cooling capacity.

PolicyTo Do

Case Study: Global ManufacturerRecommendations

– Add precision cooling sized for anticipated growth and redundancy

– Raise the floor height to promote air flow and reconfigure some tiles

– Reconfigure racks in hot aisle / cold aisle arrangement– Add halon system

IT Manager’s Response:“This is great - we could hang meat in here!"

Next Steps

Page 21: T7 BCP's Big Secret · Dual power supply per server eliminates single point of failure but increases cabling under the floor. Eliminate Single Points of Failure Tier II UPS A Rack

21

Resolving Issues• Work with an outside consultant to perform a gap analysis and

compare your current situation with best practices

• Fix the easy problems now

• Gain organizational consensus and approval for more costly improvements

• Update information (one-lines, inventory, etc.):

– Every six months is typical

– Prior to budget review and allocation

– Change with new equipment or business requirements

• Keep abreast of current policies for best practices and standards

• Assign accountability with time and positions