T7 BCP's Big Secret · Dual power supply per server eliminates single point of failure but...
Transcript of T7 BCP's Big Secret · Dual power supply per server eliminates single point of failure but...
1
Dave Sonner Central Area Sales DirectorEmerson Network Power - Liebert
BCP’s Big Secret: It’s Powerful and Cool!
Agenda� Business and IT resiliency
� Protecting the IT systems for resiliency
� Threats to system availability
� New technologies require new power and cooling strategies
� The assessment process
� Evaluating vulnerabilities
� Key power considerations: Redundancy and single points of failure
� Key cooling considerations: High-density equipment
� Key monitoring considerations: Predictive analysis
� Case Study
� Next Steps
Our Focus Today
• Elements
– Power
• Maintain availability and integrity
– Cooling
• Designed to handle electronic loads 24x7
– Monitoring
• Immediate event notification and predictive
2
CEOs are Aiming to Achieve Four Major Goals
Source: Gartner Inc. - CEO Study February, 2004
Grow revenue
Improve productivity
Increase profitability
Manage risk
Through business innovation
Through efficiency
Through effectiveness
Through risk management and compliance
Addressing risk management and ensuring effectiveness demands that the organization put in
place a comprehensive continuity program
What we see today, is that Business Continuity efforts at many organizations are shifting from recovery (reactive) to resiliency
(proactive adaptability)
Resiliency: Shifting from Reactive to Proactive
Business Resilience Is About
� Ensuring the survival of the enterprise
� Positioning the business to seamlessly take advantage of and benefit from change
� Supporting an environment of innovation and anticipation
� Maintaining the ability to achieve regulatory and governance obligations
� Ensuring that process and policy do not deter action and success
� Protecting brand, reputation and integrity
3
IT Resilience Is About
� Ensuring the organization’s ability to respond to business challenges
� Allowing seamless and continuous business transactions to occur
� Maintaining flexibility to meet changing IT demands
� Providing an infrastructure that compliments the rate of change
� Enabling the organization to meet security and privacy obligations
� Monitoring and predicting change to ensure continuity of operations
Infrastructure Resilience Is About
ReliabilityEliminating all sources of failurefrom component to system level
Total CostEnsuring lowest life cycle
costs
FlexibilityProviding path to respond easily to
changing business needs
Traditional Approach to Infrastructure
Network
Closets
Less CriticalMost Critical
Computer RoomsData Center
V700
V700
V700
V700
V700
V700
Desktop
4
Resiliency Requires New Approach
Critical
Computer RoomsData Center
V700
V700
V700
V700
V700
V700
DesktopNetworkClosets
Power and Cooling Matter
• Power and cooling infrastructure is the foundation for network hardware and software
• Typical spend on critical power and cooling is 3% of total data center cost
• Necessary to take ‘system’ view of investment
Threats Include:
• People
• Facilities
• Power
• Natural disasters
• Inadequate service level agreements
• External agency and organizations
• IT Initiatives
5
Regulations Raise the Stakes
• Regulations “suggest” availability of systems within a “reasonable” timeframe
• Require sufficient UPS runtime for short outages
• Generator power for prolonged outages
• Business model - outages can be tolerated but not for a prolonged period of time
• Senior Management is highly vested in ensuring data integrity and availability
Power and Cooling Challenges
IT In
itia
tive
s
70%
45%
65%
Source: 2006 Emerson Network Power/Continuity Insights Survey
• 76% do not test generator systems by switching their load
from utility to generator
• 68% have not determined how long their computers can
operate without cooling
• 44% say their cooling systems are not redundant
• 30% are not monitoring power and cooling systems
• 27% have backup power systems without redundancy
• 27% do not know if power and cooling infrastructures can
support network expansion.
Is Your Power and Cooling Infrastructure Prepared?
Source: 2006 Emerson Network Power/Continuity Insights Survey
6
How To Assess Your Infrastructure
The Assessment Process
Evaluate Evaluate
VulnerabilitiesVulnerabilities
DetermineDetermine
Actions Actions
RequiredRequired
Prioritize Prioritize
ActionsActions
RequiredRequired
Implement Implement
ActionsActions
Review,Review,
Adjust,Adjust,
ReviseRevise
Identify Identify
AssetsAssetsConduct Gap Conduct Gap
AnalysisAnalysis
Applied to Power, Cooling,
Monitoring, and Management Practices
Quantify Quantify
Downtime Downtime
CostsCosts
Evaluating Vulnerabilities
7
Key Power Questions1. Has your load been calculated based on a combination of real power
usage and planned expansion?
2. Are your line-drawings up to date so you can identify single points
of failure?
3. In a multiple-bus system, are computer systems with multiple power supplies plugged into circuits from different UPS sources?
4. Do you know if your UPS batteries can still provide the runtime
originally specified?
5. Do you switch your load to generator to test, instead of just turning
on your generator?
Key Cooling Questions
1. Have you calculated the amount of time your computer systems areas can operate without cooling in the event of an outage?\
2. Are you using comfort or precision cooling?
3. Are your racks arranged in hot aisle / cold aisle configuration?
4. Do you have adequate cooling redundancy with loads distributed
between multiple cooling systems?
5. Do you know the MTBF and expected life span of your cooling
equipment components to ensure you have performed adequate preventative maintenance?
6. Do you inspect your racks routinely for hot spots and document temperature measurements for trending?
Key Monitoring Questions
1. Do you monitor for heat, smoke, humidity, and waterleakage in your computer rooms?
2. Do you have UPS battery monitoring systems in place and a preventative maintenance program?
3. Do you have UPS and cooling system monitoring in place?
4. How often do you review your monitoring logs?
8
Best Practices:
Ensuring Redundancy and Eliminating Single Points of Failure
Technology Trends – Data Center Power
Rack6 kW
UPS
PDU
Utility Source
Dual Input PDU
LBSUPS 1
Load
STS
Load
STS
Load
Rack24 kW
Utility Source / Generator #2
Utility Source / Generator #1
UPS 2
PDU
PDUPDU
PDUPDU
1990’s Today
IT Requirements and UPS Architecture
Zero downtimeDual busRedundant UPS or DC UPS
(7-8)
99.99999% to
99.999999
Computer / communication system is the business
Tier
4
Increased uptime (business continuity)
Redundant UPS
(5-6)99.999% to 99.9999%
Computer / communication system is critical to business
Tier
3
(4)
99.99%
Orderly shutdown and improved uptime
Fewer unexpected interruptions
Data integrity.
Some business interruption acceptable
Tier
2
None
(3)
99.90%Avoid power transients
Protect
hardware
Tier
1
Single Module System
Requirements Concerns9’s of
Availability
UPS
System
9
Eliminate Single Points of Failure
Rack 4
Electrical Dist.
Panel
Rack 1
Rack 2
Rack 3
UPS A
⌧
⌧
⌧
⌧
� Single UPS� Single electrical distribution panel
� Single cable per computer device� Single power supply per server
⌧
= Single Point Failure
Tier II
UPS A
Rack 4
Electrical Dist.
PanelRack
1
Rack 2
Rack 3
⌧
⌧
����
⌧
⌧ = Single Point Failure
� Single UPS� Single electrical distribution panel
� Single cable per computer device� Dual power supply per server eliminates single point
of failure but increases cabling under the floor.
Eliminate Single Points of FailureEliminate Single Points of Failure
Tier II
UPS A
Rack 4
Electrical Dist.
PanelRack
1Rack
2Rack
3
⌧
⌧
����
⌧ ����
⌧
= Single Point Failure
� Single UPS
� Single electrical distribution panel� Smart power strips eliminate excess cabling & improve
air flow under the floor.
Eliminate Single Points of FailureEliminate Single Points of Failure
Tier II
10
UPS A
Electrical
Dist. Panel
Electrical Dist. Panel
Rack 4
Electrical Dist. Panel
Rack 1
Rack 2
Rack 3
⌧
����
����
����
⌧
= Single Point Failure����
� Single UPS
� Redundant electrical distribution panel and smart power strips.
Eliminate Single Points of FailureEliminate Single Points of Failure
Tier III
Eliminate Single Points of FailureEliminate Single Points of Failure
UPS A
Electrical
Dist. Panel
Electrical Dist. Panel
Rack 4
Electrical Dist. Panel
Rack 1
Rack 2
Rack 3
����
UPS B
����
��������
����
= Single Point Failure
� Redundant UPS
Tier
IV
System Architecture Flexibility
• Change system configurations to get the right availability
• Trade capacity for availability
• Protect the initial investment
System A
1 + 1
System B
2 + 1
System C
2 + 2
Move in any direction
11
Maintain Current One-Line Diagram
FDC B
FDC A
Npower B
Npower A
Overhead Power cabling
Data Cabling through Cabinet
Building Switch-
gear
Utility
ATS
Generator
Server Power Supply Requirements Demand Flexibility
Rack 1
Rack 2
Electrical Dist.
Panel
Electrical
Dist. Panel
1990’s Today
Rack 1
Rack 2
Electrical Dist.
Panel
Electrical
Dist. Panel
Distribution Flexibility – Provides control at the rack and equipment levelsAdaptive rack distribution delivers control and visibility inside the rack to expand rack power requirements and support dual-corded devices
Adaptive In-rack
Power Distribution
12
Power Best Practices - Summary
• Understand how Power Tiers of
Protection can fulfill business resiliency
• Minimize single points of failure
• Document “As Built” Conditions
• Deploy flexible, Adaptive power systems
Heat Density and the
Blade Server Challenge
Mix of Heat Loads per Rack within a Data Center
(based on maximum power and configuration)
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
0-5 >5-10 >10-15 >15-20 >20-25 >25-30 >30
KW per Rack Range
% M
ix w
ith
in t
he D
C
2002
2006
2008
Technology Trends – Data Center Heat Loads
13
Cooling Becomes More Critical
Time Without Cooling (Minutes)
Avera
ge
Ca
bin
et A
ir I
nle
t T
em
pera
ture
10,000 sf. Data Center
Time to 40’C
450 W/sf 300 W/sf
150 W/sf
Best Practices:
Precision Cooling
Document Environmental Conditions
14
Back to Front – 10 kW/Rack
Rack
Rack
Rack
Rack
Rack
Ra
ck
Ra
ck
Rack
Rack
Ra
ck
Ra
ck
Rack
© Copyright 2005 Liebert Corporation. All rights reserved.
Hot Aisle / Cold AisleHot Aisle / Cold Aisle
Supplemental Cooling
Supplemental
Cooling Module
Pumping UnitWaterless Refrigerant
Plug-and-Play
15 kW Per Connector
15
Virginia Tech
• Third Fastest Supercomputer in the World, 2003 (10.28 TeraFlops)
Seventh Fastest in Nov. 2004
• 1,100 Dual Apple G5
• 3,000 sf Section of a 9,000 sf Data Center
• 18 Inch Raised Floor
• Initial Load = 193 W/sf (Tower model)
• Ultimate Load > 400 W/sf (1 U
model)
• Website: http://don.cc.vt.edu/
© Copyright 2005 Liebert Corporation. All rights reserved.
Rack Flexibility for Better Cooling
• Add expansion channels to make racks deeper for new
deeper equipment
• Deeper racks with cable management channels improve rack
airflow
• Reduce/remove obstructions under floor
• Use blanking panels in open rack spaces to prevent recirculation
• Air flow enhancers can be used on a limited basis to increase
rack air flow
Proper Cable Management Helps Cool the Rack
© Copyright 2005 Liebert Corporation. All rights reserved.
16
Best Practices:
Monitoring = Visibility + Control
19% End of
Discharge
18%
Bad Batteries
Regardless of the problem, monitoring is a critical part of avoidance
Source: Liebert Global Services
For example, battery monitoring can prevent the
two leading causes of dropped loads{ }
Mission-Critical Monitoring
Centralized monitoring
Local and remote monitoring
Remote
monitoring of UPS using
SNMP
17
Monitoring to Manage Distribution• Building Management System or dedicated support system monitoring for
UPS and PDU
• Load Management Monitoring
– Panel board monitoring
– Power strip monitoring & receptacle control
– Maximize panel board usage
• Provides Facilities & IT Managers overview
– Total distribution system
– Ability to respond to changing load requirements
Case Study
Case Study: Global ManufacturerSituation
– Small data center – 400 square feet
– Handles all data for global manufacturing and supply chain
Initial Problem
– Hot spots from adding blade servers
Analysis revealed dozens of risks and vulnerabilities– Comfort cooling
– Ineffective rack configurations
– Lack of cooling redundancy
– Poor air circulation – raised floor not high enough
– Lack of system visibility without monitoring
– Fire risk due to lack of suppression system
18
Risk Levels
Vulnerability to Downtime
Tolerance to Downtime
19
Financial Impact of Disruptions
Relative Tolerance to Downtime
Threat Priorities
20
Case Study: Global Manufacturer
Do not allow hot air from computer systems to be forced back onto the computer systems or the device can ultimately be damaged. Cooling systems should be configured in a hot aisle/ cold aisle arrangement and guarantee that cold supply air is directed to the cold aisle through perforated tile or an overhead system.
Configure your space in a hot aisle/cold aisle arrangement and guarantee that cold supply air is directed to the cold aisle through perforated tiles or an overhead system.
When planning the computer systems area, make sure that sufficient space is set aside for cooling as computer systems are added.
Make sure that sufficient space is set aside for cooling as computer systems are added. Consider supplemental cooling approaches that require minimal floor space.
Proper cooling configurations entail multiple units sharing the load at partial capacity. In the event that one unit fails, all cooling is done by the remaining units while the non-working unit is repaired or replaced.
Set up proper cooling configurations to entail multiple units sharing the load at partial capacity. Monitoring should be set up so that the administrator is notified if there is a unit failure.
Monitoring should be integrated with the cooling systems so that you can monitor for excessive humidity.
Integrate monitoring with the cooling systems to monitor for excessive humidity.
Water intrusion detection systems should be installed, maintained, and integrated with facilities monitoring.
Install water intrusion detection systems.
Install the piping infrastructure to support simple capacity expansions before they are required.
Install a piping infrastructure that allows for easy addition of supplemental cooling capacity.
PolicyTo Do
Case Study: Global ManufacturerRecommendations
– Add precision cooling sized for anticipated growth and redundancy
– Raise the floor height to promote air flow and reconfigure some tiles
– Reconfigure racks in hot aisle / cold aisle arrangement– Add halon system
IT Manager’s Response:“This is great - we could hang meat in here!"
Next Steps
21
Resolving Issues• Work with an outside consultant to perform a gap analysis and
compare your current situation with best practices
• Fix the easy problems now
• Gain organizational consensus and approval for more costly improvements
• Update information (one-lines, inventory, etc.):
– Every six months is typical
– Prior to budget review and allocation
– Change with new equipment or business requirements
• Keep abreast of current policies for best practices and standards
• Assign accountability with time and positions