The High Availability Mantra - How DCIM Can Help
-
Upload
greenfield-software-private-limited -
Category
Documents
-
view
280 -
download
0
description
Transcript of The High Availability Mantra - How DCIM Can Help
1
High Availability Mantra: How DCIM Can Help
2
Today’s Topics
• High Availability Mantra Revisited
• Anatomy of a DCIM Software: GFS Crane
• How GFS Crane DCIM Delivers Higher Availability
• How GFS Crane DCIM Helps to Reduce Costs
• GFS Crane DCIM Case Studies
3
The High Availability Mantra Revisited The High Availability Mantra Revisited
Amazon Data Centers (built to Tier 4 standards and with an expected availability of 99.995%) had two outages in 2012 – each over 3 hours!
• Tier 3/Tier 4 just defined by hardware redundancies
• Glaring gaps in operating procedures to prevent fatal human errors
• Lack of purpose-built BCP software to predict failures
• Lack of chain of custody to detect root cause
Amazon Data Centers (built to Tier 4 standards and with an expected availability of 99.995%) had two outages in 2012 – each over 3 hours!
• Tier 3/Tier 4 just defined by hardware redundancies
• Glaring gaps in operating procedures to prevent fatal human errors
• Lack of purpose-built BCP software to predict failures
• Lack of chain of custody to detect root cause
Availability % Downtime per year Downtime per month* Downtime per week
99% ("two nines") 3.65 days 7.20 hours 1.68 hours
99.5% 1.83 days 3.60 hours 50.4 minutes
99.8% 17.52 hours 86.23 minutes 20.16 minutes
99.9% ("three nines") 8.76 hours 43.8 minutes 10.1 minutes
99.95% 4.38 hours 21.56 minutes 5.04 minutes
99.99% ("four nines") 52.56 minutes 4.32 minutes 1.01 minutes
99.999% ("five nines") 5.26 minutes 25.9 seconds 6.05 seconds
99.9999% ("six nines") 31.5 seconds 2.59 seconds 0.605 seconds
99.99999% ("seven nines") 3.15 seconds 0.259 seconds 0.0605 seconds
4
Did You Know?
90% of DC Failures Are From Common Preventable Causes 90% of DC Failures Are From Common Preventable Causes
5
Did You Know?
Average Failure of an Online System: 36 hours per annum. That’s only 99.6% Uptime
Average Failure of an Online System: 36 hours per annum. That’s only 99.6% Uptime
6
Did You Know?
75% of Businesses Without a BC Plan Fail Within 3 Years after a Major Disruption in their IT Systems
75% of Businesses Without a BC Plan Fail Within 3 Years after a Major Disruption in their IT Systems
7
Anatomy of a DCIM Software: GFS Crane
8
Improves Availability: Predictability, Visibility & Change Tracking
Advanced Alarm Management and analytics helps in failure predictability, faster turn-around-time, improved availability and SLA Consolidation of alarms from different facilities helps in centralized monitoring
Improved visibility of the power chain and the relationships among critical components of the infrastructure helps in better impact analysis of device malfunction or failure and doing RCA
Change Tracking in the data center environment helps in doing impact analysis of any change and root cause analysis of any outage occurring due to a change
Predictive Analytics Predictive Analytics
Visibility from Power Chain
Visibility from Power Chain
Change Tracking Change Tracking
9
Improves Availability: Predictability from Proactive Alarms
Proactive Real-time alarms Alarms on power, PUE and environmental conditions like temperature, humidity, smoke, fire, WLD, door-open and motion Alarms can be sent on e-mail & SMS
Alarm Dashboard Alarms from multiple data centers are consolidated on a dashboard Analysis on alarms based on severity, type, source, duration etc.
Advanced Alarm Management helps in failure predictability, faster turn-around-time, improved availability & SLA compliance
10
Improves Availability: Visibility from Power Chain
Maps relationships among critical components of electrical infrastructure Create power chain for electrical infrastructure Map asset relationships and redundancies starting from power source to customers and applications
Asset Relationship Mapping
Improved visibility of the power chain and relationships among critical components of
the infrastructure help in better impact analysis of device malfunction or failure
and doing root cause analysis
11
Improves Availability: Change Tracking
Maintains an audit trail for all Installation/Move/Add/Change activity in the data center Integration with existing ITSM tool enables running the tracked changes through a workflow system for change approvals
Audit Trail of DC Configuration Changes
Tracking changes in the data center environment helps in doing impact analysis of any change and root cause analysis of any outage occurring due to a change
12
Reduces Cost: Capex & Opex
Better visibility helps discovering under-utilized computing capacities -> defers capex purchases Better visibility helps avoiding stranded capacities on rack space & power use: maximizes utilization of available capacities
Better monitoring & analytics reduces operating cost on power Automation of processes like Asset Tracking, Provisioning & Monitoring improves productivity Rationalizing asset base helps in lower maintenance costs like equipment AMC
Reduces Capex Reduces Capex
Reduces Opex Reduces Opex
13
Reduces CapEx: Monitoring IT Utilization
Visibility of hidden compute capacity Calculates the average utilization of all computing devices in the data center Identifies the unused compute capacity
Under-utilized servers can be repurposed Based on power consumption & utilization patterns, hardware specs and age, ‘Repurpose Candidates’ are identified that helps in deferring new server hardware purchase
Hidden Computing Capacity
Repurpose Hardware
Discovery of hidden compute capacity defers capital investment on new server hardware and software licenses
14
Reduces Capex: Minimizing Stranded Capacities
Visibility of consumed power against max capacity in a rack Provides real-time information on actual IT load in a rack Provides maximum power capacity Provides available power capacity
Visibility of occupied rack space against max available space Provides real-time information on occupied space in the rack in RU Provides maximum space capacity Provides available space capacity
Hidden Power Capacity
Hidden Space Capacity
15
Reduces OpEx: Power Costs
Multi-level PUE Comparison Compares PUE calculated at multiple levels and identifies power distribution losses that can be rectified to improve efficiency and reduce OpEx on Power
Detect Power Distribution Loss
L1 PUE: UPS Output
L2 PUE: PDU Output
L3 PUE: Device-level reading Detection of power distribution losses in the
electrical infrastructure helps in improving energy efficiency of the data center and reduce operating cost on power
16
Reduces Opex: Process Automation & Improved Productivity
Automated discovery and inventory of both IT and infrastructure assets Intelligent assets are automatically discovered using SNMP/IPMI Manufacturer Repository contains information on static attributes of assets Assets data imported from spreadsheets or asset management tool Single management console to manage IT and non-IT assets Maintenance management for assets done using plug-ins that sends scheduler based proactive alerts Workflow-based auto-provisioning improves speed and reduces errors
Advanced Asset Management
17
Reduces Opex: Asset Rationalization
Asset Rationalization Asset Management module tracks & maintains inventory of all assets (IT
& non-IT) in the data Centre. Helps identify legacy servers and replacement candidates Reduces AMC, space rentals
Asset Rationalization
Asset Rationalization
Server Virtualization
Server Virtualization
Capacity Planning
Capacity Planning
Data Center Consolidation
Data Center Consolidation
GFS Crane
DC DCIM
GFS Crane
DC DCIM
Legacy Data Center
Legacy Data Center
Server & Rack Consolidation Server & Rack Consolidation
Multiple Data Centers
Multiple Data Centers
18
How GFS Crane DCIM Helps
• Helps Data Center Manager avoid unnecessary over-provisioning • Helps plan investments and new capacity • Helps reduce the capital costs • Helps reduce power use and other operating costs • Helps reduce risk of failures through critical alerts • Helps adapting to technical and business change more easily • Helps improvement plans through real-time metrics & dashboard
19
GFS Crane DCIM Case Study 1: Financial Services
Industry Project Financing & Mutual Funds
Data Center Location India
Data Center Details Tier III certified by 451 Research, Energy Efficient ‘green’ Data Center certified by TÜV Rheinland
DCIM Implementation date
January, 2012
Business requirement driving DCIM implementation
Improve energy efficiency through better energy management Comply with Green Grid recommendations and adopt best practices in data center operations Improve data center availability and meet business SLA through better monitoring, failure prediction and faster turn-around-time
Integration Touch
Points
Power Systems: LT transformer panels, UPS, PDUs and Distribution Panels, BUSBAR panels, Multifunction Energy Meters. Environmental Systems: PAC units, temperature and humidity probes Servers, Network devices, Storage devices
Siemens Building Management System
20
Industry Mobile Operator
Data Center Location South Asia
Data Center Details Multiple data centers spread across 4 locations, covering 8,500 sq.ft. of whitespace and housing 320 racks
DCIM Implementation Date
Ongoing
Business requirement driving DCIM implementation
Improve data center efficiency through better energy management Improve operational efficiency through better asset management, capacity planning and converged infrastructure monitoring capability Improve data center availability and meet business SLA through better monitoring, failure prediction and faster turn-around-time
Integration Touch Points
Power Systems: LT transformer panels, UPS, A/C & D/C PDUs and Distribution Panels, BUSBAR panels, Multifunction Energy Meters. Environmental Systems: PAC units, temperature and humidity probes Diesel generator, flow and level sensors
IBM Netcool (ITSM), VESDA, ACS and IP Surveillance
GFS Crane DCIM Case Study 2: Telecom
21
Thank You http://www.greenfieldsoft.com Email: [email protected]
See other two in this series: - The Modern Data Center Topology: The High
Availability Mantra - Data Center Infrastructure Management:
ERP for the Data Center Manager