Presented by Mark Minasi [email protected] copyright 2009 Mark Minasi SVR315.
Ultan kinahan dr - minasi 2010
-
Upload
nathan-winters -
Category
Technology
-
view
1.418 -
download
3
description
Transcript of Ultan kinahan dr - minasi 2010
Who am I?
Ultan Kinahan Born & Raised in Ireland & moved states side in 1992
the week after I finished college & started working in a bar in Greenwich Village NYC… Landed first role in IT here in late ‘92
Currently; Regional IT Director of Brown & Brown Insurance Co. Previous roles included:
AOL – “Mapquest, Digital City & Moviefone” Network Admin for a realty firm in NYC Multiple Consulting positions
What constitutes a disaster?
Disaster is Natural, Recovery is Superhuman! A disaster is an unplanned event that
interrupts normal business operations. Types of events
“Acts of God” Floods Earthquakes Volcano's Snow Storms etc.
Man Made Acts of Terrorism
A few phrases we've all comes across
Risk Management Business Continuity Disaster Recovery
Recovery Time Objective (RTO) The target time for making an application available
Recovery Point Objective (RPO) The age of the data recovered
Data Center Where we store all our toys.
What is DRP (Disaster Recovery Planning) ?
Essentially the breakdown goes: RM (Risk Management) drives BCP (Business
Continuity Planning) BCP (Business Continuity Planning) drives DRP
(Disaster Recovery Planning) DRP – Mainly where the IT comes into play
and involves: Planning and implementation of procedures and facilities for use when essential systems are not available for a prolonged period of time
What is BCP (Business Continuity Planning) ?
Business continuity planning Is the creation and validation of a practiced logistical
plan for how an organization will recover and restore partially or completely interrupted critical (urgent) functions within a predetermined time after a disaster or extended disruption. The logistical plan is called a business continuity plan.
In plain language BCP is working out how to stay in business in the
event of disaster. Incidents include local incidents like building fires, regional incidents like earthquakes, or national incidents like pandemic illnesses.
A few Statistics
In 2009 Symantec released the results of its fifth annual Global IT Disaster Recovery survey; According to the report, 93% of organizations have had to
execute their DR Plans and the average cost: USA: $287,000 Canada: $496,500
The average budget for disaster recovery initiatives WORLDWIDE is $50 million USD – Not a lot really!
Average time it takes to "achieve skeleton operations after an outage" is 3 hours.
Average time to be fully "up and running after an outage," the average is 4 hours, states the report.
A few Statistics - continued
Executive-level involvement in DR plans is rising. In 2007, 55% of respondents reported DR committees
involved the CIO, CTO or IT director this dropped to 33 per cent in 2008 The number rose to 67 per cent in 2009
DR "becoming a competitive differentiator“ within organizations – especially in the financial sector
Also driven by budgets, Upper management making sure IT spends wisely in the current economy
Disaster Recovery Planning Objectives
Develop the ability to recover key business functions following a disaster
Recover systems based on a timeline as defined by an Internal Audit of operations IT Infrastructure (12-24 hours) Processing System (24-48 hours) Email (4 days) Financials (5 days)
What is mainly affected?
A Business Impact Analysis is essential to determine what core business functions would be most critical to restore following a disaster.
Internal Audits are one of the best options. Typical findings;
Financials (accounts receivables, accounts payable, etc)
HR/Payroll (Payroll, Benefits processing, etc) IT Infrastructure (to support the business
applications above)
What Systems or Services are Required?
Outline all business critical needs for restoration of services within the timeline determined by the analysis. Servers Desktops Laptops Networking Gear Data Lines Software
Home Grown Apps Vendor Based Apps
Vendor Contact List Etc…
What mediums are available?
Portable & Low Cost Options: Has its benefits CD’s, DVD’s USB Sticks or Drives Tape Disk Local Replication
Other Options Application Replication (Local site or Site to Site) Site to Site Replication (Branch site or Colocation) Cloud Storage or hosted applications Thin client computing – VDI
Mmm, What do I need to start?
First things first…Funding Approval!!! Purchase new storage, servers, software, licensing
etc. Can you use existing (non-production) systems at a
second data center? Data lines Configuration of servers, software & storage Setup external access ports for failover
MX Records, Terminal Server, Citrix, VPN etc. Testing failover
Bandwidth – The Replication Challenge
If you're lucky enough to get 70% of the bandwidth usable! you're likely to see transfer rates for a dedicated connection similar to those in the table below. Technolog
yMb/s Theoretical
GB/hExpected
GB/h
T1 1.536 0.66 0.46
10Base-T LAN
10 4.39 3.08
DS3 (T3) 43.2 18.98 13.29
100Base-T LAN
100 43.95 30.76
OC3 155 68.12 47.68
0C12 622 273 191.34
Bandwidth - Continued
Reasons for loss of bandwidth: The provider. In most cases with your standard T1 at
1.5MB your lucky to get 1.3MB then you have… The asynchronous replication engine is often based on
a protocol, protocol converter Application running on top of IP or TCP Transport protocol overhead Replication protocol overhead
Option – Double-Take
System Friendly Asynchronous Replication Low CPU Overhead Defined Memory and Disk Usage Write Order Intact Replication
Bandwidth Friendly Data Movement User-definable Replication Sets Compression, Scheduling, Scheduled Bandwidth
Throttling capabilitiesPoint-in-Time Recovery
Integration w/Volume Shadow Copy Services
Application Failover
Failover Monitoring
Replication
• IP ICMP or Heartbeat Monitoring
• Detect Failure in Seconds or Minutes
• Users can reconnect within minutes of failure
• Failover can occur across a LAN, WAN and even NAT
• Failover more than one Server Identity to the Same Target Server
• Failover Scripting for Custom Configurations
Source Target
Application Failback
Failover Monitoring
Restoration
Source Target
• Recover or Replace Source Server
• Restore Data to the Source Server
• Failback Source Identity
• Bring Source Application Online
• Users Reconnect Within Minutes
• Start Replication and Resume Failover Monitoring for Continued Protection
Replication
Value of Automated Site Recovery
VMware Site Recovery Manager provides cost savings from: Reduced recovery infrastructure requirements Fewer hours spent creating and maintaining DR plans and
processes Significantly reduced cost of DR tests; eliminates IT staff overtime
and application impact Recovery in a matter of hours, not days or weeks – greatly reducing
the financial exposure a company faces during a major outage The following captures an estimate of the cost savings provided by
SRM when used to recover from a major outage or disasterCompany that does $25M in revenue a year = ~$96k/weekday. Assume that SRM can achieve RTO of 12 hours instead of 72 hours compared to traditional DR plan.
= $ 615,385(per disaster)
X $96,153
Value of Lost Revenue
Days offaster recovery
Lost revenue per workday
X + X 500
Value of Lost Time by Workers
Days offaster recovery
Number ofworkers
X X $300/day
Cost of workerwages
2.52.5
Failover Automation
Detect site failures Raise alert when heartbeat
lost
Initiate failover User confirmation of outage Granular failover initiation
Manage replication failover Break replication Make replica visible to
recovery hosts
Execute recovery process Use pre-programmed plan Provide visibility into progress
Testing
Testing is key to success of any DR Plan no matter how big or small the environment
According to Symantec’s annual Global IT Disaster Recovery survey One in four DR tests fail. This marks an improvement, however, when compared to previous years. In 2007, 50% of DR tests failed In 2008, 30% of DR tests failed and 25% in 2009
Symantec continues to say, that only 15% report that they never experienced a failure in test.
Q&A
Any questions?
Contact Information
Ultan KinahanBusiness Continuity & Disaster Recovery