Post on 20-May-2018
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
1
Disaster Prevention and Recovery Architecture
A Presentation for Disaster Recovery Planning Professionals
Steven J. RossExecutive PrincipalRisk Masters, Inc.
New York, NY
RMI Risk Masters, Inc.
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
Many Disaster Recovery plans were written with the assumption that all that was required was an alternate facility in which to run production in an emergency. Businesses now have less tolerance for downtime and data loss and a combination of technology and economics has made prevention of downtime just as realistic a goal for many organizations as is recovery. Server virtualization, cloud computing, data center consolidation, SAN-based replication, virtual tape libraries, ultra-high bandwidth and distributed staffing models have made it possible to design enterprise data centers with continuous operations as a realistic goal. Putting all the pieces together calls for more than a planner; you need an architect. This session presents techniques for the development of an IT Disaster Prevention and Recovery Architecture that is driven by the business needs for continuity in data centers as well as the offices and factories that depend on them. It distinguishes HA from DR, showing how closely aligned the two concepts are. It will enable attendees to begin the lengthy road to true protection of the enterprise’s IT assets.
Abstract
RMI Risk Masters, Inc.
The Drivers for Disaster Recovery Architecture
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
• Most Disaster Recovery Plans (DRP) were developed to recover the operation of a single site– Alternate data center, often at a commercial recovery service– Recovery of the infrastructure, applications and data maintained
in that data center– Often recovery of data and software from backup tapes– Recovery time measured in hours to days
• Each data center served the business needs of either– Functions in geographic proximity– Widespread locations in the same business unit
• Rarely did DRPs address the needs of all the data centers taken as a whole, serving the entire enterprise
It Ain’t Your Daddy’s Disaster Recovery Plan
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
• As the decision to consolidate data centers is made and implemented, it becomes apparent that Disaster Recovery Planning needs to be re-thought– Applications and data spread across locations– Resulting concentration of key applications (e.g., ERP,
CRM)• Or costly duplication of application instances
– Inability to back up and restore immense amounts of data– Vastly increased risk from the failure of a single data center
• How many data centers is too many?• How few data centers is too few?
Consolidation Forces DRP Change
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
• Disaster Recovery infrastructure and practices• Critical enterprise-wide applications• The number of data centers• Backbone routing• Dependency on the Extended Enterprise
Key Indicators of the Need for a Disaster Recovery Architecture
• In order to recover the applications supporting the business, it is first necessary to recover the infrastructure– The physical data center– The network– System software– Storage
• Only then can applications and external connections be recovered
Disaster Recovery Infrastructure and Practices
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
• With an alternate data center, it is necessary to port infrastructure, network, applications and data– Usually less capacity
Disaster Recovery Infrastructure and Practices, continued
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
• This structure breaks down when applications and infrastructure are spread over multiple data centers– What goes to the alternate site
and what is retained locally?• How do shared applications
operate when one site is down?
Disaster Recovery Infrastructure and Practices, continued
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
Critical Enterprise-wide Applications
• In some cases, local data centers serve only local needs– In these cases, localized Disaster Recovery is sufficient
• Increasingly we see common applications serving broad-based geographies and user populations– Email (Exchange)– ERP (SAP, Oracle)– Customer Relationship Management (Seibel)– Procurement (Ariba)– Warehouse Management (often industry specialized)– Human Resources (if not in an ERP)
• These may cross data centers and cannot be recovered easily on a site basis
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
• You know you have too many data centers if…– Support and control staffs are duplicated at each site– Overall server utilization is less than 10%– Data is replicated from site to site– Total cost of ownership (TCO) of information
resources is growing rapidly• Software licenses• Operating systems• Real estate
– High cost to replicate security from site to site– General inefficiencies of scale
The Number of Data Centers
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
• In some cases, the intra-company backbone network runs through one site, usually the major acquirer's original data center– This makes the core data center a giant single point of
failure• Reducing the risk requires major network re-
architecture, including attention to diversity and redundancy (i.e., Disaster Recovery)– Network protocols (MPLS)– Carrier diversity or guarantees of route diversity– Bandwidth on demand
Backbone Routing
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
• In other cases, companies have a diverse backbone network
• This reduces the dependence on one site, but also creates a condition in which regional clusters add cost for network routing but does not accomplish savings by increasing efficiencies of scale
• The risk of a disaster at a concentrator site, while not as great as a star-shaped backbone, is still significant
Backbone Routing, continued
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
• The problems of recovering multiple data centers are exacerbated when one or more of those data centers are owned and operated by third parties– Joint ventures/coopetition– ASPs– Data sources
• The more critical these third party relationships are to the enterprise, the more difficult it is to manage recoverability on a site-by-site basis– Overlapping and contradictory contracts– Possible prohibitions on relocating services or connectivity
• Consideration must be given to a disruption caused by an outage at another company’s data center
Dependency on the Extended Enterprise
RMI Risk Masters, Inc.
Disaster Recovery and High Availability
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
• Disaster Recovery focuses on the restoration of services after physical destruction of facilities, equipment and/or data.– Policies, processes and plans– Accomplished remotely from the primary site– The timeframe for doing so is dictated by the needs of the business
• High Availability deals with ensuring a prearranged level of operational performance– Redundancy and sound operational methods – Applies to all level applications and infrastructure – Aimed at preventing downtime and ensuring data center availability
(i.e. in response to a component failure) – Both within and among data centers
• Highly available data centers are also recoverable
A Subtle Difference…But a Real One
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
Availability Tiers (Uptime Institute)Tier 1 Tier 2 Tier 3 Tier 4
Active Capacity Components to Support the IT Load
N N+1 N+1N after any failure (or
2N)
Distribution Paths 1 1 1 active+1 alternate
2 simultane-ously active
Concurrently Maintainable No No Yes Yes
Fault Tolerance No No No Yes
Compartmentali-zation No No No Yes
Continuous Cooling Load Density Dependent
Load Density Dependent
Load Density Dependent Class A
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
• Disaster Recovery is driven by the needs of the business for information systems – Business Impact Analysis
• High Availability is driven by the degree of risk minimization that management is prepared to pay for– Risk Analysis
• If some downtime is acceptable, is high availability necessary?– Especially if there is adequate
recoverability– As usual, cost is a critical consideration
Different Drivers
RMI Risk Masters, Inc.
Disaster Recovery Architecture Methodology
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
Disaster Recovery Architecture Model
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
• The ideal– Recoverability and resilience are considerations in
consolidation, placement and construction decisions– All data centers are highly available in accord with
management’s perception of risk– A consolidated Disaster Recovery Plan is developed in
parallel with the consolidation of data centers, created by• Business Continuity Management• Disaster Recovery Planning• Technical Engineering• Network Engineering• Operations
Assessing the Current State of Resilience
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
• The reality– Recoverability and resilience are afterthoughts in
equipment acquisition, consolidation, placement and construction decisions
– A consolidated Disaster Recovery Plan is developed after the fact of the consolidation of data centers, created by
• Disaster Recovery Planning, with the aid of– Consultants– Auditors– Anyone who will listen and help out
– Both recoverability and availability are constrained by budgets, not needs
Assessing the Current State of Resilience, continued
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
• The needs of the business set the parameters for recoverability, resilience and availability
• Those needs are expressed in different ways that overlap and sometimes contradict one another
Business Parameters for Disaster Recovery
Risk
RTO
RPOIT
Strategy
Corpo-rateStrategy
Legal Require-ments
Reputa-tion
Profit/Cash Flow
• Corporate Strategy– Organic growth– Acquisition– Customer service– Cost containment– New products and service– Shareholder value
• Disaster Recovery– Maximum uptime– Minimum data loss– Service restoration based
on criticality– Flexibility– Balanced cost
Corporate Strategy and Disaster Recovery
While business needs drive disaster recovery requirements, it is also true that achievement of corporate strategic goals is supported by and dependent upon information systems availability
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
• Information Technology strategy– Error-free processing– Cost containment– Service orientation– Standardization (ITIL)
• Disaster Recovery– Maximum uptime– Minimum data loss– Service restoration based
on criticality– Flexibility– Balanced cost
I.T. Strategy and Disaster RecoverySimilarly, the strategic goals of the Information Technology function (which may include resilience and recoverability) are also dependent on Disaster Recovery
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
RMI Risk Masters, Inc.
Technology Trends Affecting Disaster Recovery
Technology TrendsIT OperationsIT Operations
• ITIL framework adoption• ITIL process improvement• CMM service improvement• Data center consolidation• Service-centricity• Asset management• Configuration change management
• Inventory validation with maintenance contracts• Server provisioning and coordination• CPU, channel, memory and OS resource
management• Virtualization• Remote operations• Cloud computing and recoverability• Network-based failover
ApplicationsApplications
• Standardization, restacking and rightsizing• Workload consolidation• Storage tiering• Hierarchical Storage Management• Active-Active applications• Service and data dependency mapping• Applications as a service• Legacy application dependencies
FacilitiesFacilities• Increased server density ratios• Green IT and sustainability requirements• Reduced raised floor area• Increased power and HVAC requirements• Voice and data network convergence• Active-active data centers• Thin storage provisioning on the desktop• Network based failover• Virtualized bare metal restore• Virtual tape libraries
InfrastructureInfrastructure
• Heightened platform performance• Infrastructure as a service• Commodity servers• Operating system maturation• Network security• Infrastructure as a service
• Storage consolidation• Storage as a service• Storage virtualization• Resilient network protocols• Cost of bandwidth
IT Resilience Technology
Drivers
IT Resilience Technology
Drivers
ApplicationsApplications
InfrastructureInfrastructure
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
• In the past decade, a number of trends have combined to make the data centers of previous times obsolete– Blade servers– Power cost– Business growth by acquisition– Voice and data network convergence– Virtualization
It Ain’t Your Daddy’s Data Center Anymore
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
• Blades are modular, stripped down computer systems arrayed on a common hardware backbone for power cooling, network interface, etc.
• Although each blade server draws less power and runs cooler than a rack-mounted PC, their dense concentration results in much higher demands for electricity and air conditioning– Since the power feed is to the entire blade
enclosure, has led to the use of additional UPS– Beyond the need for cooling, it is important to
have adequate air flow within a data center to avoid hot spots
• There is far greater need for underfloor wiring and overhead space to dissipate heat
Blade Servers
16-unit blade server with three UPS. Photo courtesy of
Wikipedia
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
• At the same time, the cost of electricity has skyrocketed, in part due to increased demand for power and HVAC
• This, more than any other factor, has been the force behind the drive towards Green IT
Power Cost
Source: US Energy Information Administration
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
• Mergers and acquisitions commonly result in the acquiring companies retention – at least for a while – of the other companies’ data centers– As a result, some very large organizations
have data centers spread nationally and globally
– While this diminishes the risk to the parent corporation of the failure of any one data center, the resulting costs are unsustainable
• Bandwidth• Labor
• The result has been a wave of data center consolidations, made possible in part by blade servers and virtualization
Business Growth by Acquisition
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
• The advent of voice over IP (VoIP) has increased the demand for bandwidth passing through the data center
• Previously, telephone service was relatively immune to power failures– The possibility of losing both the voice and data
networks simultaneously has raised the necessity for backup power in the data center
– Similarly, the need for multiple demarcs (entry points) for the network has been transformed to a need for multiple computing sites to provide assured service
• As a result, the proportion of network terminating equipment has increased relative to servers and storage
Voice and Data Network Convergence
Servers•Capacity optimization•Rapid server provisioning•Server portability•Reduced hardware expense•Improved disaster recovery
Storage•Increased utilization of the existing storage environment •Improved ROI of existing data storage assets•Reduction in downtime due to data management issues •Improvement of backup and recovery procedures•Improved “quality of service”offerings •Masking of data storage management complexity
Virtualization• There are many rationales for server and storage virtualization
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
• After abortive growth in the early part of the century, colocation firms are well established
• The economies of scale of shared MEP with dedicated computing and networking space alter the build vs. rent decision
• Colocation offers the possibility of geographic diversity without the need for large-scale dispersed staffing
The Growth of Colocation April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
• Many data centers today are in the same space that housed their legacy mainframe systems– Some even built before the advent of CMOS computers– Raised floor predominated– Provisioning was a major event, planned years in advance
The Data Center of the Past
Underfloor1 ½ ft.
12 ft.
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
• There is less requirement for raised floor due to miniaturization and virtualization
• Mechanical, electrical and plumbing equipment takes more space, both relatively and absolutely
• More room is needed for wiring, cooling and heat dissipation
Data Center of the Future (and Now)
Underfloor3 ft.
18 ft.
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
• There is a difference between recovering an application and recovering it to the same service level– Is the recovery site configuration equivalent to the primary one?– Is the network connectivity to the recovery site the same as to
the primary data center?• Are there a sufficient number of data center personnel to
recover all applications at each level of criticality within therequired timeframes?– Operators cannot work continuously for days at a time
• Will virtualized applications work the same way on a different configuration?– What will be the effect of a different degree of compression? – Will there be the same mix of applications on recovery site
servers as in the primary data center?
Consider Capacity and Performance
RMI Risk Masters, Inc.
Disaster Recovery Architecture Selections
Alternate LocationsDedicated DR Site
Shared DR Sites
Cloud Based Recovery
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
ServersHot or Warm Servers
Repurposed Servers
At Time of Disaster(ATOD)
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
• Copying– Tape– Virtual Tape
• Replication– Software– SAN
Storage
Networks – Inter-Data Center Point to Point(DR dedicated)
MPLS(shared with
users)
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
Networks – Local LoopsLocal
Central Office
DR site
Direct Connection
Local Central Office
DR site
MetropolitanArea Network
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
Networks – CarriersCarrier Diversity
Route Diversity
DR site
Central OfficeCarrier 2
Carrier Diversity
Central Office Carrier 1
Local Central Office
DR site
Route Diversity
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
• To achieve the objectives of a Disaster Prevention and Recovery Architecture, you need
Skill and experienceFortitude and determinationTechnology and business management supportLawyers, guns and moneyAll the above
Next Steps
Did I mention money?
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
Feedback?
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona
• An architect's most useful tools are an eraser at the drafting board, and a wrecking bar at the site. – Frank Lloyd Wright
Final Thoughts
RMI“Experience Matters”
Risk MastersTM
April 16‐18, 2012 • Talking Stick Resort • Scottsdale, Arizona