How to Reduce Disaster Recovery Expenses

18
clear concise consulti HOW TO REDUCE DISASTER RECOVERY EXPENSES Best Practices for Virtual Environments Chris M Evans Langton Blue Ltd Copyright © 2014 Langton Blue Ltd 1

Transcript of How to Reduce Disaster Recovery Expenses

clear concise consulting

Copyright © 2014 Langton Blue Ltd 1

HOW TO REDUCE DISASTER RECOVERY EXPENSES

Best Practices for Virtual EnvironmentsChris M Evans

Langton Blue Ltd

clear concise consulting

Copyright © 2014 Langton Blue Ltd 2

ABOUT OUR SPEAKERS

• Chris M Evans• 27 years experience in the IT industry across all platforms,

including IBM mainframe, Windows & Open Systems. • Co-founder and independent consultant at Langton Blue

Ltd, a specialist consulting company in the UK.• Blogger and part-time analyst at Architecting.IT• Twitter: @chrismevans, @architectingIT, @langtonblue• Web: www.architecting.it, www.langtonblue.com

clear concise consulting

Copyright © 2014 Langton Blue Ltd 3

WHAT YOU WILL LEARN

• The business need for BC/DR• Why BC/DR is different from hardware resiliency• Strategies for implementing BC/DR based on application and

business Service Level Objectives• Choosing between storage array, hypervisor and application-

based recovery solutions• Choosing between VM-level and LUN-level recovery solutions• Technical solutions for BC/DR with virtual servers, including

Microsoft Hyper-V, VMware vSphere and open source platforms• Vendor roundup - 3rd Party recovery solutions for virtual

environments

clear concise consulting

Copyright © 2014 Langton Blue Ltd 4

THE NEED FOR BC/DR

• BC - Business Continuity; DR – Disaster Recovery• Data and IT systems are an increasingly important (and in

many cases critical) part of many organisations’ business processes• Customer facing websites• Purchasing systems• ERP (manufacturing, marketing, sales, payments)• Email, VDI

• Businesses can afford little or no downtime or service outages• Recovery is as much about process/people as it is computer

systems

clear concise consulting

Copyright © 2014 Langton Blue Ltd 5

THE NEED FOR BC/DR - STATISTICS

• 30% of all businesses that have a major fire go out of business within a year. 70% fail within five years. (Home Office Computing Magazine)

• 31% of PC users have lost all of their files due to events beyond their control.

• 34% of companies fail to test their tape backups, and of those that do, 77% have found tape back-up failures.

• 60% of companies that lose their data will shut down within 6 months of the disaster.

• Every week 140,000 hard drives crash in the United States. (Mozy Online Backup)

• Companies that aren't able to resume operations within ten days (of a disaster hit) are not likely to survive. (Strategic Research Institute)

clear concise consulting

Copyright © 2014 Langton Blue Ltd 6

THE NEED FOR BC/DR - EXAMPLE

• Loss or damage to computer systems• Fire, power failure, flood, earthquake

• Inability to access facilities• Fire, flood or hazard (chemical,

radiation, gases)

• Criminal or Malicious Damage• Disgruntled employees, hackers

• System or Application Failure• Software bug, failed upgrades, data

corruption

clear concise consulting

Copyright © 2014 Langton Blue Ltd 7

HARDWARE RESILIENCY IS NOT BC/DR

• Hardware resiliency provides for simple localised hardware failure• Redundant power supplies, multi-pathed storage

connections, redundant network connections, RAID storage

• BC/DR provides systems and processes to continue business operations in the event of major disasters

• Developing a BC/DR strategy means creating processes for re-instating systems due to loss of equipment, facilities and staff

clear concise consulting

Copyright © 2014 Langton Blue Ltd 8

RECOVERY REQUIREMENTS – SERVICE LEVELS

Some more definitions• RPO – Recovery Point Objective – the time (in the past) that

systems should be recovered to. Could be 24/48 hours or 0 for critical banking systems

• RTO – Recovery Time Objective – the time taken to re-instate systems back to the RPO point. Could be as low as 0, but typically minutes or hours

• SLO – Service Level Objective – a target measure of the service to be delivered (e.g. 90% of systems restored within 4 hours)

• SLA – Service Level Agreement – a legal agreement, usually with penalties attached to an SLO (e.g. service credits or £10,000 fine for not restoring within the SLO)

clear concise consulting

Copyright © 2014 Langton Blue Ltd 9

BC/DR STRATEGIES

• System recovery costs vary by process• Typically, the closer to RTO=0/RPO=0, the more

expensive the solution• Technical Options – DR site• Array-based replication• Hypervisor-based replication• Application-based replication

• Technical Options – no DR site• “Traditional” disk/tape backups

clear concise consulting

Copyright © 2014 Langton Blue Ltd 10

RECOVERY STRATEGIES

RTO

RPO

Sync Replication

Async Replication

Async with snapshots

Log Shipping

Backup to Disk

Backup to Tape

Non-strategic/scalable solutions

clear concise consulting

Copyright © 2014 Langton Blue Ltd 11

ARRAY-BASED REPLICATION

Supports synch and asynchronous modes (RPO=0) Scalable – entire array or LUN-based replication Fast (low RTO) Agentless deployment

X Expensive licences (usually per TB of capacity)X Requires duplicate hardware from same vendorX Not application or hypervisor awareX Low granularity (LUN/volume based)X Complex/impossible to support cloud DR

clear concise consulting

Copyright © 2014 Langton Blue Ltd 12

HYPERVISOR-BASED REPLICATION

Supports asynchronous modes Scalable to many virtual machines Virtual machine & application aware Good granularity (the VM), only changed data

Can support cloud models, but not easy

X Licensing is requiredX Can require deployment of dedicated VMs to manage

replication data

clear concise consulting

Copyright © 2014 Langton Blue Ltd 13

APPLICATION-BASED REPLICATION

Supports asynchronous modes VM aware (works within the VM) Application aware (by definition)

Works well with cloud deployments

X Not scalable – requires configuration/deployment for each application

X Licensing is requiredX Potentially complex with many sub-applications & databases

clear concise consulting

Copyright © 2014 Langton Blue Ltd 14

LUN OR HYPERVISOR REPLICATION?

• Array-based replication moves data at the LUN/volume or file share level

• Entire LUN has to be “failed over” to remote site/equipment, all VMs on the LUN must go

• Hypervisor replication provides a more granular approach for replication, unless “one LUN per VM” is used (not scalable)

• LUN level replication can be efficient if primary supports features like deduplication (and dedupe data is not sent over the WAN)

• Synchronous replication has a direct impact on application latency, depending on how far apart sites are located

clear concise consulting

Copyright © 2014 Langton Blue Ltd 15

BEST PRACTICES – SAVING COSTS

• Build a “tiered” DR plan implementing multiple backup/recovery methods with different RTO/RPO

• Assign backup/recovery method based on application requirements/needs

• Understand requirements for infrastructure recovery before backup recovery (e.g. AD/LDAP, DNS etc)

• Prioritise application recovery, including dependencies

• Automate – where possible use automation technologies to handle the recovery process

clear concise consulting

Copyright © 2014 Langton Blue Ltd 16

VENDOR ROUNDUP - HYPERVISORS

• VMware• vSphere Site Recovery Manager – automated

management and recovery of virtual machines using either array-based or Hypervisor-based replication

• vSphere Replication – hypervisor based replication of virtual machines

• Microsoft• Hyper-V Replica – virtual machine replication, managed

either via the GUI or PowerShell, System Center for automation

• Storage Replica (due in next Windows release)

clear concise consulting

Copyright © 2014 Langton Blue Ltd 17

VENDOR ROUNDUP – 3RD PARTY

• VM Backup• Zerto – BC/DR for Enterprises• Veeam – Backup & Replication v7

• Data Replication• StarWind – Virtual SAN, Asynchronous Replication• Vision Solutions – Double-Take Availability 7.0

clear concise consulting

Copyright © 2014 Langton Blue Ltd 18

CHOOSING THE RIGHT PRODUCT

• RPO = 0, RTO = 0• VMware Fault Tolerance

• RPO = 0, RTO ≈ 0• Replication, e.g. StarWind, Array-based replication,

VMware Replication, Hyper-V replica

• RPO > 0, RTO ≈ 0• Snapshots, e.g. Veeam Backup & Replication

• RPO > 0, RTO > 0• Backup solutions, e.g. traditional platforms, Netbackup,

Backup Exec, TSM etc.