Making your Business Unstoppable Angela Osorio HPS Solution Manager Angela Osorio HPS Solution...

33
Making your Business Making your Business Unstoppable Unstoppable Angela Osorio HPS Solution Manager

Transcript of Making your Business Unstoppable Angela Osorio HPS Solution Manager Angela Osorio HPS Solution...

Making your Business Making your Business UnstoppableUnstoppable

Making your Business Making your Business UnstoppableUnstoppable

Angela Osorio

HPS Solution Manager

Angela Osorio

HPS Solution Manager

Today’s business is about information availability

‘‘00s00s‘‘90s90s‘‘80s80s

DecisionDecision OptionalOptional MandatoryMandatory

RecoveryRecoveryExpectationExpectation

HardwareHardware

Days/HoursDays/Hours

Hardware, DataHardware, Data

Minutes/SecondsMinutes/Seconds

Hardware, Data,Hardware, Data,Applications Applications

Minutes/SecondsMinutes/Seconds

Magnified byMagnified by DisasterDisaster Absence ofAbsence of“Bricks & Mortar”“Bricks & Mortar”

Dependence onDependence onComputersComputers

Driven byDriven by RegulationRegulation e-Commercee-Commerce CompetitionCompetition

RequirementsRequirements Restore, RecoverRestore, Recover High AvailabilityHigh Availability 24 x 7, Scalable24 x 7, Scalable

Business FocusBusiness Focus TraditionalTraditional Dot.comDot.com e-Businesse-Business

Evolution of Business ContinuityEvolution of Business ContinuityEvolution of Business ContinuityEvolution of Business Continuity

Changing Concept of Business Changing Concept of Business ContinuityContinuityChanging Concept of Business Changing Concept of Business ContinuityContinuity

AvailabilityAvailability AccessibilityAccessibility QualityQuality

Drivers of Data and Information FlowDrivers of Data and Information Flow Drivers of Data and Information FlowDrivers of Data and Information FlowYesterdayYesterday TodayToday

ASPASP

MFGMFG

DISTDIST ISPISP

SSPSSP

SCSCCreditCredit

CompanyCompany

CustomerCustomer

CompanyCompany

CustomerCustomer

Risks to information availability

Component

AdministrativeIntervention

Building LevelIncident

Metropolitan Area Event

Regional Event

The Failure Event Spectrum

Global Event

Source: Gartner GroupSource: Gartner Group

Causes ofCauses of DOWNTIMEDOWNTIMECauses ofCauses of DOWNTIMEDOWNTIME

11 Planned maintenance

Application failure

Operator error

Operating system failure

Hardware failure

Power outage

Natural disaster

Planned maintenance

Application failure

Operator error

Operating system failure

Hardware failure

Power outage

Natural disaster

2233

4455

6677

Source: Contingency Planning Research, 2000

Financial cost of downtime is relative to Financial cost of downtime is relative to who feels the painwho feels the painFinancial cost of downtime is relative to Financial cost of downtime is relative to who feels the painwho feels the pain

Industry

Financial

Financial

Media

Retail

Retail

Transportation

Entertainment

Shipping

Financial

Application

Brokerage operations

Credit card sales

Pay-per-view

Home shopping (TV)

Catalog sales

Airline reservations

Tele-ticket sales

Package shipping

ATM fees

Average cost per hour of downtime (US$)

$ 7,840,000

$ 3,160,000

$ 183,000

$ 137,000

$ 109,000

$ 108,000

$ 83,000

$ 34,000

$ 18,000

Disasters are defined by youDisasters are defined by youDisasters are defined by youDisasters are defined by you

Which systems are critical to your business?– Those which are customer facing are usually more

important What happens if data becomes unavailable?

– Is it merely inconvenient or aggravating?– Is it life or death?

One person’s inconvenience may be another’s disaster

More disastrous resultsMore disastrous resultsMore disastrous resultsMore disastrous results

Loss of customer service satisfaction

Cost and time of rebuilding lost data

Possible fines and penalties imposed by regulatory agencies

Idle time of employees

Fines and penalties imposed for not meeting contracted delivery times or SLAs

Movement of your customers to your competitor

High Availability and Disaster ToleranceHigh Availability and Disaster ToleranceHigh Availability and Disaster ToleranceHigh Availability and Disaster Tolerance

Disaster Tolerance tends to be:

– Data-centric

– Data integrity-focused

– Geographical

– Recovery point focused

– Longer time horizon

High Availability tends to be:

– Transaction-centric

– Transaction integrity-focused

– Local

– Recovery time focused

– Very short time horizon

Protect your business…Protect your business…Protect your informationProtect your informationThe stakes are high!The stakes are high!

Protect your business…Protect your business…Protect your informationProtect your informationThe stakes are high!The stakes are high!

“Nearly half the companies that lose their data through disaster, never re-open, and 90% are out of business within two years.”

Source: University of Texas Center for Research on Information Systems

Site goes downSite goes down

Shares down 30 pts.

$4B in stock value lost

What types of problems does/will your plan anticipate?

Network failure

Hardware component failure

Natural disasters

Operating system fault/failure

Software viruses

Application failure

Malicious physical and computer security breaches (external)

Malicious physical and computer security breaches (internal)

Acts of man (war, terrorism, etc.)

Service provider failure

Accidental employee-initiated outages

Attack on company Web site

86.9%

84.8

84.4

77.6

75.5

70.9

68.4

59.1

57.8

56.1

55.3

53.6

89.5

78.9

77.9

77.9

83.2

71.6

67.4

56.8

56.8

60

47.4

56.8

87.6

89.3

90.9

76

69.4

69.4

68.6

59.5

60.3

53.7

61.2

52.9

Under $20M in Revenue Over $20M in Revenue

CIO Insight study on Disaster Recovery – November 2001

Anticipated problems driving need for High Anticipated problems driving need for High Availability and Disaster ToleranceAvailability and Disaster Tolerance

Events that actually forced companies to Events that actually forced companies to declare a disasterdeclare a disasterEvents that actually forced companies to Events that actually forced companies to declare a disasterdeclare a disasterPower OutageHardware FailureFireFloodEarthquakeHurricaneSoftware ErrorBombingSnow/Wind StormNetwork FailureContaminationBurst PipeForced EvacuationHVAC FailureDelayed RelocationRiotDR Testing went wrong

Source: Disaster Recovery Journal

High Availability & Disaster Tolerance High Availability & Disaster Tolerance It’s about data and keeping it availableIt’s about data and keeping it availableWhat Is Your Specific Situation?What Is Your Specific Situation?

Questions to ask yourself– What is your business?– What is your application?– What is your environment (flood zone, earthquake)? – What risks are you willing to take?– What’s happened in the past?– What if your critical systems were lost?

High Availability & Disaster Tolerance High Availability & Disaster Tolerance It’s about data and keeping it availableIt’s about data and keeping it availableEvaluating RPO and RTOEvaluating RPO and RTO

Recovery point objective– How fresh is your data?

Not all data needs to be recovered to the same point

The quicker your required recovery time and the more thorough and accurate your recovery point, the more

robust a solution is required

Recovery time objective– How soon after an event do you need to be running?

Not all applications need to come up at the same time

Rules Of ThumbRules Of ThumbRules Of ThumbRules Of Thumb

More Forgiving Less Forgiving

Environment

Disaster Tolerance Methodology

Backup and drivetape across town

Campus-Wide Clusters

Emergency 911

Telecommunications

Defense

Financial Transactions

HealthcareeCommerce

Accounting

Data Warehousing

Payroll

Discrete Mfg

Tech Pubs

High Availability & Disaster Tolerant High Availability & Disaster Tolerant responses are a balance of three aspectsresponses are a balance of three aspectsHigh Availability & Disaster Tolerant High Availability & Disaster Tolerant responses are a balance of three aspectsresponses are a balance of three aspects

TechnologyServicesProcedures and discipline

Find the balance of three aspectsFind the balance of three aspects Find the balance of three aspectsFind the balance of three aspects

TechnologyTechnologyServicesServices

Procedures & Procedures & DisciplineDiscipline

20

Techniques to eliminate system downtimeTechniques to eliminate system downtimeTechniques to eliminate system downtimeTechniques to eliminate system downtime

Data protection Remote log shipping Data Replication

Manager Campus-Wide Clusters Reliable Transaction

Router

Technology Services Procedures & Discipline InsuranceInsurance Assets RecoveryAssets Recovery Cold-site, Mobile recoveryCold-site, Mobile recovery Stand-Alone systemsStand-Alone systems Business Protection ServiceBusiness Protection Service Distributed & Networked Distributed & Networked

systemssystems Disaster recovery hot-siteDisaster recovery hot-site Redundancy, Hot Swap Redundancy, Hot Swap

components, RAIDcomponents, RAID Availability clustersAvailability clusters Data mirroring, SMARTData mirroring, SMART Dual host/redundancyDual host/redundancy Shared Data clustersShared Data clusters FDDI, ATM switchingFDDI, ATM switching

Plan Question Exercise Document procedures Eliminate single points

of failure Rolling Upgrades Provide shared, direct

access to storage Minimize

environmental risks Practice!

Se

rvice

sCustom Systems

RemoteLog Shipping

Data Protection

DataReplication

Manager

Reliable Transaction

Router

Campus WideClusters

Time to recover

COST

LOSS

Maximum costof plan

Acceptabledowntime

Mon

eyNominal Justifiable Cost of PlanNominal Justifiable Cost of PlanNominal Justifiable Cost of PlanNominal Justifiable Cost of Plan

Does cost of recovery exceed the losses?

Plan IV

Loss reduction (savings)

Plan III

Cos

t Plan II

Maximum costof plan

Acceptabledowntime

Evaluate AlternativesEvaluate AlternativesEvaluate AlternativesEvaluate Alternatives

Does your plan make financial sense?

Plan I

Dependency on TechnologyDependency on TechnologyDependency on TechnologyDependency on Technology

Risk LevelRisk LevelRisk LevelRisk Level

E-business…E-business…putting all of your “eggs-in-a-basket”putting all of your “eggs-in-a-basket”E-business…E-business…putting all of your “eggs-in-a-basket”putting all of your “eggs-in-a-basket”

Tools to Make Your Tools to Make Your Business UnstoppableBusiness Unstoppable

Tools to Make Your Tools to Make Your Business UnstoppableBusiness Unstoppable

High Availability & Disaster Tolerance High Availability & Disaster Tolerance It’s about data and keeping it availableIt’s about data and keeping it availableEvaluating RPO and RTOEvaluating RPO and RTO

Recovery point objective– How fresh is your data?

Not all data needs to be recovered to the same point

The quicker your required recovery time and the more thorough and accurate your recovery point, the more

robust a solution is required

Recovery time objective– How soon after an event do you need to be running?

Not all applications need to come up at the same time

Rules Of ThumbRules Of ThumbRules Of ThumbRules Of Thumb

More Forgiving Less Forgiving

Environment

Disaster Tolerance Methodology

Backup and drivetape across town

Campus-Wide Clusters

Emergency 911

Telecommunications

Defense

Financial Transactions

HealthcareeCommerce

Accounting

Data Warehousing

Payroll

Discrete Mfg

Tech Pubs

High Availability & Disaster Tolerant High Availability & Disaster Tolerant responses are a balance of three aspectsresponses are a balance of three aspectsHigh Availability & Disaster Tolerant High Availability & Disaster Tolerant responses are a balance of three aspectsresponses are a balance of three aspects

TechnologyServicesProcedures and discipline

Find the balance of three aspectsFind the balance of three aspects Find the balance of three aspectsFind the balance of three aspects

TechnologyTechnologyServicesServices

Procedures & Procedures & DisciplineDiscipline

29

Techniques to eliminate system downtimeTechniques to eliminate system downtimeTechniques to eliminate system downtimeTechniques to eliminate system downtime

Data protection Remote log shipping Data Replication

Manager Campus-Wide Clusters Reliable Transaction

Router

Technology Services Procedures & Discipline InsuranceInsurance Assets RecoveryAssets Recovery Cold-site, Mobile recoveryCold-site, Mobile recovery Stand-Alone systemsStand-Alone systems Business Protection ServiceBusiness Protection Service Distributed & Networked Distributed & Networked

systemssystems Disaster recovery hot-siteDisaster recovery hot-site Redundancy, Hot Swap Redundancy, Hot Swap

components, RAIDcomponents, RAID Availability clustersAvailability clusters Data mirroring, SMARTData mirroring, SMART Dual host/redundancyDual host/redundancy Shared Data clustersShared Data clusters FDDI, ATM switchingFDDI, ATM switching

Plan Question Exercise Document procedures Eliminate single points

of failure Rolling Upgrades Provide shared, direct

access to storage Minimize

environmental risks Practice!

Se

rvice

sCustom Systems

RemoteLog Shipping

Data Protection

DataReplication

Manager

Reliable Transaction

Router

Campus WideClusters

Preventing a DisasterPreventing a DisasterPreventing a DisasterPreventing a Disaster

You Need:– copy of applications– copy of application data

current: no, or predictable degree of, data loss consistent: write ordering across related replicas

– systems to restart and run applications– reestablished client communications

Spectrum of recovery techniques– trade off cost, recovery time, data currency

Making online healthy and

beautiful

Making online healthy and

beautiful“High availability is as High availability is as

important to eCommerce as important to eCommerce as breathing is to humans.breathing is to humans.Our Compaq servers stayOur Compaq servers stayhighly available to customers, highly available to customers, giving us an advantage for giving us an advantage for eCommerce.eCommerce.

Kal RamanKal RamanChief Information OfficerChief Information OfficerDrugstore.com, Inc.Drugstore.com, Inc.

AVAILABILITY…AVAILABILITY…open all night longopen all night longAVAILABILITY…AVAILABILITY…open all night longopen all night long

“At the Vatican... security was At the Vatican... security was our first criterion in choosing a our first criterion in choosing a partner; our second critical factor partner; our second critical factor was availability; another waswas availability; another washigh performance.high performance.

Stefano PasquiniStefano PasquiniIT PlannerIT PlannerInternet Office of the Holy SeeInternet Office of the Holy See

”God knows what else you need…

Professional Services

God knows what else you need…

Professional Services

SECURITY… SECURITY… solving a devilish problemsolving a devilish problemSECURITY… SECURITY… solving a devilish problemsolving a devilish problem

Business Continuity MethodologiesBusiness Continuity MethodologiesBusiness Continuity MethodologiesBusiness Continuity Methodologies

Asynchronous Synchronous

Application

Technology

Simple Backup &Remote Storage Site

Campus-Wide Clusters

Remote Log ShippingSANworks Data

Replication Manager

Emergency 911

Telecommunications

Defense

Financial Transactions

HealthcareeCommerce

Accounting

Data Warehousing

Payroll

Discrete Mfg

Tech Pubs

Data ProtectionTechnologies

Reliable Transaction Router