Customer Engagement Workshop IT Service Continuity Phoenix, Aston 6th May 2015 Paul Gant, Head of...
-
Upload
austin-barton -
Category
Documents
-
view
215 -
download
2
Transcript of Customer Engagement Workshop IT Service Continuity Phoenix, Aston 6th May 2015 Paul Gant, Head of...
Customer Engagement WorkshopIT Service Continuity
Phoenix, Aston 6th May 2015
Paul Gant, Head of BCM Assurance
David Davies, BCM Assurance Consultant
Agenda
• 11:00 Registration, refreshments and networking.
• 11:30 Why get fit, anyway?
• 11:50 Fictitious live incident.
• 12:10 Post incident review.
• 12:30 Steps to success.
• 12:50 Questions & answers.
• 13:00 Lunch, tours, event close.
• 13:30 BCM Assurance 1-2-1 sessions by appointment.
Why get fit, anyway?
Introducing BCM Assurance – your personal trainers
What if?
Real Recovery (Invocations) is like a Battle
YOUR ENEMIES
• (Lack of) time.
• You can’t recover what you haven’t backed up.
• You can’t upgrade recovery technology during an invocation.
YOUR FRIENDS
• Phoenix.
• Your preparation.
What does “Preparation” involve?
It’s not just about the technology!
But aren’t policies, analysis, plans and reports only there to satisfy to auditor?
Is there any rhyme or reason to them?
Priorities
Dependencies Plans Testing Mainten
ance
IT Service Continuity Management
Focuses on 5 things…
1. What’s needed first?
Sir, is it women and children first…
… or Active Directory and Exchange?
Priorities Dependencies Plans Testing Maintenance
2. What rests on what?
3
Dependencies Plans Testing MaintenancePriorities
3. Make a plan
Dependencies Plans Testing MaintenancePriorities
4. See if it works
Dependencies Plans Testing MaintenancePriorities
5. Keep it up-to-date
Priorities Dependencies Plans Testing Maintenance
What goes wrong? Issues reported in the media
DATACOM co-location datacentre flood, Melbourne Australia, March 2010
• Heavy rain broke a ceiling panel and poured water into the data centre.
• Water damaged SANs, servers and routers.
• All equipment impacted by 12 hour power outage.
Camera Corner / Connecting Point datacentre fire, Green Bay, Wisconsin, USA, 19th March 2008
• Fire alarms but no fire suppression.
• 75 hosted servers destroyed.
• “10 day outage” reported, with 98% of services resumed by 1st April.
Phoenix Standby Reasons
Phoenix Invocation Reasons
Phoenix Invocation Reasons
The reccurring dangers that we see
• IT recovery requirements haven’t been agreed with the business (through a BIA).
• IT recovery strategy isn’t joined up (i.e. a full end to end solution isn’t there).
• Strategy isn’t supported by plans and isn’t tested rigorously enough (resulting in inefficiencies and failures during actual recovery).
Fictitious Live Incident
(Why have a personal trainer to help you?)
Warehouse and second server room (ground floor)
Backup SAN and tapes
Offices andServer room 2nd (top) floor
MAIN GATE
VIS
ITO
R C
AR
P
AR
KIN
G
STAFF CAR PARKING
GARDENS GARDENS
CRITICAL SYSTEMS:Recovery Time Objective
24 hours
Recovery Point Objective 24 hours (disk to disk
daily)
NON CRITICAL SYSTEMS:Recovery Time Objective
5 days
Recovery Point Objective 1 day (local tape) and
7 day (offsite tape)
SIDE GATE(FOOTPATH)
Warehouse and second server room (ground floor)
Backup SAN and tapes
Offices andServer room 2nd (top) floor
100 mbps 100 mbps
1 gbps
CRITICAL SYSTEMS:Recovery Time Objective
24 hours
Recovery Point Objective 24 hours (disk to disk
daily)
NON CRITICAL SYSTEMS:Recovery Time Objective
5 days
Recovery Point Objective 1 day (local tape) and
7 day (offsite tape)
08:07Fire
Warehouse and second server room (ground floor)
Backup SAN and tapes
Offices andServer room 2nd (top) floor
MAIN GATE
VIS
ITO
R C
AR
P
AR
KIN
G
GARDENS GARDENS
CRITICAL SYSTEMS:Recovery Time Objective
24 hours
Recovery Point Objective 24 hours (disk to disk
daily)
NON CRITICAL SYSTEMS:Recovery Time Objective
5 days
Recovery Point Objective 1 day (local tape) and
7 day (offsite tape)
STAFF CAR PARKINGSIDE GATE(FOOTPATH)
12:15Servers onsite
Warehouse and second server room (ground floor)
Backup SAN and tapes
Offices andServer room 2nd (top) floor
MAIN GATE
VIS
ITO
R C
AR
P
AR
KIN
G
GARDENS GARDENS
CRITICAL SYSTEMS:Recovery Time Objective
24 hours
Recovery Point Objective 24 hours (disk to disk
daily)
NON CRITICAL SYSTEMS:Recovery Time Objective
5 days
Recovery Point Objective 1 day (local tape) and
7 day (offsite tape)
08:07Fire
STAFF CAR PARKINGSIDE GATE(FOOTPATH)
Warehouse and second server room (ground floor)
Backup SAN and tapes
Offices andServer room 2nd (top) floor
MAIN GATE
VIS
ITO
R C
AR
P
AR
KIN
G
GARDENS GARDENS
CRITICAL SYSTEMS:Recovery Time Objective
24 hours
Recovery Point Objective 24 hours (disk to disk
daily)
NON CRITICAL SYSTEMS:Recovery Time Objective
5 days
Recovery Point Objective 1 day (local tape) and
7 day (offsite tape)
12:15Servers onsite
08:07Fire
12:45Exec
Report
STAFF CAR PARKINGSIDE GATE(FOOTPATH)
Warehouse and second server room (ground floor)
Backup SAN and tapes
Offices andServer room 2nd (top) floor
MAIN GATE
VIS
ITO
R C
AR
P
AR
KIN
G
GARDENS GARDENS
12:15Servers onsite
08:07Fire
12:45Exec
Report
CRITICAL SYSTEMS:Recovery Time Objective
24 hours
Recovery Point Objective 24 hours (disk to disk
daily)
NON CRITICAL SYSTEMS:Recovery Time Objective
5 days
Recovery Point Objective 1 day (local tape) and
7 day (offsite tape)
13:15 Start
recovery
STAFF CAR PARKINGSIDE GATE(FOOTPATH)
12:15Servers onsite
08:07Fire
12:45Exec
Report
13:15 Start
recovery
12:15Servers onsite
08:07Fire
12:45Exec
Report
13:15 Start
recovery
09:30Server
recovered?
12:15Servers onsite
08:07Fire
12:45Exec
Report
13:15 Start
recovery
09:30Server
recovered?
11:45Recovery
stalled
Post Incident Review
(What are the consequences of being unfit?)
Post Incident Review
• What went well? (Where were they fit?)
• what went badly? (Where were they unfit?)
• What could the IT manager have done differently during the recovery?
• What could the IT manager have done differently before the recovery?
IT Service Continuity Issues
Have you experienced any of the issues raised?
• Difficulty in getting board engagement.
• No business requirements for IT recovery (i.e. not BIA).
• Single points of failure in key skills sets.
• Lack of recovery documentation (perhaps no spare time to write it?)
• Lack of formal testing and test reporting.
• Any other issues?
The Barriers and Results
• What’s stopping you / stopped you from making changes?
• What would happen if changes aren’t made and you invoke?
• What would happen if you do make the changes?
Steps to Success
(How to become IT service continuity fit.)
What if?
Steps to success
The Steps to Successful IT Service Continuity
1. Engagement and sponsorship at a strategic level.
2. Balance between the technology and ITSC management.
3. Do all of ITSC, and run it as a repeating programme.
1. Strategy: Talk the Language of the Business
I need to upgrade the NAS by 5 terabytes and research getting an
enhanced burstable pipe.Err… good for you.
1. Strategy: Talk the Language of the Business
I’m concerned that our IT recovery could be
inadequate until business requirements are confirmed
in a BIA.
At present, our business may struggle to recover
from an IT outage.
What? We need to do something about this.
1. Strategy: Engage with the Executive Team
Does the Executive Team know:
• What are the impacts if IT fails?
• What are the risks associated with IT failure?
• What is the RTO and RPO of services – and what these terms mean.
• What is the recovery and hand back process?
2. Balance Technology with ITSC Management
Priorities Dependencies Plans Testing Maintena
nce
3. Do all of the Programme Steps, and Repeat
BusinessImpact
Analysis
IT Service Continuity
Plan
IT Recovery Testing
Time
Trigger
PEAK
BC Readiness
Priorities Dependencies Plans Testing Maintenance
3. Do all of the Programme Steps, and Repeat
BusinessImpact
Analysis
Time
Trigger
PEAK
BC Readiness
Priorities Dependencies Plans Testing Maintenance
IT Service Continuity
Plan
IT Recovery Testing
What if?
The traps
Trap 1: The Scope Trap
I’ve tested Email and Filestore time and again.
I have complete confidence in their recovery.
Great, what about the other 48 IT services?
Trap 2: The Audit Trap
Quick, we need to dust off the plans to satisfy the
auditor.
Then we can forget about ITSC again.
He’ll never know… ha ha!
Trap 3: The Importance and Urgency Trap
We’ve got ten projects going live this quarter.
There’s no time to fully implement and test IT DR,
as it will affect “go live” dates.
Well I suppose we can sort it out later.
We don’t want to get in the way of business strategy.
Trap 4: The Gambler’s (or Optimist’s) Trap
It’ll never happen…
If it does, we’ll be all right provided it happens on a
Monday and I’ve remembered to take the backup tapes home with
me.
Good odds eh?
I’m not bothered, I plan to win the lottery and retire
this week.
Trap 5: The Hero Trap
We’ll all pull together and work extra hours to nail it.
Sleep’s for wimps.
Yeah, it’s nothing that a load of pizza and energy
drinks can’t solve!
Any Questions?
Thank you for participating.
Lunch is now ready.
Would you like a tour or a meeting?
Thank You