Strengthening Business Continuity Through Strategic ... · Frequent reviews of the recovery...

17
Copyright © 2015 CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved. 1 Strengthening Business Continuity Through Strategic Partnership Cynthia L. Jenkins, Lead BCM Analyst, CSG International Copyright © 2015 CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved. 2 Who is CSG International? First processed Cable TV statements in 1983 as part of FDR Became independently owned in 1994 First publicly traded on NASDAQ in 1996 (CSGS) Now the 2 nd largest cable billing vendor in the world Producing 65M pieces of mail each month One of the top 10 US mailers Over 3,600 employees in 24 countries Corporate Headquarters in Denver, CO Largest office in Omaha, NE

Transcript of Strengthening Business Continuity Through Strategic ... · Frequent reviews of the recovery...

Page 1: Strengthening Business Continuity Through Strategic ... · Frequent reviews of the recovery documents ‐not a one and done! Foster a close working relationship with your Service

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.1

Strengthening Business ContinuityThrough Strategic Partnership

Cynthia L. Jenkins,

Lead BCM Analyst, CSG International

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.2

Who is CSG International?

First processed Cable TV statements in 1983 as part of FDR

Became independently owned in 1994

First publicly traded on NASDAQ in 1996 (CSGS)

Now the 2nd largest cable billing vendor in the world• Producing 65M pieces of mail each month

• One of the top 10 US mailers 

• Over 3,600 employees in 24 countries

• Corporate Headquarters in Denver, CO  

• Largest office in Omaha, NE

Page 2: Strengthening Business Continuity Through Strategic ... · Frequent reviews of the recovery documents ‐not a one and done! Foster a close working relationship with your Service

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.3

What Do CSG Products Do?

Manage Cable, Internet, and Telephone services 

Support 90,000 Customer Service Representatives world‐wide

Compose, print, and mail statements monthly

Support on‐line bill pay

Support scheduling and routing of work orders 

Mediate 8 Trillion Call Detail records annually

Enable detailed data analysis and data mining

Smart PhoneIVR/SMS

Call Center TechnicianKiosk Direct Mail & Statement

Web / E‐Mail

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.4

Some of CSG’s 500 + Clients

APACAPACEMEAEMEA AMERICASAMERICAS

Page 3: Strengthening Business Continuity Through Strategic ... · Frequent reviews of the recovery documents ‐not a one and done! Foster a close working relationship with your Service

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.5

One facility serves one purpose• Data Centers

• Output Solution Centers

• Office Facilities

Each facility has a unique BC/DR plan 

BC/DR plans are written/maintained by CSG resources familiar with the facility, hardware, product, or service

CSG BCM Department • Organize large annual exercises

• Advise on and assist in determining BC/DR requirements for new products and services

• Track BC plan updates

• Provide liaisons for all the BC/DR partners/vendors

Corporate BC/DR Strategy

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.6

Background for Presentation

CSG North American Service Bureau• Based in the Omaha Data Center

• Contractual commitments to clients to return to operation within three distinct Recovery Time Objectives (RTO)

• The Minimum Acceptable Recovery Configuration (MARC) level means systems are up and ready for client use at the end of the RTO

• CSG Solutions are classified by MARC level which are:

- MARC I RTO – 48 Hours 

- MARC II RTO – 3 – 7 Days 

- MARC III RTO – 8 – 31 Days 

• The recovery time starts when the disaster is declared by the CSG Emergency Management Team

Page 4: Strengthening Business Continuity Through Strategic ... · Frequent reviews of the recovery documents ‐not a one and done! Foster a close working relationship with your Service

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.7

Launched in 2001 ‐ Open Systems recovery support only

Mainframe support added in 2010 

For first 10 years, CSG was responsible for Open Systems recoveries

Joined Sungard Managed Recovery Program (MRP) in 2012

CSG now views Sungard engineers as extensions of CSG resources at time of disaster

CSG – Sungard AS Partnership

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.8

Reasons for Adopting MRP

CSG Dual Recovery Strategy• North America Service Bureau Recovery – Sungard

• Internal CSG Products and Corporate Support – Tempe, AZ

Growing number of OS images to recover at Sungard• 6 different OS’s to recover

• Over 400 OS instances in all

Augment CSG Staff• CSG Staff responsible for both locations

• Additional trained staff needed to make RTOs

Page 5: Strengthening Business Continuity Through Strategic ... · Frequent reviews of the recovery documents ‐not a one and done! Foster a close working relationship with your Service

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.9

CSG Production Environment

Tightly Integrated Product Set• Most products dependent on Middleware and Mainframe to function

• Product dependencies change over time

Highly Configurable Environment – experiencing frequent change• Experiences an average of 30 – 40 changes per day

Large data recovery • Open Systems – 130 TB

• Mainframe 54 TB

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.10

Production Environment

Billing Engine

Data Warehouse

Video Interfaces

APIs

Job Routing and Scheduling

EBPP Solution 

APIs to External Transactions

AdvancedProduct Catalog

External Video 

Tran

sactions

Interfaces to other CSG products

Mainfram

e

Statement FilesFor Composition 

Interface to Mail Tracking

Statement Images

Usage Processing 

External Usage Files

ProvisionableServices

Page 6: Strengthening Business Continuity Through Strategic ... · Frequent reviews of the recovery documents ‐not a one and done! Foster a close working relationship with your Service

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.11

In Addition….

CSG invested in permanent infrastructure in the Sungard data center which includes:• NIM servers

• Kickstart servers

• Jumpstart servers

• NAS Storage 

• SAN Storage

• VMware hosts and VCenter host

• Network support equipment

• Symantec replication appliances (Open Systems replicated data)

• EMC DLm (Mainframe replicated data)

• Communication circuits between Sungard and Omaha Data Center

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.12

Getting Started ‐ Knowledge Transfer

Initial MRP Workshop (March, 2012)• Tightly integrated products

• Standard build process for all OS’s 

• Permanent infrastructure at Sungard

Spent over 2 days in discussions• At the end, Sungard Recovery Solution Architects had 

everything to produce the first draft of recovery documentation

Page 7: Strengthening Business Continuity Through Strategic ... · Frequent reviews of the recovery documents ‐not a one and done! Foster a close working relationship with your Service

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.13

First Exercise – August, 2012

Huge Learning Experience for Both Sides• CSG

- Recovery instructions needed to be more explicit

- Needed to “let go” so Sungard engineers work

- Better collaboration tools needed

- Change management policy needed during the exercise

• Sungard

- More recovery document review needed

- Another Sun OS workshop needed

- Different exercise management structure was needed

- Agreement on better collaboration tools and change management

• Put action plans in place to adjust and correct each situation

• Tracked progress during weekly status meetings

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.14

Second Exercise – January, 2013

180 Degree Change!

Augmented staffing showed better results.

Sungard made changes in exercise management so engineers built servers and did not manage shifts.

CSG and Sungard jointly developed a Google Docs spreadsheet to convey build progress 

Change management policy worked very well

The second Sun workshop paid great benefits• Held in CSG Denver office with Sun and Windows/Vmware SMEs

• End result was more detailed recovery documentation

• More emphasis on build verification checklist

- Last step Sungard performs

- First step CSG performs

Page 8: Strengthening Business Continuity Through Strategic ... · Frequent reviews of the recovery documents ‐not a one and done! Foster a close working relationship with your Service

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.15

Collaborative Google Document

Users update this document and others can see the updates real‐time.  This enables CSG System Administrators to see system build progress without interfering with Sungard engineers’ work.

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.16

Exercise Change Management Policy

Sungard engineer identifies issue with recovery procedure

Shift lead is notified.  Issue validated with Sungard SME before escalating.

Shift lead engages CSG SME on conference bridge.  Sungard Test Manager and CSG Onsite Manager are also notified.

Issue is either clarified or a change is agreed on between Sungard recovery team and CSG SME.

Request for change is brought to the Sungard Test Manager and CSG Onsite Manager for Approval.

If approved, the change is documented in Sungard Observations & Recommendations and CSG Issue Tracking and noted as “single use” or “re‐usable”.

Sungard engineer is given approval to make change.

Page 9: Strengthening Business Continuity Through Strategic ... · Frequent reviews of the recovery documents ‐not a one and done! Foster a close working relationship with your Service

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.17

Third Exercise – August, 2013

Showed some significant challenges with the Sun builds

Joint decision to run a Proof of Concept exercise focusing on only Sun servers

The POC was held in January, 2014• Selected a representative set of Sun servers

• Brought a CSG Sun SME on‐site to work with Sungard engineers

• Experience was absolutely invaluable

- Fostered an understanding of the Sungard working environment

- CSG SME could explain build steps first hand 

- Sungard engineers found areas to improve in the documentation 

- CSG SME found processes on the CSG side needing adjustment

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.18

August, 2014

Largest exercise ever attempted by CSG• Close to 400 OS instances were built

• Over 200 OS instances were fully restored and integrated

• Finished 8 hours before the end of the exercise time.

• Extremely successful exercise.

Lessons Learned• Entire recovery timeline needed adjustment

• More robust tools needed to manage exercise (or recovery)

• First time using CSG off‐shore employees  ‐ special access procedures are needed for them

• Investigate permanent infrastructure upgrades at Sungard

• Large recovery made possible with the help of Sungard MRP resources

Page 10: Strengthening Business Continuity Through Strategic ... · Frequent reviews of the recovery documents ‐not a one and done! Foster a close working relationship with your Service

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.19

Continual Change Management 

Change Management (outside of exercises)• Omaha environment is very active – 30 – 40 changes daily

• As August exercise gets closer this becomes a very important topic

- Best practice – Changes after lockdown date

› Product already in production – leave it out

› New product not in production – include it

- Track Changes in Production Environment

- Keep the Sungard equipment reservations up to date!  

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.20

CMDB Interface

CMDB Interface • The size and complexity of the Omaha environment mandates 

interfacing the CSG and Sungard CMDBs

• CSG hired a college intern to develop queries extracting data from the CSG CMDB and format it for use by Sungard

- 5 operating systems with an average of 7 queries each

- There are also disparate CMDB tools

› Sungard uses HP

› CSG uses Bladelogic

• CSG has just automated this process leaving files on an FTP server for Sungard to retrieve

Page 11: Strengthening Business Continuity Through Strategic ... · Frequent reviews of the recovery documents ‐not a one and done! Foster a close working relationship with your Service

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.21

Best Lessons Learned from MRP Experience

Frequent reviews of the recovery documents ‐ not a one and done!

Foster a close working relationship with your Service Delivery Manager (SDM) and other members of your Sungard Account Team• Meet weekly to plan 2 disparate annual exercises

• Track changes in environments (Omaha, Atlanta, and Sungard Infrastructure)

Bringing CSG SME on‐site for Proof of Concept was incredibly valuable

Use the smaller April exercise to test items we want to improve on for August

Tracking change management during exercise is critical.  • Develop a policy and make sure all engineers (CSG and Sungard) understand it

Track all issues before and during the exercise• Include things that worked well to insure they are not forgotten

Solicit feedback from all teams and compile a Lessons Learned document immediately after the exercise

Exchange Lessons Learned with Sungard after each exercise

Keep the equipment reservations up to date

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.22

Benefits To Date

Well documented recovery procedures at Sungard for both CSG Data Centers

Fully trained Sungard engineers ready to take on server builds for either CSG Data Center ATOD

ATOD workload balance between CSG System Administrators and Sungard Engineers

Good understanding of change management between and during exercises

Better understanding of ATOD recovery timeline

Page 12: Strengthening Business Continuity Through Strategic ... · Frequent reviews of the recovery documents ‐not a one and done! Foster a close working relationship with your Service

||

Recover My Environment

Manage My Recovery

Protect My Data

Managed Recovery Program

Customer Managed Recovery

Sungard AS Recovery-As-A-Service

Reduce Costs

& Risks

Network, WG, Other

Physical Infrastructure

Reduce Costs & Risks

Reduce Costs & Risks

Reduce Costs & Risks

Improve Data

Protection

Improve RTO

Reduce Tape

UsageMove data from customer’s location to a recovery center

“Right sizing” solutions with a Tiered Availability by SLA approach

23© 2014 Sungard Availability Services, all rights reserved

Virtual Infrastructure

||24

© 2014 Sungard Availability Services, all rights reserved

IT DR Layers Common Challenges

DR Program

System Recovery

Data Protection Not meeting RPOsBackup windows too longPoor backup success rates

Staffing not focused on DRPoor run booksLack of testing

Not meeting RTOsComplex interdependenciesCAPEX constraints

What is driving the need for specialized Recovery Services?

Page 13: Strengthening Business Continuity Through Strategic ... · Frequent reviews of the recovery documents ‐not a one and done! Foster a close working relationship with your Service

MRP Approach 

Discover Production

Assess & Design Recovery Strategy 

Recovery Implementation 

& Execution

Recovery Lifecycle Management

» Infrastructure & Application Discovery» Populate CMDB in Sungard Systems» Baseline Scope for Recovery » Understand Change Management Process

» Analyze Discovered Information & Apply Recovery Best Practices

» Design Recovery Solution Architecture

» Implement Recovery Solution (e.g., server / storage replication; setup infrastructure ATOT)

» Test Execution » Test Management & Reporting

» Analyze Production Changes for Impact on Recovery

» Update Recovery Design, Plans & Procedures» Ongoing Recovery Optimization

Recommendations

Define Recovery Plans & Procedures

» Define Core Recovery Configuration (e.g. ,DNS, AD)

» Define Application Recovery Configuration » Define Application Recovery Plans &

Procedures

Dis

cove

r As

sess

Impl

emen

t &

Test

Def

ine

RLC

M

Discover

Manage

Run

Design

|

MRP Benefits

26© 2014 Sungard Availability Services, all rights reserved

Program Kept in Constant State of Readiness

Refined & clearly documented run books/procedures

Recovery environments kept in sync with production by integrating DR readiness into daily change control

Enablement of tiered recoverability

Measureable results & continuous improvement

Implementation of DR best practices & automation tools

24/7/365 state of readiness

SLA-backed, IT availability solution

Focused, Experienced & Available Staff

DR is sole focus & core competency

Expert global staffing model executes at time of test/at time of disaster

Subject matter expertise across all recovery disciplines, system platforms & backup technologies

Over 35 years & 3,300 disasters supported & executed successfully

Optimal Spend on Risk Mitigation

Protect investments made in production IT (both technology & people)

Production staff focused on revenue generating activities

OPEX alternative to CAPEX investments

Define best alternatives for tiered recoverability

Ensure the appropriate levels of spending on risk mitigation

Better Managed Complexity & Reduced Risk

Factual baseline of your production environment

Identification of critical business processes, application interdependencies & underlying infrastructure

Validation of data protection & recoverability state

Repurpose IT staff to focus on event mitigation & data synchronization

Ensure application-level recovery at agreed upon service levels

Page 14: Strengthening Business Continuity Through Strategic ... · Frequent reviews of the recovery documents ‐not a one and done! Foster a close working relationship with your Service

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.27

In Summary…

To get the most out of MRP, communicate often with your Sungard Account Team and 

stay in touch with changes in your environment.

Thank you!

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.28

Cynthia L. Jenkins

Lead BCM Analyst

[email protected]

1‐402‐431‐7401

www.csgi.com

Contact Information

Page 15: Strengthening Business Continuity Through Strategic ... · Frequent reviews of the recovery documents ‐not a one and done! Foster a close working relationship with your Service

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.29

Additional Exercise Planning Information

This information is used during the planning and execution of CSG’s BC/DR Exercises

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.30

Tools for Planning an Exercise

One Place to Find All Exercise Related Information• Unique SharePoint site for each exercise

• All exercise information is found there

Weekly In‐house Status Meetings • Start the meeting by stating how many days until the exercise

• Keeps resources focused on their tasks

• Everyone informed of progress and issues

• Keeps teams aware of exercise timeline

• Follow‐up with teams not represented

• Sungard SDM and RSA are present

• Other Sungard resources invited as exercise gets closer

Page 16: Strengthening Business Continuity Through Strategic ... · Frequent reviews of the recovery documents ‐not a one and done! Foster a close working relationship with your Service

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.31

Tools for Developing Recovery Timeline

Tools Continued• Detailed Project Plan

- Created using Microsoft Project

- Details the execution from exercise beginning to end (72 hours)

- Shows dependencies between servers and products

- Shows priority of server builds and restores

• Status Tracker

- Excel Spreadsheet using output of detailed project plan for planned start and completion time of the phases of recovery

- Lists every server in exercise and tracks completion time for each phase

- Updated by exercise monitors

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.32

Tools for Managing an Exercise 

Tools for Managing the Exercises• Collaboration – Screen Sharing and Conference Bridges

- Screen sharing and conferencing tools defined before exercise

- Publish the phone numbers and URLs in multiple places

- Main Conference Bridge – open for the entire exercise

- Secondary Bridges – used to work on and resolve specific issues

• Send out meeting invitations with pertinent information for entire exercise to all known participants

- Include the main bridge number

- Send secondary bridge information invitations to monitoring staff

• Instant Messaging

- Used for one off conversations

- Keeps conference bridge chatter down

- Use small group IM sessions for targeted conversations on specific issues

Page 17: Strengthening Business Continuity Through Strategic ... · Frequent reviews of the recovery documents ‐not a one and done! Foster a close working relationship with your Service

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.33

Tools for Monitoring the Exercise

Monitoring Exercise Progress – CSG uses the following strategy:• On‐site team to interface with Sungard and publish exercise 

status

• 3 other monitors on conference bridge- 2 monitors to track server status changes and facilitate turnover

- 1 monitor to record and track issues

• Hold training classes for monitors so they know what is expected of them

Issue Tracking – CSG uses Remedy• Develop templates to prepopulate as many fields as possible

• Develop dropdown lists to make quick selections

• Encourage exercise participants to put technical information in IM windows so details are not lost and can be easily copied into tickets

Copyright© 2015  CSG Systems International, Inc. and/or its affiliates (“CSG International”). All rights reserved.34

Tools for Communicating with Executive Management

Go/No Go Report• Used to gain Executive Management approval

• Shows high level timeline 

• Documents readiness for all teams

• Lists all possible change collisions 

• Lists exercise risks and mitigation strategies

• States all internal team and external client communications and dates sent

6 Hour Progress Reports• Predefined email list

• Recaps progress since last report

• Lists significant issues resolved since last report

• Lists significant issues since last report

• Shows graphs of server recovery progress