IT Operations and Infrastructure Services - CEMLAcemla.org/.../2011-09-SistematizacionBC-04.pdf ·...

23
Bank of Canada IT Operations and Infrastructure Services Bank of Canada (BoC) White Paper Business Continuity Plan (BCP) versus Disaster Recovery Plan (DRP) for presentation to XXXIV MEETING ON CENTRAL BANK SYSTEMATIZATION Sept 7-9, 2011, SANTIAGO, CHILE Date of Issue: August 17, 2011 Bank of Canada, Information Technology Services (ITS) Daniel Schaffler, Victor Baez with Daniel Lamoureux

Transcript of IT Operations and Infrastructure Services - CEMLAcemla.org/.../2011-09-SistematizacionBC-04.pdf ·...

Page 1: IT Operations and Infrastructure Services - CEMLAcemla.org/.../2011-09-SistematizacionBC-04.pdf · IT Operations and Infrastructure Services . ... Business Continuity Plan (BCP) v/s

Bank of Canada

IT Operations and Infrastructure Services

Bank of Canada (BoC) White Paper

Business Continuity Plan (BCP) versus Disaster Recovery Plan (DRP)

for presentation to

XXXIV MEETING ON CENTRAL BANK SYSTEMATIZATION

Sept 7-9, 2011, SANTIAGO, CHILE

Date of Issue: August 17, 2011

Bank of Canada, Information Technology Services (ITS)

Daniel Schaffler, Victor Baez with Daniel Lamoureux

Page 2: IT Operations and Infrastructure Services - CEMLAcemla.org/.../2011-09-SistematizacionBC-04.pdf · IT Operations and Infrastructure Services . ... Business Continuity Plan (BCP) v/s

TABLE OF CONTENTS

1. WHO WE ARE ..............................................................................................................................1

1.1. HISTORY .....................................................................................................................................21.2. OUR CHANGING WORLD ................................................................................................................41.2.1. COMPLEXITY OF ENVIRONMENT – ADAPTING TO EMERGING TECHNOLOGIES ................................................... 41.2.2. INCREASED RISK SPECTRUM .................................................................................................................... 5

2. OVERVIEW OF RISK MANAGEMENT FRAMEWORK, CONTINUITY OF OPERATIONS PROGRAM, AND IT SERVICE CONTINUITY MANAGEMENT .............................................................................................6

2.1. RISK MANAGEMENT FRAMEWORK ....................................................................................................62.2. CONTINUITY OF OPERATIONS (COOP) ...............................................................................................72.2.1. MANDATE ........................................................................................................................................... 72.2.2. PRINCIPLES AND FRAMEWORK ................................................................................................................ 72.2.3. RELATION BETWEEN COOP AND THE RISK MANAGEMENT FRAMEWORK ....................................................... 82.2.4. BUSINESS IMPACT ANALYSIS (BIA) .......................................................................................................... 82.3. IT SERVICE CONTINUITY MANAGEMENT (ITSCM) ................................................................................92.3.1. MANDATE ........................................................................................................................................... 92.3.2. PRINCIPLES AND FRAMEWORK ................................................................................................................ 92.3.3. RELATION BETWEEN ITSCM AND COOP ................................................................................................ 102.3.4. INPUT FROM THE BIA .......................................................................................................................... 112.3.5. ITSCM PROCESS RELATION WITH ITIL AND OPERATIONAL PROCESSES ......................................................... 112.3.6. CURRENT RECOVERY POSTURE .............................................................................................................. 122.3.7. SPLIT OPERATIONS .............................................................................................................................. 132.3.8. THE FUTURE POSTURE, A MORE RESILIENT ENVIRONMENT .......................................................................... 142.4. HUMAN CAPITAL ........................................................................................................................ 152.5. AWARENESS .............................................................................................................................. 162.6. TESTING AND EXERCISES ............................................................................................................... 162.6.1. ITSCM DRIVEN EXERCISES .................................................................................................................... 162.6.2. DISASTER RECOVERY EXERCISES ............................................................................................................. 162.6.3. TABLE TOP EXERCISES .......................................................................................................................... 172.7. COOP DRIVEN EXERCISES ............................................................................................................. 172.7.1. BANK-WIDE CONTINUITY TESTS ............................................................................................................. 172.7.2. CALL TREE EXERCISES ........................................................................................................................... 182.7.3. SIMULATIONS .................................................................................................................................... 182.8. EVALUATING OUR PREPAREDNESS AND RESPONSE ............................................................................... 18

3. LESSONS LEARNED FROM THE OCCURRENCE OF REAL CONTINGENCIES ....................................... 19

3.1. CONTINUOUS IMPROVEMENT ........................................................................................................ 21

Page 3: IT Operations and Infrastructure Services - CEMLAcemla.org/.../2011-09-SistematizacionBC-04.pdf · IT Operations and Infrastructure Services . ... Business Continuity Plan (BCP) v/s

Business Continuity Plan (BCP) v/s Disaster Recovery Plan (DRP)

Page 1

1. Who we are

The Bank of Canada is the nation’s central bank, with four main areas of responsibility:

• Monetary policy - The Bank contributes to solid economic performance and rising living standards

for Canadians by keeping inflation low, stable and predictable. Since 1991, the Bank’s monetary

policy actions toward this goal have been guided by a clearly defined inflation target

• Currency - The Bank designs, produces and distributes Canada’s bank notes and replaces worn

notes. It deters counterfeiting through leading-edge bank note design, public education and

collaboration with law-enforcement agencies

• Financial System - The Bank promotes a stable and efficient financial system in Canada and

internationally. To this end, the Bank oversees Canada’s key payment, clearing and settlement

systems; acts as lender of last resort; assesses risks to financial stability; and contributes to the

development of financial system policies

• Funds Management - The Bank provides effective and efficient funds-management services for the

Government of Canada, as well as on its own behalf and for other clients. For the government, the

Bank provides treasury-management services and administers and advises on the public debt and

foreign exchange reserves. In addition, the Bank provides banking services to critical payment,

clearing and settlement systems

Our principal role, as defined in the Bank of Canada Act, is to “promote the economic and financial

welfare of Canada”.

The Bank was founded in 1934 as a privately owned corporation. In 1938, it became a Crown

corporation belonging to the federal government. Since that time, the Minister of Finance has held the

entire share capital issued by the Bank. Ultimately, the Bank is owned by the people of Canada.

Page 4: IT Operations and Infrastructure Services - CEMLAcemla.org/.../2011-09-SistematizacionBC-04.pdf · IT Operations and Infrastructure Services . ... Business Continuity Plan (BCP) v/s

Business Continuity Plan (BCP) v/s Disaster Recovery Plan (DRP)

Page 2

The Bank is not a government department and conducts its activities with considerable independence

compared with most other federal institutions. For example:

• The Governor and Senior Deputy Governor are appointed by the Bank's Board of Directors (with the

approval of Cabinet), not by the federal government

• The Deputy Minister of Finance sits on the Board of Directors but has no vote

• The Bank submits its expenditures to its Board of Directors. Federal government departments

submit theirs to the Treasury Board

• Bank employees are regulated by the Bank itself, not by federal public service agencies

• The Bank's books are audited by external auditors appointed by Cabinet on the recommendation of

the Minister of Finance, not by the Auditor General of Canada

1.1. History

Canadians have always been firm believers in the value of ‘insurance’, and the institution of the

Bank of Canada is no exception. Making the business case for Business Continuity Management

(BCM) and Disaster Recovery Planning (DRP) has never been at issue for the Bank. The necessity of

fulfilling our legislated mandate, practicing good governance, and preparedness for provision of

uninterrupted, essential services vital to the national and global financial community have all been

well understood and accepted, and indeed have been a integral part of the Bank’s culture from the

early days of the mainframe to the more recent distributed computing environment.

The IT computing environment running the Banks core applications in the latter part of the 20th

century consisted of a mainframe computer dedicated to the production environment, and another

mainframe computer in a separate, geographically disparate location dedicated to the testing and

development environments. The production environment was very tightly controlled and restricted,

Page 5: IT Operations and Infrastructure Services - CEMLAcemla.org/.../2011-09-SistematizacionBC-04.pdf · IT Operations and Infrastructure Services . ... Business Continuity Plan (BCP) v/s

Business Continuity Plan (BCP) v/s Disaster Recovery Plan (DRP)

Page 3

with a high degree of policy and procedure in place to manage access to, and migration of changes

from the development to the production environment.

Disaster recovery plans were in place to facilitate complete recovery of the production mainframe

computing and network environment, and included off-site vaulting of what were deemed as ‘vital

records’ (eg. tape backups of the system, software, applications and data).

A dedicated Disaster Recovery Coordinator position was staffed, with responsibility for over-seeing

the creation and management of disaster recovery plans and procedures to ensure business

continuity.

Disaster recovery plans were exercised and tested twice yearly and entailed full recovery of the

production mainframe environment onto the development mainframe. Additional exercises , limited

to the IT Services department (at the time known as Automation Services Department (ASD)) were

also undertaken periodically to ensure that ASD was well positioned to provide IT services to meet

the business continuity requirements of the larger Bank. Those exercises also provided a means by

which ASD could ensure that changes enacted into the production environment had been factored

into business continuity plans.

With the advent and adoption of the distributed computing environment, the complexity of

business continuity management increased significantly, but through it all, the Bank has, and always

will, continue to place high importance on the need to have proven strategies and plans in place.

Page 6: IT Operations and Infrastructure Services - CEMLAcemla.org/.../2011-09-SistematizacionBC-04.pdf · IT Operations and Infrastructure Services . ... Business Continuity Plan (BCP) v/s

Business Continuity Plan (BCP) v/s Disaster Recovery Plan (DRP)

Page 4

1.2. Our Changing World

“Everything flows and nothing stays.” Heraclitus, Greek Philosopher (c.535-c.475 BC)

In keeping with the long-term trend in the history of computing hardware described by Moore’s law

(exponentially increasing capacity), so too has the complexity of both the business and computing

environments evolved, along with the spectrum of risks to be dealt with.

In today’s world, business and IT organizations are:

• Striving to leverage ‘lessons learned’ and apply ‘best practices’

• Increasingly required to demonstrate their value (profitability, cost-effectiveness)

• Re-defining relationships between internal and external business units

• Dealing with changing business models

• Addressing rapidly changing technologies, standards and practices

The Bank of Canada, like every other organization, has been subject to the technological, natural

and environmental forces that shape the nature of our business and delivery of our services, and the

changes that need to be undertaken to stay current and sustainable.

1.2.1. Complexity of environment – adapting to emerging technologies

Most, if not all, IT organizations are additionally trying to manage IT as business, amidst a

dramatically changing technology landscape that is both much more complex and challenging

than previous computing environments. Computing platform changes (mainframe vs.

distributed vs. cloud; centralized vs. decentralized), networking advances (multimedia; VOIP;

data and voice convergence; fiber), server and storage evolution (clusters; virtualization; NAS;

SAN; backup and recovery technology), applications development (SaaS; multi-threading) and

Page 7: IT Operations and Infrastructure Services - CEMLAcemla.org/.../2011-09-SistematizacionBC-04.pdf · IT Operations and Infrastructure Services . ... Business Continuity Plan (BCP) v/s

Business Continuity Plan (BCP) v/s Disaster Recovery Plan (DRP)

Page 5

services all come with various benefits and risks, and can dramatically alter the business

continuity posture of an IT organization. Vigilance with respect to availability, reliability and

sustainability, along with the imperative need to ensure alignment between business

requirements, service level expectations and the cost of doing business, is required and

continuously on-going.

1.2.2. Increased risk spectrum

The spectrum of risks facing an organization from natural and environmental forces has changed

significantly over the course of the last two decades.

Thinking back from 1998 to the present, the Bank has rapidly responded to, and successfully

dealt with, the following naturally occurring events, with marginal disruption to the conduct of

business and the delivery of services:

• 1998 - ice storm, affecting power distribution lines across central Canada

• 2010 - a magnitude 5.0 earthquake occurs in central Canada

On the environmental front, the same can be said for the following actual and on-going events:

• 1999 - the new millennium (Y2K)

• 2001 - heightened terrorism treat after September 11

• 2002 - Severe Acute Respiratory Syndrome (SARS) outbreak

• 2002 - 2 day "Take the Capital" protest in Ottawa against G8 meeting being held in Alberta

• 2003 - The largest power outage in North American history

• 2008 - Suspicious package deposited outside the Bank’s head office

• 2009 - Potential influenza pandemic

Page 8: IT Operations and Infrastructure Services - CEMLAcemla.org/.../2011-09-SistematizacionBC-04.pdf · IT Operations and Infrastructure Services . ... Business Continuity Plan (BCP) v/s

Business Continuity Plan (BCP) v/s Disaster Recovery Plan (DRP)

Page 6

• 2010 – Major equipment failure affecting the Bank’s telephony system

• 2011 - heightened occurrence of cyber attacks

• On-going – succession planning and the loss of corporate knowledge

• On-going – supply chain (including outsourcing) and vendor induced disruptions

The above mentioned events and threats, as well as the potential for future ones, have all

contributed directly to the constant re-assessment, strengthening and evolution of the Bank of

Canada’s risk management, business continuity and IT service continuity management plans and

posture.

2. Overview of Risk Management Framework, Continuity of Operations

Program, and IT Service Continuity Management

2.1. Risk Management Framework

The Bank developed its risk management framework in 1971 in consultation with the Board of

Directors. Risk management is viewed by the Bank as particularly important to sound governance,

decision making and accountability. The framework supports informed decision making by ensuring

that the appropriate competencies, analytic tools, consultation and communication form the

foundation for innovation and responsible risk taking.

The risk management framework is fully integrated with the Bank’s corporate management

processes. It is incorporated into the annual planning, priority-setting, budget process, and

quarterly/yearly stewardship processes. It is supported by an in-house tool that allows tracking and

classification of operational risk events which gives it insight into the nature of problems that arise

with its processes and systems.

Page 9: IT Operations and Infrastructure Services - CEMLAcemla.org/.../2011-09-SistematizacionBC-04.pdf · IT Operations and Infrastructure Services . ... Business Continuity Plan (BCP) v/s

Business Continuity Plan (BCP) v/s Disaster Recovery Plan (DRP)

Page 7

The Bank has had a long-standing, well-established security and administrative framework for

safeguarding its personnel and assets (physical, information and financial). The safeguards include:

policies and standards; personnel screening; physical and logical security equipment and processes;

business continuity planning; and security awareness programs.

2.2. Continuity of Operations (COOP)

2.2.1. Mandate

The Bank of Canada delivers services that are essential to the economic well-being of the nation.

To ensure that those services, and the Bank’s role in the global financial community, continue to

be delivered during a disruptive event, the Bank has created a Continuity of Operations (COOP)

program. The COOP program encompasses all disciplines necessary to enable recovery of

essential Bank services subsequent to a disruptive event, with emphasis on the protection of

Bank employees and property.

2.2.2. Principles and framework

The Bank of Canada Continuity of Operations program has been explicitly designed to meet the

standards set by applicable sections of the Bank Security Policy (BSP) and the National Fire

Protection Association (NFPA) 1600 Standard on Disaster/Emergency Management and

Continuity of Operations Programs. Compliance with the BSP is mandatory, but compliance with

NFPA 1600 is voluntary.

The Continuity of Operations program is an ongoing management and governance process

mandated and supported by senior management, and resourced to ensure that the necessary

steps are taken to identify the impact of potential losses, maintain viable recovery strategies

Page 10: IT Operations and Infrastructure Services - CEMLAcemla.org/.../2011-09-SistematizacionBC-04.pdf · IT Operations and Infrastructure Services . ... Business Continuity Plan (BCP) v/s

Business Continuity Plan (BCP) v/s Disaster Recovery Plan (DRP)

Page 8

and plans, and help ensure continuity of key functions and processes through exercising,

rehearsal, testing, training and maintenance.

2.2.3. Relation between COOP and the Risk Management Framework

The COOP program is a key part of the Bank’s risk framework for safeguarding its personnel and

assets (physical, information and financial), as the COOP program guides, supports, and

promotes the Bank’s plans to:

• Help ensure the safe evacuation of all persons on Bank property in an emergency

• Continue its critical business in the event of a disaster or crisis

2.2.4. Business Impact Analysis (BIA)

The fundamental goal of the Continuity of Operations discipline is to identify mission-critical

processes that support key Bank activities, the maximum recovery timeframes for those

processes after a disruptive event, and the processes and procedures used to restore the

process subsequent to a business interruption.

To maintain an inventory of Bank-wide business processes and functions (not all of which are IT

related or have an IT connotation), the COOP program conducts a Business Impact Analysis

(BIA), as the mechanism for the identification and prioritization of the Bank’s critical business

processes based on impact, injury and loss that would result if a process were to become

unavailable for any reason.

The Business Impact Analysis is necessarily a point in time snapshot that reflects the functions of

the Bank and the recovery priorities and timeframes as they exist when the BIA is created.

However, the Bank is an evolving organization, and the COOP program recognizes the fact that

functions and priorities may shift over time. Therefore, the COOP Program Office is charged with

Page 11: IT Operations and Infrastructure Services - CEMLAcemla.org/.../2011-09-SistematizacionBC-04.pdf · IT Operations and Infrastructure Services . ... Business Continuity Plan (BCP) v/s

Business Continuity Plan (BCP) v/s Disaster Recovery Plan (DRP)

Page 9

responsibility for facilitating a review of the BIA every two years, to ensure that each

Department’s inputs accurately represent the Bank's current operational state.

2.3. IT Service Continuity Management (ITSCM)

2.3.1. Mandate

Information Technology Service Continuity Management (ITSCM) is concerned with managing

the organization’s ability to continue to provide a pre-determined and pre-approved level of IT

service to support the minimum business requirements following an interruption to the

business. This may range from an application or system failure, to a complete loss of the

business premises.

2.3.2. Principles and framework

ITSCM is based on the IT Infrastructure Library (ITIL). ITIL is a publicly available framework, and it

is used by organizations word-wide to establish and improve capabilities in IT Service

Management, to provide value to customers in the form of services (a service being something

that provides value to customers.)

The main benefits of ITIL include:

• Alignment with business needs

• Negotiated achievable service levels

• Predictable, consistent processes

• Efficiency in service delivery, with well-defined processes

• Measurable, improvable services and processes

• Common language and terms

Page 12: IT Operations and Infrastructure Services - CEMLAcemla.org/.../2011-09-SistematizacionBC-04.pdf · IT Operations and Infrastructure Services . ... Business Continuity Plan (BCP) v/s

Business Continuity Plan (BCP) v/s Disaster Recovery Plan (DRP)

Page 10

ITSCM is a mature process within the IT Service Management group of the IT Services (ITS)

department. It is supported by IT senior management, and resourced to ensure that the

necessary steps are taken to identify the impact of potential losses, maintain viable IT recovery

strategies and plans, and help ensure IT continuity of key functions and processes through

exercising, rehearsal, testing, training and maintenance.

2.3.3. Relation between ITSCM and COOP

ITSCM is a key part of the overall Continuity of Operations process and is dependent upon

information derived through this process. ITSCM is focused on the continuity of IT services to

the business, and the COOP program is concerned with the Business Continuity management

process that incorporates all services upon which the business depends, one of which is IT.

ITSCM supports the overall COOP process by ensuring the required IT infrastructure,

applications, and services identified as critical by the business, can be recovered within the

required, and agreed upon, business timescales.

To accomplish this goal, ITSCM ensures that proactive measures are:

• in place to minimize or avoid business disruptions caused by IT outages,

• supported as part of normal IT service deliverables, and

• factored into all IT projects and initiatives

ITSCM provides a framework that minimizes risk for the management and provision of IT

services (for either actual or potential disruptions) to defined service levels. Accordingly, ITSCM

not only focuses on reactive contingency measures, but also on proactive measures to avoid

serious business disruptions.

Page 13: IT Operations and Infrastructure Services - CEMLAcemla.org/.../2011-09-SistematizacionBC-04.pdf · IT Operations and Infrastructure Services . ... Business Continuity Plan (BCP) v/s

Business Continuity Plan (BCP) v/s Disaster Recovery Plan (DRP)

Page 11

2.3.4. Input from the BIA

Information technology is often a critical resource that is required to restore operations of many

Bank processes and functions; it is a critical enabling resource. Therefore, ITSCM is responsible

for review the BIA and ensure that the Bank's information technology recovery plan accurately

reflects the business priorities and timeframes as they are represented in the BIA.

Also the BIA provides the means to categorize the business processes in Tiers based on the

maximum allowable downtime. These Tiers allow IT to define:

• Service levels for applications and infrastructure defined by tier, instead of defining service

levels for each application or service

• Identify the Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO) for the

applications and IT infrastructure depending on the business process that supports.

The benefits of the categorization by Tiers from the BIA:

• Covers all business line applications and foundational, enterprise-wide services (eg.

network connectivity; application hosting; storage management; email; backup/recovery;

remote access; etc)

• Provides clarity on services provided by ITS, and links them to the costs incurred

• Provides clarity on Disaster Recovery posture, since Disaster Recovery solutions can be

offered to the business in Tiers (eg. Critical, Standard, etc.)

• Applies criteria of critical vs. non-critical services, providing guidelines for initial

prioritization and cost saving opportunities (where to focus attention and resources)

2.3.5. ITSCM Process relation with ITIL and operational processes

Page 14: IT Operations and Infrastructure Services - CEMLAcemla.org/.../2011-09-SistematizacionBC-04.pdf · IT Operations and Infrastructure Services . ... Business Continuity Plan (BCP) v/s

Business Continuity Plan (BCP) v/s Disaster Recovery Plan (DRP)

Page 12

Ensuring the continuity of IT services in the event of a disruption requires a thorough

understanding of IT services provided and how they operate under normal circumstances. The

ITSCM process must be aware of and take account of any factors that affect the operation of IT

services - ITSCM is a process that is engaged in all activities in ITS. Consequently it receives feeds

from various operational processes and entities such as:

• Configuration Management: IT components and relationships

• Change Management: Ensuring the currency and accuracy of the Continuity Plans through

the identification of changes affecting, or modifying the IT continuity posture

• Problem Management: Impact analysis of problems that are affecting or may affect the

continuity solutions and ITSCM plans

• Incident Management: Early notification of incidents that can potentially interrupt IT

services and could require the activation of ITSCM plans

• Service Level Management: Service Levels based on BIA criticality, detailing what service

levels must be maintained under normal circumstances and in a disaster situation

• IT Project Management and Delivery Process: Assessment of ITSCM requirements and

proposed continuity solutions, based on ITSCM policy

• IT Enterprise Architecture: Assessment of long-term architecture vision and design

compliance to approved continuity solutions and plans

2.3.6. Current Recovery Posture

The Bank’s main data center is located at Head Office with an alternate center located 20 km

away, which serves as the recovery site in the event of a business disruption. The alternate site

is equipped with computer systems, data links, and staff work areas that enable the Bank to

continue critical operations if the Head Office location is inaccessible or unavailable.

Page 15: IT Operations and Infrastructure Services - CEMLAcemla.org/.../2011-09-SistematizacionBC-04.pdf · IT Operations and Infrastructure Services . ... Business Continuity Plan (BCP) v/s

Business Continuity Plan (BCP) v/s Disaster Recovery Plan (DRP)

Page 13

The current alternate posture provides:

• On-site support for a number of users based on BIA requirements

• Support for remote access connectivity

• Local workstations configured with business line applications

• A flexible recovery workspace (since corporate applications and business line tools can be

provided via “workspace virtualization”, when users start their personal workspace, they

view their own familiar and personalized work space where they can access files,

applications, settings and entire desktop. IT is not dependant on a pre-defined

workstation.)

2.3.7. Split operations

The Bank conducts pre-defined critical business functions from two sites simultaneously, so that

should an event affect either site, the remaining site will settle the day’s work. The

implementation of split operations strengthened and deepened the Bank’s operational

resiliency.

Page 16: IT Operations and Infrastructure Services - CEMLAcemla.org/.../2011-09-SistematizacionBC-04.pdf · IT Operations and Infrastructure Services . ... Business Continuity Plan (BCP) v/s

Business Continuity Plan (BCP) v/s Disaster Recovery Plan (DRP)

Page 14

2.3.8. The future posture, a more resilient environment

Page 17: IT Operations and Infrastructure Services - CEMLAcemla.org/.../2011-09-SistematizacionBC-04.pdf · IT Operations and Infrastructure Services . ... Business Continuity Plan (BCP) v/s

Business Continuity Plan (BCP) v/s Disaster Recovery Plan (DRP)

Page 15

The Bank recently launched a project to increase the environment resilience, by relocating the

main data centre. The strategy was developed through a review of the threats with potential to

impact the Bank’s operations, the extent to which those threats can be mitigated by increasing

geographic separation between sites and the associated operational risks, as well as what other

central banks and similar organizations are doing in this area.

The strategy is to:

• implement split operations for pre-defined critical operations

• locate the Bank’s main data centre and business recovery 6 to 20 km from Head Office, and

• locate the Bank’s alternate data centre 20 to 50 km from the main data centre

2.4. Human capital

A fundamental best practice for all COOP planning is to plan to the worst-case scenario, to ensure

that the Bank is prepared for a variety of situations, even though we do not necessarily know to

what degree we may be challenged. In this instance, one of the worst cases would be coping with

severely reduced workforce. To mitigate this scenario, Managers from across the Bank worked with

the COOP Office to develop a categorization matrix identifying those functions in their areas that

were time-critical and determined whether or not those functions could be performed remotely.

They then identified those individuals who currently fulfill those functions (referred to as the core

group), as well as a pool of individuals with the skill sets who could do the necessary work if people

in the core group were unable to.

This category matrix also provides the guidelines for:

• Identifying areas of vulnerability

Page 18: IT Operations and Infrastructure Services - CEMLAcemla.org/.../2011-09-SistematizacionBC-04.pdf · IT Operations and Infrastructure Services . ... Business Continuity Plan (BCP) v/s

Business Continuity Plan (BCP) v/s Disaster Recovery Plan (DRP)

Page 16

• Establishing remote access priority, ensuring that the pre-determined staff can continue to

conduct business-critical operations

2.5. Awareness

The success of ITSCM depends on a continuing commitment at all levels in the organization and on

people's awareness of their respective responsibilities. IT service continuity requirements are

factored alongside operational activities.

Each department has a Departmental Emergency Response Coordinator, which is responsible for the

coordination and logistics of the departments Continuity of Operations plan, and also is responsible

for the dissemination of information provided by the COOP program office.

2.6. Testing and exercises

2.6.1. ITSCM driven exercises

ITSCM policies mandate that IT continuity solutions and plans must be tested on a regular basis

to evaluate recovery capability effectiveness, and to identify and address any deficiencies.

The purpose of this policy is to validate that the applications and infrastructure at the alternate

site can operate isolated from the primary site and meet the required Recovery Time and

Recovery Point Objectives. It also serves to identify and resolve problems in the IT infrastructure

that could impact the recovery capabilities of critical bank processes, and ensures compliance to

the Audit Department’s requirements for regular ITSCM assessments and reviews of tests and

events for operational readiness.

2.6.2. Disaster Recovery exercises

Page 19: IT Operations and Infrastructure Services - CEMLAcemla.org/.../2011-09-SistematizacionBC-04.pdf · IT Operations and Infrastructure Services . ... Business Continuity Plan (BCP) v/s

Business Continuity Plan (BCP) v/s Disaster Recovery Plan (DRP)

Page 17

The objective the Disaster Recovery (DR) exercises is to demonstrate the ability to activate the

production applications and IT environment at the alternate site, within the prescribed Recovery

Time Objectives (RTO) and Recovery Point Objectives (RPO). These tests validate ITS’s

preparedness to recover operations.

The DR exercises are conducted twice yearly, during a weekend, with the participation of all IT

groups and Bank business lines to validate critical systems. The DR exercise has predefined

conditions, control points, success criteria, and a strict command and control structure.

A report on the test results is distributed to the COOP program, IT Management, and the Audit

Department. This report measures compliance to Recovery Time and Recovery Point Objectives,

quality of the individual test results and reports, and highlights updates required to IT plans and

IT infrastructure.

Results are analyzed to determinate if there are variations with the pre-determined level of

services for the business, and to implement the necessary measures to mend deficiencies.

2.6.3. Table top exercises

Table top exercises are ‘paper base’ exercises and are conducted in ITS by ITSCM in preparation

for the DR exercises. The objective is to identify gaps in the Disaster Recovery plans.

2.7. COOP driven exercises

2.7.1. Bank-wide continuity tests

Every two years the COOP Program Office conducts full-scale tests during business hours, while

the business is conducting real business transactions at the alternate site. These tests utilize

close-to-life scenarios and situational injects during the exercise, presenting a realistic and

Page 20: IT Operations and Infrastructure Services - CEMLAcemla.org/.../2011-09-SistematizacionBC-04.pdf · IT Operations and Infrastructure Services . ... Business Continuity Plan (BCP) v/s

Business Continuity Plan (BCP) v/s Disaster Recovery Plan (DRP)

Page 18

challenging operating environment. The objective is to stress-test the continuity of operation

plans (business continuity plans) for the departments; including ITS, and identify gaps in those

plans.

These exercises are designed with focus on specific situations, and objectives, and normally

include exercise ‘press releases’, in paper and video, as part of the injects during the exercise.

2.7.2. Call tree exercises

As part of the ongoing business of individual departments for Bank-wide readiness, all

departments and business lines must ensure that they maintain up-to-date contact information

for all of their employees. As such, a Bank-wide Call Tree exercise is conducted once a year. In

order to represent a realistic scenario, staff are requested not to alter their regular routine to

accommodate this exercise.

2.7.3. Simulations

These are ‘paper base’ exercises and are conducted by COOP Program Office. The objective of

these one day exercises, utilizing close-to-life scenarios, is to train personnel and also to identify

gaps in the continuity of operations plans.

2.8. Evaluating our preparedness and response

Test result reports are reviewed by the Audit Department, and if required, observations are

presented to the Bank Senior management. A review of results is conducted with the departments,

and tasks are assigned to areas to follow-up on gaps found during the tests.

Page 21: IT Operations and Infrastructure Services - CEMLAcemla.org/.../2011-09-SistematizacionBC-04.pdf · IT Operations and Infrastructure Services . ... Business Continuity Plan (BCP) v/s

Business Continuity Plan (BCP) v/s Disaster Recovery Plan (DRP)

Page 19

3. Lessons learned from the occurrence of real contingencies

The occurrence of events has provided the occasion for the Bank to update and fine tune its contingency

plans, policies, processes, communications, and decision-making in a real-time atmosphere, and allowed

the Bank to test the effect of our response measures to support the critical processes of the Bank.

A success factor for the Bank to deal with emerging risks has been the rapid reaction to, and

implementation of, improvements to contingency plans. The use of information gathered in the BIA has

provided an excellent tool to minimize the time to adapt plans to new risks as the Bank does not need to

do the data gathering exercise when situations arise.

In situations when the staff cannot access the Bank premises, either as a risk reduction measure or as a

consequence (ie. 1998 - ice storm, 2002 - Severe Acute Respiratory Syndrome (SARS) outbreak, 2003 -

Power outage, 2009 - Potential influenza pandemic), the Bank has adapted the continuity plans to deal

with the reduction of staff and with the increase of remote access by taking the following actions:

• Identify key functions that would be affected by a shortage of staff

• Identify the minimum necessary number of staff for critical processes during peak periods

• Identification of pool of staff, from which to draw in case of staff shortage

• Identify processes that can be done by remote access (users at home), and processes that must be

conducted on site.

• Implement response in stages for a shortage of staff or the need for social distancing

• Provide training in the use of personal protective equipment for staff that are identified as required

to be on site to perform time critical processes

• Review dependencies on key suppliers

Page 22: IT Operations and Infrastructure Services - CEMLAcemla.org/.../2011-09-SistematizacionBC-04.pdf · IT Operations and Infrastructure Services . ... Business Continuity Plan (BCP) v/s

Business Continuity Plan (BCP) v/s Disaster Recovery Plan (DRP)

Page 20

• Revise policies and procedures for remote access, and strengthen processes to assure priority

access for critical processes in case of remote access bandwidth limitations

• Provide mobile devices to staff performing critical processes/services

• Implementation of flexible recovery workspace, that can be accessed from any Bank issued PC or

from a user owned PC, providing a personalized work space that is not dependant on a pre-defined

workstation

In events that require staff to continue operations at the alternate site (ie. 2003 - Power outage, 2008 -

Suspicious package deposited outside the Bank’s headquarters, 2010 – Earthquake) the Bank has taken

the following actions:

• Implemented split operations, to conduct pre-defined critical business functions from two sites

simultaneously

• Increased the capacity and redundancy for emergency power distribution (redundant diesels

generators); and sign agreements with vendors to guarantee fuel supply at primary and alternate

sites

• Increased the capacity and redundancy for the Data Centers cooling

• Assured a minimum number of seats at the recovery site per department and also provide flexibility

by increasing the number of workstations available for the departments by deploying a ‘virtual

workspace’, which is a work space that is not dependant on a pre-defined workstation

• Strengthened the Incident Management Team structure

and will be undertaking the following:

• Relocating the main data center away from Head Office

• Implementing lights-out Data Centers (all Data Center management will be done remotely)

Page 23: IT Operations and Infrastructure Services - CEMLAcemla.org/.../2011-09-SistematizacionBC-04.pdf · IT Operations and Infrastructure Services . ... Business Continuity Plan (BCP) v/s

Business Continuity Plan (BCP) v/s Disaster Recovery Plan (DRP)

Page 21

In preparation for events that threatens IT security (ie. cyber attacks), the Bank has implemented an

Enterprise Security Operation Centre (ESOC), whose objective is to provide centralized security incident

and event management. The ESOC enables the Bank to take a proactive approach toward mitigating

threats and protecting its assets by providing the following functions:

• real-time security monitoring and analysis of infrastructure for access control and policy breaches

• security incident management

• security metrics gathering, analysis, and reporting

3.1. Continuous improvement

At the Bank, continuity programs are not only about technology or facilities loss. The scope has

evolved to manage business disruptions, with a well established multitier approach to deal with

situations that do or may disrupt critical business processes.

Regular testing is a key part of our continuous improvement process, providing the means to

identify gaps and to fine-tune plans, and for staff to learn by executing and testing their recovery

procedures.