Incident Consequence Analysis
-
Upload
ronald-bartels- -
Category
Business
-
view
855 -
download
0
description
Transcript of Incident Consequence Analysis
![Page 1: Incident Consequence Analysis](https://reader036.fdocuments.in/reader036/viewer/2022082702/55589c74d8b42a2a738b4b37/html5/thumbnails/1.jpg)
Incidentdd/mm hhmm
Detectedhhmm
Repairhhmm
Recoverhhmm
Restorehhmm
Resolutionhhmm
Workaround<Description>
Diagnosishhmm
Escalations: Problem management: dd/mm<Any extra details>
Notification / Report:dd/mm
Incident Consequence Analysis<Description of major incident>Service desk references: ######
This is reported as a <Minor, Major> Incident.
<Business units> affected in <location>. <x> minutes unavailable and/or <x> minutes degraded.
<Resolution>. <Service> affected by <cause>. <No, blank> further root cause analysis required.
Escalated to <escalations>.This incident affected the company <less than, the
same as, greater than> usual. The outage was <less than, blank, greater than> normal. The risk
is <less than, blank, greater than> average.
Prepared by: <first name, surname>
<Major Incident Dashboard>
Rolling Incident averages:Classification – <Norm>, Outage analysis –
<Norm>, Risk management – <Norm>This incident was <calculation> less than the
norm using the Incident User Metric.
Resources <Job descriptions and names of resources involved>
Service affected <Name of service from catalogue>
Description of incident<Description of incident>>
Resolution<Description of resolution?
Timelines& details
Time analysis
<Graph of Expanded Incident Lifecycle>
Incident Breakdowns<Pie chart of incident breakdown by service> <Pie chart of incident breakdown by cause>
Time unavailable/degraded <x> minutes unavailable, <x> minutes degradedMTTR=<x> minutes, MTBF=<x> days, MTBSI=<x> days.
Incident User Metric Cost of downtime analysis <x><Incident user Metric skyline>
<Last 10 Incidents – ROC analysis>
Classification (<x>%)Outage analysis
(<x>%)Risk Management (<x>%)
S CR OP U P IT B I V CM
Thinking problem management! (ICA Template) Page: 1
![Page 2: Incident Consequence Analysis](https://reader036.fdocuments.in/reader036/viewer/2022082702/55589c74d8b42a2a738b4b37/html5/thumbnails/2.jpg)
Major Incident Dashboard
Classification
Outage
Risk
0% 10% 20% 30% 40% 50% 60% 70%
<x> <x> <x> <x> <x> <x> <x> <x> <x> <x>ClassificationScope (S) <input from major incident draft template>Credibility (CR) <input from major incident draft template>Operations (OP) <input from major incident draft template>Urgency (U) <input from major incident draft template>Prioritization (P) <input from major incident draft template>Outage analysisIT service outage analysis (IT) <input from major incident draft template>Business service outage analysis (B) <input from major incident draft template>Risk managementRisk impact (I) Best practice CIA analysis (CRAMM) – Confidentiality (unauthorized disclosure), Integrity (unauthorized modification or misuse), Availability (destruction or loss).
<input from major incident draft template>
Risk vulnerability (M) What are the chances of the outage occurring considering loss, error or failure?
<input from major incident draft template>
Countermeasures (CM) What is being done to prevent this from happening again?
<input from major incident draft template>
ClosureEscalations Please note that if no comments or questions are received within 5 working days this reported is classed as Accepted
<input from major incident draft template>
Example
Incident Consequence AnalysisEmail outage in Pofadder
Service desk references: 555772This is reported as a Minor Incident.
All Business units affected in Pofadder. 12 minutes unavailable and 238 minutes degraded.
Mail server recycled. Messaging affected by bug. No further root cause analysis required.
Escalated to Infrastructure Manager.This incident affected the company less than usual. The outage was normal. The risk is less
than average.
Prepared by: Ronald Bartels
Rolling Incident averages:Classification – 69%, Outage analysis – 49%, Risk
management – 54%This incident was 66% less than the norm using
the Incident User Metric.
Resources Service Level Manager (M Mouse), Regional Infrastructure team leader (D Duck).
Service affected Messaging
Description of incidentIT customers located in the Pofadder office experienced slow delivery of mail messages to other regions and business units. IT Customers unable to confirm instructions or send credit minutes via email. The inbound and outbound queues on the Exchange server were not flowing. Documents scanned and emailed via multi-function devices where the size of the document was over 1.5mb where largely affected. Log file gave specific error code which suggested several resolutions from the knowledge base. (http://support.microsoft.com/kb/329617). The bad mail folder was cleared and the SMTP service was restarted. However, this did not clear the issue and it was only when the mail server was power cycled that
Thinking problem management! (ICA Template) Page: 2
![Page 3: Incident Consequence Analysis](https://reader036.fdocuments.in/reader036/viewer/2022082702/55589c74d8b42a2a738b4b37/html5/thumbnails/3.jpg)
Incident9/10 09h30
Detected11h25
Repair13h15
Recover13h17
Restore13h35
Resolution13h35Server
restarted
WorkaroundFailed - Bad mail folder cleared and SMTP service restarted.
Diagnosis13h06
Escalations: Problem management: 9/10Notification / Report:
9/10
Incident breakdown by Service(affected messaging)
Ecommerce
Monitoring
Printing
Third party
Operations
Backups
Service desk
Storage
AD
Documents
Security
Intranet
Hosting
Payments
Voice
Messaging
Data networks
Incident breakdown by Cause(caused by bug)
Change
Capacity
Process
Vendor
Hardware
Bug
Environmental
Service Provider
Carrier
Configuration
Component Failure
normal operations resumed.ResolutionMail server recycled.
Timelines& details
Time analysis
Incident times
00:00
00:14
00:28
00:43
00:57
01:12
01:26
01:40
01:55
02:09
hh:mm 01:55 01:41 00:09 00 00:02 00:18
Detect Diagnose Repair Recover Restore
Incident Breakdowns
Time unavailable/degraded 12 minutes unavailable, 238 minutes degradedMTTR=238 minutes, MTBF=8 days, MTBSI=7 days.
Incident User Metric Cost of downtime analysis 347
Thinking problem management! (ICA Template) Page: 3
![Page 4: Incident Consequence Analysis](https://reader036.fdocuments.in/reader036/viewer/2022082702/55589c74d8b42a2a738b4b37/html5/thumbnails/4.jpg)
Incident User Metric Skyline
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
2000024
\03\
2007
31\0
3\20
07
07\0
4\20
07
14\0
4\20
07
21\0
4\20
07
28\0
4\20
07
05\0
5\20
07
12\0
5\20
07
19\0
5\20
07
26\0
5\20
07
02\0
6\20
07
09\0
6\20
07
16\0
6\20
07
23\0
6\20
07
30\0
6\20
07
07\0
7\20
07
14\0
7\20
07
21\0
7\20
07
28\0
7\20
07
04\0
8\20
07
11\0
8\20
07
18\0
8\20
07
25\0
8\20
07
01\0
9\20
07
08\0
9\20
07
15\0
9\20
07
22\0
9\20
07
29\0
9\20
07
06\1
0\20
07
Last 10 incidents- ROC Analysis
1 2 3 4 5 6 7 8 9 10
Risk
Outage
Classification
Classification (60%)Outage analysis
(50%)Risk Management (41%)
S CR OP U P IT B I V CM
2 2 1 4 3 3 1 2 2 1ClassificationScope (S) Less than 25% of customers affected*Credibility (CR) Multiple business units affected negativelyOperations (OP) Some interference with normal completion of workUrgency (U) Underway and could not be stoppedPrioritization (P) High - Technicians respond immediately, assess the situation, and
may interrupt other staff working low or medium priority jobs for assistance.
Outage analysisIT service outage analysis (IT) Major - App, server, link (network or voice) unavailable for greater
than 1 hour or degraded for greater than 4 hoursBusiness service outage analysis (B) Minor -Financial loss with a visible impact on profitability but no
real effect, greater than $10k or some embarrassment or rule or process breaches or medical treatment
Risk managementRisk impact (I) Best practice CIA analysis (CRAMM) – Confidentiality (unauthorized disclosure), Integrity (unauthorized modification or misuse), Availability (destruction or loss).
Confidentiality=Confidential, Integrity=High, Availability=Moderate
Risk vulnerability (M) What are the chances of the outage occurring considering loss, error or failure?
Low loss probabilityModerate error probabilityModerate failure probability
Countermeasures (CM) What is being done to prevent this from happening again?
Proactive monitoring of environment. Refer to the knowledge base at service desk. Antivirus service locks up SMTP Service when BadMail queue reaches a specific size. Add check to daily check list to monitor BadMail folder size.
ClosureEscalations Please note that if no comments No further root cause analysis required. Escalated to Infrastructure
Thinking problem management! (ICA Template) Page: 4
![Page 5: Incident Consequence Analysis](https://reader036.fdocuments.in/reader036/viewer/2022082702/55589c74d8b42a2a738b4b37/html5/thumbnails/5.jpg)
or questions are received within 5 working days this reported is classed as Accepted
Manager.
Thinking problem management! (ICA Template) Page: 5