slide 1
Service Management @ Colruyt
slide 3
Assignment Service management
Service Management BP&S has the overall responsibility, together with all stakeholders, to ensure that the operations and support of the operational BP&S products and services meet and continue to meet the agreed service levels
We keep SH.. out of ..IT
slide 4
Role of Service Mgmt in the Service Life Cycle
ServiceManagement
Solution Delivery
Solution
Managed service
Solutions deliver the new functional and nonfunctional requirements fix the service levels
Ensures that we keep the agreed SLE’s
slide 5
Guidelines for Service management
Used standard: ITIL (“Information Technology Infrastructure Library”)
= Goal
= Series of best practices (guidance) to set up the necessary operational processes for an (ICT) organisation
Service management ensures that these processes can be incorporated within BP&S
slide 6
The processes…Reference model
Service Design & Management
Operation bridge
Incident Event
Problem
Continuity Capacity
COST
SLA Management
Availability
Service development & deployment
Build & test
Release to Production
Business IT Alignment
IT Strategy development
Business Assessment Customer
Management
Service Planning
Request Fulfilment
Configuration
Change
slide 7
The processes…
Service Design & Management
Operation bridge
Incident Event
Problem
Continuity Capacity
COST
SLA Management
Service development & deployment
Build & test
Release to Production
Business IT Alignment
IT Strategy development
Business Assessment Customer
Management
Service Planning
Request Fulfilment
Configuration
Change
Availability
slide 8
Why Service Management?
slide 9
For the Business
IS CRUCIAL
PRODUCTION
slide 10
Operational ITIL
We make every effort to keep a stable production environmenttoday and tomorrow.
To achieve this we need to set up different processes
PRODUCTION
slide 11
Operational ITIL
You can only have a stable production environment if you have control over the operational changes
PRODUCTION
CHANGE
slide 12
Operational ITIL
Having control over the operational changes means:
CHANGEChangeCalendar
ITCONFIG
- knowing the correct impact of a change - knowing ALL the changes
- planning and communicating each change
PRODUCTION
ITChange
slide 13
Operational ITIL
Asset management is mandatory for asset validationCHANGE_ASSET = INCIDENT_ASSET = EVENT_ASSET
CHANGEChangeCalendar
ITCONFIG
ITASSET
PRODUCTION
ITChange
slide 14
Operational ITIL
CHANGEChangeCalendar
ITCONFIG
ITSERVICES
Change Window
Unavailability
ITASSET
PRODUCTION
ITChange
SLA & SLE
Having control over the impact means:
- knowing the change window of an impacted asset
- knowing what an enduser needs (inventory of assets)
- communicating the changes for each itservice
slide 15
ChangeGoal
Ensure that changes can happen within the
agreed SLEs and without affecting the
stability of the production
slide 16
ChangeHow
• Having control over the changes: – Each Change is communicated ITChange – Each Change is planned ChangeCalender– Each Change impact is known– Each Change is authorised
The CAB (Change Advisory Board) manage all changes.
slide 17
Configuration & AssetImpact Analysis & dependences
The environment becomes more and more complex
The impact becomes bigger
Extra availability becomes ‘normal’
The change windows become smaller
How can we keep an overview of all these assets & relations?
slide 18
When can I switch this cable?
What is the impact?
When can I maintain the UPS System?
When can I deploy this middleware service?
When can I upgrade the RAC Database?
How can I move a datacenter?
When can I install a new application server?
slide 19
IMPACT
80 % of all unavailabilities are due to changes(Gartner)
Today 99% of all changes are running fine at Colruyt,but this still generates more than 40% of all unavailabilities…
slide 20
Impact?
Which services are impacted whenI pull the fibre cable connected to the director XFBS011102 on port
26 module 2?
slide 21
IMPACT?
Director 1XFBS011101-FC2/26XFBS011101-FC6/4XFBS011101-FC9/4
SAN
DS8300W-50050763060005D4DS8300W-50050763060B05D4DS8300W-50050763061405D4DS8300W-50050763061905D4
Bootdisk
SAN
FIBERCARD1SVLIPC71-500110A00016C17E
FIBERCARD2SVLIPC71-50050763060005D4
SVLIPC71 Wilgenveld 1214B RACK AD41
NETWORKXWBS013P21 – GI0/12
MACSVLIPC71-001A64D32554
Director 2XFBS011102-FC2/26XFBS011102-FC6/4XFBS011102-FC9/4
ORACPC50
BRSTD001@ORACPC50
ORACPC50_PROCESS
DS-JDBC_BRANCHCOUNT
BRANCHCOUNT001
ITSERVICEVERKOOP_FVS2000ITSERVICE
VERKOOP_FVS2000ITSERVICEVERKOOP_FVS2000ITSERVICEVERKOOP_FVS2000
The ITService VERKOOP_FVS2000 has 1199 dependences(Result on 20/01/2010)
The impact list of componentXFBS011102-FC2/26 contains 1954 entries (Result on 20/01/2010)
TELLINGEN_ALIAS
slide 22
RELATIONSCountry Site Building Room Rack
Physical server
Logical server
MF OthersWindows Network componentUnixLinux
LPAR
STC’s
Logical Database
Physical Database
JDBC connection
Middleware services
Application
Universe
Reports
IRAP
ESX
Storage
IMSL
Fibercard
Blade ChassisLoad balancer
Queue
Windows Services
Bootdisk
ITSERVICES
CICS
WAS
ITELEMENTS ITFUNCTIONS
Windows Shares
slide 23
ITService e.g. Finance
AGENDA MUSTARCHIVES MUST ATST SHOULD DIENSTINFO_SHARE MUSTEXCEL SHOULDINTERNET_CONNECTIVITY MUSTIRAP MUSTFILT SHOULDMICF SHOULDONKO MUSTPAFW SHOULDPEOPLESOFT_HUMAN_RESOURCES MUST PERSONEELSDIENST_SHARE MUSTPNPEPAFW_REPORTGROUP MUSTTELEFONIE SHOULD....
35 top levels defined by the FA
ITSERVICE 35 top levels
685 dependencies
gives 685 dependences for this itservice
ASSET
slide 24
Extra availability
Extra availability is a period outside the normal availability hours when you want to make use of the ITService
e.g. Extra work needs to be done on Saturday
e.g. No changes on related ITServices because the financial year closure takes place the first 2 weeks of April
e.g. Next week project H59A asks full exclusivity for changes because of the size of the project
e.g. A demo will take place at the fair this weekend
slide 25
Frozen period
During the whole month of December we reduce the amount of changes to an absolute minimum for the complete Colruyt Group
because:
-This period is too crucial to take risks for the Colruyt Group(each change is a risk…)
- We notice that a yearly ‘rest of our IT’ is good for stability
slide 26
The processes…
Service Design & Management
Operation bridge
Incident Event
Problem
Continuity Capacity
COST
SLA Management
Availability
Service development & deployment
Build & test
Release to Production
Business IT Alignment
IT Strategy development
Business Assessment Customer
Management
Service Planning
Request Fulfilment
Configuration
Change
slide 27
IncidentWhat
• An incident is an event caused by a disruption or a reduction in quality of a service
slide 28
IncidentGoal
- Return as soon as possible to the ‘normal situation’ so the end user can continue doing his job
- Minimise the negative impact on the business operation
It is not the goal of incident to fix the problem in a permanent way Cost vs benefit
An incident is fixed when the EU can continue with his work and when he agrees with the proposed solution
slide 29
slide 30
Information Request
Information Requests are handled by the
Key user of the application on business side
slide 31
DisasterEscalation of an incident
• Prio1 and 2 incidents can be escalated to disaster by helpdesk
• Escalated incidents are evaluated by a disaster coordinator
• Not every escalated incident results in a disaster!
• The disaster coordinator coordinates the disaster until the incident is under control
• Tools : Adobe connect, disastertel, disaster room
slide 32
Request FulfilmentWhat
• Handles standard IT requests (computer, keyboard,
software, hardware, mobile devices,...) of an end user
• <> INCIDENT!
slide 33
Request FulfilmentHelp
• Link @ Portal to Servicedesk
slide 34
Event MonitoringWhat
Monitors all events that occur throughout the IT infrastructure, to monitor normal operation and to detect and escalate exception conditions
We have :
– Passive monitoring: Detects operational events configuration item (asset)
– Active Monitoring: Active testing of a health status of a configuration item (asset)
slide 35
Event Management
Collecting Snmptraps,Application & System LogMonitoringSystem MessagesMail2ITO
How does ITO works?
Processing FilteringPriorityGroupingThreshold
Acting Automatic ActionsOperator Initiated ActionsIncident ManagementNotification (SMS)
slide 36
Automatic Actions
Operator Initiated Actions
Fixes
Workarounds
Filter
Event Overview
OPERATIONS
SUPPORTTEAM
Monitoring Strategy
MACHINES
ITO
HELPDESKCONFIG
INCIDENT PROBLEM
APPLICATIONS
END USERS
INCIDENT
CHANGE
slide 37
ProblemWhat
Problem management is focused on:• Solving the underlying cause of a incident
“How can we avoid this?”• Ideas from the end user• Managing problems that you deliberated not to fix
• status REJECTED!
active & proactive
slide 38
Questions
Thanks