Problem Management Best Practices

2
Problem Management Best Practices www.techexcel.com There should be a designated Problem Manager whose responsibility is to idenfy problems during daily operaons as well as through historical reporng that shows recurring incidents. Depending on the size of your organizaon, this may not be a full me job, but is a necessary role. Addionally, the Service Desk Manager should be in direct communicaon with the Problem Manager, as he or she will likely be the first alerted when a cluster of Crical or High Priority incidents are opened. The primary objecves of Problem Management are: 1) To uncover a diagnosis of the root cause of the problem 2) To provide either a temporary fix or workaround to the problem 3) To control the error by leaving the fix in place or permanently repairing the condion Step 1: Define Roles and Responsibilities Step 2: Focus on Root Cause Tip: Schedule regular incident reviews. Create a weekly meeng to review all incidents where the root cause was not removed. Sample Problem Management Process TechExcel ServiceWise includes a graphical workflow editor. With this editor, organizaons may ‘draw’ their process into place. To the right is an example of how an organizaon might choose to implement the problem mangement process. Create Problem Record Change Implemented Classify Problem Request Change Problem Resolved Known Error New New Problem Classificaon Invesgate Provide Workaround - Known Error Pending Change Problem Resolved or Permanent Workaround Invesgaon and Analysis Further Invesgaon Create a documented process for Root Cause Analysis that describes what techniques will be used. These can include brainstorming, Ishikawa diagrams, Causal Mapping or any other technique that successfully uncovers the underlying cause. This process should be “group think”, and the group composed of representaves from any possible area of breakdown. Once a root cause and a workaround are in place, a problem becomes a “known error.” The workaround should be communicated to all end-users who have submied an incident and the incidents placed in a “resolved” status. The Problem record should be in a “known error” status. Addionally, the known error and workaround should be published to the knowledgebase for resoluon at the Service Desk. Connue to open related incidents as reported and link them to the problem record, but if the published workaround has been implemented with the end-user, the newly related incidents should be in a “resolved” state. This should stop SLA calculaons against the incidents, but will not allow full closure unl the problem is resolved and closed. Once the environment has calmed down and producvity restored to the end-users through the workaround, Problem Managers must decide if permanently fixing the root cause is economically viable or if the workaround should become permanent. Step 3: Make a “Known Error” Known Problem Management (PM) is one of the components in the ITIL Service Support area. The primary focus of PM is to idenfy causes of service issues and commission correcve work to prevent recurrences. PM processes are both reacve and proacve - reacve in solving problems in response to incidents, and proacve in idenfying and solving potenal incidents before they occur. A Step-by-Step Guide ITIL Step Step by

description

Problem Management (PM) is one of the components in the ITIL Service Support area. The primary focus of PM is to identify causes of service issues and commission corrective work to prevent recurrences. PM processes are both reactive and proactive - reactive in solving problems in response to incidents, and proactive in identifying and solving potential incidents before they occur.

Transcript of Problem Management Best Practices

  • 1. Problem Management Best Practices StepITIL by Step A Step-by-Step GuideProblem Management (PM) is one of the components in the ITIL Service Support area. The primary focus of PM is to identifycauses of service issues and commission corrective work to prevent recurrences. PM processes are both reactive andproactive - reactive in solving problems in response to incidents, and proactive in identifying and solving potential incidentsbefore they occur. Step 1: De ne Roles and ResponsibilitiesThere should be a designated Problem Manager whose responsibility is to identify problems during daily operations as wellas through historical reporting that shows recurring incidents. Depending on the size of your organization, this may not be afull time job, but is a necessary role. Additionally, the Service Desk Manager should be in direct communication with theProblem Manager, as he or she will likely be the rst alerted when a cluster of Critical or High Priority incidents are opened.The primary objectives of Problem Management are:1) To uncover a diagnosis of the root cause of the problem2) To provide either a temporary x or workaround to the problem3) To control the error by leaving the x in place or permanently repairing the condition Sample Problem Management Process Step 2: Focus on Root Cause New Create a documented process for Root Cause Analysis that TechExcel ServiceWise includes a describes what techniques will be used. These can include graphical workow editor. With brainstorming, Ishikawa diagrams, Causal Mapping or any this editor, organizations may Create Problem draw their process into place. other technique that successfully uncovers the underlying cause. To the right is an example of Record how an organization might This process should be group think, and the group composed of choose to implement the representatives from any possible area of breakdown. problem mangement process. New Problem Tip: Classify Schedule regular incident reviews. Create a Problem weekly meeting to review all incidents where the root cause was not removed. Classication Step 3: Make a Known Error Known Investigate Once a root cause and a workaround are in place, a problem becomes a known error. The workaround should be Further communicated to all end-users who have submitted an incident Investigation Investigation and Analysis and the incidents placed in a resolved status. The Problem record should be in a known error status. Additionally, the Request Provide Workaround - known error and workaround should be published to the Change Known Error knowledgebase for resolution at the Service Desk. Continue to open related incidents as reported and link them to the problem record, but if the published workaround has been implemented Pending Known Change with the end-user, the newly related incidents should be in a Error resolved state. This should stop SLA calculations against the incidents, but will not allow full closure until the problem is Change Problem Resolved or resolved and closed. Once the environment has calmed down Implemented Permanent Workaround and productivity restored to the end-users through the workaround, Problem Managers must decide if permanently Problem xing the root cause is economically viable or if the workaround Resolved should become permanent. www.techexcel.com
  • 2. Problem Management Best Practices A Step-by-Step GuideStep 4: Weigh the ROIIf the return on investment (ROI) for repairing a root cause will not be achievable in six months, consider leaving theworkaround in place. If the repair of the root cause is feasible or necessary regardless of length of ROI, theProblem Manager and assignees may have to initiate a Request for Change (RFC). This change record is governed by theChange Management process and the same way incidents are linked to problems, a problem should be linked to the RFC.When the RFC is successfully implemented and closes, it will in turn allow the Problem record to be closed and any associatedincidents will be closed. Step 5: Focus on Root Cause Dont automatically close Problem records when an RFC is complete. They should be reviewed by the Problem Manager to assure that any workaround in place is backed out, if necessary, in order to eectively use the changed conguration item. Additionally, this allows for total contact ownership and customer satisfaction. Tip: Identify actual causes vs. blame: Even though actual cause can sometimes be attributed to human error, the real cause may be due to lack of understanding of the process or training.Step 6: Be Customer-centricFocus on customers, not infrastructure. The tendency is to focus on the most troublesome infrastructure. However,the goal of eective IT Service Management is to focus on customers. To this end, Problem Managers should sort recurringincidents by line of business and address the business unit with the most issues. An example of how Best Practice Problem Management might ow: 1) Many users/customers lose their email connectivity and submit incidents 2) The Service Desk Manager alerts the Problem Manager and a problem record is opened and assigned to a technical resource (assignee) with the incidents linked to it 3) The root cause is determined to be a particular Exchange server that has a bad internal power supply 4) The workaround is to temporarily recongure any end-user of that server to use another Exchange server 5) The Problem Manager/assignee initiates an Emergency Change Request and gets the okay to recongure 6) The workaround is distributed to the users/customers with incidents as well as any other users of the server who have not yet contacted the Service Desk 7) Linked incidents are put into a resolved state 8) The workaround and associated problem number are distributed to the Service Desk for any new callers 9) The Problem Manager/assignee determines the cost of the new power supply and repair can be recovered in 1 month. 10) The Problem Manager initiates a Request for Change (RFC) to repair the Exchange Server 11) The RFC goes through the Change Management process and is successfully implemented. The last task of this change is to return the end-users to the original Exchange Server 12) The Problem Manager is notied of the successful change and closes the problem 13) The associated incidents are either closed or, if the company practices Total Contact Ownership, the usres/customers are contacted by the Service Desk to verify that they have changed back to the original server and they are satised that the incident can be closed. www.techexcel.com