RB ITSMbyException WP

12
Table of Contents I. The Layers of Enterprise Management II. Integrating Between Management Layers  SNMP-Managed Networks  Systems Management Monitoring and Notification III. 12 Unique Things to Manage o n IBM i IV. Easier Ways to Meet SLAs  Adaptive Analytics  Enterprise Monitoring Agents  Systems Management Solutions V . Robot vs. Enterpri se Monito ring Agents VI. The Benefits of Robot Systems Management Solutions VII. Conclusions I. The Layers of Enterprise Management Lots of things come in layers: cold weather clothes, cakes, even companies. A layered approach toward managing all key areas of operation in complex environments ensures that there is sufcient coordination and escalation between different groups of stakeholders when providing IT service. Many small and midsized enterprises (SM Es) on IBM Power Systems running IBM i accomplish this effectively and efciently using a single layer of systems management solutions. Larger enterprises, however, require more layers of management, with each layer optimized for a specic purpose and notifying up. While SNMP is often used to exchange data between layers, tickets generated at the top layers record and track problems to their resolution. This paper explores the continuing role of systems management software for stopping tickets at their source and managing operations by exception within multi-layered enterprise management environments. First, let’s take a look at the top layer in most large enterprises, Business Services Management (BSM). BSM is a set of software tools, processes, and methods that help manage IT from a business-centric approach. This layer interfaces with customers and business units to ensure that IT always remains aligned with business objectives. The middle layer of management is IT Services Management (ITSM), which focuses on the quality of IT services across all IT domains in the enterprise, as well as the relationship between IT and customers. ITSM takes its priorities from BSM with a shared purpose of ensuring that IT is supporting business objectives. In some organizations, ITSM and BSM are combined into a single layer . Within large enterprises, IBM i is typically considered a domain, which can be either platforms or applications. Each domain has a common set of functional areas to monitor, terminology, and functionality. You’ll nd our nal layer, systems management software such as the Robot solution, deployed on each IBM i domain to allow improvements to quality of service and cost reductions, as well as monitoring on IBM i. While only certain details such as active/inactive, busy/ not busy, problem/no problem, and full/how full can be monitored when viewed from higher levels (i.e., ITSM/ BSM), the systems management layer is an expert in ITSM by Exception It’ s Just the Ticket!  © 2014 HelpSystems. All trademarks and registered trademarks are the property of their respective owners. Robot | A Division of HelpSystems | www.helpsystems.com United States: +1 952 -933-0609 | Outside the US: +44 (0 ) 870 120 3148 Pg. 1

Transcript of RB ITSMbyException WP

Page 1: RB ITSMbyException WP

7/27/2019 RB ITSMbyException WP

http://slidepdf.com/reader/full/rb-itsmbyexception-wp 1/11

Table of ContentsI. The Layers of Enterprise Management

II. Integrating Between Management Layers

  SNMP-Managed Networks

  Systems Management Monitoring

and Notification

III. 12 Unique Things to Manage on IBM i

IV. Easier Ways to Meet SLAs

  Adaptive Analytics

  Enterprise Monitoring Agents

  Systems Management Solutions

V. Robot vs. Enterprise Monitoring Agents

VI. The Benefits of Robot Systems

Management Solutions

VII. Conclusions

I. The Layers of Enterprise ManagementLots of things come in layers: cold weather clothes,

cakes, even companies. A layered approach toward

managing all key areas of operation in complex

environments ensures that there is sufficient

coordination and escalation between different groups

of stakeholders when providing IT service. Many small

and midsized enterprises (SMEs) on IBM Power Systems

running IBM i accomplish this effectively and efficiently

using a single layer of systems management solutions.

Larger enterprises, however, require more layers ofmanagement, with each layer optimized for a specific

purpose and notifying up.

While SNMP is often used to exchange data between

layers, tickets generated at the top layers record and

track problems to their resolution. This paper explores

the continuing role of systems management software

for stopping tickets at their source and managing

operations by exception within multi-layered enterprise

management environments.

First, let’s take a look at the top layer in most large

enterprises, Business Services Management (BSM). BSM

is a set of software tools, processes, and methods that

help manage IT from a business-centric approach. Thislayer interfaces with customers and business units to

ensure that IT always remains aligned with business

objectives.

The middle layer of management is IT Services

Management (ITSM), which focuses on the quality of IT

services across all IT domains in the enterprise, as well

as the relationship between IT and customers. ITSM

takes its priorities from BSM with a shared purpose of

ensuring that IT is supporting business objectives. Insome organizations, ITSM and BSM are combined into

a single layer.

Within large enterprises, IBM i is typically considered a

domain, which can be either platforms or applications.

Each domain has a common set of functional areas to

monitor, terminology, and functionality. You’ll find our

final layer, systems management software such as the

Robot solution, deployed on each IBM i domain to allow

improvements to quality of service and cost reductions,as well as monitoring on IBM i.

While only certain details such as active/inactive, busy/

not busy, problem/no problem, and full/how full can be

monitored when viewed from higher levels (i.e., ITSM/

BSM), the systems management layer is an expert in

ITSM by ExceptionIt’s Just the Ticket! 

© 2014 HelpSystems. All trademarks and registered

trademarks are the property of their respective owner

Robot | A Division of HelpSystems | www.helpsystems.com

United States: +1 952-933-0609 | Outside the US: +44 (0) 870 120 3148

Pg. 1

Page 2: RB ITSMbyException WP

7/27/2019 RB ITSMbyException WP

http://slidepdf.com/reader/full/rb-itsmbyexception-wp 2/11

the IBM i domain. Monitoring your IBM i domain using

systems management tools gives you a much more

granular and proactive view, plus the ability to notify up

to the ITSM/BSM layers. The most sophisticated systems

management tools also have management capabilitiesthat proactively improve quality at the system level.

Within the top layers of enterprise management, tickets

are used as running reports on particular problems,

their status, and other relevant data that can be used to

help resolve the problem. In the lowest layer, however,

there typically are no tickets because the priority

here is to prevent tickets from happening in the first

place. Notification up through the layers occurs only if

problems cannot be solved at this lowest layer, and thenresult in tickets.

II. Integrating Between Management Layers

While monitoring takes a more reactive approach

to system events—you wait until there is a negative

event and then receive notification—it’s an integral

part in communication between the layers of enterprise

management. Monitoring alone does not implement

solutions on your system. In fact, the administrator often

has to work from experience in order to know whether

the monitoring tool is showing a problem or a normal

situation. But thanks to monitoring, stakeholders within

each layer can view status information on dashboards

or receive immediate automated alerts via email/SMS.

Any event or condition that affects quality of service

(QoS) and cannot be resolved automatically by systemsmanagement software gets notified up to the ITSM layer

where a ticket is created for manual corrective actions.

ITSM in turn notifies BSM of unresolved events within a

certain time window. The notification is most frequently

achieved with SNMP, and all good monitoring tools

support SNMP integration.

SNMP-Managed Networks

Simple network management protocol (SNMP) is an

internet standard protocol originally designed to help

automate the management of devices on IP networks.

SNMP has become a standard way for ITSM/BSM

solutions to receive information about negative events

occurring in the SNMP-managed network.

A managed IBM i domain is an IBM i server or application

being monitored in an SNMP-managed network. Events

occur constantly on each IBM i managed domain.

Systems management solutions like Robot must be used

to convert IBM i events to SNMP traps; the OS does not

do this automatically.

© 2014 HelpSystems. All trademarks and registered

trademarks are the property of their respective owners

Robot | A Division of HelpSystems | www.helpsystems.com

United States: +1 952-933-0609 | Outside the US: +44 (0) 870 120 3148

Figure 1: Large enterprises need multiple management layers whereas SMEs

typically do not.

Figure 2: Any unresolved event that could affect QoS is notified up to thelayer above.

Messages

Resources

Logs

Batch Job Schedules

Agent Jobs

Mul7-System

Interac7ve Jobs

Storage

Backup & Recovery

 R  o b  o t   S  y  s  t   e m s 

 M a n a g e m e n t   (   S  M )  

IT Technicians

Manual Correc7ve Ac7ons

 I   T  S  e r  v  i   c  e

  M a n a g e m e n t   (   I   T  S  M )  

B  u s  i   n e s  s  S  e r  v  i   c  e

  M a n a g e m e n t   (  B  S  M )  

IT Management

Manual Correc7ve Ac7ons

Customer Business Units

Manual Correc7ve Ac7ons

Domain specific

no5fica5ons

(SNMP)

IT specific

no5fica5ons

(SNMP)

Proac5vely

Monitored

Func7onal Areas

Spooled Output

Performance

SNMP Traps

Events which

could impact

quality of

business service

Events which

could impact

quality of IT

service

Automated 

Correc7ve

Ac7ons

Monitor

DashboardsEmail/SMS

Alerts

Automated

Correc7ve

Ac7ons

Automated 

Correc7ve

Ac7ons

 W i   n d  o w s 

 U N I   X 

 I  B  M i  

Monitor

DashboardsEmail/SMS

Alerts

Monitor

DashboardsEmail/SMS

Alerts

 M a i   n f   r  a m e

IBM Power Systems

IBM i Domain

Pg. 2

Page 3: RB ITSMbyException WP

7/27/2019 RB ITSMbyException WP

http://slidepdf.com/reader/full/rb-itsmbyexception-wp 3/11

© 2014 HelpSystems. All trademarks and registered

trademarks are the property of their respective owners

Robot | A Division of HelpSystems | www.helpsystems.com

United States: +1 952-933-0609 | Outside the US: +44 (0) 870 120 3148

An event is something that happens which can be

monitored, indicates a problem, or is just information

(e.g., a file fills up and needs a response to a message

to allow it to grow, causing part of the business ERP

to freeze pending a response). Software componentscalled enterprise monitoring agents (EMAs) run on

the managed IBM i domains and periodically collect

information about events. The EMAs then initiate

SNMP messages called trap messages as a response to

each event.

The data in an SNMP trap message can be read as a

set of variables using an external structure called a

management information base (MIB). For example, the

MIB may describe that the first 10 characters of theSNMP trap message refer to an IBM i system name. A

manager is a server used to collect and process SNMP

trap messages. A software component called a network

management system (NMS) runs on the manager,

receives SNMP trap messages, and executes applications

that monitor and control the managed domains (i.e.,

they execute components of the ITSM application). The

ITSM application receives the information about each

event from the NMS, triages it, and generates tickets

that are then routed to the right operator, manager, orsubject matter expert for manual resolution.

Systems Management Monitoring

and Notification

The most sophisticated systems management tools

offer multiple ways to display information about

system events and raise awareness. For monitoring,the Robot solution features message centers. Much

more efficient than message queues, these message

centers are graphical displays optimized for different

types of users. Messages in message centers have been

filtered, summarized, suppressed, and color-coded and

automated responses have been applied to most.

The Robot systems management solution also offers

different centers for performance, a map center for high-

level status, and a product metric center to show thelevel of automation achieved. Operators and managers

look at these centers regularly to get large volumes of

information quickly.

Notification from Robot systems management tools

find the right technical person when a message or

event appears. It doesn’t matter if you’re not looking at

a message center, they’ll find you online. Robot alerts

arrive as emails or SMS, delivered to correctly identified

operators and managers. Messages become alertsif they are very important, if an event needs manual

reactions, or if an event has not been acknowledged

within a certain time. Alerts can include attachments

to help explain the cause and allow the recipient to

respond by email/SMS. You can send alerts to broadcast

lists and escalation lists if a response is not received

from the first recipient within a certain timeframe.

Alerts assume that you do not have staff just staring at

consoles watching for errors.

Figure 3: SNMP has become the standard way for I TSM/BSM solutions to receive

information about unresolved events.

Pg. 3

Page 4: RB ITSMbyException WP

7/27/2019 RB ITSMbyException WP

http://slidepdf.com/reader/full/rb-itsmbyexception-wp 4/11

© 2014 HelpSystems. All trademarks and registered

trademarks are the property of their respective owners

Robot | A Division of HelpSystems | www.helpsystems.com

United States: +1 952-933-0609 | Outside the US: +44 (0) 870 120 3148

III. 12 Unique Things to Manage on IBM i

Where monitoring is reactive, management implies

taking a more proactive approach on your system.

By incorporating a systems management layer to

your enterprise management strategy, you helpprevent negative events from happening. Invariably,

some negative events will occur, but the best

systems management tools adopt a solve-at-source

approach. These tools maintain a repository of pre-

defined, automated, and adaptive solutions that

are matched to negative events automatically and

in real time. Significant functionality is devoted to

anticipating commonplace, recurring negative events

and implementing automated and adaptive solutions

to minimize escalation. Customizable rules providefiltering and automation to align and uphold Service

Level Agreements (SLAs) and business needs.

IBM i has a unique set of functional areas to manage,

plus unique terminology and functionality that differs

from Windows, UNIX, Linux, or mainframe. IBM i

is often a cornerstone domain to the provision of

business services, running mission critical business

applications such as accounting, banking, logistics,

warehousing, manufacturing, insurance, sales, retail,and telecommunications. Clearly, it should be a high

priority to manage this domain well.

So, what’s unique to manage on each IBM i?

1. Messages

IBM i is constantly producing messages. Many events that

trigger messages are a proactive way for the operating

system to tell operators what’s happening. However, it’s

not always clear just by looking at a single messagewhether the event is negative. A message stating that

a job finished normally may seem like a good thing, but

if that job finished after five minutes when it normally

runs for an hour, it could be bad. It’s typically necessary

to analyze more than just one message to understand

an event. Some messages demand a response and slow

or incorrect responses could lead to problems. If a file

has reached its maximum size, for example, a message

is issued to the operator asking what to do now. Type

the wrong response and you may end up canceling an

important job accidentally or causing an unimportant

 job to continue, fill ing the disk with unnecessary data.

For many large enterprises, there are simply too many

messages to handle manually, which is where message

management tools like Robot/CONSOLE come in to

automate message monitoring, filtering, responses,

and escalation.

2. Resources

There are hundreds of resources on most enterprise

servers, including subsystems, jobs, job queues, devices,lines, WebSphere MQ servers, Domino servers, and

more. Many of these resources must be in the correct

state at the correct time for business applications to

function properly, but most resources do not generate a

message saying they are not in the correct state—you

don’t get a message when a subsystem isn’t active or a

 job didn’t run.

It’s necessary to understand what status every critical

resource must have, at what time, and to proactivelyverify that it is so. This area is referred to as work

management on IBM i, and it is a major area that ITSM/

BSM tools have no clue about. Here again, resource

monitoring tools like Robot/CONSOLE keep an eye on

things around the clock to keep your critical systems

and applications available.

3. Logs

The history log, system audit log, and communications

logs contain important information that could givewarning when a negative event is happening. Log

entries typically do not generate messages and manually

looking at logs can be a mind-numbing task. Systems

management tools proactively and continuously scan

logs for important entries as they occur and take

immediate action.

Pg. 4

Page 5: RB ITSMbyException WP

7/27/2019 RB ITSMbyException WP

http://slidepdf.com/reader/full/rb-itsmbyexception-wp 5/11

© 2014 HelpSystems. All trademarks and registered

trademarks are the property of their respective owners

Robot | A Division of HelpSystems | www.helpsystems.com

United States: +1 952-933-0609 | Outside the US: +44 (0) 870 120 3148

4. Batch Job Schedules

Much of the workload on an IBM i server is batch.

Traditional time-based job schedulers such as the

IBM Job Scheduler do not anticipate problems so the

schedules are vulnerable to relatively small hiccups.Something as simple as a job running too long can

upset several jobs ahead of it. Implementing event-

based schedulers like Robot/SCHEDULE enables your

schedules to automatically adapt when the unexpected

happens, minimizing the risk of a negative event.

5. Agents

Jobs running on IBM i can have dependencies on other

systems. Those dependencies could be to other IBM i

systems or Windows, UNIX, or Linux systems and mostoften involve file sharing. Natively, IBM i does not see

much outside of its own, single system environment.

Without native multi-system capabilities, it can’t be

sure if a dependent job has run or not. This could lead to

some serious business problems unless you implement

a system of agents that run on remote systems,

communicate with jobs on IBM i, and coordinate

activities in a harmonious way. Robust job scheduling

software like Robot/SCHEDULE Enterprise automates

file transfers and helps you manage your entire cross-platform job schedule centrally from your IBM i.

6. Multi-System Environments

Some business applications require that multiple logical

partitions (LPARs) and even multiple distributed IBM

i systems behave as if they were one homogenous

environment. This is increasingly necessary in large

enterprises where departments, companies, and

countries find themselves as a subcomponent working

within one LPAR of a multi-LPAR and multi-system

environment. Managing such an extensive environment

manually is complicated, but implementing a multi-

system management technology hides this complexity.

Robot/NETWORK, for example, consolidates performance

data across one or multiple IBM i partitions and readily

displays it on your mobile device.

7. Resource-Heavy Interactive Jobs

Some jobs that require high levels of CPU and IO

(report jobs, for example) are not designed within

an application to run in batch. Allowing users to run

these interactively and on an ad hoc basis can lead toperformance problems. You may work at a company

with an application developed in-house that allows

your sales team to produce sales orders interactively.

At peak times of day you may have multiple users on

the system producing resource-heavy sales orders

interactively, most of them a duplicate effort. Invariably,

this leads to complaints from all departments that the

system is running too slow.

You can solve the issue by implementing a system thatintercepts such requests and converts the job types

from interactive to batch and schedules them at regular,

off-peak times to prevent performance problems. Smart

systems management tools like Robot/REPLAY can also

save you time and eliminate errors in interactive jobs

by capturing your keystrokes in dynamic fields and

automatically repeating your processes in future jobs.

8. Storage

Storage fills up naturally over time with unnecessary oroversized objects that impact performance and could,

in severe cases, crash the system. Anecdotally, let’s

say Company A is complaining that their system runs

slowly and periodically crashes without reason. When

they restart the system, it runs okay for a while and

then crashes again. A quick look at the storage shows

that the disk utilization is constantly running above

95 percent full. Apparently, Company A feels they have

5 percent left. After running some storage analysis

tools, you discover that they never deleted sales invoice

spooled files—ever! They have over 15 years of invoices

on disk that they’re keeping just in case they need to do

a reprint, which they admit happens rarely.

Pg. 5

Page 6: RB ITSMbyException WP

7/27/2019 RB ITSMbyException WP

http://slidepdf.com/reader/full/rb-itsmbyexception-wp 6/11

© 2014 HelpSystems. All trademarks and registered

trademarks are the property of their respective owners

Robot | A Division of HelpSystems | www.helpsystems.com

United States: +1 952-933-0609 | Outside the US: +44 (0) 870 120 3148

Collecting data about how your storage is being

consumed, including temporary storage, is essential to

smart systems management. Products like Robot/SPACE

even produce graphs and projections to make sure

storage stays at healthy levels and run regular cleanuproutines to reduce storage utilization automatically.

9. Backup and Recovery

Saving to tape or virtual tape is your last line of defense

against a disaster. Manual methods involve a lot of work

after business hours with high potential for human error.

Tapes contain your company’s most valuable asset—

its data—and are typically taken off-site. Automated

backup and recovery solutions can eliminate long, ugly

hours for operators, even during restricted state saves.Robot/SAVE even includes encryption so your data is

protected when it’s off-site.

10. Spooled Output

Users rely on spooled output to run the business, and

they are getting pickier about how they receive spooled

output—everyone wants it a different way and most

want to view it on their laptops or tablets. Spooled

files contain sensitive company information and are

difficult to reproduce once they are lost. It’s importantto automate spooled output management to ensure

the information is automatically archived to the proper

media or distributed securely. Robot/REPORTS also adds

flexibility to report handling processes by automatically

bursting IBM i reports into segments for distribution to

the end users that need them.

11. Performance

Business application performance often impacts business

performance. Systems can be tuned manually to maximize

response times during the day and batch throughput

overnight, but this is a time-consuming, specialized task.

Get it wrong and an immediate negative event may result

in many tickets. Luckily, smart systems management tools

like Robot/AUTOTUNE automatically and dynamically

tune systems based on the actual workloads at any given

time and the business priorities that have been set.

12. SNMP Traps

Simple network management protocol (SNMP)

allows the capture of messages from any connected

network devices, including routers, switches, servers,

workstations, printers, modem racks, time clocks, andmore. It’s also a way for IBM i to communicate events

out. Communicating with all devices essential to the

correct functioning of your business applications is

vital, and SNMP is a good way of doing that. This area

helps to tie domain-level monitoring with ITSM/BSM,

provided only exceptions are sent forward.

Managing by exception is where Robot solutions excel.

Combining Robot/CONSOLE with Robot/NETWORK

and Robot/ALERT allows you to centralize messagemanagement and resource monitoring, distribute

product instructions and objects (such as automation

instructions), and monitor resources and system logs

throughout your network of IBM i systems. And of course,

you can notify up through your layers of enterprise

management via text, email, or SNMP messages.

IV. Easier Ways to Meet SLAs

In recent years, large enterprises have faced new

challenges when it comes to meeting Service LevelAgreements (SLAs). Fueled by new innovations such as

cloud-as-a-service computing, sophisticated enterprise

environments are increasingly dependent on virtualized

environments managed by multiple entities (internal

and external), thus creating silos. IT resources are

increasingly scarce in proportion to the challenges

at hand.

Pg. 6

Page 7: RB ITSMbyException WP

7/27/2019 RB ITSMbyException WP

http://slidepdf.com/reader/full/rb-itsmbyexception-wp 7/11

© 2014 HelpSystems. All trademarks and registered

trademarks are the property of their respective owners

Robot | A Division of HelpSystems | www.helpsystems.com

United States: +1 952-933-0609 | Outside the US: +44 (0) 870 120 3148

As a result, the number of tickets is overwhelming IT

operations. Individual tickets don’t provide sufficient

insight into service issues. Too often tickets fail to notify

IT about problems before there is a business impact.

Escalations keep IT dependent on subject matter

experts. Since the silo model is growing in these larger

organizations, tickets no longer facilitate essential

collaboration across IT and its external partners.

The consequence may be that end users are the first

to identify the existence of a problem that affects

the business.

SLAs are part of a service contract between the

customer and the service provider(s) where a service

is formally defined. An SLA referring to an IBM i

domain will typically have a technical definition that

outlines measurable details for any of these unique

IBM i management categories and others. While every

organization has its own methods for meeting SLAs,

we’ll discuss potential options.

Adaptive Analytics

To tackle growing issues, an innovative new approach

is being pioneered that looks set to disrupt the way

ITSM/BSM traditionally works. This technique is called

adaptive analytics. Adaptive analytics uses advancedprobability algorithms combined with natural language

processing techniques to infer the existence of business-

impacting problems by analyzing tickets. Adaptive

analytics uses natural language searches to link tickets

together across different silos and identify which tickets

are related and which business areas are going to be

impacted. This means various IT stakeholders from

different silos, support staff, subject matter experts, and

customers can come together faster to tackle a problem

collaboratively before there is a business impact.

Adaptive analytics also promises to help consolidate

and reduce the overall number of tickets within ITSM/

BSM systems.

Although this is an important development, it does

not change the need to tackle all events on the IBM

i domain at the source and in real time in order to

prevent them from becoming tickets in the first place.

The fewer tickets adaptive analytics has to analyze in

the ITSM system, the lower the chance for problems that

impact business.

Enterprise Monitoring Agents

The enterprise monitoring agent (EMA) approach

collects event statuses periodically, which means many

events can happen before they are captured. As there

is no automated resolution, all collected events are

escalated, which means support teams get a lot of

tickets that are already delayed by the time they arrive.Each ticket must then be escalated to the right operator

for a manual solution, which introduces yet another

delay. These delays and manual actions make it difficult

to meet SLAs, particularly when ticket volumes increase

and support staff headcount is limited, and gives

management a good idea of when and why SLAs were

compromised but contributes little toward preventing

the compromise in the first place.

Figure 4: Adaptive analytics promises to consolidate related tickets into groups.

Robot automatically solves issues and stops tickets from being generated in the

first place.

Pg. 7

Page 8: RB ITSMbyException WP

7/27/2019 RB ITSMbyException WP

http://slidepdf.com/reader/full/rb-itsmbyexception-wp 8/11

© 2014 HelpSystems. All trademarks and registered

trademarks are the property of their respective owners

Robot | A Division of HelpSystems | www.helpsystems.com

United States: +1 952-933-0609 | Outside the US: +44 (0) 870 120 3148

Systems Management Solutions

Using systems management solutions like Robot differs

from EMA in that it traps events and reacts in real

time (e.g., a job queue which usually has only two jobs

waiting suddenly has ten causing an unacceptable delayto processing key business ERP functions). There is no

delay between the event happening and an automated

solution. Non-essential events are suppressed

immediately and then pre-defined, and automated

solutions are applied to any recognizable, unresolved

events immediately. Robot filters out only those events

that require human action. As a result, fewer events

are escalated and they are escalated sooner. This

means fewer tickets; better, faster, and more consistent

solutions; and less work for the over-worked support

teams, making it easier to meet SLAs.

IBM i domains can be managed from a central

repository, better known as console consolidation.

This management by exception approach creates a

single status center for all IBM i partitions but still

converts events to an SNMP trap for the ITSM/BSM.

Many large organizations need stepping stones in

their organizations so that the domain-level expert can

immediately be notified and respond to events before

escalation to ITSM/BSM.

V. Robot vs. Enterprise Monitoring Agents

Consider how the enterprise monitoring agent approach

and the Robot approach might affect your SLAs. EMAs

are periodic collectors. They collect event data and

then they pass it to the ITSM layer, which generates aticket for a subject matter expert. EMAs do not apply

automated solutions, but sometimes EMAs are used in

place of automation software like Robot on IBM i.

Robot captures events and applies automated and

adaptive solutions in real time. Many events that

happen on IBM i have been seen before so automated

solutions can be defined as sets of rules. For example, if

too many jobs are waiting in a job queue, automatically

move some of them to another queue.

Both approaches to monitoring have advantages

and disadvantages, and these differ from platform to

platform. IBM i is more complex to monitor than some

other domains because it has so many important and

unique functional areas. We listed 12 of them already in

this paper. IBM i also runs multiple applications in the

same server so it has more processes and the potential

for more events than other servers.

Figure 5: It’s easier to meet your SLAs with Robot than with ITSM/BSM alone.

EventHappens

Delay

Fast

Enterprise Monitoring Agent (EMA)

Robot Systems Management

Time Taken to Implement Solution Slow

Delay

Filter  Escalate

100%CreateTicket

ManualSolution

ManyEscalations

EventHappens

Fast Time Taken to Implement Solution Slow

Trigger   Filter FilterAuto

SolutionEscalate<10%

FewerEscalations

CollectStatus

Pg. 8

Page 9: RB ITSMbyException WP

7/27/2019 RB ITSMbyException WP

http://slidepdf.com/reader/full/rb-itsmbyexception-wp 9/11

© 2014 HelpSystems. All trademarks and registered

trademarks are the property of their respective owners

Robot | A Division of HelpSystems | www.helpsystems.com

United States: +1 952-933-0609 | Outside the US: +44 (0) 870 120 3148

Enterprise Monitoring Agent

Custom code  – Most EMAs for IBM i are not a ready-to-

deploy solution. Instead you get a tool that enables you to

build a solution by writing custom code. This is expensive,

time-consuming, and often requires an external specialist.

High maintenance – Custom code takes time to maintain.

The more granular the information you are monitoring, the

more likely you will need to maintain/update/modify your EMA

when there are small changes to your environment. This can

lead to frequent, inconvenient, and expensive maintenance

activity, often through external experts. Installing new agents

on new partitions is often time-consuming.

Heavy overhead – EMAs are collectors. Collection typically

requires a lot of resource on the system. The more granular

the collected information, the larger the potential overhead.

You may need to oversize your systems to accommodate this

extra workload.

Low on detail – It’s not easy to get good, granular status

information from IBM i with an EMA unless you write a lot

of custom code. It’s impractical to do that very often, so you

end up with non-detailed statuses (e.g., it’s active/inactive ora percent of how busy). This does not give you much detail on

what caused a problem. You get alerted only when a big event

has already happened.

No corrective actions  – EMAs are not designed to man-

age IBM i, just to monitor it. Even after investing in building

EMAs, you’re still left to manually manage your IBM i and

correct problems yourself. EMAs require staff that has the skills

to build scripts to automate, which is a burden on change

management and the application development team.

Delivery lag – Most EMA collections can only happen for

practical reasons on a periodic check. The more frequently they

check the more overhead they consume, so there is usually a

lag between the time an event happens and the time it gets

captured. This means many important escalations are late.

Robot 

No custom code  – Robot does not require custom code

to monitor an environment. All key IBM i components have

already been considered and configuration is relatively quick

using a combination of a GUI and self-verification reports that

check how well you are doing.

Low maintenance  – Robot uses sharable configuration

objects that simplify and accelerate maintenance. Monitoring

logic is configured rather than programmed. You can change a

configuration object in one place and it can have a large effect.

Low overhead – Events are triggered rather than collected.

Triggering does not add significantly to the overhead because

these events are happening anyway. Also, there is no searching

for a status.

Lots of detail – All key functional areas to monitor are

built in. It’s easy to escalate an event, for example

when a high availability replication job is not active and

compromises recovery objectives. This type of granular

monitoring detail provides early warnings of coming problems

so it’s possible to take action before they impact business.

Automated corrective actions  – Automated corrective

actions take place instantly and escalations only take place

if automated corrective actions are not possible. This means

management by exception. Managing fewer events makes

a big difference. If you can correct nine out of ten events

automatically, you’re more likely to meet SLAs—and you’ll

have less work to do!

Real-time  – Most events are triggered when a message

arrives in a message queue or when a job starts/ends. Thereis no need to search for these events; you can react to them

instantly. Notification is quicker so potential corrective actions

can start sooner.

 Table 1: Robot handles events better than enterprise monitoring agents (EMAs).

This table summarizes the differences between the EMA approach and the Robot approach to monitoring.

Pg. 9

Page 10: RB ITSMbyException WP

7/27/2019 RB ITSMbyException WP

http://slidepdf.com/reader/full/rb-itsmbyexception-wp 10/11

© 2014 HelpSystems. All trademarks and registered

trademarks are the property of t heir respective owners

Robot | A Division of HelpSystems | www.helpsystems.com

United States: +1 952-933-0609 | Outside the US: +44 (0) 870 120 3148

VI. The Benefits of Robot Systems

Management Solutions

Large enterprises with multiple platforms and

increasingly virtualized environments typically need

multiple layers of management with each layerperforming a different function and interacting with a

different group of people. While the main role of ITSM/

BSM layers is to collect incoming events from multiple

silos, create tickets, triage and assign them to subject

matter experts, and then track their manual resolution,

Robot systems management solutions automate and

manage IBM i operations and prevent tickets from

being generated in the first place.

For over 30 years, Robot has been the standard forsystems management on IBM i. Used by over 2,100

companies in all industries, ranging from SMEs to

large enterprises, the Robot solution includes over 15

products that provide systems management capabilities

for all key areas of IBM i operations.

Robot systems management solutions automate IBM i

operations to reduce the occurrence of negative events

and to automatically correct repeat negative events at

the source, escalating only a handful of notifications forhuman response. Another benefit of the Robot systems

management solution is that the tools do not require

programmers for implementation. Their fill-in-the-

blank rules can be administered by the team that was

previously managing events manually.

Robot solutions are designed to anticipate problems

and have built-in logic to prevent minor events from

becoming major events. Robot includes sophisticated,

graphical alert consoles that allow managers to monitor

the status of multiple systems at a high level with directdrill-down to the underlying events. Every effort is made

to avoid an eyes-on-glass approach when tackling

individual events. If Robot needs an operator to get

involved, it automatically locates the right person based

on work schedules and subject areas and sends them

an email in real time. Messages can also be sent via

text or SMS, and some manual fixes can be completed

remotely. If the operator does not acknowledge, Robot

automatically escalates to the next person in line.

Robot also automates SNMP notification so any alert

can notify your ITSM/BSM via an SNMP trap. This is in

addition to or instead of an email or SMS, making it

easy to integrate Robot in an ITSM/BSM environment.

For more information about Robot, please visit

www.helpsystems.com.

Automated Job Scheduling

Mul4-SystemEnvironment

Support

Reports ManagementAdvancedMessage

No4fica4on

Disaster Recovery

Message Management

Performance Management

Figure 6: Robot covers all essential areas of systems management.

Pg. 10

Page 11: RB ITSMbyException WP

7/27/2019 RB ITSMbyException WP

http://slidepdf.com/reader/full/rb-itsmbyexception-wp 11/11

VII. Conclusions

On IBM i, Robot is faster and more effective at solving

negative events than ITSM/BSM alone. This is because

ITSM/BSM alone simply collect and forward event data

to be solved later, manually. Robot helps to avoid suchevents and, when they do occur, solves most of them

automatically and immediately at the source.

• SNMP messages are an industry standard way

of communicating between systems management,

ITSM, and BSM.

• Many SMEs can manage their IBM i systems

effectively with Robot alone, particularly if their

environment includes Robot/NETWORK, which helps

managers see an overview of systems management

activity across their entire IBM i network.

• Because IBM i has unique management

requirements that set it apart from Windows, UNIX,

Linux, and mainframe, Robot systems management

solutions are a critical part of enterprise management

architecture within large enterprises for all

IBM i domains.

• While adaptive analytics will become useful over

time in helping to consolidate tickets into logically

related groups, helping stakeholders collaborate on

manual solutions, it does nothing to prevent negative

events from happening on IBM i in the first place.

• Robot handles events better than enterprise

monitoring agents (EMAs). It exposes more detail,

consumes less resource, and avoids custom code,

making it easier to manage.

• Robot is the most effective way to tackle events

with a solve-at-source approach so fewer tickets

appear in ITSM/BSM, allowing you to truly manage

by exception.

© 2014 HelpSystems. All trademarks and registered trademarks are the property of their respective owners. (R021IE4)

About HelpSystem

HelpSystems is a leading provider of systems management

business intelligence, and security and compliance software

We help businesses reduce data center costs by improvin

operational control and delivery of IT servicesRobot | A Division of HelpSystems | www.helpsystems.com

United States: +1 952-933-0609 | Outside the US: +44 (0) 870 120 3148

The increasing popularity of

cloud-as-a service solutionsmay require a multi-layered

enterprise management

response. But beware. Skip

the systems management

layer and you could find

yourself drowning in tickets.

Pg. 11

Let’s Get Started

To set up a personal consultation, call 1 800-328-1000 

or email [email protected]. We’ll review your

current setup and see how Robot products can help you

achieve your automation goals.