HP Operations Manager i topology-based event correlation: concepts and operation

41
1 ©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice HP Operations Manager i Topology Based Event Correlation Concepts and Operation Dave Trout Hewlett-Packard Software and Solutions

description

This session will explore key values, concepts ,and operational details of Topology Based Event Correlation (TBEC) in HP Operations Manager i. We’ll show you how TBEC offers a powerful correlation capability that is self-adjusting to changes in the infrastructure and that requires much less administrative effort than other technologies on the market. We will examine the role of Event Type Indicators and Health Indicators, and we’ll explore the ways in which HP Universal CMDB topology is leveraged, rules are defined via a simple GUI, and the causal engine automatically chains rules together for cross-domain correlation. We’ll also show you how to control TBEC run-time behavior using configuration settings.

Transcript of HP Operations Manager i topology-based event correlation: concepts and operation

Page 1: HP Operations Manager i topology-based event correlation: concepts and operation

1 ©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

HP Operations Manager iTopology Based Event Correlation –Concepts and Operation

Dave TroutHewlett-Packard Software and Solutions

Page 2: HP Operations Manager i topology-based event correlation: concepts and operation

2

Agenda

– TBEC in action (What does it do?)

– Basic concepts of TBEC

– Using the Correlation Manager

– Automatic cross-domain correlation

– New TBEC features in OMi 9.0

– Summary

TBEC = Topology Based Event Correlation

Page 3: HP Operations Manager i topology-based event correlation: concepts and operation

3

TBEC in Action(video)

Page 4: HP Operations Manager i topology-based event correlation: concepts and operation

4

TBEC Basic Concepts

– Subtitle goes here

Page 5: HP Operations Manager i topology-based event correlation: concepts and operation

5

KPIs

Health Indicators

OMi Events with Event Type Indicators

OMi Events

BSM Service Health Hierarchy

OM SiSBPM RUM

NNM3rd Party

Mgrs

Page 6: HP Operations Manager i topology-based event correlation: concepts and operation

6

ETI Event Type Indicator

– ETI is an attribute of an event

– Indicates concise status of

managed infrastructure element

– Set based on a hint* in the event

or via server based mapping

filters

– Are defined per CI Type

• Only pre-defined ETIs are

evaluated when events arrive

• Valid for all derived CI Types

CI Type Example ETIs

Database OracleReadWriteError:Occurred

ArchiveMode:Enabled

MemorySortRate:Normal

OracleSessionCount:High

ReplicationStatus:Broken

SQLQueryPerformance:Normal

Node UnexpectedReboot:Occurred

BackupJob:Failed

PingAvailability:Unavailable

LogicalDiskFreeSpace:NearCapacity

MemoryUsageLevel:High

Router LinkStatus:Up

NodeState:Down

Custom Attribute “EventTypeIndicator” = “<ETI name>:<ETI value>”*

Page 7: HP Operations Manager i topology-based event correlation: concepts and operation

7

KPIs

Health Indicators

OMi Events with Event Type Indicators

OMi Events

Correlation Requires ETIs

Only Events with

Event Type

Indicators can be

correlated

Event Type

Indicators are

used to define

correlation rules

OM SiSBPM RUM

NNM3rd Party

Mgrs

Page 8: HP Operations Manager i topology-based event correlation: concepts and operation

8

Cause and symptom events

– Something goes wrong in your

environment

– Monitoring reports multiple

problems via events

– Usually just one of the events

describes the cause of the problem

– Others are just symptoms

– Fix the cause and also the

symptoms go away

In a nutshell, TBEC identifies CAUSE and SYMPTOM events

Page 9: HP Operations Manager i topology-based event correlation: concepts and operation

9

Topology: the “T” in TBEC

Cause and

symptoms are

one part of a rule

The other part is

the CI type

topology

Events are correlated if the topology and the Event Type Indicators are matching

Page 10: HP Operations Manager i topology-based event correlation: concepts and operation

10

Correlation requires relationship

Event1

Ping:Unavailable

Event3

Ping:Unavailable

Two events, cause and symptom not within the same topology => no correlation

Cause and symptom set by event AND cause and symptom within the same topology => events are correlated

Event2

LinkStatus:Down

Symptom

Cause

Page 11: HP Operations Manager i topology-based event correlation: concepts and operation

11

Time Window for Correlation

– Even if cause and symptom and the connecting topology match, events

might not be correlated

– Events have to arrive within a certain time window

– A time window starts when the first cause or symptom event arrives that cannot be

correlated with any other event

– Default time window is 16 minutes

– Each correlation rule can have its own time window which overrides the

global setting

Time

Event Event

Correlation Window

No correlation

Page 12: HP Operations Manager i topology-based event correlation: concepts and operation

12

A Simple Correlation Rule

What the rule defines:

– IF the system receives an event that

sets LinkStatus = Down

– AND IF the system receives an

event that sets Ping Availability =

Unavailable

– AND IF the Router and Computer

are somehow connected (topology)

– AND IF that happens at roughly the

same time

– THEN the system will mark the

LinkStatus Down event as CAUSE

and the Ping Availability

Unavailable event as SYMPTOM

Page 13: HP Operations Manager i topology-based event correlation: concepts and operation

13

TBEC Correlation Rules – Semantics

– A correlation rule shows possible cause-symptom relationships:

• If the two events happen within a defined window of time, then correlate. Otherwise do

nothing.

– A correlation rule does NOT say

• If I have that cause, then I will see that symptom (impact)

• If I see that symptom, then I must have this cause for it

– One cause can have multiple symptoms (and not all have to appear at

the same time)

Page 14: HP Operations Manager i topology-based event correlation: concepts and operation

14

Usage when defining rules

TBEC Cause/Symptom

– A correlation rule must include at least one CAUSE and one or more

SYMPTOMs

– Multiple CAUSE specifications are allowed if they reference the exact

same CI Type

– A SYMPTOM in one rule can be configured as a CAUSE in another rule

(and vice versa)

Page 15: HP Operations Manager i topology-based event correlation: concepts and operation

15

Correlation Engine behavior

TBEC Cause/Symptom

– A correlation rule triggers when a CAUSE event and any combination of

specified SYMPTOM events occur within the correlation time window

– CAUSE and SYMPTOM events can occur in any sequence within the

time window

– A rule which would otherwise mark an event as a SYMPTOM will be

ignored for the event if it is already marked as a SYMPTOM to a

different CAUSE event

– A duplicate CAUSE event which arrives during a correlation window is

correlated and handled like a SYMPTOM event

Page 16: HP Operations Manager i topology-based event correlation: concepts and operation

16

Browser-related behavior

TBEC Cause/Symptom

– If the lifecycle state of CAUSE event is changed:

• CAUSE and related SYMPTOM events are marked with the new state (e.g. “Work On”)

• CAUSE and related SYMPTOM events are assigned to the user

– If the CAUSE event is closed:

• All SYMPTOM events are also closed

– Additional SYMPTOM events which arrive after a CAUSE event is closed

will also be closed until the current correlation window expires

• Can be disabled in Platform settings

Page 17: HP Operations Manager i topology-based event correlation: concepts and operation

17

Correlation Window – Auto Extend Mode

Time

Symptom Event

Cause Event

CorrelationWindow

Auto Extend Mode = True(default)

Time

Symptom Event Cause Event

Extended Correlation Windows

Symptom Event

Auto Extend Mode = False

Correlation Window

Page 18: HP Operations Manager i topology-based event correlation: concepts and operation

18

TBEC Settings

Setting Default

Auto-Extend Time Window Mode true

Correlate Closed Cause Events true

Correlation Time Window (seconds) 960

Max Waiting Queue Size (events) 5000

Admin Platform Infrastructure settings Applications Operations Management:

Page 19: HP Operations Manager i topology-based event correlation: concepts and operation

19

OMi event pipeline

Indicator Manager

Correlation Manager

Event to ETI Mapping

Event Correlation

Content Manager

Event to CI Mapping

AdminView, create and modify correlation rules

Events

If configured, attach HI-Value to CI

KPI calculation

BSM Platform

HIValue

EventEvent

CI

ETIValue

Event

CI

Event

OMi Browser

CI HIValue

HIValue

KPIs

CI

Event

Event

Page 20: HP Operations Manager i topology-based event correlation: concepts and operation

20

Using the Correlation Manager

– Subtitle goes here

Page 21: HP Operations Manager i topology-based event correlation: concepts and operation

21

Correlation Manager

– Define, deploy, and manage correlation rules

– Visualize the topology of correlation rules

– View CAUSE and SYMPTOM events in rules

– View assigned and available Event Type Indicators and their values

– Browse the hierarchy of cross-domain correlation rules

– Access to Correlation Manager is controlled by user role settings

Page 22: HP Operations Manager i topology-based event correlation: concepts and operation

22

Correlation Manager UI

Causes and symptoms of selected rule

Available ETIs of selected

CI type

List of rules currently defined

CI type topology of selected rule

Page 23: HP Operations Manager i topology-based event correlation: concepts and operation

23

What you need to know

Creating TBEC rules

– Working knowledge of CI Types and the BSM type model

– Working knowledge of UCMDB Views

– Understanding of Event Type Indicators

– Detailed knowledge of the events which you want to correlate

• event domain (networking, database, storage, etc.)

• ETIs specified in the events

• event relationships (Cause, Symptom)

Page 24: HP Operations Manager i topology-based event correlation: concepts and operation

24

Basic workflow sequence

Creating TBEC rules

1. Create new rule using the * button

2. Define rule properties (name, description, time window, etc.)

3. Select a topology (UCMDB) view which includes the CI Types and

relationships you want to use in the rule

4. Define CAUSE event(s):

• Select a CI Type in the View

• Select an ETI and ETI value from the list of available ETIs and “Add as a Cause”

5. Define SYMPTOM event(s):

• Select a CI Type in the View

• Select an ETI and ETI value from the list of available ETIs and “Add as a Symptom”

6. Correlation Manager highlights the shortest relationship path

• If a different path is desired, select the appropriate relationship connectors

Page 25: HP Operations Manager i topology-based event correlation: concepts and operation

25

Completing rule definitionRule is valid

Save when

finished

Relations between cause and symptom

CI type are automatically

added

After saving, visualized rule

topology is simplified

Page 26: HP Operations Manager i topology-based event correlation: concepts and operation

26

Automatic Cross-domain Correlation

– Subtitle goes here

Page 27: HP Operations Manager i topology-based event correlation: concepts and operation

27

WebApp

TXAvail:Unavailable

App Server Domain

Chaining of Correlation RulesCI Type

ETI:value

Database Instance

Tablespace

Database Domain

StorageCapacity:Critical

Logical Volume

Storage Server

Physical Disk

Storage Domain

Utilization:Full

Quota:Exceeded

Page 28: HP Operations Manager i topology-based event correlation: concepts and operation

28

Relations Between Correlation Rules

– Triggered rules are connected (chained

together) at runtime when they include a

Cause or Symptom event that

• resolves to the exact same CI

• and has the exact same ETI and ETI value

– Chaining is automatic; no configuration is

required

– Rules can trigger in any sequence

Page 29: HP Operations Manager i topology-based event correlation: concepts and operation

29

New in OMi 9.0!

– Subtitle goes here

Page 30: HP Operations Manager i topology-based event correlation: concepts and operation

30

New Features in OMi 9.0

– Manually relate selected events in browser

• CAUSE event is marked from a group of selected events

• Browser shows “Cause” and “Symptom” icons on the events

• does not create a future relationship, i.e. no correlation rule is created

• event lifecycle state changes and user assignment on CAUSE event are also marked on

SYMPTOM events

– Create new correlation rule directly from selected events

(Correlation Generator)

– Enhance existing correlation rule directly from selected events

(Correlation Generator)

Page 31: HP Operations Manager i topology-based event correlation: concepts and operation

31

Manually relating events(Video)

– Subtitle goes here

Page 32: HP Operations Manager i topology-based event correlation: concepts and operation

32

Using the Correlation Generator

Creating A Rule From Events

If two events often occur at the same time, and if one is always the cause...

Then a new correlation rule can be created by

selecting the two events and

selecting Create Correlation Rule from the context menu

Page 33: HP Operations Manager i topology-based event correlation: concepts and operation

33

Correlation Generator Wizard

– User selects CAUSE event and

SYMPTOM event(s)

– Generator retrieves relationships

between cause CI and symptom

CIs from model automatically

• shortest route automatically selected

– Cause and symptom ETIs from

selected events automatically

added

– Generates a valid correlation rule

Note: No UCMDB view required!

Page 34: HP Operations Manager i topology-based event correlation: concepts and operation

34

Correlation Rules in OMi

– OMi delivers artifacts like correlation rules, ETIs, HIs, KPIs, tool

definitions, etc. using Content Packs

– Content Packs are included with OMi license

– OMi 9.0 Content Packs:

• Infrastructure (includes system, cluster and virtualization artifacts)

• Oracle

• MS SQL Server

• J2EE App Server (WebLogic, WebSphere)

• Exchange

• Active Directory

– 140+ correlation rules are provided

Page 35: HP Operations Manager i topology-based event correlation: concepts and operation

35

– Operators can quickly identify cause events in the browser

– Operators work on cause events instead of wasting time on multiple

symptom events

– Fewer invalid escalations to cross-domain tier 2/3 specialists

– Escalations which DO occur are sent to the right specialist

– Correlation rules continue to work as the infrastructure changes since

they are based on discovered topology

– Rules can be created directly from events in the browser

– Automatic “chaining” of correlation rules to cover cross-domain

scenarios

– Lower cost of

event/fault management

TBEC Delivers Operational Efficiency

Page 36: HP Operations Manager i topology-based event correlation: concepts and operation

36

Thank you for Attending!

Page 37: HP Operations Manager i topology-based event correlation: concepts and operation

37

Q&A

Page 38: HP Operations Manager i topology-based event correlation: concepts and operation

38 ©2010 Hewlett-Packard Development Company, L.P.

To learn more on this topic, and to connect with your peers after

the conference, visit the HP Software Solutions Community:

www.hp.com/go/swcommunity

Page 39: HP Operations Manager i topology-based event correlation: concepts and operation

39

Page 40: HP Operations Manager i topology-based event correlation: concepts and operation

40

Backup Slides

Page 41: HP Operations Manager i topology-based event correlation: concepts and operation

41

BSM Service Health artifacts

Acronym Full Name Definition

ETI Event Type Indicator

• indicates concise status of infrastructure element

• event Custom Attribute with the name “EventTypeIndicator”

• Value of CA = <ETIName>:<ETIValue>• can instantiate HI of the same name

HI Health Indicator • unique object in BSM (not an event attribute)

• represents indicated health of a specific CI

• can be set via an ETI from an event or via metrics from BSM data collectors

KPI Key Performance Indicator

• represents calculated health of a specific CI

• aggregate health is calculated based on assigned HIs and business rules