HP Operations Manager i topology-based event correlation: concepts and operation
-
Upload
hp-software-solutions -
Category
Documents
-
view
6.798 -
download
5
description
Transcript of HP Operations Manager i topology-based event correlation: concepts and operation
1 ©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
HP Operations Manager iTopology Based Event Correlation –Concepts and Operation
Dave TroutHewlett-Packard Software and Solutions
2
Agenda
– TBEC in action (What does it do?)
– Basic concepts of TBEC
– Using the Correlation Manager
– Automatic cross-domain correlation
– New TBEC features in OMi 9.0
– Summary
TBEC = Topology Based Event Correlation
3
TBEC in Action(video)
4
TBEC Basic Concepts
– Subtitle goes here
5
KPIs
Health Indicators
OMi Events with Event Type Indicators
OMi Events
BSM Service Health Hierarchy
OM SiSBPM RUM
NNM3rd Party
Mgrs
6
ETI Event Type Indicator
– ETI is an attribute of an event
– Indicates concise status of
managed infrastructure element
– Set based on a hint* in the event
or via server based mapping
filters
– Are defined per CI Type
• Only pre-defined ETIs are
evaluated when events arrive
• Valid for all derived CI Types
CI Type Example ETIs
Database OracleReadWriteError:Occurred
ArchiveMode:Enabled
MemorySortRate:Normal
OracleSessionCount:High
ReplicationStatus:Broken
SQLQueryPerformance:Normal
Node UnexpectedReboot:Occurred
BackupJob:Failed
PingAvailability:Unavailable
LogicalDiskFreeSpace:NearCapacity
MemoryUsageLevel:High
Router LinkStatus:Up
NodeState:Down
Custom Attribute “EventTypeIndicator” = “<ETI name>:<ETI value>”*
7
KPIs
Health Indicators
OMi Events with Event Type Indicators
OMi Events
Correlation Requires ETIs
Only Events with
Event Type
Indicators can be
correlated
Event Type
Indicators are
used to define
correlation rules
OM SiSBPM RUM
NNM3rd Party
Mgrs
8
Cause and symptom events
– Something goes wrong in your
environment
– Monitoring reports multiple
problems via events
– Usually just one of the events
describes the cause of the problem
– Others are just symptoms
– Fix the cause and also the
symptoms go away
In a nutshell, TBEC identifies CAUSE and SYMPTOM events
9
Topology: the “T” in TBEC
Cause and
symptoms are
one part of a rule
The other part is
the CI type
topology
Events are correlated if the topology and the Event Type Indicators are matching
10
Correlation requires relationship
Event1
Ping:Unavailable
Event3
Ping:Unavailable
Two events, cause and symptom not within the same topology => no correlation
Cause and symptom set by event AND cause and symptom within the same topology => events are correlated
Event2
LinkStatus:Down
Symptom
Cause
11
Time Window for Correlation
– Even if cause and symptom and the connecting topology match, events
might not be correlated
– Events have to arrive within a certain time window
– A time window starts when the first cause or symptom event arrives that cannot be
correlated with any other event
– Default time window is 16 minutes
– Each correlation rule can have its own time window which overrides the
global setting
Time
Event Event
Correlation Window
No correlation
12
A Simple Correlation Rule
What the rule defines:
– IF the system receives an event that
sets LinkStatus = Down
– AND IF the system receives an
event that sets Ping Availability =
Unavailable
– AND IF the Router and Computer
are somehow connected (topology)
– AND IF that happens at roughly the
same time
– THEN the system will mark the
LinkStatus Down event as CAUSE
and the Ping Availability
Unavailable event as SYMPTOM
13
TBEC Correlation Rules – Semantics
– A correlation rule shows possible cause-symptom relationships:
• If the two events happen within a defined window of time, then correlate. Otherwise do
nothing.
– A correlation rule does NOT say
• If I have that cause, then I will see that symptom (impact)
• If I see that symptom, then I must have this cause for it
– One cause can have multiple symptoms (and not all have to appear at
the same time)
14
Usage when defining rules
TBEC Cause/Symptom
– A correlation rule must include at least one CAUSE and one or more
SYMPTOMs
– Multiple CAUSE specifications are allowed if they reference the exact
same CI Type
– A SYMPTOM in one rule can be configured as a CAUSE in another rule
(and vice versa)
15
Correlation Engine behavior
TBEC Cause/Symptom
– A correlation rule triggers when a CAUSE event and any combination of
specified SYMPTOM events occur within the correlation time window
– CAUSE and SYMPTOM events can occur in any sequence within the
time window
– A rule which would otherwise mark an event as a SYMPTOM will be
ignored for the event if it is already marked as a SYMPTOM to a
different CAUSE event
– A duplicate CAUSE event which arrives during a correlation window is
correlated and handled like a SYMPTOM event
16
Browser-related behavior
TBEC Cause/Symptom
– If the lifecycle state of CAUSE event is changed:
• CAUSE and related SYMPTOM events are marked with the new state (e.g. “Work On”)
• CAUSE and related SYMPTOM events are assigned to the user
– If the CAUSE event is closed:
• All SYMPTOM events are also closed
– Additional SYMPTOM events which arrive after a CAUSE event is closed
will also be closed until the current correlation window expires
• Can be disabled in Platform settings
17
Correlation Window – Auto Extend Mode
Time
Symptom Event
Cause Event
CorrelationWindow
Auto Extend Mode = True(default)
Time
Symptom Event Cause Event
Extended Correlation Windows
Symptom Event
Auto Extend Mode = False
Correlation Window
18
TBEC Settings
Setting Default
Auto-Extend Time Window Mode true
Correlate Closed Cause Events true
Correlation Time Window (seconds) 960
Max Waiting Queue Size (events) 5000
Admin Platform Infrastructure settings Applications Operations Management:
19
OMi event pipeline
Indicator Manager
Correlation Manager
Event to ETI Mapping
Event Correlation
Content Manager
Event to CI Mapping
AdminView, create and modify correlation rules
Events
If configured, attach HI-Value to CI
KPI calculation
BSM Platform
HIValue
EventEvent
CI
ETIValue
Event
CI
Event
OMi Browser
CI HIValue
HIValue
KPIs
CI
Event
Event
20
Using the Correlation Manager
– Subtitle goes here
21
Correlation Manager
– Define, deploy, and manage correlation rules
– Visualize the topology of correlation rules
– View CAUSE and SYMPTOM events in rules
– View assigned and available Event Type Indicators and their values
– Browse the hierarchy of cross-domain correlation rules
– Access to Correlation Manager is controlled by user role settings
22
Correlation Manager UI
Causes and symptoms of selected rule
Available ETIs of selected
CI type
List of rules currently defined
CI type topology of selected rule
23
What you need to know
Creating TBEC rules
– Working knowledge of CI Types and the BSM type model
– Working knowledge of UCMDB Views
– Understanding of Event Type Indicators
– Detailed knowledge of the events which you want to correlate
• event domain (networking, database, storage, etc.)
• ETIs specified in the events
• event relationships (Cause, Symptom)
24
Basic workflow sequence
Creating TBEC rules
1. Create new rule using the * button
2. Define rule properties (name, description, time window, etc.)
3. Select a topology (UCMDB) view which includes the CI Types and
relationships you want to use in the rule
4. Define CAUSE event(s):
• Select a CI Type in the View
• Select an ETI and ETI value from the list of available ETIs and “Add as a Cause”
5. Define SYMPTOM event(s):
• Select a CI Type in the View
• Select an ETI and ETI value from the list of available ETIs and “Add as a Symptom”
6. Correlation Manager highlights the shortest relationship path
• If a different path is desired, select the appropriate relationship connectors
25
Completing rule definitionRule is valid
Save when
finished
Relations between cause and symptom
CI type are automatically
added
After saving, visualized rule
topology is simplified
26
Automatic Cross-domain Correlation
– Subtitle goes here
27
WebApp
TXAvail:Unavailable
App Server Domain
Chaining of Correlation RulesCI Type
ETI:value
Database Instance
Tablespace
Database Domain
StorageCapacity:Critical
Logical Volume
Storage Server
Physical Disk
Storage Domain
Utilization:Full
Quota:Exceeded
28
Relations Between Correlation Rules
– Triggered rules are connected (chained
together) at runtime when they include a
Cause or Symptom event that
• resolves to the exact same CI
• and has the exact same ETI and ETI value
– Chaining is automatic; no configuration is
required
– Rules can trigger in any sequence
29
New in OMi 9.0!
– Subtitle goes here
30
New Features in OMi 9.0
– Manually relate selected events in browser
• CAUSE event is marked from a group of selected events
• Browser shows “Cause” and “Symptom” icons on the events
• does not create a future relationship, i.e. no correlation rule is created
• event lifecycle state changes and user assignment on CAUSE event are also marked on
SYMPTOM events
– Create new correlation rule directly from selected events
(Correlation Generator)
– Enhance existing correlation rule directly from selected events
(Correlation Generator)
31
Manually relating events(Video)
– Subtitle goes here
32
Using the Correlation Generator
Creating A Rule From Events
If two events often occur at the same time, and if one is always the cause...
Then a new correlation rule can be created by
selecting the two events and
selecting Create Correlation Rule from the context menu
33
Correlation Generator Wizard
– User selects CAUSE event and
SYMPTOM event(s)
– Generator retrieves relationships
between cause CI and symptom
CIs from model automatically
• shortest route automatically selected
– Cause and symptom ETIs from
selected events automatically
added
– Generates a valid correlation rule
Note: No UCMDB view required!
34
Correlation Rules in OMi
– OMi delivers artifacts like correlation rules, ETIs, HIs, KPIs, tool
definitions, etc. using Content Packs
– Content Packs are included with OMi license
– OMi 9.0 Content Packs:
• Infrastructure (includes system, cluster and virtualization artifacts)
• Oracle
• MS SQL Server
• J2EE App Server (WebLogic, WebSphere)
• Exchange
• Active Directory
– 140+ correlation rules are provided
35
– Operators can quickly identify cause events in the browser
– Operators work on cause events instead of wasting time on multiple
symptom events
– Fewer invalid escalations to cross-domain tier 2/3 specialists
– Escalations which DO occur are sent to the right specialist
– Correlation rules continue to work as the infrastructure changes since
they are based on discovered topology
– Rules can be created directly from events in the browser
– Automatic “chaining” of correlation rules to cover cross-domain
scenarios
– Lower cost of
event/fault management
TBEC Delivers Operational Efficiency
36
Thank you for Attending!
37
Q&A
38 ©2010 Hewlett-Packard Development Company, L.P.
To learn more on this topic, and to connect with your peers after
the conference, visit the HP Software Solutions Community:
www.hp.com/go/swcommunity
39
40
Backup Slides
41
BSM Service Health artifacts
Acronym Full Name Definition
ETI Event Type Indicator
• indicates concise status of infrastructure element
• event Custom Attribute with the name “EventTypeIndicator”
• Value of CA = <ETIName>:<ETIValue>• can instantiate HI of the same name
HI Health Indicator • unique object in BSM (not an event attribute)
• represents indicated health of a specific CI
• can be set via an ETI from an event or via metrics from BSM data collectors
KPI Key Performance Indicator
• represents calculated health of a specific CI
• aggregate health is calculated based on assigned HIs and business rules