Event Console Integration - checkmk.comTemplates in PHP (ノಠ益ಠ) ノ彡 No semantics, no...

Post on 05-Jul-2020

3 views 0 download

Transcript of Event Console Integration - checkmk.comTemplates in PHP (ノಠ益ಠ) ノ彡 No semantics, no...

03.05.2018, Sven PanneCheck_MK Conference #4

Event Console Integration

Check_MK Conference #42

Two Kinds of Monitoring

Pushing/pulling states

“Classic” Check_MK

Pushing of messages

Independent of state

Event Console

Status-based Event-based

Both kinds are useful!

Check_MK Conference #43

Design Considerations

High frequency of incoming

messages

Human-manageable amount of

actual events

Not a high-performance

syslog archive

Check_MK Conference #44

TrapParser

syslogParser

RuleEngine

Actions

SNMP

syslogspool

EMail

Notifications

Shell Script

local

EC

Architectural Overview IBasic Pipeline

Check_MK Conference #45

TrapParser

syslogParser

RuleEngine

Actions

SNMP

syslogspool

EMail

History Events

ViewsEventsHistoryDetails

Notifications

Shell Script

Monitoring CoreLivestatus

local

GUI

EC

Architectural Overview IIPersistence & GUI

Check_MK Conference #46

TrapParser

syslogParser

RuleEngine

Actions

SNMP

syslogspool

EMail

History Events

Event Service

ViewsEventsHistoryDetails

Check

Notifications

Shell Script

Monitoring CoreLivestatus

local

GUI

EC

Architectural Overview IIIThere and Back Again

Logwatch

Check_MK Conference #47

TrapParser

syslogParser

RuleEngine

Actions

SNMP

syslogspool

EMail

History Events

Event Service

ViewsEventsHistoryDetails

Check

Notifications

Shell Script

Monitoring CoreLivestatus

local

GUI

ECHosts,Down-times

Architectural Overview IVHost Information

Logwatch

Check_MK Conference #48

Events are useful,even during host downtimes

Three chunks yet to be completed

Still generated, but marked specially

Automatically archived after downtime

Tactical overview considers them “handled”

Actions can be skipped

Check_MK Conference #49

Further Improvements

Finer configuration of limits

More powerful rewriting

Export of rule packs via MKPs

03.05.2018, Tom BärwinkelCheck_MK Conference #4

Background jobs

Check_MK Conference #412

Examples of long running processes

Host renamingBake AgentsCreate Report

Check_MK Conference #413

Problem – Apache process timeouts

Check_MK Conference #414

The usual workaround

locate files increase timeout restart

Let’s just increase the timeout

update

Check_MK Conference #415

Still a problem - What’s the status?

+ +No userfeedback

Blockedprocesses

Apacherestarts

Check_MK Conference #416

Solution - separation of concerns

CommonInterface

GUI

configuration start jobs show progress

Background jobs

do actual work send updates

Check_MK Conference #417

How it looks in practice - Reports

Check_MK Conference #418

How it looks in practice - Reports

Check_MK Conference #419

The Background jobs overview

Check_MK Conference #420

Special rules for critical jobs

Affected jobs: bake agents rename host

Only one job allowed Error page with details Require acknowledgement

Upcoming background jobs

Download of agent output

Service discovery

SLA calculations

03.05.2018, Konstantin BüttnerCheck_MK Conference #4

Time-specific check parameters

Check_MK Conference #424

Use case● Recurring events lead to conditions that would be abnormal during other times

● Eg. backup jobs etc.

Check_MK Conference #425

Predictive Levels?

Check_MK Conference #426

Solution – Timespecific parameters● Define parameters for time periods

● Generic solution – works for all check plugins

Check_MK Conference #427

How does it work?

Check_MK Conference #428

How does it work?

Check_MK Conference #429

03.05.2018, Konstantin BüttnerCheck_MK Conference #4

Customizable graph layouts

Check_MK Conference #432

Our objective in graphing

● Traditional graphing system are for customizing graphs

● But: day-to-day, we’d rather just have graphsTweaking graphs is fun, but time consuming

● Our objective: Make our data useful out of the box

Check_MK Conference #433

A bit of history - PNP4Nagios

PNP graph In the Olden Days

Graphs were defined for each check type Templates in PHP ( ノ ಠ益 ಠ) ノ彡┻━┻ No semantics, no uniformity* No customizability**

*except for checks sharing templates**unless you wrote your own templates

Check_MK Conference #434

A bit of history - Metrics

Metrics graph A new dawn

Most visible: New look Most important: Semantics Checks no longer have graphs… ...but values have semantics

→Automatic graphs→Custom graphs

Check_MK Conference #435

Now: More customizability

● Unified options for dashlets, views, reports(Werk #5569)

Check_MK Conference #436

Check_MK Conference #437

Check_MK Conference #438

Now: More customizability

● Unified options for dashlets, views, reports(Werk #5569)

● Vertical scaling and mirroring customizable(Werk #3920)

Check_MK Conference #439

Check_MK Conference #440

03.05.2018, Konstantin BüttnerCheck_MK Conference #4

Performance improvements

Check_MK Conference #443

Why is a performance focus critical?

● The typical Check_MK setup keeps growing – more services, more sites, more users

● Better scalability makes new things possible

How to make things go faster

Avoid unnecessary work

Optimize

Make things feel faster

Check_MK Conference #445

Avoid unnecessary work - Tabs● A typical browser: 50 open tabs

● You don’t use them all… but they all make regular livestatus queries, though

● New: Only update active tabs/windows (Werk #4753)

● Benefit: No more wasted data transfers

Check_MK Conference #446

Avoid unnecessary work – Emails● Graphs can make emails reasonably large

● Multiply that by the number of recipients ...

● What’s a mail server for? (Werk #4813)

● Benefit: Notification system becomes more scalable

Check_MK Conference #447

Avoid unnecessary work – Sites

● Frequent scenario: multi-site setup

● For many users, only few sites are actually relevant

● Yet, GUI in the past contacted all sites for every user

● No more: Now configureable for each user (Werk #4921)

Check_MK Conference #448

Optimize – BI● Large aggregations now compile faster (Werk #5142)

● Magic - “it’s the algorithm, stupid”

● Further reworking already planned

Check_MK Conference #449

Optimize – liveproxyd● Previously: # The main loop of the daemon goes here

● One python process – GIL limits scalability

● Now what?

Check_MK Conference #450

Optimize – liveproxyd● Now: One process per site, master process to manage● Process view:

● More resilient, more performant through use of multiple CPUs (cf. Werk #4901)

Check_MK Conference #451

Make things feel faster - Graphs● Previously: View with many Graphs? Better wait for all the data

● Now: Load the view with placeholders, update graphs asynchronously

Take-home message

Attention to detail pays off

Continuously improving

If you have lots of sites and lots of users, Werk #4921 may help you