LogVelocityMonitoring...MonitoringandAlerDng 24 AlerngThresholds*!...

Copyright © 2013 Splunk Inc.

Sean Delaney Client Architect, Splunk #splunkconf

Log Velocity Monitoring

Legal NoDces During the course of this presentaDon, we may make forward-‐looking statements regarding future events or the expected performance of the company. We cauDon you that such statements reflect our current expectaDons and esDmates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-‐looking statements, please review our filings with the SEC. The forward-‐looking statements made in this presentaDon are being made as of the Dme and date of its live presentaDon. If reviewed aSer its live presentaDon, this presentaDon may not contain current or accurate informaDon. We do not assume any obligaDon to update any forward-‐looking statements we may make. In addiDon, any informaDon about our roadmap outlines our general product direcDon and is subject to change at any Dme without noDce. It is for informaDonal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obligaDon either to develop the features or funcDonality described or to include any such feature or funcDonality in a future release.

Splunk, Splunk>, Splunk Storm, Listen to Your Data, SPL and The Engine for Machine Data are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respecCve owners.

©2013 Splunk Inc. All rights reserved.

2

About Me

! Splunk Client Architect –  Splunker for 2+ years –  Using Splunk for 6+ years –  Large Splunk Deployments

!   Previously –  Splunk Professional Services –  10+ years ProducDon Services for a large Internet Security Company

3

Agenda

!   Log Velocity !   Monitoring and AlerDng !   Drill Down Demo

4

Log Velocity

TradiDonal Velocity aka Speed

6

Velocity (m/s)

Dis

tanc

e (m

)

Time (s)0 10 20 30 40 50 60

3

6

9

12

15

Log Velocity

7

!   Logging Data Rate –  Events per Second (eps) –  Data Volume per Second (kbps)

Increases or Deceases in Log Velocity

8

!   Environmental changes –  New service, servers or new data sources added to Splunk –  ApplicaDon change (New code deployment, configuraDon change) –  Networking Change (Firewall, RouDng) –  Service migraDon

!   Traffic changes –  More users accessing service(s) –  Change in ApplicaDon logging level (Debug mode) –  Core component is down or intermiaent has issues (Database) –  Logs not being generated or forwarded (Changed log file directory, syslog

server down)

Higher Level Approach to Service Monitoring

9

!   Look at the forest not just the trees –  License Usage –  Event rate per Index, Sourcetype, Source –  Network Throughput –  Monitor event counts for errors and alerts

Is There an Issue?

10

!   OperaDons team is alerted that Splunk is slow !   Service owners noDce slow down, then their website is unavailable !   Customer Service call volume jumps !   OperaDons team is now flooded with monitoring alerts, phone, email and chat messages

Is There an Issue?

11

•  Splunk admins noDce a major spike in indexing volume

Is There an Issue?

12

!   Further invesDgaDon detects a corresponding spike in webserver access and error logs

Is There an Issue?

13

•  Service was DOSed (from an internal source)

•  Early detecDon would have miDgated the issue, reduced customer impact

•  Alerts on either indexing volume or webserver event counts would have noDfied OperaDons to the change of acDvity

Log Velocity Use Cases

14

!   Security Use Cases –  DOS/DDOS –  Service or Port Knocking

!   Webserver Access and Error Logs !   MarkeDng Campaigns

Log Velocity Use Cases

15

!   ApplicaDon Error Logs !   ProducDon Code Updates/Rollouts !   Infrastructure Changes !   Network RouDng or Spanning Tree changes !   DNS/SMTP Changes

Monitoring Log Velocity

Where to Measure Log Velocity

17

! Splunk’s metrics.log: –  Event counts (ev), events per second (eps) –  Data indexed (kb), index throughput (kbps)

!   Metrics data is logged by group: –  per_index_thruput!–  per_sourcetype_thruput!–  per_source_thruput!–  per_host_thruputhistory!


18

!   Example searches:

–  index=_internal source="*/metrics.log" "group=per_index_thruput" | timechart span=10m sum(ev) by series!

–  index=_internal source="*/metrics.log" "group=per_index_thruput" | timechart span=10m avg(kbps) by series!


19

!   Other sources: –  Splunk license logs –  Custom event count searches

> index=myapp error | timechart span=10m count!

Logging Workloads

20

!   Log data workloads are normally cyclic !   Service peaks oSen correspond business or trading hours

Logging Workloads

21

!   Weekday trends normally follow the same cycle !   Logging may drop off on weekend/holidays (business services) !   Log volume could be greater in the evening or weekends (online gaming)

!   Logging can go crazy – Black Friday/Cyber Monday (online shopping) !   Take into account global/regional Dmezones

Monitoring and AlerDng

22

AlerDng Thresholds

!   When defining alerDng thresholds, you need to consider either semng an upper boundary or your data workload

!   Compare to the same Dme period yesterday, last week, last month


23

AlerDng Thresholds

!   Absolute Thresholds: –  index=_internal source="*/metrics.log" group="per_index_thruput"

series="main" | Dmechart span=10m sum(ev) as ev_count | stats max(ev_count) as max_ev | search max_ev>`lv_threshold`

•  Macro used to hold `lv_threshold` value:

[lv_threshold]!definition = 600!iseval = 0!


24

AlerDng Thresholds

!   Compare to same Dme previous day, day of week, etc

earliest=-10m latest=@m index=_internal source="*/metrics.log" group="per_index_thruput" series="main" | stats sum(ev) as ev_count_1 | append [search earliest=-1450m latest=-1440m index=_internal source="*/metrics.log" group="per_index_thruput" series="main" | stats sum(ev) as ev_count_2 ] | stats first(ev_count_1) as ev_count_today, first(ev_count_2) as ev_count_yesterday | eval delta=abs(ev_count_today - ev_count_yesterday) | eval threshold=ev_count_yesterday*0.1 | search delta>threshold!


25

Summary Indexing

!   Summary Indexing your Log Velocity has benefits –  Faster Loads for Monitoring Dashboards –  Provide faster stats for comparaDve alerDng

Error Log Velocity

26

•  Monitor and baseline error counts for an applicaDon

•  Table the top 50 error types/codes •  When a new code release is deployed

monitor for an increase of errors •  Table the top 50 error types/codes and

compare with the results from the previous release

•  Deploy patch/houix/update, and repeat unDl stable state has been re-‐established

sourcetype="apache_error" | rex "^(?:[^\]]*\]){3}\s*(?<phperr>[^\:]+)\:\s*(?<msg>.*)" | stats count by phperr,msg | sort - count | head 50 | fields count,msg!

Drill Down Demo

Summary

28

Monitoring Log Velocity provides addiDonal insight into your environment

•  Detect and alert on environmental changes and abnormal traffic volumes

•  Provides feedback on code deployments

•  First level alerDng for issues •  Useful for NOC/SOC monitoring •  StarDng point for drill down

invesDgaDons

QuesDons?

Next Steps

30

Download the .conf2013 Mobile App If not iPhone, iPad or Android, use the Web App

Take the survey & WIN A PASS FOR .CONF2014… Or one of these bags!

1

2

THANK YOU

LogVelocityMonitoring...MonitoringandAlerDng 24 AlerngThresholds*!...

Documents

Transcript of LogVelocityMonitoring...MonitoringandAlerDng 24 AlerngThresholds*!...

Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*!...

Documents

Transcript of Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*!...

LogVelocityMonitoring...MonitoringandAlerDng 24 AlerngThresholds*!...

Transcript of LogVelocityMonitoring...MonitoringandAlerDng 24 AlerngThresholds*!...