Analyzing Your Logs: What are they telling you?
-
date post
15-Sep-2014 -
Category
Business
-
view
1 -
download
0
description
Transcript of Analyzing Your Logs: What are they telling you?
Analyzing Your LogsWhat are they telling you?
Gerard Ibarra, PhDNovember 2008
Goals Systems Thinking Definition of System: This Presentation Log Analysis Analysis Summary
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 2
Think systems first Use statistics to understand what is going
on Get a better picture with charts Include control charts to monitor the system
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 3
“A system is an assemblage or combination of elements or parts forming a complex or unitary whole;…” (Blanchard, B. S., and Fabrycky, W. J., Systems and Engineering and Analysis (2nd ed.). Englewood Cliffs, NJ: Prentice-Hall, 1990)
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 4
Systems could be any of the following:◦ A transportation network moving items from one
place to another – dynamic◦ A bridge used to connect places together – static◦ A set of unmanned aerial vehicles (UAV) located
in a strategic region providing intelligence – dynamic
◦ A group of applications and servers acting together to perform a service – dynamic
◦ A motor for a car – static/dynamic
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 5
Systems today are more complex than before (Using Systems Engineering to Improve RMS&L Requirements, A Government-Industry Training Workshop, various discussions, Springfield VA: November 12-13, 2008)
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 6
Changes in one part of the system affects the system as a whole◦ More items to move – extra resources to process◦ Increase traffic – longer times to cross bridge◦ Reduction in UAV – changes strategies if mission
remains the same◦ Server down – increases load; possible sales loss◦ New and improved parts – increase inventory to
maintain both motors
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 7
Why think systems for your network?◦ Because changes done to its parts affect its
overall mission and ultimately the business as a whole. For example, the items below have an effect on how the system operates that in turn affects how the company can conduct its business. Adding or removing applications Modify software/hardware configuration Add or remove hardware from operations Improving, adding, or deleting features
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 8
System is the aggregation of applications, servers, and services working in unison to produce a common function for the use, goals, sustainment, and operations of the company
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 9
Various ways to analyze logs: Examples◦ Statistical
Central Tendency Variation Skewness Kurtosis
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 10
Examples Continued:◦ Graphical
Bar Chart Line Chart Pie Chart Control Charts
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 11
Statistical – Central Tendency◦ Determine how much central tendency there is in
the log data Know and understand what is the average number of
events occurring in a system – used for a quick check of how the system is currently operating
Compare the average events occurring over time – see if there are any patterns
Look at the startup of a process – determine if the number of errors differ as times progresses
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 12
Statistical – Central Tendency Example◦ Use the following analytics to generate report
Mean Medium Mode Quartiles
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.
First Hour
(Based on 1-min aggregations over 1-hour periods)
13
◦ The mean is 2.3333 – this is the average times over one hour based on one minute increments that the error occurred; anything more than this should raise a flag when comparing the same events to the same hour to other days
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.
ExampleDay1 Day2 Day3 Day4 Day5
Mean 2.33 2.35 2.21 7.45 2.41
14
◦ The median is 2 – this is the mid point number of events based on the hour; it should be somewhat close to the mean unless the data is skewed
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.
ExampleData (1, 1, 1, 1, 1, 1, 2, 20, 21, 22, 23, 24, 25 )Mean = 11Median = 2The mean is over five times the median – should raise a flag; notice that the data is skewed to the ones and twenties
15
◦ The mode is 1 – this is the most reoccurring number of events based on one minute aggregations over the one hour; shows where most of the data comes from; should make some sense with respect to the mean or median or both
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.
ExampleData (1, 1, 1, 1, 1, 1, 5, 17, 19, 21, 23, 25, 27)Mode = 1Mean =11Median = 5
There is a wide variation between the three indices – should raise flag
16
◦ The lower and upper quartiles are 1 and 3.5 – this shows the lower half and upper half of the medians based on the Moore and McCabe or “M-and-M method” (there are various ways to calculate the quartiles)
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.
ExampleData (0, 0, 0, 0, 0, 0, 5, 17, 19, 21, 23, 25, 27)LQ = 0; UQ = 22Mean = 11The mean is far from the LQ in terms of percentages – should raise flag; could show that at the startup of the period the #no. of errors were nil, and as time increased, so did the errors
17
Statistical – Variation◦ Determine how much the log data is varying from
the mean The closer to the mean, the less the systems vary The less variations typically the smoother the system
operates
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 18
Statistical – Variation Example◦ Use the following analytics to generate report
Mean Variation
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.
First Hour
(Based on 1-min aggregations over 1-hour periods)
19
◦ The mean is 2.3333 and the standard deviation is 1.91195 – the standard deviation is the amount that the data varies from the mean; it is the amount of spread from the mean expressed in the original units
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.
ExampleMean = 45StdDev = 41The standard deviation is almost the same amount as the mean – this should raise a flag (Note that the company could define this type of behavior as normal)
20
Statistical – Skewness and Kurtosis◦ Try to find out the type of distribution the system
generates Learn if the data is normal – good for predictions See how the system operates – determine if there
are modes during certain periods
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 21
Statistical – Skewness and Kurtosis Example◦ Use the following analytics to generate report
Statistics
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.
(Based on 1-hour aggregations over the range of the data)
22
◦ The Skewness is -2.34592 – this is a measure of the symmetry of the distribution (negative means that it skews to the left and positive to the right)
◦ The Kurtosis is 8.49086 – this is the measure of how peaked the distribution is (the larger the number, the more “peaked”)
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.
Normal
Skewed LeftPeaked High
m
Re
gion
with
S
igni
fican
t #
of
Eve
nts
Example of possible distribution: Mostof the events take place at the start ofthe process and peaks in a short interval
23
◦ A Skewness of 0.0 and Kurtosis of 3.0 means that this is an ideal normal distribution – great for predicting possible outcomes
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.
Normal
m
s sm
X
Z
24
Graphical – Bar Charts◦ View the errors based on different periods◦ Understand the behavior of the systems better
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.
Most Errors on Day 2Least number of errorsat 6:00 am and 5:00 pm
Two instances of almostzero errors on day 5
25
Graphical – Line Charts◦ Get a clearer perspective on the error rates◦ View same data, but from a different perspective
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.
Most Errors on Day 2Least number of errorsat 6:00 am and 5:00 pm
Two instances of almostzero errors on day 5
26
Graphical – Line Charts◦ Use it to forecast
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.
Follows Same Trend Basedon Periods (Aug01 – Sep01 and Aug02 – Sep02)
Shows an Upward Trend
27
Graphical – Pie Charts◦ Compare to other events◦ Compare to system as a whole
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.
Errors account for less than2% of the Events in the System
Significant number ofErrors occurring basedon the number of Warnings
28
Graphical – Control Charts◦ Monitor the system or individual subsystems◦ Anticipate possible problems
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.
Out of Compliance
Trending Upwards: Tryto keep it from going abovethe UCL again
29
Use analytics and charting to help view and understand what the system and its subsystems may be doing◦ Look for
Abnormalities Deviations Compliances
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 30
◦ Learn how to Predict Anticipate Forecast
Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.
Most of the chart and result screen shots shown in this presentation were created in Violog. http://www.buildwave.com/violog
31