Advanced Splunk Dashboards in Operations and Support · PDF file ·...

50
Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group - Emerging Technologies Deployment & Support #splunkconf

Transcript of Advanced Splunk Dashboards in Operations and Support · PDF file ·...

Page 1: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Advanced Splunk Dashboards in

Operations and Support Norbert Hamel

Vodafone Group - Emerging Technologies Deployment & Support

#splunkconf

Page 2: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Information Manager & Data Analyst

•  Norbert holds a Master’s Degree in Mechanical Engineering from the RWTH Aachen/Germany.

•  He has been working in the IT industry for nearly 20 years now, initially in Marketing and Technical Writing.

•  With Vodafone for the last 4 years, he is involved in Reporting and ultimately built up the Splunk infrastructure in his department, analyzing data from 1000+ virtual servers/machines and 90 databases with 20 Splunk instances.

•  His focus in Splunk is the creation of sophisticated dashboards which present valuable information in an easy-to-use manner. Various audience groups including 24/7 Monitoring, Operations & Support as well as Management are using these dashboards to gain personal insight into complex technical processes. 2

Page 3: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Agenda

•  Vodafone Group Services Emerging Technologies Deployment & Support – What We Do

•  Why We are Using Splunk •  Splunk Infrastructure •  Splunk Dashboards – A Journey Over 2 Years •  Splunk Enterprise 6 – Our Next Steps •  Splunk Enterprise 6 Live Demo – What We Have Tested •  Summary

3

Page 4: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Vodafone Group Services Emerging Technologies Deployment & Support What We Do

Page 5: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Use Case: Carrier Billing

•  Today we are using our smartphones more and more for purchasing different kinds of digital goods.

•  One common use case is to buy apps, music or subscriptions to various services, where the purchased good is used directly on the smartphone.

•  Usually the mobile users expects to find this purchases on his monthly bill from his mobile network carrier. Alternatively, if the user has a prepaid contract, the purchase should be taken from his balance.

•  The team presenting its Splunk use case at the .conf is called Deployment & Support, which is part of Vodafone’s Emerging Technologies department.

•  The team is responsible for operating a complex platform for carrier billing in 20 Vodafone companies and partner markets . For the sake of simplicity we will call this the “charging platform“.

Charging Platform

5

Page 6: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Sounds Easy?

Charging Platform

6

Page 7: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

There is a Bit More Behind the Scenes ...

•  To complete the cycle of purchasing some questions have to be answered: – Who is this the mobile customer? –  Is the customer using a mobile device or a PC? –  Is this a prepaid or postpaid customer? – To which local market is the customer registered? – Which business partner is providing the purchased digital goods? – Which business partner takes a share of the revenue? – Which additional systems needs to be informed about the purchase?

•  Ultimately the overall cycle results in several requests and responses between the mobile customer‘s device and multiple server systems.

•  Currently the deployment & support team operates 570 virtual servers/machines and approx. 40 databases for this purpose in production and test environments.

7

Page 8: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Several Entities Are Involved or Affected Somehow

Charging Platform

Customer Care Pricing Billing

24x7 Monitoring

Development

Deployment

Management

Business Partner

8

Page 9: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Why We Are Using Splunk

Page 10: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Why We Are Using Splunk

•  Before Splunk: Cacti, RRDtools, Tail, Log files ...

•  Have all data from different IT systems available in one single environment and in (nearly) real-time.

•  Correlation of data from different sources. •  Easy-to-use interface for any non-technical audience in multiple

user groups.

10

Page 11: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Splunk Infrastructure

Page 12: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Forwarding

Indexing

Syslog

UFs Scripts

Searching / dashboards

Data input

Queues

DB

12

Page 13: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Data Sources •  Apache •  Jboss application •  SQL databases •  HornetQ •  Tibco EMS •  Remedy BMC •  HP Quality Center •  HP OpenView •  IBM DataPower •  Business objects •  Pentaho •  Excel •  Soon: Hadoop

13

Page 14: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Splunk Dashboards A Journey Over 2 Years

Page 15: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Splunking Server Applications and Databases

15

Page 16: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Requests and Response Times

•  This is where we started from: Showing requests and response times from a group of servers which perform the same tasks

Average Response Times

Amount of Requests

16

Page 17: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Simple KPI Dashboard – Set Color Based on Values

•  Our KPI dashboards show the performance of different services.

•  Since we have SLA targets of nearly 100% it‘s sometimes hard to visualize if the target was met or breached.

•  We decided to set another color for the columns if the target was breached for a certain time range.

•  Since we could not directly assign a color to a certain value, we generated 2 rows of results, from which one takes the good ones in green and the other takes the breach values in red.

17

Page 18: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Set Colors by Value for Line Charts

•  Some processes of post-processing data are supposed to be finished within a time range of 2 hours (120 minutes).

•  This chart shows the actual processing time for a certain part of the process.

•  Processing times below the limit are in green, above in red.

•  In case the processing takes mor than 10 hours, the chart will cut the line in black.

•  Similar to the KPI dashboard, this is realized using multiple rows of result which are layered on top of each other.

18

Page 19: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Correlation of Amount of Requests and Response Times

•  The next step: combine amount of requests and response times into a single chart.

•  Amounts are rendered as column chart, repsonse times as line charts.

•  Each chart uses its own y-axis scaling. •  Additional gauge chart for real time view.

19

Page 20: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Showing Maintenance

•  If you have a monitoring team watching your dashboards, it might be helpful to inform them about maintenance periods.

•  Should unusual charts show up during maintenance, the monitoring team can immediately see that this might be related to planned maintenance, and act accordingly.

•  The maintenance graph can be realized with one single event stating the start and end time, e.g. from one database record.

20

Page 21: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Combine Summary Indexing with Drilldown to Live Search

•  Sometimes you might find gaps in charts build on summary indexes. So the charts are fast loading, but incomplete.

•  In this case we can drill down to another version of the same chart, which is using the live index instead of summary.

Summary index

Live index 21

Page 22: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Regression Test with Different Software Versions

•  Run regression tests with different versions of a software or different settings.

•  The dashboard will automatically find all different runs and fill this to drop down lists, regardless of when the test run was executed.

•  Easily compare selected runs over a certain time frame which might show significant values.

22

Page 23: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Splunking Business Processes

23

Page 24: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Compare Results with Previous Weeks

•  This chart shows the amount of purchase transactions in the charging platform for a certain selection of local market, business partner or other characteristic attributes.

•  The amount of transactions is displayed as column chart, for example split by product.

•  The overlay shows comparative values for the same time range from 3 weeks before.

24

Page 25: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Compare Results with Previous Weeks

•  In this situation something is different from previous weeks.

•  We can see a significant increase of transactions, and all of those are coming from the items marked in yellow.

•  The 3rd layer in the background is rendered as an area chart and shows transactions with errors only.

25

Page 26: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Revenue Loss Calculation

•  As mentioned in the introduction, the charging platform supports business cases to generate revenue from purchasing processes.

•  On the other hand this means, that an outage in the charging platform may impact the revenue and potentially lead to revenue loss.

•  The revenue loss calculator is a tool that supports people involved in technical issues to quickly detect the potential financial impact of an outage and take the appropriate actions.

26

Page 27: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Splunking Ticketing Systems

27

Page 28: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Ticket History

•  Using Splunk to create reports about ticketing systems enables us to make the related information available to a wide audience.

•  Here each “team” can identify within seconds how many tickets are open and their status. They can watch a list of details as well.

•  The history shows the trend of tickets in “open” status

28

Page 29: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Scatter Chart with Transparency

•  Scatter charts can help to identify significant patterns in the relation between 2 attributes.

•  In this case we measure the resolution time of software defects within a development team over time.

•  Before May we have encountered “bug closing parties“ on certain days resulting in high peak values.

•  After changing the processes the team is now working continously on defects resulting in lower resolution times.

29

Page 30: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Writing Data to Splunk to Sort-Out Incorrect Tickets

•  We also use Splunk to report SLA-related information. •  Sometimes there might be tickets which are assigned to

our team by mistake. •  Since we don‘t want to have those tickets in our SLA

report, we created a dashboard where users can identify the tickets and generate comments on them.

•  The comments are then written to a lookup table, which is used as a filter for showing only relevant tickets in SLA reports.

30

Page 31: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Splunk Enterprise 6 Our Next Steps

Page 32: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

What Will We Get from Splunk?

•  Overlay charts without Flash. •  Interactive dashboards with forms in simple XML. •  Data models to provide customised information access.

32

Page 33: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Splunk Enterprise 6 Live Demo What We Have Tested

Page 34: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Summary

Page 35: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Sophisticated Dashboards with Splunk

•  Using Splunk you can create sophisticated dashboards which meet the requirements of various audience groups, including monitoring teams, real techies, as well as upper management.

•  Splunk dashboards can cover technical processes as well as business cases and organizational processes.

•  Splunk Enterprise 6 will leverage most of the functionality required for sophisticated dashboarding to a level which can be used by a wider range of users.

35

Page 36: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Next Steps

Download the .conf2013 Mobile App If not iPhone, iPad or Android, use the Web App Take the survey & WIN A PASS FOR .CONF2014… Or one of these bags! View all “What’s New” presentations PPTs on the .conf2013 Mobile App Recordings will be available shortly

1

2

3

36

Page 37: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Thank You

Page 38: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Backup

Page 39: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

The Ultimate Log File Format ULFF

•  After the first steps with Splunk, we found that several different log file formats may lead to errors or at least confusion, e.g. if timestamps or return codes do not share the same format.

•  The resulting issues can be solved within Splunk, but this requires additional configuration and processing power.

•  Finally we decided to define a new ultimate log file format (ULFF) which is being implemented in all applications feeding the Charging Platform – ULFF is a custom JSON format.

•  {"transaction-id":"1234-5678-9012", "usecase-id":"9876-5432-1098", "timestamp":"2013-03-10T20:24:25,123+01:00", "country-code":"GB", "status":"ok", "error":"", "payload":"<xml attr=\"value\"></xml>"}

39

Page 40: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Some Examples of Complexity: Prepaid or Postpaid

•  In case a customer has a postpaid contract, the charging platform sends information to the customer‘s local Vodafone market, and they will put the purchase on the customer’s bill.

•  In case the customer is a prepaid customer, the charging platform performs several additional steps:

1.  Check if the customer‘s balance is sufficient for the desired purchase. 2.  Somehow “block” the amount required to purchase the item. 3.  Provide the desired item. 4.  Finally take the amount from the “blocked” balance.

40

Page 41: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Some Examples of Complexity: Cellular or Wireless LAN Connection •  If customer has a connection to his mobile carrier’s network when starting a

purchase cycle, the authentication can be established using the MSISDN from the customer’s SIM card.

•  If the customer is located in an area without mobile network coverage but WLAN connectivity only, there is no MSISDN communicated for the charging platform to refer to. In this case the user may install an app on his mobile device which is able to provide the MSISDN via wireless LAN connectivity as an alternative.

•  If the customer is not using a mobile device at all, but a PC instead, there is no way to provide the MSISDN automatically. In this case the user may manually enter the MSISDN which is to be charged for the purchase. For authentication, the charging platform could send an SMS to this MSISDN providing a one-time authorization code for the purchase.

41

Page 42: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Showing Long Term Trends

•  The overlay technique is a good tool for operation engineers or monitoring teams who need to observe the current situation of server systems.

•  But those charts can also be used to get a better insight in long-term processes.

42

Page 43: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Database Performance

•  Before using Splunk, we took this information from rrdtools.

•  Now we monitor physical IO values (reads, writes, redos), waits, sessions and CPU or disk usage directly. in Splunk.

43

Page 44: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Correlation of Amount of Requests and Response Times

•  The next step: combine the amount of requests and response times into a single chart.

•  Amounts are rendered as column chart, response times as line charts.

•  Each chart uses its own y-axis scaling.

44

Page 45: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Monitoring Message Queues

•  Real-time gauge charts to monitor the amount of messages processed in Java messaging queues.

•  In parallel, get a more detailed view over the last 60 minutes comparing the amount of requests with processing time.

45

Page 46: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Spanning Based on Selected Time Range

•  In time charts the spanning is either set to a fixed value or calculated automatically.

•  Fixed spanning will result in too many measuring points for long time ranges, auto-mode will result in different max values.

•  We have implemented time charts where the spanning is adjusted to time range, values are automatically calculated as comparable Transaction Per Minute (TPM).

46

Page 47: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Time Range Selector Without “All Time”

•  Time range selectors usually allow the definition of “custom time“, which might result in very long-running searches.

•  Using form elements we have created our own time range selector, where the user can only select from predefined time ranges – no more custom time.

47

Page 48: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Splunking Ticketing Systems

•  After we have sent all our application and database information to Splunk, we started to find other data sources.

•  Reporting about ticketing systems like HP Quality Center or Remedy BMC took a lot of manual effort in the past.

•  Ticket information is a bit different in terms of Splunking. –  Usually we have multiple records for a ticket over its lifetime. –  Ticket records may carry several timestamps, e.g. to identify different

statuses. –  Each and every record for one single ticket may be very important.

48

Page 49: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Simple Revenue Trends Chart

•  As mentioned before, we use splunk to provide information to different audiences, for example the management.

•  Since the managament is more interested in figures about business proceses than technical processes, we can easily provide charts showing revenue trends.

•  But the information about financial transactions can also

be helpful for users with a technical focus, such as on-call engineers.

49

Page 50: Advanced Splunk Dashboards in Operations and Support · PDF file · 2017-10-13Advanced Splunk Dashboards in Operations and Support Norbert Hamel Vodafone Group ... • With Vodafone

Compare Results with Previous Weeks

•  The 3rd layer in the background is rendered as area chart.

•  This area shows transactions with errors only. •  The are background charts which are only visible if

there are more errors than successful transactions.

50