SplunkLive! New York Dec 2012 - SNAP Interactive

34
High Velocity Intelligence Application Monitoring with Splunk SNAP Interactive, Inc. Presented by: Nicholas DiSanto Architecture Team Lead

Transcript of SplunkLive! New York Dec 2012 - SNAP Interactive

Page 1: SplunkLive! New York Dec 2012 - SNAP Interactive

High Velocity IntelligenceApplication Monitoring with Splunk

SNAP Interactive, Inc.

Presented by:

Nicholas DiSantoArchitecture Team Lead

Page 2: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

Company Overview•SNAP Interactive, Inc.

•www.AreYouInterested.com

•Believes it is one of the largest social discovery platforms on the web (based on monthly active users)

•More than 5 million monthly active users

•Over 1 billion total pieces of structured data from its users

•Synced to millions of Facebook profiles

•Receives over 1,000 real-time updates per minute on like actions from Facebook

•Subscription-based business model

•SNAP is publicly traded - Ticker: STVI

Page 3: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

About Nick

•Developing on LAMP stack at tech startups for 10 years

•Leading a team of core engineers

•Passionate about experimentation & data driven iteration

•Striving to eliminate all technical blockers to speed and innovation

•@NicholasDiSanto

Page 4: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

Summary

•We use splunk for many, many things!

•Today, I will share some of our more interesting applications

•How we get data into splunk

•What we do with that data

•Various types of monitoring

•Extensive user behavior analysis

Page 5: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

What We Give Splunk

•Custom application logs

•Structured, minified, event data

•De-normalized user demographics

•Application profiling data

•Error logs

Page 6: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

Sending Splunk Data

•Centralize logging functions and:

•Format arbitrary structured data into splunk extractable field/value pairs: field=”value”

•Normalize and minify field names

•Detect user_id and augment logs

•Optionally log a percent of events

•Target different log files (error, info, analytics)

Page 7: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

User Demographics

•Our analytics log contains application events, triggered by real users

•We augment these event logs with useful demographic data to classify the events

✦ Gender

✦ Seeking gender

✦ Country

✦ Ethnicity

✦ Date of birth

✦ Date last email click

✦ Date joined

✦ Date last email open

✦ Date last site visit

✦ AB test variant

Page 8: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

Demographic Power

•By augmenting event logs with user demographics we can perform powerful and detailed analysis of user behavior

•Target analysis at countries, genders, or age ranges

•Classify events by days since: registration, login, email open, etc.

•...and much more

Page 9: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

Performance Metrics

•We time key algorithms in our application, and log:

•server name

•query name

•time spent working

•This lets us graph the average, min and max times of these algorithms per server

•We also dark launch features, benchmarking performance prior to official launch.

Page 10: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

Performance

•Average query time for key algorithms by server

Page 11: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

What Splunk Gives Us

•Monitoring - to measure application health

•Analysis - to drive future product decisions

•AB test evaluation - to validate hypotheses

•Detection - to find patterns & classify users

Page 12: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

Monitoring

•With continuous deployment, detailed monitoring is absolutely essential.

•Each deploy we watch changes in:

•Realtime classified error graphs

•Core event stat graphs

•We also monitor email deliverability, revenue, and performance (although not every deploy)

Page 13: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

Error & Event Monitoring•Alert us immediately after deploy if something has gone wrong

•Use realtime background searches

•Single dashboard with multiple graphs and tables

•We are exploring realtime sms alerts to the ‘developer on call’

•Use historial data to identify min/max expected thresholds(weighted averages: same time of day, same day of week)

•Detect consistent deviations and alert

Page 14: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

Error Monitoring

•Count of all errors : past 30 seconds

Page 15: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

Error Monitoring

•All errors: past 5 minutes w/deploys

Page 16: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

Error Monitoring

•Rolled up errors: past 5 minutes

Page 17: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

Error Monitoring

•Rolled up filtered errors: past 5 minutes

Page 18: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

Error Monitoring

•Rolled up JS errors: past 3 hours

Page 19: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

Event Monitoring•We monitor ~20 event stats, in realtime, each deploy

Page 20: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

Event Monitoring

•Overview and detail views, powered by a single realtime background search

Page 21: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

Monitor Email & Performance

•Email

•Deliverability is essential to business

•Need to maximize engagement

•Performance

•What async jobs may be contributing to high DB load?

•What performance are end users experiencing?

•Are particular servers overloaded?

Page 22: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

Email Deliverability

•Overview of key metrics

Page 23: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

Email Monitoring

•Inserts into email scheduled send queue

Page 24: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

Performance

•Asynchronous process timers

•Can correlate spikes with site issues

Page 25: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

Analysis

•We heavily leverage summary indexing for performance gains

•Daily rollups are grouped judiciously, giving us fast, flexible, analysis over long periods

•We summarize: revenue, email deliverability, core KPI, and general stats data

•Custom dashboards facilitate easy searching

•Lots of ad hoc searching by product team

Page 26: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

Email Analysis•Sends opens, clicks, bounces & FBL rates by email type

Page 27: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

Email Analysis

•Monitor changes in open & click rates by email, ISP, country, etc.

Page 28: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

Email Analysis

•Analysis dashboard *Sample data

Page 29: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

Core KPI Dashboard

•Powerful targeted cohort analysis*Sample data

Page 30: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

AB Test Results

•We are constantly running a variety of AB experiments on our live users

•We divide our user population in to nine 10% segments and ten 1% segments

•Each segment can be targeted with an experiment

•All event logs are annotated with the appropriate AB experiment name

•This allows us to measure behavior changes between experiment and control groups

Page 31: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

Easy AB Analysis

•All event logs contain the an AB field

•This identifies the experiment group of the user at that point in time

•Fully integrated into core analysis dashboards

•Ad hoc analysis becomes simple

`my_search` (AB=my_test OR AB=ctrl)| `my_reporting_command` by AB

Page 32: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

AB Dashboards

•key metrics: experiment vs control group

Page 33: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

Latest Splunking - Detection

•The way our users interact with one another is insightful

•We can use this data to classify users:

•Identifying “attractive” users

•Identifying spammers & scammers

•We test hypotheses with ad hoc searches

•Find reliable patterns then setup scheduled searches that interface with MySQL

•This data then feeds into our application in various ways

Page 34: SplunkLive! New York Dec 2012 - SNAP Interactive

www.snap-interactive.com

Contact Us•SNAP Interactive, Inc.

www.snap-interactive.com

•Nicholas DiSantoArchitecture Team [email protected] @NicholasDiSanto

•Lindsay BubbicoFinn Partners (Public Relations Associate)[email protected]@LindsayBubbico