Implementing improved and consistent arbitrary event tracking company-wide using Snowplow

21
Implementing improved and consistent arbitrary event tracking company- wide using Snowplow Nora Paymer Sr. Business & Consumer Insights Analyst, StumbleUpon 10/6/2015 SF Snowplow MeetUp

Transcript of Implementing improved and consistent arbitrary event tracking company-wide using Snowplow

Implementing improved and consistent arbitrary event

tracking company-wide using Snowplow

Nora PaymerSr. Business & Consumer Insights Analyst, StumbleUpon

10/6/2015SF Snowplow MeetUp

About me

• Hi, I’m Nora• BS & MA in Cognitive Neuroscience– Ask me about sign/speech bilingualism or

optical illusions in the brain!• Previous Roles:– UC Berkeley: Institutional Analytics– CBS Interactive: Inventory Analytics– SquareTrade: Marketing/Consumer Insights

Analytics• StumbleUpon: Business & Product

Analytics

About StumbleUpon

• What is StumbleUpon?– Recommendation Engine for the Internet– Ad Platform for native advertisement– Social engagement platform

• Still #4 in Referral Traffic* (behind Facebook, Twitter, and Pinterest; ahead of Reddit)

• Still alive and kicking!

*Shareaholic, Q4 2014 (mot recent data available)

My Role

• Data Science Team & Finance/Sales Analytics Team, but no dedicated Product or Business Analytics

• When I was hired, I was asked to:– Help Product team be a data-driven

culture–Make data more available company-wide• Better & easier to change dashboards• Ability for non-data people to access data

– Help clean up Data Pipelines• With support from amazing Data Engineering

Team

Problems

1. Data siloed all over the place2. Data inaccessible to most people

• Other data all over the place• No way to integrate with

user/stumble/activity data• Only accessible by a couple people each

• Only place to access most real site data

• Dashboards all made with R/Shiny

• Queries done at terminal, only by Data Science/Analytics Team

• Hive/MapReduce is slow for real-time data querying!

Data sources

Protobuf messages

MySQL

HBase/Hive

MixPanel

FireBase

Adjust

App AnnieDesk.com

SalesForce

StrongView

Solutions

1. Copy product data to quicker/more universal data solution

2. Implement BI tool (Looker)

Data sources

Protobuf messages

MySQL

HBase/Hive

MixPanel

FireBase

Adjust

App AnnieDesk.com

SalesForce

• Send data to RedShift for faster querying• Connect RedShift to Looker:

• Dashboards• GUI Query Builder

RedShift

Looker

StrongView

Problems

1. Data siloed all over the place2. Data inaccessible to most people3. Difficult for teams to add new events– Only “official” solution was protobuf

messages, which was slow and needed to go through Engineering/Data Science/Me just to record a button click

– Teams started using MixPanel, which is expensive and limited

Solutions

1. Copy product data to quicker/more universal data solution

2. Implement BI tool (Looker)3. Replace MixPanel with Snowplow for

arbitrary Event Reporting– Sends data to RedShift for easy

integration with other data– Easy for teams to add new events

Data Sources

Protobuf messages

MySQL

HBase/Hive

MixPanel

FireBase

Adjust

App AnnieDesk.com

SalesForce

RedShift

LookerSnowplow

StrongView

Problems

1. Data siloed all over the place2. Data inaccessible to most people3. Difficult for teams to add new events4. So many teams! So much integration!– Mobile (iOS & Android), Site (back end &

front end), Ads, Marketing (including install referral info & email marketing & other), Firefox & Chrome toolbars, etc. etc.

How we did it

Intended Plan:1. Site implements default page tracker2. Site implements 2-3 events to make

sure flow is working properly– Structured Events

3. Assess if everything is working4. Mobile implements 2-3 events per

platform5. Then roll out everywhere

How we did it

What Actually Happened:1. Site implemented default page tracker2. Site implemented ~100 events– Structured Events

3. Mobile replaced all MixPanel events with Snowplow– Structured Events– Some trouble with implementation/integration

with Android– Used wiki page created by a site engineer, had

confusing language, did some things weirdly4. Testing??

Uh-Oh

• Structured Events not really the right thing:

• Didn’t have userid implemented properly originally

• More fields were going to be needed

Snowplow Term Our UseCategory Event Name (e.g. thumbup)

Action Event Type (e.g. click vs view)

Label Platform (site, iOS…)

Property Version #

Value When event had a value associated with it

So? Switch to Unstructured Events! Easy, right?

• OK great, come up with a new framework for Unstructured Events!– Some required fields across all events– Some optional fields that we know will be widely

used from day 1– Nature of unstructured events is that more fields

could be added laterField Req’d? Descriptionevent_name y Event name

platform y site, iOS, Android, etc.

device_version y Version number (standard field)

event_category n e.g. click; view: useful for filtering

event_group n For defining a group of events, for filtering

value n For events with a value

referrer n Referral source (when applicable)

Sounds good so far!

• Teams that had already implemented Unstructured did not want to implement Structured– They had already spent Eng time on this,

why spend more?• Everyone is always on a tight timeline

– Had trouble seeing the value in the format of their events matching the format of teams they didn’t work with.

• Result? Arguments and top-down mandates

What should we have done differently?

1. Program management across all teams– Didn’t have anyone officially in charge

2. Implement in phases: do test events & a test project before going full live

3. Excellent Documentation4. Get buy-in from everyone from day

one5. Think through dream/far-fetched use

cases: what will you need for that?6. Use Snowplow team for advice!

So now what?

• Still working on it• Connecting all existing data pipelines

to RedShift, sometimes via Snowplow• Better utilizing Snowplow when back

end tracking is too cumbersome– Referral Tracking: both reg and landing

page– Better understanding of engagement and

Time on Site (for non-stumble pages especially)

– Understanding user flow through the site– Etc. etc. etc, hopefully!

Protobuf messages

MySQL

HBase/Hive

MixPanel

FireBase

Adjust

App AnnieDesk.com

SalesForce

RedShift

LookerSnowplow

StrongView

New Data!

Thank You!Questions, etc?