Fast and Efficient A/B Testing Analysis with Shiny and SQL · Fast and Efficient A/B Testing...
Transcript of Fast and Efficient A/B Testing Analysis with Shiny and SQL · Fast and Efficient A/B Testing...
Fast and Efficient A/B Testing Analysis with Shiny and SQL
Charlie ThompsonStoryblocks
A/B Testing at Storyblocks
Our search page for stock video
“Related Search” cards test
“Related Search” cards testTest Control
We store results for our tests in Shiny
We have > 100 metrics to analyze per test
We have thousands of A/B tests with millions of users
Multiple ways to measure “users”
Lots of metrics per user
A/B testing generates big data
Shiny and SQL together
A brief history
2015
Automated online dashboard in SQL
2014
Adhoc SQL queries2011
Outsourced to 3rd party
2016
To Shiny!
2017
Scaling within Shiny
Loading big data into Shiny
Raw A/B testing data (SQL)
test_1.RData
R script queries the SQL database and saves off an .RData file for each test that contains the raw data
Overnight preprocessing on shiny
server
test_2.RData
test_3.RData
test_4.RData
load_data.R
Loading big data into Shiny
Raw A/B testing data (SQL)
server.R
test_1.RData
R script queries the SQL database and saves off an .RData file for each test that contains the raw data
Shiny Dashboard
Overnight preprocessing on shiny
server
Live in dashboard
As tests are selected in the dashboard, Shiny pulls the raw data file and computes all the metrics needed, including hypothesis tests
test_2.RData
test_3.RData
test_4.RData
load_data.R
Constraints with Shiny at scale
Raw A/B testing data (SQL)
Bottleneck #1: Reading in large tests
server.R
test_1.RData
R script queries the SQL database and saves off an .RData file for each test that contains the raw data
Shiny Dashboard
Bottleneck #2: Calculating hypothesis tests for 50+ metrics
Bottleneck #3: Users queue
Overnight preprocessing on shiny
server
Live in dashboard
As tests are selected in the dashboard, Shiny pulls the raw data file and computes all the metrics needed, including hypothesis tests
test_2.RData
test_3.RData
test_4.RData
load_data.R
Overcoming Shiny constraints
Raw A/B testing data (SQL)
Bottleneck #1: Reading in large tests
server.R
test_1.RData
R script queries the SQL database and calculates hypothesis tests and saves off an .RData file for each test that contains the aggregated data
Shiny Dashboard
Bottleneck #2: Calculating hypothesis tests for 50+ metrics
Bottleneck #3: Users queue
Overnight preprocessing on shiny
server
Live in dashboard
As tests are selected in the dashboard, Shiny pulls the aggregated file for each test, which now contains historical values instead of daily snapshots
test_2.RData
test_3.RData
test_4.RData
load_data.R
NO WORRIES! The dashboard is so fast we won’t notice
FUHGETTABOUTIT! Aggregated data is wicked small
NOT ANYMORE! This is done in the morning
Making the most of your
data
When is a test done?
Aggregated data gives a time series view
Test begins
Time series helps prevent premature reads
P Value
Date
Test looks 95% significant here!
P-value should stabilize over time
Win or lose, the P-value should
stabilize before a test is “finished”
P Value
Date
When to think about scaling
Shiny: prototype vs production
Prototype Production
Hosting Local Shiny server, shinyapps.io, etc
Number of concurrent users One Multiple
Page load time Easy to overlook Instant, UX is important
Data storage Often pull in unused rows or columns
Loads only necessary data
Stability and maintenance Only needs to be working when demoing
Minimal downtime
Measuring Shiny usageMake sure you know how many users you have!
What we learned
Let SQL be SQL and R be R
R SQL
Big data aggregation Possible, but slow Made for exactly this
Hypothesis tests and charts Made for exactly this Painful, need tools
Data tips for Shiny in production
1. Subset your input data before reading it in
2. Use .RData files
3. Consider ETL process - do you really need real-time data?
4. Monitor usage
A/B Testing in the Wild [Etsy] - Emily Robinson
A/B Testing at Stack Overflow - Julia Silge
Experiments at Airbnb - Jan Overgoor
Shiny server system performance monitoring - Huidong Tian
Additional resources
We’re hiring!
https://weare.storyblocks.com
Contact me
www.RCharlie.com
Twitter: @RCharlie425
Questions?