Ad Yield Optimization @ Spotify - DataGotham 2013

33
May 12, 2014 Ad Yield Optimization @ Spotify

description

 

Transcript of Ad Yield Optimization @ Spotify - DataGotham 2013

Page 1: Ad Yield Optimization @ Spotify - DataGotham 2013

May 12, 2014

Ad Yield Optimization @ Spotify

Page 2: Ad Yield Optimization @ Spotify - DataGotham 2013

I’m Kinshuk Mishra

•  Work on distributed systems and data science problems •  Lead architecture for ads backend platform at Spotify •  You can find me @_kinshukmishra

Page 3: Ad Yield Optimization @ Spotify - DataGotham 2013

3

•  Started in 2006 •  Currently has over 24 million users •  6 million paying users •  Available in 28 countries •  Over 300 engineers, of which 100 in NYC

What is Spotify?

Page 4: Ad Yield Optimization @ Spotify - DataGotham 2013

•  getFreeTierUsers() / getAllUsers() > 0.70 •  getSpotifyPayoutToMusicLabels() = $$$ •  Great medium for promotions and announcements

Why are Ads important?

Page 5: Ad Yield Optimization @ Spotify - DataGotham 2013

5

Native Ads

Page 6: Ad Yield Optimization @ Spotify - DataGotham 2013

The problem

How do we optimize the ad yield on Spotify platform?

Page 7: Ad Yield Optimization @ Spotify - DataGotham 2013

The type of questions we have

Find the total available audio ad impressions on iOS platform between 9/12/2013 and 9/13/2013 in NYC metro area for male users in the age-group of 18-35, and who typically listen to hip-hop music genre?

Page 8: Ad Yield Optimization @ Spotify - DataGotham 2013

What is unique about us?

•  Rules triggering ad breaks are unique

•  We also log user activity and audio streaming data

Page 9: Ad Yield Optimization @ Spotify - DataGotham 2013

Different approaches

•  Simulate ad delivery by replaying user events and triggering ad breaks

•  Pre-compute impression aggregates for different dimensions and build a complex model to combine those

•  Use subset of impression data then filter and extrapolate it using a simple model

Page 10: Ad Yield Optimization @ Spotify - DataGotham 2013

Our Hadoop infrastructure

700 nodes in our hadoop cluster

Page 11: Ad Yield Optimization @ Spotify - DataGotham 2013

Some constraints

•  Fast real-time lookup service

•  Consistent results

•  Ability to handle additional targeting

•  Ability to scale

Page 12: Ad Yield Optimization @ Spotify - DataGotham 2013

The solution

Use subset of impression data then filter and extrapolate it using a simple model in a service

Page 13: Ad Yield Optimization @ Spotify - DataGotham 2013

But how?

Now begins the fun part… Lets dive deeper to solve this problem

Page 14: Ad Yield Optimization @ Spotify - DataGotham 2013

What was the big picture going be like?

Hadoop  Ad  impression  log  

Postgres  DB  Booked  Campaigns  

Forecas4ng    engine  

Forecast  Query  

Page 15: Ad Yield Optimization @ Spotify - DataGotham 2013

High level forecasting engine algorithm

Log  data   Load  Data  Cache  

Campaign  data  daily Once a minute

Submit  Forecast  query  

Wait  for  query  

Apply  filter  criteria  to  dataset  

Count  available  impressions  

Apply  growth  and  other  

extrapola4on  factors  

Page 16: Ad Yield Optimization @ Spotify - DataGotham 2013

Some challenges…

•  Organic growth in inventory •  Cold start •  Seasonality

Page 17: Ad Yield Optimization @ Spotify - DataGotham 2013

Organic growth in inventory

Ad impression inventory in a growing market

Page 18: Ad Yield Optimization @ Spotify - DataGotham 2013

Organic growth in inventory?

Ad impression inventory in a market with high conversion to premium

Page 19: Ad Yield Optimization @ Spotify - DataGotham 2013

Cold start

Ad impression inventory in a newly launched market

Page 20: Ad Yield Optimization @ Spotify - DataGotham 2013

Seasonality

Ad impression inventory dip in early Q1

Page 21: Ad Yield Optimization @ Spotify - DataGotham 2013

Volume of data

•  Billions of ad impressions per month •  Terabytes of relevant forecasting data

Data overload?

Page 22: Ad Yield Optimization @ Spotify - DataGotham 2013

Sampling

Page 23: Ad Yield Optimization @ Spotify - DataGotham 2013

Caching

9/12/2013  9/11/2013  9/10/2013  9/09/2013  9/08/2013  9/07/2013  

Log  data   Load  Data  Cache  

Campaign  data  daily Once a minute

9/13/2013  9/14/2013  

Page 24: Ad Yield Optimization @ Spotify - DataGotham 2013

Optimizing data retrieval

•  We analyzed our data access pattern and found over 75% of our campaigns are targeted by age and location.

•  So we mapped location to a list of users sorted by age using SortedSetMultimap

•  Optimized user lookup by location and age-group to O(kLgN) from typical O(kN) where, N : Total users for a location k : constant

Page 25: Ad Yield Optimization @ Spotify - DataGotham 2013

Day of the Month

1   2   3   4   5   6   7   8   9   10   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32  

Growth

Page 26: Ad Yield Optimization @ Spotify - DataGotham 2013

How to find available inventory for sample population?

1.  Take all user ad impressions by applying “day of the month” substitution

2.  Apply filters by ad-type, location, age, gender, platform, etc. 3.  Count the total impressions for all the users who match 4.  Read booked impressions for the similar target criteria from

the cache 5.  Inventory available = total impressions – booked

impressions

Page 27: Ad Yield Optimization @ Spotify - DataGotham 2013

Growth Factor

Keep it simple

Page 28: Ad Yield Optimization @ Spotify - DataGotham 2013

Extrapolation

•  Population (15 million) -> Sample (150,000)

•  Scaling factor is 100

•  Total Available inventory = scaling factor * available inventory for sample

Page 29: Ad Yield Optimization @ Spotify - DataGotham 2013

Other features

•  Ad Frequency capping

•  Day of the week and time of the day filtering

•  View per user (VPU) capping

Page 30: Ad Yield Optimization @ Spotify - DataGotham 2013

What worked for us?

1.  Fast lookups

2.  Simple models scaled well

3.  Deterministic algorithms easier to debug

4.  Adding new targeting features was easy

5.  Forecasting engine agnostic to changes in ad server

Page 31: Ad Yield Optimization @ Spotify - DataGotham 2013

What didn’t work that well?

1.  Campaign level forecasts difficult without simulation

2.  Cold start is a real problem when there is no proxy dataset

3.  Forecasting inventory for new ad types can be challenging

Page 32: Ad Yield Optimization @ Spotify - DataGotham 2013

What we’ve learnt

•  Think data volume •  Consider Sampling •  Choose appropriate time window

•  Analyze data access patterns and optimize for it •  Use deterministic algorithms •  Analyze data trends and factor those in computation •  Simple models scale well

Page 33: Ad Yield Optimization @ Spotify - DataGotham 2013

May 12, 2014

Email - [email protected] https://twitter.com/Spotifyjobs

Thanks!