Data Science at Flurry

18
Data Science at Flurry Soups Ranjan, PhD [email protected]

description

This talk is about the Data Science problems that the Data Science team at Flurry works on. In particular, it dives in to one of the problems we are solving: Machine Learning driven Bidding Strategy for bidding on mobile Real-Time Bidding (RTB) ad-exchanges.

Transcript of Data Science at Flurry

Page 1: Data Science at Flurry

Data Science at Flurry

Soups Ranjan, [email protected]

Page 2: Data Science at Flurry

We all know that when we’re talking about mobile, we’re talking about apps

Source: Nielsen- State of the App Nation 2012 Report and June 2013 Cross Platform Report

Time spent on mobile devices

Page 3: Data Science at Flurry

Flurry has the deepest insight into consumer behavior on mobile

Flurry Facebook Google Millenial Media twitter JumpTap0

200

400

600

800

1,000

1,200

1,400

1200

875

700

400320

100

Source: Data gathered from public statements/filings by Companies; Facebook denotes property and Network; Google Reach denotes sites and Network

Monthly Device Reach (Millions)

Twitter

Page 4: Data Science at Flurry

• Flurry Analytics– Track users, sessions, events and crashes

• Flurry AppCircle– Advertise with Flurry to acquire new users for your app

• Flurry AppSpot– Monetize your app traffic via ads

Flurry Product Overview

Page 5: Data Science at Flurry

• AppCircle: Advertiser configuration to set an ad:– Ad type: CPI, CPC, CP Video – Corresponding Bid– Ad format: Banner or Interstitial– Targeting (Age, Gender, Device, Location, Persona)

• AppCircle Bidder:– Optimally Acquire Ad-Space inventory where ads can be

shown

AppCircle – Advertise to Acquire Users

Page 6: Data Science at Flurry

Cost Model (Bid Price Estimation)

Bid Request (user, pub, exchange)

{Eligible Ads}

History of Bid, win-price

(Ad1, Bid1, P(win)1) … (Adn, Bidn, P(win)n)

Revenue model

(Ad1, AdvBid1, P(conv)1)…(Adn, AdvBidn, P(conv)n)

History of Ad Impressions, conversions

{Ads}

Budget Pacing

Advertiser Goals (α,β)

Ad, AdvBid, Daily Budget,

Spend

Ad Selector (Pick ad and its bid price)

Bid ad on Exchange

{Eligible Ads}

AppCircle Bidder Strategy

Page 7: Data Science at Flurry

Bidder Ad Selection Model - I

Ad Selection Model:Select Ad(adv,pub,exchange,user) =

argmax (Pwinα (Revenue(adv,pub,exchange,user) – β Cost(adv,pub,exchange,user)))

• Maximize margin model (α = β = 1):Select Ad(adv,pub,exchange,user) =

argmax (Pwin (Revenue(adv,pub,exchange,user) – Cost(adv,pub,exchange,user)))

– May lead to lower advertiser fill rate, as we will then only bid to show an advertiser's ad when we are guaranteed to win at price lower than advertiser's bid

Ad Rev (ecpm)

Cost P(win) Rank

Adv1 1.50 1.30 0.30 0.3 * (1.5-1.3) = 0.06

Adv2 0.60 0.50 0.70 0.7 * (0.6-0.5) = 0.07

Page 8: Data Science at Flurry

Bidder Ad Selection Model - II

• Maximize fill rate for advertiser (α =1, β = 0): Select Ad(adv,pub,exchange,user) =

argmax (Pwin Revenue(adv,pub,exchange,user))

– We select the ad that maximizes our revenue goals– however, we only bid if the revenue > cost

Ad Rev (ecpm)

Cost P(win) Rank

Adv1 1.50 1.30 0.30 0.3 * (1.5) = 0.45

Adv2 0.60 0.50 0.70 0.7 * (0.6) = 0.42

Page 9: Data Science at Flurry

Ad Revenue Optimization problem: – Max: P(conv) * bid– Conversion Prediction Model: Max

P(conv)

Historical Estimation:

- Past conversion rate as a predictor for future conversion rates

ML Conversion Prediction Model: – Features: Publisher, Ad, User, Time, Location

AppCircle: Ad Revenue Optimization:

u1

User id

Con

v-pr

ob

Conv-prob for users who sawAd1 in Pub1’s app

Avg conv-prob

Page 10: Data Science at Flurry

Bidder Cost Model

Cost model: We don’t know about other players in the auction Best we can do is to predict based off of our wins and losses

1) If historically we win on auctions for users in Kansas City => 2) Most likely, other bidders not interested in Kansas City users => 3) Next time, we’ll lower our bid for Kansas City users =>4) If we still win those Kansas City users, continue (1-3) =>5) If not, we will revise our bid back up

Page 11: Data Science at Flurry

Machine Learnt model gives us both: Cost and P(win)

Multi-class Classification model (Logistic Regression) to predict win-price based on ad impression

Machine Learning based Bidder Cost Model

P(win) ~ 1.0 P(win) ~ 0.0

Win-price=28cWin-price=27c

Win-price=52 c

No Win

Page 12: Data Science at Flurry

AppCircle Conversion Rates: Local Hour of Day

Regression weights for localHourOfDay

Local Hour of Day (0-23 hours)

Co

effi

cie

nt

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Conversion Probability by locaHourOfDay

local hour of day (0-23 hours)

12 noon

4 pm6 pm

7 pm

Page 13: Data Science at Flurry

Machine Learning Workflow

• How much data is enough?• Parallelize Feature Generation vs. Model Generation• Interpretable vs. Black-box models• Batch vs. Online learning• Time to Score a Model• Unbalanced Data• Over-fitting & Regularization

Page 14: Data Science at Flurry

Recommender System

Recommender System as an Ad-ranking method

Given users and apps they have installed in the past, what other apps are they likely to install?

Given users and their app usage (time-spent), what new apps are they likely to highly engage with?

Page 15: Data Science at Flurry

1hr

1.6hr

1hr

0.6hr

1.5hr

1.2hr

2hr

2.1hr

2hr

3hr

0.3hr

0.1hr0.3hr

2hr 0.8hr

Recommender System

• Item-Item based Collaborative Filtering:– Missing value prediction

App1

App2

App3

App4

Page 16: Data Science at Flurry

Engagement Model – Android All

• Category of SocialApp: Social• Number of users of SocialApp: 2,227• Number of predicted users of SocialApp: 1,131

SocialApp SocialApp

Page 17: Data Science at Flurry

Engagement Model – Android All

• Category of SocialApp: Social• Number of users of SocialApp: 2,227• Number of predicted users of SocialApp: 1,131

SocialAppSocialApp

Page 18: Data Science at Flurry

Other Flurry Data Science Problems

Age and Gender Estimation

Click Fraud Detection

Optimize AppSpot Waterfall