Data Science at Flurry

Data Science at Flurry

Soups Ranjan, [email protected]

We all know that when we’re talking about mobile, we’re talking about apps

Source: Nielsen- State of the App Nation 2012 Report and June 2013 Cross Platform Report

Time spent on mobile devices

Flurry has the deepest insight into consumer behavior on mobile

Flurry Facebook Google Millenial Media twitter JumpTap0

200

400

600

800

1,000

1,200

1,400

1200

875

700

400320

100

Source: Data gathered from public statements/filings by Companies; Facebook denotes property and Network; Google Reach denotes sites and Network

Monthly Device Reach (Millions)

Twitter

• Flurry Analytics– Track users, sessions, events and crashes

• Flurry AppCircle– Advertise with Flurry to acquire new users for your app

• Flurry AppSpot– Monetize your app traffic via ads

Flurry Product Overview

• AppCircle: Advertiser configuration to set an ad:– Ad type: CPI, CPC, CP Video – Corresponding Bid– Ad format: Banner or Interstitial– Targeting (Age, Gender, Device, Location, Persona)

• AppCircle Bidder:– Optimally Acquire Ad-Space inventory where ads can be

shown

AppCircle – Advertise to Acquire Users

Cost Model (Bid Price Estimation)

Bid Request (user, pub, exchange)

{Eligible Ads}

History of Bid, win-price

(Ad1, Bid1, P(win)1) … (Adn, Bidn, P(win)n)

Revenue model

(Ad1, AdvBid1, P(conv)1)…(Adn, AdvBidn, P(conv)n)

History of Ad Impressions, conversions

{Ads}

Budget Pacing

Advertiser Goals (α,β)

Ad, AdvBid, Daily Budget,

Spend

Ad Selector (Pick ad and its bid price)

Bid ad on Exchange

{Eligible Ads}

AppCircle Bidder Strategy

Bidder Ad Selection Model - I

Ad Selection Model:Select Ad(adv,pub,exchange,user) =

argmax (Pwinα (Revenue(adv,pub,exchange,user) – β Cost(adv,pub,exchange,user)))

• Maximize margin model (α = β = 1):Select Ad(adv,pub,exchange,user) =

argmax (Pwin (Revenue(adv,pub,exchange,user) – Cost(adv,pub,exchange,user)))

– May lead to lower advertiser fill rate, as we will then only bid to show an advertiser's ad when we are guaranteed to win at price lower than advertiser's bid

Ad Rev (ecpm)

Cost P(win) Rank

Adv1 1.50 1.30 0.30 0.3 * (1.5-1.3) = 0.06

Adv2 0.60 0.50 0.70 0.7 * (0.6-0.5) = 0.07

Bidder Ad Selection Model - II

• Maximize fill rate for advertiser (α =1, β = 0): Select Ad(adv,pub,exchange,user) =

argmax (Pwin Revenue(adv,pub,exchange,user))

– We select the ad that maximizes our revenue goals– however, we only bid if the revenue > cost

Ad Rev (ecpm)

Cost P(win) Rank

Adv1 1.50 1.30 0.30 0.3 * (1.5) = 0.45

Adv2 0.60 0.50 0.70 0.7 * (0.6) = 0.42

Ad Revenue Optimization problem: – Max: P(conv) * bid– Conversion Prediction Model: Max

P(conv)

Historical Estimation:

- Past conversion rate as a predictor for future conversion rates

ML Conversion Prediction Model: – Features: Publisher, Ad, User, Time, Location

AppCircle: Ad Revenue Optimization:

u1

User id

Con

v-pr

ob

Conv-prob for users who sawAd1 in Pub1’s app

Avg conv-prob

Bidder Cost Model

Cost model: We don’t know about other players in the auction Best we can do is to predict based off of our wins and losses

1) If historically we win on auctions for users in Kansas City => 2) Most likely, other bidders not interested in Kansas City users => 3) Next time, we’ll lower our bid for Kansas City users =>4) If we still win those Kansas City users, continue (1-3) =>5) If not, we will revise our bid back up

Machine Learnt model gives us both: Cost and P(win)

Multi-class Classification model (Logistic Regression) to predict win-price based on ad impression

Machine Learning based Bidder Cost Model

P(win) ~ 1.0 P(win) ~ 0.0

Win-price=28cWin-price=27c

Win-price=52 c

No Win

AppCircle Conversion Rates: Local Hour of Day

Regression weights for localHourOfDay

Local Hour of Day (0-23 hours)

Co

effi

cie

nt

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Conversion Probability by locaHourOfDay

local hour of day (0-23 hours)

12 noon

4 pm6 pm

7 pm

Machine Learning Workflow

• How much data is enough?• Parallelize Feature Generation vs. Model Generation• Interpretable vs. Black-box models• Batch vs. Online learning• Time to Score a Model• Unbalanced Data• Over-fitting & Regularization

Recommender System

Recommender System as an Ad-ranking method

Given users and apps they have installed in the past, what other apps are they likely to install?

Given users and their app usage (time-spent), what new apps are they likely to highly engage with?

1hr

1.6hr

1hr

0.6hr

1.5hr

1.2hr

2hr

2.1hr

2hr

3hr

0.3hr

0.1hr0.3hr

2hr 0.8hr

Recommender System

• Item-Item based Collaborative Filtering:– Missing value prediction

App1

App2

App3

App4

Engagement Model – Android All

• Category of SocialApp: Social• Number of users of SocialApp: 2,227• Number of predicted users of SocialApp: 1,131

SocialApp SocialApp

Engagement Model – Android All

• Category of SocialApp: Social• Number of users of SocialApp: 2,227• Number of predicted users of SocialApp: 1,131

SocialAppSocialApp

Other Flurry Data Science Problems

Age and Gender Estimation

Click Fraud Detection

Optimize AppSpot Waterfall

Data Science at Flurry

Technology

Transcript of Data Science at Flurry