Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

127

Transcript of Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Page 1: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha
Page 2: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

I listen to ~ 100 Bln ad opportunities daily

I respond with optimal bids within milliseconds

I petabytes of data (ad impressions, visits, clicks, conversions)

Page 3: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

I listen to ~ 100 Bln ad opportunities daily

I respond with optimal bids within milliseconds

I petabytes of data (ad impressions, visits, clicks, conversions)

Page 4: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

I listen to ~ 100 Bln ad opportunities daily

I respond with optimal bids within milliseconds

I petabytes of data (ad impressions, visits, clicks, conversions)

Page 5: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

I listen to ~ 100 Bln ad opportunities daily

I respond with optimal bids within milliseconds

I petabytes of data (ad impressions, visits, clicks, conversions)

Page 6: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Predicting user response to ads is a Machine-Learning problem.

but quantifying impact of ad-exposure is a Measurement probem.

Page 7: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Predicting user response to ads is a Machine-Learning problem.but quantifying impact of ad-exposure is a Measurement probem.

Page 8: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Spark: existing vs simulated data

Most Spark applications process existing big data-sets.

Today we’re talking about analyzing simulated big data

Page 9: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Spark: existing vs simulated data

Most Spark applications process existing big data-sets.Today we’re talking about analyzing simulated big data

Page 10: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Key Conceptual Take-aways

I Issues in Ad lift measurement

I Proper definitionI Confidence bounds

I Bayesian Methods for Ad Lift Confidence Bounds

I Gibbs Sampling (MCMC – Markov Chain Monte Carlo)

I Using Spark for:

I Monte Carlo sampling for confidence-boundsI Monte Carlo simulations

Page 11: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Key Conceptual Take-aways

I Issues in Ad lift measurementI Proper definition

I Confidence bounds

I Bayesian Methods for Ad Lift Confidence Bounds

I Gibbs Sampling (MCMC – Markov Chain Monte Carlo)

I Using Spark for:

I Monte Carlo sampling for confidence-boundsI Monte Carlo simulations

Page 12: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Key Conceptual Take-aways

I Issues in Ad lift measurementI Proper definitionI Confidence bounds

I Bayesian Methods for Ad Lift Confidence Bounds

I Gibbs Sampling (MCMC – Markov Chain Monte Carlo)

I Using Spark for:

I Monte Carlo sampling for confidence-boundsI Monte Carlo simulations

Page 13: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Key Conceptual Take-aways

I Issues in Ad lift measurementI Proper definitionI Confidence bounds

I Bayesian Methods for Ad Lift Confidence Bounds

I Gibbs Sampling (MCMC – Markov Chain Monte Carlo)

I Using Spark for:

I Monte Carlo sampling for confidence-boundsI Monte Carlo simulations

Page 14: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Key Conceptual Take-aways

I Issues in Ad lift measurementI Proper definitionI Confidence bounds

I Bayesian Methods for Ad Lift Confidence BoundsI Gibbs Sampling (MCMC – Markov Chain Monte Carlo)

I Using Spark for:

I Monte Carlo sampling for confidence-boundsI Monte Carlo simulations

Page 15: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Key Conceptual Take-aways

I Issues in Ad lift measurementI Proper definitionI Confidence bounds

I Bayesian Methods for Ad Lift Confidence BoundsI Gibbs Sampling (MCMC – Markov Chain Monte Carlo)

I Using Spark for:

I Monte Carlo sampling for confidence-boundsI Monte Carlo simulations

Page 16: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Key Conceptual Take-aways

I Issues in Ad lift measurementI Proper definitionI Confidence bounds

I Bayesian Methods for Ad Lift Confidence BoundsI Gibbs Sampling (MCMC – Markov Chain Monte Carlo)

I Using Spark for:I Monte Carlo sampling for confidence-bounds

I Monte Carlo simulations

Page 17: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Key Conceptual Take-aways

I Issues in Ad lift measurementI Proper definitionI Confidence bounds

I Bayesian Methods for Ad Lift Confidence BoundsI Gibbs Sampling (MCMC – Markov Chain Monte Carlo)

I Using Spark for:I Monte Carlo sampling for confidence-boundsI Monte Carlo simulations

Page 18: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Application context: ad impact measurement

I Advertisers want to know the impact of showing ads to users.

Page 19: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Measuring Ad Impact: Two Approaches

I Observational studies:

I Compare uses who happen to be exposed vs not exposedI Bias a big issue

I Randomized tests:

I Randomly expose to test, compare with control (un-exposed)

Page 20: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Measuring Ad Impact: Two Approaches

I Observational studies:I Compare uses who happen to be exposed vs not exposed

I Bias a big issue

I Randomized tests:

I Randomly expose to test, compare with control (un-exposed)

Page 21: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Measuring Ad Impact: Two Approaches

I Observational studies:I Compare uses who happen to be exposed vs not exposedI Bias a big issue

I Randomized tests:

I Randomly expose to test, compare with control (un-exposed)

Page 22: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Measuring Ad Impact: Two Approaches

I Observational studies:I Compare uses who happen to be exposed vs not exposedI Bias a big issue

I Randomized tests:

I Randomly expose to test, compare with control (un-exposed)

Page 23: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Measuring Ad Impact: Two Approaches

I Observational studies:I Compare uses who happen to be exposed vs not exposedI Bias a big issue

I Randomized tests:I Randomly expose to test, compare with control (un-exposed)

Page 24: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ideal Randomized Test

Page 25: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ideal Randomized Test

Page 26: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ideal Randomized Test

Page 27: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ideal Randomized Test: Ad lift

Page 28: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ideal Randomized Test: Ad lift

Page 29: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Response Rates

If we see k = 200 conversions out of N = 10, 000 users,

what is a good estimate for the response-rate?

Estimated response-rate R̂ = k/N = 200/10, 000 = 2%. . .But how confident are we?

Page 30: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Response Rates

If we see k = 200 conversions out of N = 10, 000 users,

what is a good estimate for the response-rate?

Estimated response-rate R̂ = k/N = 200/10, 000 = 2%. . .

But how confident are we?

Page 31: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Response Rates

If we see k = 200 conversions out of N = 10, 000 users,

what is a good estimate for the response-rate?

Estimated response-rate R̂ = k/N = 200/10, 000 = 2%. . .But how confident are we?

Page 32: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Response Rate 90% Confidence Bounds

P(R > R̂ | r = q5) = 5%P(R < R̂ | r = q95) = 5%

Page 33: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Response Rate 90% Confidence Bounds

P(R > R̂ | r = q5) = 5%

P(R < R̂ | r = q95) = 5%

Page 34: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Response Rate 90% Confidence Bounds

P(R > R̂ | r = q5) = 5%P(R < R̂ | r = q95) = 5%

Page 35: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Response-Rate Confidence Bounds

Page 36: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Response-Rate Confidence Bounds

Page 37: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Response-Rate Confidence Bounds

Page 38: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Response-Rate Confidence Bounds

How to find (q5, q95) ?

Page 39: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha
Page 40: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Response-Rate: Bayesian Confidence Bounds

Randomly generate response rates that are consistent with the data.

(Sample rates from posterior distribution given data.)Find the (0.05, 0.95) quantiles of these rates.

Page 41: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Response-Rate: Bayesian Confidence Bounds

Randomly generate response rates that are consistent with the data.(Sample rates from posterior distribution given data.)

Find the (0.05, 0.95) quantiles of these rates.

Page 42: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Response-Rate: Bayesian Confidence Bounds

Randomly generate response rates that are consistent with the data.(Sample rates from posterior distribution given data.)Find the (0.05, 0.95) quantiles of these rates.

Page 43: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Response-Rate: Bayesian Confidence Bounds

I Assume an unknown true rate r , with a prior distrib. p(r)I assume p(r) = Beta(1, 1) = Unif (0, 1)

I Sample from the posterior distribution of the rate r

I conditional on the observed data (k conversions out of N)

P(r | k) Ã P(k | r) · p(r)

à r

k(1 ≠ r)N≠k · Beta(1, 1)Ã r

k+1(1 ≠ r)N≠k+1

à Beta(k + 1, N ≠ k + 1)

I Compute (0.05, 0.95) quantiles from the generated rates.

Page 44: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Response-Rate: Bayesian Confidence Bounds

I Assume an unknown true rate r , with a prior distrib. p(r)I assume p(r) = Beta(1, 1) = Unif (0, 1)

I Sample from the posterior distribution of the rate r

I conditional on the observed data (k conversions out of N)

P(r | k) Ã P(k | r) · p(r)

à r

k(1 ≠ r)N≠k · Beta(1, 1)Ã r

k+1(1 ≠ r)N≠k+1

à Beta(k + 1, N ≠ k + 1)

I Compute (0.05, 0.95) quantiles from the generated rates.

Page 45: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Response-Rate: Bayesian Confidence Bounds

I Assume an unknown true rate r , with a prior distrib. p(r)I assume p(r) = Beta(1, 1) = Unif (0, 1)

I Sample from the posterior distribution of the rate r

I conditional on the observed data (k conversions out of N)

P(r | k) Ã P(k | r) · p(r)Ã r

k(1 ≠ r)N≠k · Beta(1, 1)

à r

k+1(1 ≠ r)N≠k+1

à Beta(k + 1, N ≠ k + 1)

I Compute (0.05, 0.95) quantiles from the generated rates.

Page 46: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Response-Rate: Bayesian Confidence Bounds

I Assume an unknown true rate r , with a prior distrib. p(r)I assume p(r) = Beta(1, 1) = Unif (0, 1)

I Sample from the posterior distribution of the rate r

I conditional on the observed data (k conversions out of N)

P(r | k) Ã P(k | r) · p(r)Ã r

k(1 ≠ r)N≠k · Beta(1, 1)Ã r

k+1(1 ≠ r)N≠k+1

à Beta(k + 1, N ≠ k + 1)

I Compute (0.05, 0.95) quantiles from the generated rates.

Page 47: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Response-Rate: Bayesian Confidence Bounds

I Assume an unknown true rate r , with a prior distrib. p(r)I assume p(r) = Beta(1, 1) = Unif (0, 1)

I Sample from the posterior distribution of the rate r

I conditional on the observed data (k conversions out of N)

P(r | k) Ã P(k | r) · p(r)Ã r

k(1 ≠ r)N≠k · Beta(1, 1)Ã r

k+1(1 ≠ r)N≠k+1

à Beta(k + 1, N ≠ k + 1)

I Compute (0.05, 0.95) quantiles from the generated rates.

Page 48: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Response-Rate: Bayesian Confidence Bounds

I Assume an unknown true rate r , with a prior distrib. p(r)I assume p(r) = Beta(1, 1) = Unif (0, 1)

I Sample from the posterior distribution of the rate r

I conditional on the observed data (k conversions out of N)

P(r | k) Ã P(k | r) · p(r)Ã r

k(1 ≠ r)N≠k · Beta(1, 1)Ã r

k+1(1 ≠ r)N≠k+1

à Beta(k + 1, N ≠ k + 1)

I Compute (0.05, 0.95) quantiles from the generated rates.

Page 49: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Response-Rate: Bayesian Confidence Bounds

A simple form of Gibbs Sampling (more later):

I sample M values of r from posteriorP(r | k) ≥ Beta(k + 1, N ≠ k + 1).

I compute (0.05, 0.95) quantiles

from numpy.random import beta

from scipy.stats.mstats import mquantiles

def conf(N, k, samples = 500):

rates = beta(k+1, N-k+1, samples)

return mquantiles(rates, prob = [0.05, 0.95])

Page 50: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Response-Rate: Bayesian Confidence Bounds

A simple form of Gibbs Sampling (more later):

I sample M values of r from posteriorP(r | k) ≥ Beta(k + 1, N ≠ k + 1).

I compute (0.05, 0.95) quantiles

from numpy.random import beta

from scipy.stats.mstats import mquantiles

def conf(N, k, samples = 500):

rates = beta(k+1, N-k+1, samples)

return mquantiles(rates, prob = [0.05, 0.95])

Page 51: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Response-Rate: Bayesian Confidence Bounds

Page 52: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Response-Rate: Bayesian Confidence Bounds

Page 53: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Response-Rate: Bayesian Confidence Bounds

Page 54: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Response-Rate: Bayesian Confidence Bounds

Page 55: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Response-Rate: Bayesian Confidence Bounds

Page 56: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Response Rates: Example

If we see k = 200 conversions out of N = 10, 000 users,

what is a good estimate for the response-rate?

Estimated response-rate R̂ = k/N = 200/10, 000 = 2%. . .

=∆ 90% confidence region (1.8%, 2.2%)

Page 57: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Response Rates: Example

If we see k = 200 conversions out of N = 10, 000 users,

what is a good estimate for the response-rate?

Estimated response-rate R̂ = k/N = 200/10, 000 = 2%. . .=∆ 90% confidence region (1.8%, 2.2%)

Page 58: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

We’ve talked about Response Rates. . .

now let’s consider Ad Lift

Page 59: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Simple Example

I control: 10,000 users, 200 conversionsI test: 100,000 users, 2200 conversions

Observed response-rates:

I control: R̂c = 200/10, 000 = 2%I test: R̂t = 2200/100, 000 = 2.2%

Estimated Lift L̂ = 2.2/2 ≠ 1 = 10%

This is a great lift !Not so fast! Is this a reliable estimate?Could true lift ¸ be 0%, or even negative ?

Page 60: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Simple Example

I control: 10,000 users, 200 conversionsI test: 100,000 users, 2200 conversions

Observed response-rates:

I control: R̂c = 200/10, 000 = 2%I test: R̂t = 2200/100, 000 = 2.2%

Estimated Lift L̂ = 2.2/2 ≠ 1 = 10%This is a great lift !

Not so fast! Is this a reliable estimate?Could true lift ¸ be 0%, or even negative ?

Page 61: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Simple Example

I control: 10,000 users, 200 conversionsI test: 100,000 users, 2200 conversions

Observed response-rates:

I control: R̂c = 200/10, 000 = 2%I test: R̂t = 2200/100, 000 = 2.2%

Estimated Lift L̂ = 2.2/2 ≠ 1 = 10%This is a great lift !Not so fast! Is this a reliable estimate?

Could true lift ¸ be 0%, or even negative ?

Page 62: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Simple Example

I control: 10,000 users, 200 conversionsI test: 100,000 users, 2200 conversions

Observed response-rates:

I control: R̂c = 200/10, 000 = 2%I test: R̂t = 2200/100, 000 = 2.2%

Estimated Lift L̂ = 2.2/2 ≠ 1 = 10%This is a great lift !Not so fast! Is this a reliable estimate?Could true lift ¸ be 0%, or even negative ?

Page 63: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Bayesian Confidence Bounds

Sampling approach:Observed data: control: (kc , Nc), test: (kt , Nt)

1. Repeat M times:

I draw control response rate rc from posterior

P(rc | kc) ≥ Beta(kc + 1, Nc ≠ kc + 1).

I draw test response rate rt from posterior

P(rt | kt) ≥ Beta(kt + 1, Nt ≠ kt + 1).

I compute lift L = rt/rc ≠ 1

2. Compute (0.05, 0.95) quantiles of set of M lifts {L}.

Page 64: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Bayesian Confidence Bounds

Sampling approach:Observed data: control: (kc , Nc), test: (kt , Nt)

1. Repeat M times:

I draw control response rate rc from posterior

P(rc | kc) ≥ Beta(kc + 1, Nc ≠ kc + 1).

I draw test response rate rt from posterior

P(rt | kt) ≥ Beta(kt + 1, Nt ≠ kt + 1).

I compute lift L = rt/rc ≠ 1

2. Compute (0.05, 0.95) quantiles of set of M lifts {L}.

Page 65: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Bayesian Confidence Bounds

Sampling approach:Observed data: control: (kc , Nc), test: (kt , Nt)

1. Repeat M times:

I draw control response rate rc from posterior

P(rc | kc) ≥ Beta(kc + 1, Nc ≠ kc + 1).

I draw test response rate rt from posterior

P(rt | kt) ≥ Beta(kt + 1, Nt ≠ kt + 1).

I compute lift L = rt/rc ≠ 1

2. Compute (0.05, 0.95) quantiles of set of M lifts {L}.

Page 66: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Bayesian Confidence Bounds

Sampling approach:Observed data: control: (kc , Nc), test: (kt , Nt)

1. Repeat M times:

I draw control response rate rc from posterior

P(rc | kc) ≥ Beta(kc + 1, Nc ≠ kc + 1).

I draw test response rate rt from posterior

P(rt | kt) ≥ Beta(kt + 1, Nt ≠ kt + 1).

I compute lift L = rt/rc ≠ 1

2. Compute (0.05, 0.95) quantiles of set of M lifts {L}.

Page 67: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Bayesian Confidence Bounds

Sampling approach:Observed data: control: (kc , Nc), test: (kt , Nt)

1. Repeat M times:

I draw control response rate rc from posterior

P(rc | kc) ≥ Beta(kc + 1, Nc ≠ kc + 1).

I draw test response rate rt from posterior

P(rt | kt) ≥ Beta(kt + 1, Nt ≠ kt + 1).

I compute lift L = rt/rc ≠ 1

2. Compute (0.05, 0.95) quantiles of set of M lifts {L}.

Page 68: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Bayesian Confidence Intervals

I control: nc = 10, 000 users, kc = 200 conversionsI test: nt = 100, 000 users, kt = 2, 200 conversions

Observed response-rates:

I control: R̂c = 200/10, 000 = 2%I test: R̂t = 2200/100, 000 = 2.2%

Estimated Lift L̂ = 2.2/2 ≠ 1 = 10%

90% confidence interval: (≠2.7%, 23.6%)

Page 69: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Bayesian Confidence Intervals

I control: nc = 10, 000 users, kc = 200 conversionsI test: nt = 100, 000 users, kt = 2, 200 conversions

Observed response-rates:

I control: R̂c = 200/10, 000 = 2%I test: R̂t = 2200/100, 000 = 2.2%

Estimated Lift L̂ = 2.2/2 ≠ 1 = 10%90% confidence interval: (≠2.7%, 23.6%)

Page 70: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha
Page 71: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Complication 1:

Auction win-bias

Page 72: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ideal Randomized Test

Page 73: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ideal Randomized Test

Page 74: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ideal Randomized Test

Bids on control users are wasted!

Page 75: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ideal Randomized Test

Bids on control users are wasted!

Page 76: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha
Page 77: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

A Less Wasteful Randomized Test

Page 78: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

A Less Wasteful Randomized Test: Win-bias

Cannot simply compare Test Winners (tw) and Control (c):

I test-winners selection bias: “win bias”

Page 79: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Proper Definition

Page 80: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Proper Definition

Page 81: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Proper Definition

Page 82: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Proper Definition

Page 83: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Proper Definition

Page 84: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Proper Definition

Page 85: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift Estimation

Main ideas:

I observe test-losers response rate RtL

I observe test win-rate w

I we show one can estimate

R

0tw = Rc ≠ (1 ≠ w)RtL

w

I compute lift L = R

1tw /R

0tw ≠ 1

I similar to Treatment E�ect Under Non-compliance in clinicialtrials.

Page 86: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift Estimation

Main ideas:

I observe test-losers response rate RtL

I observe test win-rate w

I we show one can estimate

R

0tw = Rc ≠ (1 ≠ w)RtL

w

I compute lift L = R

1tw /R

0tw ≠ 1

I similar to Treatment E�ect Under Non-compliance in clinicialtrials.

Page 87: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift Estimation

Main ideas:

I observe test-losers response rate RtL

I observe test win-rate w

I we show one can estimate

R

0tw = Rc ≠ (1 ≠ w)RtL

w

I compute lift L = R

1tw /R

0tw ≠ 1

I similar to Treatment E�ect Under Non-compliance in clinicialtrials.

Page 88: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift Estimation

Main ideas:

I observe test-losers response rate RtL

I observe test win-rate w

I we show one can estimate

R

0tw = Rc ≠ (1 ≠ w)RtL

w

I compute lift L = R

1tw /R

0tw ≠ 1

I similar to Treatment E�ect Under Non-compliance in clinicialtrials.

Page 89: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift Estimation

Main ideas:

I observe test-losers response rate RtL

I observe test win-rate w

I we show one can estimate

R

0tw = Rc ≠ (1 ≠ w)RtL

w

I compute lift L = R

1tw /R

0tw ≠ 1

I similar to Treatment E�ect Under Non-compliance in clinicialtrials.

Page 90: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift Estimation

How to compute the 90% confidence interval for L?

Page 91: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha
Page 92: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Confidence Intervals with Gibbs sampler

Bayesian approach (details omitted, see Chickering/Pearl 1997):

I Assume a random parameter vector ◊ consisting of:

I user latent (potential) behaviorsI their probabilities

I Set up prior distribution on ◊ ≥ p(◊) (Dirichlet)

I Sample M values of unknown ◊ from posterior: Gibbs Sampler

P(◊ |Data) Ã P(Data | ◊) · p(◊)

I For each sampled ◊ compute lift L using above

I Compute (0.05, 0.95) quantiles of sampled L values

Page 93: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Confidence Intervals with Gibbs sampler

Bayesian approach (details omitted, see Chickering/Pearl 1997):

I Assume a random parameter vector ◊ consisting of:

I user latent (potential) behaviorsI their probabilities

I Set up prior distribution on ◊ ≥ p(◊) (Dirichlet)

I Sample M values of unknown ◊ from posterior: Gibbs Sampler

P(◊ |Data) Ã P(Data | ◊) · p(◊)

I For each sampled ◊ compute lift L using above

I Compute (0.05, 0.95) quantiles of sampled L values

Page 94: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Confidence Intervals with Gibbs sampler

Bayesian approach (details omitted, see Chickering/Pearl 1997):

I Assume a random parameter vector ◊ consisting of:I user latent (potential) behaviors

I their probabilities

I Set up prior distribution on ◊ ≥ p(◊) (Dirichlet)

I Sample M values of unknown ◊ from posterior: Gibbs Sampler

P(◊ |Data) Ã P(Data | ◊) · p(◊)

I For each sampled ◊ compute lift L using above

I Compute (0.05, 0.95) quantiles of sampled L values

Page 95: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Confidence Intervals with Gibbs sampler

Bayesian approach (details omitted, see Chickering/Pearl 1997):

I Assume a random parameter vector ◊ consisting of:I user latent (potential) behaviorsI their probabilities

I Set up prior distribution on ◊ ≥ p(◊) (Dirichlet)

I Sample M values of unknown ◊ from posterior: Gibbs Sampler

P(◊ |Data) Ã P(Data | ◊) · p(◊)

I For each sampled ◊ compute lift L using above

I Compute (0.05, 0.95) quantiles of sampled L values

Page 96: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Confidence Intervals with Gibbs sampler

Bayesian approach (details omitted, see Chickering/Pearl 1997):

I Assume a random parameter vector ◊ consisting of:I user latent (potential) behaviorsI their probabilities

I Set up prior distribution on ◊ ≥ p(◊) (Dirichlet)

I Sample M values of unknown ◊ from posterior: Gibbs Sampler

P(◊ |Data) Ã P(Data | ◊) · p(◊)

I For each sampled ◊ compute lift L using above

I Compute (0.05, 0.95) quantiles of sampled L values

Page 97: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Confidence Intervals with Gibbs sampler

Bayesian approach (details omitted, see Chickering/Pearl 1997):

I Assume a random parameter vector ◊ consisting of:I user latent (potential) behaviorsI their probabilities

I Set up prior distribution on ◊ ≥ p(◊) (Dirichlet)

I Sample M values of unknown ◊ from posterior: Gibbs Sampler

P(◊ |Data) Ã P(Data | ◊) · p(◊)

I For each sampled ◊ compute lift L using above

I Compute (0.05, 0.95) quantiles of sampled L values

Page 98: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Confidence Intervals with Gibbs sampler

Bayesian approach (details omitted, see Chickering/Pearl 1997):

I Assume a random parameter vector ◊ consisting of:I user latent (potential) behaviorsI their probabilities

I Set up prior distribution on ◊ ≥ p(◊) (Dirichlet)

I Sample M values of unknown ◊ from posterior: Gibbs Sampler

P(◊ |Data) Ã P(Data | ◊) · p(◊)

I For each sampled ◊ compute lift L using above

I Compute (0.05, 0.95) quantiles of sampled L values

Page 99: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Confidence Intervals with Gibbs sampler

Bayesian approach (details omitted, see Chickering/Pearl 1997):

I Assume a random parameter vector ◊ consisting of:I user latent (potential) behaviorsI their probabilities

I Set up prior distribution on ◊ ≥ p(◊) (Dirichlet)

I Sample M values of unknown ◊ from posterior: Gibbs Sampler

P(◊ |Data) Ã P(Data | ◊) · p(◊)

I For each sampled ◊ compute lift L using above

I Compute (0.05, 0.95) quantiles of sampled L values

Page 100: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Confidence Intervals

Gibbs sampler convergence may depend on prior distribution:

I start with multiple (say 100) priorsI run them all in parallel using Spark.

Page 101: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Ad Lift: Confidence Intervals

Gibbs sampler convergence may depend on prior distribution:

I start with multiple (say 100) priorsI run them all in parallel using Spark.

Page 102: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Uses of Monte Carlo Simulations

I confidence intervals

I determine “su�cient” population sizes for reliably estimating

I response ratesI lift

I understand e�ect of complex phenomena

I validate/verify analytical formulas

Page 103: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Uses of Monte Carlo Simulations

I confidence intervals

I determine “su�cient” population sizes for reliably estimating

I response ratesI lift

I understand e�ect of complex phenomena

I validate/verify analytical formulas

Page 104: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Uses of Monte Carlo Simulations

I confidence intervals

I determine “su�cient” population sizes for reliably estimatingI response rates

I lift

I understand e�ect of complex phenomena

I validate/verify analytical formulas

Page 105: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Uses of Monte Carlo Simulations

I confidence intervals

I determine “su�cient” population sizes for reliably estimatingI response ratesI lift

I understand e�ect of complex phenomena

I validate/verify analytical formulas

Page 106: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Uses of Monte Carlo Simulations

I confidence intervals

I determine “su�cient” population sizes for reliably estimatingI response ratesI lift

I understand e�ect of complex phenomena

I validate/verify analytical formulas

Page 107: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Uses of Monte Carlo Simulations

I confidence intervals

I determine “su�cient” population sizes for reliably estimatingI response ratesI lift

I understand e�ect of complex phenomenaI validate/verify analytical formulas

Page 108: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Complication 2:

Control contamination due to users with multiple cookies

Page 109: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha
Page 110: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Control Contamination due to Multiple Cookies

Page 111: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Control Contamination due to Multiple Cookies

Page 112: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Control Contamination due to Multiple Cookies

Page 113: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Control Contamination due to Multiple Cookies

Page 114: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Control Contamination due to Multiple Cookies

Page 115: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Cookie-Contamination Questions

I How does cookie contamination a�ect measured lift?

I Does the cookie-distribution matter?

I everyone has k cookies vs an average of k cookies

I What is the influence of the control percentage?

I Simulations best way to understand this

Page 116: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Cookie-Contamination Questions

I How does cookie contamination a�ect measured lift?

I Does the cookie-distribution matter?

I everyone has k cookies vs an average of k cookies

I What is the influence of the control percentage?

I Simulations best way to understand this

Page 117: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Cookie-Contamination Questions

I How does cookie contamination a�ect measured lift?

I Does the cookie-distribution matter?I everyone has k cookies vs an average of k cookies

I What is the influence of the control percentage?

I Simulations best way to understand this

Page 118: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Cookie-Contamination Questions

I How does cookie contamination a�ect measured lift?

I Does the cookie-distribution matter?I everyone has k cookies vs an average of k cookies

I What is the influence of the control percentage?

I Simulations best way to understand this

Page 119: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Cookie-Contamination Questions

I How does cookie contamination a�ect measured lift?

I Does the cookie-distribution matter?I everyone has k cookies vs an average of k cookies

I What is the influence of the control percentage?

I Simulations best way to understand this

Page 120: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Simulations for cookie-contamination

I A scenario is a combination of parameters:I

M = # trials for this scenario, usually 10K-1MI

n = # users, typically 10K - 10MI

p = # control percentage (usually 10-50%)I

k = cookie-distribution, expressed as 1 : 100, or 1 : 70, 3 : 30I

r = (un-contaminated) control user response rateI

a = true lift, i.e. exposed user response rate = r ú (1 + a).I A scenario file specifies a scenario in each row.

I could be thousands of scenarios

Page 121: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Scenario Simulations in Spark

Page 122: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Scenario Simulations in Spark

Page 123: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Scenario Simulations in Spark

Page 124: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Scenario Simulations in Spark

Page 125: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Scenario Simulations in Spark

Page 126: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha
Page 127: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha