Ed Snelson. Counterfactual Analysis

40
Counterfactual analysis: a Big Data case-study using Cosmos/SCOPE Ed Snelson

description

#BigDataBY

Transcript of Ed Snelson. Counterfactual Analysis

Page 1: Ed Snelson. Counterfactual Analysis

Counterfactual analysis: a Big Data case-study using Cosmos/SCOPE

Ed Snelson

Page 2: Ed Snelson. Counterfactual Analysis

Work by

Jonas Peters Joaquin Quiñonero Candela

Denis Xavier Charles D. Max Chickering

Elon Portugaly Dipankar Ray

Patrice Simard Ed Snelson

Léon Bottou

http://jmlr.org/papers/v14/bottou13a.html

Page 3: Ed Snelson. Counterfactual Analysis

I. MOTIVATION

Page 4: Ed Snelson. Counterfactual Analysis

Search ads

Page 5: Ed Snelson. Counterfactual Analysis

The search ads ecosystem

User Advertiser

Queries

Ads &Bids

Ads Prices

Clicks (and consequences)Learning

ADVERTISER FEEDBACK LOOP

LEARNINGFEEDBACK LOOP

USER FEEDBACK

LOOP

Search-engine

Page 6: Ed Snelson. Counterfactual Analysis

Learning to run a marketplace

• The learning machine is not a machine but is an organization with lots of people doing stuff!

How can we help?

• Goal: improve marketplace machinery such that its long term revenue is maximal

• Approximate goal by improving multiple performance measures (KPIs) related to all players• Provide data for decision making• Automatically optimize parts of the

system

Page 7: Ed Snelson. Counterfactual Analysis

Outline from here on

II. Online Experimentation

III. Counterfactual measurements

IV. Cosmos/SCOPE

V. Implementation details

Page 8: Ed Snelson. Counterfactual Analysis

II. ONLINE EXPERIMENTATION

Page 9: Ed Snelson. Counterfactual Analysis

How do parameters affect KPIs?

• We want to determine how certain auction parameters affect KPIs

• Three options:

1. Offline log analysis – “correlational”

2. Auction simulation

3. Online experimentation – “causal”

Page 10: Ed Snelson. Counterfactual Analysis

The problem with correlation analysis(Simpson’s paradox)

Trying to decide whether a drug helps or not

• Historical data:

• Conclusion: don’t give the drug

But what if the Drs. were saving the drug for the severe cases?

• Conclusion reversed: drug helps for both severe and mild cases

All Survived DiedSurvival Rate

Treated 5,000 2,100 2,900 42%

Not Treated 5,000 2,900 2,100 58%

Severe cases (treatment rate 80%)

All Survived DiedSurvival Rate

Treated 4,000 1,200 2,800 30%

Not Treated 1,000 100 900 10%

Mild case (treatment rate 20%)

All Survived DiedSurvival Rate

Treated 1,000 900 100 90%

Not Treated 4,000 2,800 1,300 70%

Page 11: Ed Snelson. Counterfactual Analysis

Overkill?

Pervasive causation paradoxes in ad data!

Example.– Logged data shows a positive correlation between

event A “First mainline ad gets a high quality score” and event B “Second mainline ad receives a click”.

– Do high quality ads encourage clicking below?

– Controlling for event C ”Query categorized as commercial” reverses the correlation for both commercial and non-commercial queries.

Page 12: Ed Snelson. Counterfactual Analysis

Randomized experimentsRandomly select who to treat

• Selection independent of all confounding factors

• Therefore eliminates Simpson’s paradox and allows:

Counterfactual estimates

• If we had given drug to 𝑥% of the patients,the success rate would have been 60% × 𝑥 + 40% × 1 − 𝑥

All population (treatment rate 30%)

All Survived DiedSurvival Rate

Treated 3,000 1,800 1,200 60%

Not Treated 7,000 2,800 4,200 40%

Page 13: Ed Snelson. Counterfactual Analysis

Experiments in the online world

• A/B tests are used throughout the online world to compare different versions of the system

– A random fraction of the traffic (a flight) uses click-prediction system A

– Another random fraction uses click-prediction system B

• Wait for a week, measure KPIs, choose best!

• Our framework takes this one step further…

Page 14: Ed Snelson. Counterfactual Analysis

III. COUNTERFACTUALMEASUREMENTS

Page 15: Ed Snelson. Counterfactual Analysis

Counterfactuals

Measuring something that did not happen

“How would the system have performed if, when the data was collected, we had used 𝑠𝑦𝑠𝑡𝑒𝑚∗ instead of 𝑠𝑦𝑠𝑡𝑒𝑚?”

Page 16: Ed Snelson. Counterfactual Analysis

Replaying past data

Classification example

• Collect labeled data in existing setup

• Replay the past data to evaluate what the performance would have been if we had used classifier θ.

• Requires knowledge of all functions connecting the point of change to the point of measurement.

𝑠*

Page 17: Ed Snelson. Counterfactual Analysis

Concrete example: mainline reserve (MLR)

MainlineSidebar

Ad Score > MLR

Page 18: Ed Snelson. Counterfactual Analysis

Online randomizationQ: Can we estimate the results of a change counterfactually

(without actually performing the change)?

A: Yes, if 𝑠𝑦𝑠𝑡𝑒𝑚∗ and 𝑠𝑦𝑠𝑡𝑒𝑚 are non-deterministic (and close enough)

𝑃(𝑀𝐿𝑅)𝑃∗(𝑀𝐿𝑅)

MLR MLR

𝑀𝐿𝑅 𝑀𝐿𝑅 ∗

Deterministic Randomized

For each auction, a random MLR is used online, drawn from the data-collection distribution 𝑃(𝑀𝐿𝑅)

Page 19: Ed Snelson. Counterfactual Analysis

Estimating counterfactual KPIs

𝐶𝑙𝑖𝑐𝑘𝑠𝑡𝑜𝑡𝑎𝑙∗ ~

𝑖

𝑤𝑖∗𝐶𝑙𝑖𝑐𝑘𝑠(𝑎𝑢𝑐𝑡𝑖𝑜𝑛𝑖)

𝐶𝑙𝑖𝑐𝑘𝑠𝑡𝑜𝑡𝑎𝑙 =

𝑖

𝐶𝑙𝑖𝑐𝑘𝑠(𝑎𝑢𝑐𝑡𝑖𝑜𝑛𝑖)

Usual additive KPI:

Counterfactual KPI:• Weighted sum: auctions with MLRs “closer” to the counterfactual

distribution get higher weight

𝑤𝑖∗ =𝑃∗(𝑀𝐿𝑅𝑖)

𝑃 𝑀𝐿𝑅𝑖

Page 20: Ed Snelson. Counterfactual Analysis

Exploration

𝑃(𝜔) 𝑃∗(𝜔) Quality of the estimation

• Confidence intervals reveal whether the data collection distribution 𝑃 𝜔 performs sufficient exploration to answer the counterfactual question of interest.

𝑃(𝜔) 𝑃∗(𝜔)

Page 21: Ed Snelson. Counterfactual Analysis

Clicks vs MLRInner

“exploration” intervalOuter “sample-

size” interval

Control with no randomization

Control with 18% lower MLR

Page 22: Ed Snelson. Counterfactual Analysis

Number of Mainline Ads vs MLR

This is easy to estimate

Page 23: Ed Snelson. Counterfactual Analysis

Revenue vs MLR

Revenue has always high

sample variance

Page 24: Ed Snelson. Counterfactual Analysis

More with the same data

How is this related to A/B testing?

• A/B testing tests 2 specific settings against each other

• Need to know what questions you want to ask beforehand!

Big advantage of more general randomization:

• Collect data first, choose question(s) later

• Randomizing more stuff increases opportunities

But…

• Requires more sophisticated offline log processing

Page 25: Ed Snelson. Counterfactual Analysis

IV. COSMOS/SCOPE

Page 26: Ed Snelson. Counterfactual Analysis

Ad Auction Logs

• ≈ 10TB per day ad-auction logs

• Cooked and joined from various raw logs

• Stored in Cosmos, queried via SCOPE

• Small fraction of total Bing logs and jobs:

– Tens of thousands SCOPE jobs daily

– Tens of PBs read/write daily

Page 27: Ed Snelson. Counterfactual Analysis

Cosmos/SCOPE

≈ PIG/HIVE

≈ HDFS

http://research.microsoft.com/en-us/um/people/jrzhou/pub/Scope.pdf

http://research.microsoft.com/en-us/um/people/jrzhou/pub/scope-vldbj.pdf

Page 28: Ed Snelson. Counterfactual Analysis

Cosmos

• Microsoft’s internal distributed data store

• Tens of thousands of commodity servers≈ HDFS, GFS

• Append-only file system, optimized for sequential I/O

• Data replication and compression

Page 29: Ed Snelson. Counterfactual Analysis

Data Representation

1. Unstructured streams– Custom Extractors: converts a sequence of bytes into

a RowSet, specifying a schema for the columns

2. Structured streams– Data stored alongside metadata information: a well-

defined schema, and structural properties (e.g. partitioning and sorting information)

– Can be horizontally partitioned into tens of thousands of partitions e.g. hash or range partitioning

– Indexes for random access and index-based joins

Page 30: Ed Snelson. Counterfactual Analysis

SCOPE scripting language

• SQL-like (in syntax) declarative language specifying data transformation pipeline

• Each scope statement takes as input one or more RowSets, and outputs another RowSet

• Highly extensible with C# expressions, custom operators and data types

• Scope compiler and optimizer responsible for generating a data flow DAG for an efficient parallel execution

Page 31: Ed Snelson. Counterfactual Analysis

C# Expressions and functions

R1 = SELECT A+C AS ac, B.Trim() AS B1

FROM R

WHERE StringOccurs(C, “xyz”) > 2;

#CS

public static int StringOccurs(string str, string ptrn)

{

int cnt=0;

int pos=-1;

while (pos+1 < str.Length)

{

pos = str.IndexOf(ptrn, pos+1);

if (pos < 0) break;

cnt++;

}

return cnt;

}

#ENDCS

C# String method

C# String expression

Page 32: Ed Snelson. Counterfactual Analysis

C# User-defined types (UDTs)

– Arbitrary C# classes can be used as column types in scripts

– Extremely convenient for easy serialization/deserialization

– Can be referenced in external dlls, C# backing files, and in-script (#CS … #ENDCS)

SELECT UserId, SessionId,

new RequestInfo(binaryData)

AS Request

FROM InputStream

WHERE Request.Browser.IsIE();

Page 33: Ed Snelson. Counterfactual Analysis

C# User-defined operators

– User defined aggregates

• Aggregate Interface: Intialize, Accumulate, Finalize

• Can be declared recursive: allows partial aggregation

– MapReduce-like extensions

• PROCESS

• REDUCE– Can be declared recursive

• COMBINE

Page 34: Ed Snelson. Counterfactual Analysis

SCOPE compilation and executionSELECT query, COUNT() AS count FROM "search.log“

USING LogExtractor

GROUP BY query

HAVING count > 1000

ORDER BY count DESC;

OUTPUT TO "qcount.result";

Runtime cost-based optimizer

Page 35: Ed Snelson. Counterfactual Analysis

SCOPE: Pros/Cons (an opinion)

• Pros:– Very quick to write simple queries without thinking

about parallelization and execution

– Highly extensible with deep C# integration

– UDT columns and C# functions

– Easy development and debugging from VS• Intellisense

• Cons:– No loop/iteration support means a poor fit for many

ML algorithms

– Batch, rather than interactive

Page 36: Ed Snelson. Counterfactual Analysis

V. IMPLEMENTATION

Page 37: Ed Snelson. Counterfactual Analysis

Counterfactual computation

• Ideal for Map-Reduce setting

• Map: 𝑎𝑢𝑐𝑡𝑖𝑜𝑛𝑖 → 𝐾𝑃𝐼(𝑎𝑢𝑐𝑡𝑖𝑜𝑛𝑖)

• Reduce: 𝑖𝑤𝑖∗…

𝐾𝑃𝐼𝑡𝑜𝑡𝑎𝑙∗ =

𝑖

𝑤𝑖∗ 𝐾𝑃𝐼(𝑎𝑢𝑐𝑡𝑖𝑜𝑛𝑖)

Page 38: Ed Snelson. Counterfactual Analysis

Counterfactual grid

Page 39: Ed Snelson. Counterfactual Analysis

SCOPE pseudo-code for counterfactualsAuctionLogs = VIEW CosmosLogPath;

SELECT Auction

FROM AuctionLogs;

SELECT ComputeKPIs(Auction) AS KPIs,

ComputeWeightGrid(Auction) AS WeightGrid;

SELECT ComputeWeightedKPIs(KPIs, GridPoint) AS wKPIs,

CROSS APPLY WeightGrid AS GridPoint;

SELECT AggregateKPIs(wKPIs) AS TotalKPIs

GROUP BY GridPoint;

SELECT GridPoint, TotalKPIs.Finalize() AS FinalKPIs

OUTPUT TO “Results.tsv”;

C# UDT: Wraps all logged info about a single auction

C# UDFs

Call instance method on “TotalKPIs” UDT

Recursive Aggregator: 𝑤𝑖, 𝑤𝑖𝐾𝑃𝐼𝑖

etc.

Unroll the weight grid

Page 40: Ed Snelson. Counterfactual Analysis

Conclusions

• There are systems in the real world that are too complex to easily formalize

• Causal inference clarifies many problems

– Ignoring causality => Simpson’s paradox

– Randomness allows inferring causality

• The counterfactual framework is modular

– Randomize in advance, ask later

• Counterfactual analysis ideally suited to batch map-reduce