Accountability through Information Flow Experiments Michael Carl Tschantz UC Berkeley Amit Datta,...

23
Accountability through Information Flow Experiments Michael Carl Tschantz UC Berkeley Amit Datta, CMU Anupam Datta, CMU Jeannette M. Wing, MSR www.cs.cmu.edu/~mtschant/ife

Transcript of Accountability through Information Flow Experiments Michael Carl Tschantz UC Berkeley Amit Datta,...

Accountability through Information Flow Experiments

Michael Carl TschantzUC Berkeley

Amit Datta, CMUAnupam Datta, CMU

Jeannette M. Wing, MSR

www.cs.cmu.edu/~mtschant/ife

2

3

Google’s Privacy Policy

When showing you tailored ads, we will not associate a cookie or anonymous identifier with sensitive categories, such as those based on race, religion, sexual orientation or health.

Google Ad Settings

4

5

Web browsing

Advertisements

Ad settings

Inferences Edits

Ad ecosystem

AdFisher

• Emulates users with fresh browser instances• Randomized assignment• Statistical analysis to find causal relations • Open source: github.com/tadatitam/info-flow-

experiments

6

Transparency

7

Web browsing

Advertisements

Ad settings

Ad ecosystem

No effect on ad settings

Visit top 100 substance abuse sites

Significant causal effect on ads (p=0.000005)

Transparency Explanations

8

Substance Abuse Visitors Control Group

The Watershed Rehabwww.thewatershed.com/Help

Alluria Alertwww.bestbeautybrand.com

Watershed Rehabwww.thewatershed.com/Rehab

Best Dividend Stocksdividends.wyattresearch.com

The Watershed Rehab(none)

10 Stocks to Hold Foreverwww.streetauthority.com

Choice

9

Web browsing

Advertisements

Ad settings

Ad ecosystem

Visits websites related to online dating

Removes interests related to online dating

Causes significant reduction in dating ads(p=0.008)

Choice Explanation

10

Keeping Dating InterestRemoving Dating Interest

Are You Single?www.zoosk.com/Dating

Car Loans w/ Bad Creditwww.car.com/Bad-Credit-Car-Loan

Top 5 Online Dating Siteswww.consumer-rankings.com/Dating

Individual Health Planswww.individualhealthquotes.com

Why can't I find a date?www.gk2gk.com

Crazy New Obama Taxwww.endofamerica.com

Discrimination

11

Web browsing

Advertisements

Ad settings

Ad ecosystem

Set the gender bit to female or male

Browse websites related finding a new job

Significant difference ads on news website(p=0.000005)

Discrimination Explanation

12

Female Group Male Group

Jobs (Hiring Now)www.jobsinyourarea.co

$200k+ Jobs - Execs Onlycareerchange.com

4Runner Parts Servicewww.westernpatoyotaservice.com

Find Next $200k+ Jobcareerchange.com

Criminal Justice Programwww3.mc3.edu/Criminal+Justice

Become a Youth Counselorwww.youthcounseling.degreeleap.com

Findings

• Lack of transparency – Web browsing can affect ads without affecting Ad

Settings

• Users have some choice– Removing interests affects ads

• Discrimination occurs– Gender affects job-related ads

13

Information Flow ExperimentsNatural Sciences Information Flow

Natural process System in question

Population of units Subset of interactions

… …

Causation Information flow

14

Theorem

Pearl’s Causation = Probabilistic Interference

15

Number of Unique Ads

1 2 3 4 5 6 7 8 9 10

13 13

810 10

1213

11

7

17

16

Number of Unique Ads

10 1 2 7 6 8 5 4 3 9

17

13 13 1312

1110 10

87

17

0 20 40 60 80 100 120 140 160 180 2000

5

10

15

20

25

30

35

40

45

Reload number

Ad

idGoogle’s Behavior is Complex

Prior Work on Behavioral Marketing

Authors Test Limitation

Guha et al. Cosine similarity No statistical significance

Balebako et al. Cosine similarity No statistical significance

Wills and Tatar Ad hoc examination No statistical significance

Liu et al. Process of elimination No statistical significance

Barford et al. χ2 test Assumes ads identically distributed

Lécuyer et al. Parametric model Correlation, not causation; assumes ads are independent

Englehardt et al. Binomial test Assumes ads identically distributed

18

Randomized Controlled Trials

19

Experimental Group Control Group

Controlled Environment

Measurements

Experimental Treatment Control Treatment

Ad Ecosystem

Ad Ecosyste

m Test Statistic

Observed ValueHypothetical Value

Our Methodology

20

Measurements

Experi

men

tal Tr

eatm

ent

Contr

ol Tr

eatm

ent

Significance Testing

Measurements

p-value

Ad Ecosystem

Ad Ecosystem

Ad Ecosyste

m

Ad Ecosyste

m

block 1

block n

Ad Ecosystem

Ad Ecosystem

Training Data

Machine Learning

Classifier

Explanations

Summary

• Rigorous information flow experiments1. Probabilistic interference = Pearl’s causation2. Experimental design for causal determination3. Significance testing with non-parametric statistics

• Experimental study of Google Ads1. AdFisher Tool2. Findings of opacity, choice, and discrimination

21

Future Work

• Extensions of AdFisher– Interpretable machine learning

• Incorporating formal notions of discrimination– Discrimination vs. unfairness

• How much transparency is right?• Internal auditing and preventing violations– Policing advertisers– Understanding models from machine learning

22

Accountability through Information Flow Experiments

Michael Carl TschantzUC Berkeley

Amit Datta, CMUAnupam Datta, CMU

Jeannette M. Wing, MSR

www.cs.cmu.edu/~mtschant/ife