Fall 2020 Data Ethics in Algorithmic Decision Making · 2020. 11. 11. · 11/11/2020 4 Fall 2020 7...

11/11/2020

1

Fall 2020

1

Data Ethics in Algorithmic

Decision Making

Vassilis Christophides, Vasilis Efthymiou{christop|vefthym}@csd.uoc.gr

http://www.csd.uoc.gr/~hy562

University of Crete, Fall 2020

Fall 2020

2

Plan

• Introduction and Motivation

• Discrimination Examples in Algorithmic Decision Making

▪Direct and Indirect Discrimination

• How Machines Learn to Discriminate?

• Anti-Discrimination Learning

▪Associational Fairness Definitions

▪Causal Fairness Definitions

• Upstream Discrimination Prevention in ML Pipelines

▪ Interplay of Data Preprocessing and Fairness Interventions

• Progress so Far and Acknowledgments

1

2

11/11/2020

2

Fall 2020

3

The Rise of Artificial Intelligence (AI)

• Widespread investments in Artificial Intelligence (AI) are made to enable computers

to interpret what they see, communicate in natural language, answer complex

questions, and interact with their physical environment

▪ improve people’s lives, e.g., autonomous cars

▪ accelerate scientific discovery, e.g., precision

medicine

▪ protect environment, e.g., reduce energy footprint

▪ optimize business, e.g., targeted advertisement

▪ transform society, e.g., increasing automation

Fall 2020

4

Timeline of the Industrial Revolutions

• The huge societal challenges arriving with the increased automation enabled by AI,

resemble the challenges of industrial revolutions

▪Data is the driver of the new industrial era and actually fuel the development of AI

https://pocketconfidant.com/how-the-era-of-artificial-intelligence-will-transform-society/

3

4

11/11/2020

3

Fall 2020

5

AI is Data Hungry!

https://www.slideshare.net/KeithKraus/gpuaccelerating-udfs-in-pyspark-with-numba-and-pygdf

1012

1015

1018

1021

109

Self-Driving

Cars

Superhuman

Doctor

Life On Other

Planets?

Creative arts

Smart farms

& food

systems

'Detoxify'

Social Media

Personal

assistants

Smarter

Cybersecurity

Earth challenge

areas

An ML model is only as good as its data, and no matter how good a training algorithm

is, the ultimate quality of automated decisions lie in the data itself!

Fall 2020

6

Automated Decisions of Consequent

• Widespread algorithmic decision systems with many

small interactions

▪ e.g. search, recommendations, social media, …

• Specialized algorithmic decision systems with fewer

but higher-stakes interactions

▪ e.g. hiring and promotion, credit-worthiness and

loans, criminal justice and predictive policing,

child maltreatment screening, medical

diagnosis, welfare eligibility, …

• At this level of impact, algorithmic decision-making

can have unintended consequences in people’s life

6

Hiring AI

Policing Sentencing AI

Lending AI

5

6

11/11/2020

4

Fall 2020

7

Are Automated Decisions Impartial?

• Algorithmic decision making is more objective than humans and yet…

▪All traditional evils of discrimination, and many new ones,

exhibit themselves in the Big Data & AI ecosystem

▪Opaque automated decision systems

• Human decision-making affected by greed, prejudice, fatigue, poor

scalability, etc. and hence can be biased!

▪ Formal procedures can limit opportunities to exercise prejudicial

discretion or fall victim to implicit bias

• High-stakes scenarios = ethical problems!

▪ Despite existing legal/regulation efforts, current anti-discrimination laws

are not yet well equipped to deal with various issues of discrimination in

data analysis [S. Barocas, A.D. Selbst 2016]

Fall 2020

8

Discrimination is not a General Concept

• It is domain specific

▪Concerned with important opportunities that affect people’s life chances

• It is feature specific

▪Concerned with socially salient qualities that have served as the basis for

unjustified & systematically adverse treatment in the past

7

8

11/11/2020

5

Fall 2020

10

Legally Recognized ‘Protected Classes’

https://www.slideshare.net/KrishnaramKenthapadi/fairnessaware-machine-learning-practical-challenges-and-lessons-learned-www-2019-tutorial

Societal categories (political

ideology, income, language,

physical traits), Intersectional

subpopulations (young white

woman, old black men), etc.

Fall 2020

11

Legal & Regulatory Frameworks in EU

• EU Council Directive 76/207/EEC of 9 February 1976 on the implementation of the

principle of equal treatment for men and women as regards access to employment,

vocational training and promotion, and working conditions

• GDPR 2016/679 71 [https://eur-lex.europa.eu/legal-content/en/TXT/ ?uri=CELEX:

32015R2120]

▪ […] In order to ensure fair and transparent processing […], the controller should

use appropriate mathematical or statistical procedures for the profiling,

▪ […] and that prevents, inter alia, discriminatory effects on natural persons on

the basis of racial or ethnic origin, political opinion, religion or beliefs,

▪ […] or that result in measures having such an effect.

Direct

Indirect

10

11

11/11/2020

6

Fall 2020

12

Discrimination Low: Two Doctrines

• Disparate treatment (DT) is the illegal practice of treating an entity, such as a creditor

or employer, differently based on a protected/sensitive attributes such as race, gender,

age, religion, sexual orientation, etc.

▪ avoid disparities between different outputs for groups of people with the same (or

similar) values of non-sensitive attributes but different values of sensitive ones

• Disparate impact (DI) is the result of systematic disparate treatment, where

disproportionate adverse impact is observed on members of a protected class [M.

Feldman et all 2015]

▪ minimize outputs that benefit (harm) a group of people sharing a value of a

sensitive attribute more frequently than other groups of people

Fall 2020

14

What does Discrimination Law Aim to Achieve?

• Equality of opportunity

(treatment):

▪ Procedural fairness

• Minimized inequality of outcome

(impact)

▪ Outcome fairness

https://www.reddit.com/r/GCdebatesQT/comments/7qpbpp/food_for_thought_equality_vs_equity_vs_justice

12

14

11/11/2020

7

Fall 2020

15

Example: Criminal Justice System

Dylon Fugett (right) was rated low risk while Bernad Parcker (left) was rated high risk

• There’s software used across US to predict whether someone who has been arrested will be re-arrested in the future on the basis of criminal history, demographic, and other information

▪ Assessing recidivism risk is biased against blacks [J. Angwin, et al. 2016]

https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

• Recidivism Risk: Defendant’s likelihood of committing a crime

▪ Used to decide pretrial detention, bail and sentencing

Fall 2020

16

Example: Predictive Policing

https://www.smithsonianmag.com/innovation/artificial-intelligence-is-now-used-predict-crime-is-it-biased-180968337/

• PredPol identifies areas in a neighborhood where serious crimes are more likely to occur during a particular period

▪ using a wide range of “leading indicator” data, including reports of crimes, such as simple assaults, vandalism and disorderly conduct, and 911 calls about such things as shots fired or a person seen with a weapon

• The American Civil Liberties Union [ACLU], the Brennan Center for Justice and various civil rights organizations have all raised questions about the risk of bias being baked into the software

15

16

11/11/2020

8

Fall 2020

17

• Judge (Decision maker): Of those I’ve labeled

high-risk, how many will recidivate ?

Positive Predictive Value

False Positive Rate

Demographic Populations

Recidivate

Did Not

Recidivate

Label

low-risk

Label

high-risk

Tutorial: 21 fairness definitions and their politics: https://www.youtube.com/watch?v=jIXIuYdnyyk

Detention of High-Risk Criminals

• Defense (think hiring rather than criminal justice): Is

the selected set demographically balanced ?

• Plaintiff: What’s the probability I’ll be incorrectly

classifying high-risk ?

True

Positive

False

Positive

False

Negative

True

Negative

Fall 2020

18

Prediction Fails Differently for Black Defendants!

[J. Dressel and H. Farid 2018]

Labeled

high-risk,

But Didn't

Re-offend

Labeled low-

risk, Yet Did

Re-offend

23,5% 44,9%

47,7% 28,0%

White African

American

• COMPAS: Correctional Offender Management Profiling for Alternative Sanctions

FNRsFPRs

17

18

11/11/2020

9

Fall 2020

19

Essence of the COMPAS Debate• ProPublica's main charge:

▪Black defendants face higher false positive rate: among defendants who did not get

rearrested, black defendants were twice as likely to be misclassified as high risk

▪White defendants face higher false negative rate: among defendants who got

rearrested, white defendants were twice as likely to be misclassified as low risk

• Northpointe's (now Equivant, of Canton, Ohio) main defense:

▪COMPAS was not made to make absolute predictions about success or failure and

it was designed to inform probabilities of reoffending across three categories of risk

(low, medium, & high)

▪The system is well-calibrated, and that if a person is assigned to one of the three

risk categories, we can treat them as having a certain risk level

• Word of caution:

▪Neither calibration nor equality of false positive/negative rates rule out blatantly

unfair practices [S. Corbett-Davies et al 2017]

Fall 2020

21

Discrimination in Credit & Consumer Markets

Redlining is the (indirect discrimination) practice of arbitrarily denying or limiting financial

services to specific neighborhoods, generally because its residents are people of color or

are poor

19

21

11/11/2020

10

Fall 2020

22

Amazon Redlining

No Amazon Free Same-day Delivery for Restricted Minority Neighborhoods

Fall 2020

23


Amazon Redlining

22

23

11/11/2020

11

Fall 2020

24

Amazon Redlining


Fall 2020

25

Discrimination in Online Services

• Non-black hosts can charge ~12% more than black hosts in AirBnB [M. Luca, B. Edelman 2014]

• Price steering and discrimination in many online retailers [A. Hannak, G. Soeller, D. Lazer, A. Mislove, and C. Wilson2014]

• Race and gender stereotypes reinforced on the Web [M. Kay, C. Matuszek, S. Munson 2015]

• China is about 21% larger by pixels when shown in Google maps for China [G. Soeller, K. Karahalios, C. Sandvig, and C. Wilson 2016]

24

25

11/11/2020

12

Fall 2020

26

Representation Bias

https://towardsdatascience.com/gender-bias-word-embeddings-76d9806a0e17

Word clouds for the nearest neighbours of “man” (L) and “woman” (R).

• Gender Bias found in word embeddings trained with word2vec on Google News

[T. Bolukbasi et al. 2016]

• Represent each word with high-dimensional vector

▪ Vector arithmetic: analogies like Paris - France = London - England

▪ Found also: man - woman = programmer - homemaker = surgeon - nurse

Topic observed: “occupations”

Fall 2020

27

Toxicity: Twitter Taught AI Chatbot to be

a Racist in Less than a Day

https://www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-racist

26

27

11/11/2020

13

Fall 2020

29

“Bugs” of Data-driven Decision Systems

Data Aquisition[H. Suresh, J. V Guttag 2019]

• When data is about people, “bugs” related to various bias types can lead to

discrimination !

Representation bias:

selection or sampling bias

Information bias:

observation or

measurement biasHistorical bias

test

“Bugs” in Data Collection

• Skewed samples

• Sample size disparity

“Bugs” in Data Processing

• Limited or proxy features

• Data errors

“Bugs” in Ground Truth

• Tainted samples

world

according

to data

Fall 2020

30

Model Building and Evaluation

[H. Suresh, J. V Guttag 2019]

Aggregation bias

Evaluation bias

test

“Bugs” of Data-driven Decision Systems

world

according

to data

29

30

11/11/2020

14

Fall 2020

31

Three Different Performance Problems

• Discovering unobserved differences in performance

▪ Skewed and tainted samples: Garbage in, garbage out (GIGO)

–Samples might be biased

–Labels might be incorrect

• Coping with observed differences in performance

▪ Sample size disparity

–Learn on majority

▪ Limited features

–Errors concentrated in the minority class

• Understanding the causes of disparities in predicted outcome

▪ Proxies (redline attributes): Data as a social mirror

–Protected attributes redundantly encoded in observables

Fall 2020

32

Anti-Discrimination Learning

❶ Discrimination Discovery/Detection

▪ Unveil evidence of discriminatory practices by analyzing the historical dataset

or the predictive model

❷ Discrimination Prevention/Removal

▪ Mitigate discriminative effects by modifying the biased data or by adjusting the

learning process or by twisting the predictive model

• Pre-processing: modify the

training data

• In-processing: adjust the

learning process

• Post-processing: directly

change the predicted labels

31

32

https://camo.githubusercontent.com/d6859ba768b02270d4fab924180a8a51633730c0/68747470733a2f2f63646e2d696d616765732d312e6d656469756d2e636f6d2f6d61782f313630302f312a575258654a7a5537326272686b69586b4375374630772e676966

11/11/2020

15

Fall 2020

33

Discrimination Discovery Framework

[D. Pedreshi, S. Ruggieri, and F. Turini 2008]

Database of

historical decision

records

A criterion of

(unlawful)

discrimination

A set of potentially

discriminated

groups

INPUT OUTPUT

A subset of decision

records and potentially

discriminated people for

which the criterion holds

Fall 2020

34

Formal Setup

•X features of an individual (e.g., criminal history, demographics, etc.)

▪ may contain redlining attributes 𝑹 (e.g., neighborhoods)•A protected, sensitive attribute (e.g., race)

▪ binary attribute 𝐴={𝑎+, 𝑎−}•D=d(X,A) predictor of decision (e.g., criminality risk)

▪ binary decision D={d+, d−}

•Y target variable, labels (e.g., recidivate)

Note: random variables are drawn from the same probability distribution

Pa{E}=P{E∣A=a}

33

34

11/11/2020

16

Fall 2020

35

Three Fundamental Criteria

• Sufficiency: Y independent of A conditional on D (Y ⊥ A ∣ D)

▪ similar rates of accurate predictions across groups

▪ also called Predictive Rate Parity, Outcome Test, Test-fairness,

Well-Calibration

• The above criteria fall into a larger category called “group fairness”

• Independence: D independent of A (D ⊥ A)

▪ predictions are uncorrelated with the sensitive attribute

▪ also called Demographic Parity, Statistical Parity

• Separation: D independent of A conditional on Y (D ⊥ A ∣ Y )

▪ equal true positive/negative rates of predictions across groups

▪ also called Equalized odds, Positive Rate Parity, Disparate

Mistreatment, Equal Opportunity

Fall 2020

36

• A is conditionally independent of B given C, if the

probability distribution governing A is independent

of the value of B, given the value of C, denoted as

A ⊥ B | C

▪ learning that B = b does not change your belief

in A when you already know C = c and this is

true for all values b that B can take and all

values c that C can take

a ε dom(A),b ε dom(B),c ε dom(C) P(A=a |B=b, C=c) = P(A=a |C=c)

• Note: conditional independence neither implies nor

is implied by independence!

Conditional Independence

35

36

11/11/2020

17

Fall 2020

37

(Un)Conditional Independence Examples

A ⊥ B | C

A ⊥ B

A ⊥ B | C

A ⊥ B

Fall 2020

38

First Criterion: Independence

• Require D and A to be independent, denoted D⊥A (fairness through unawareness) [R. Zemel et al. 2013], [M. Feldman et al. 2015] [S. Corbett-Davies et al. 2018]

▪ allocation of benefits and harms across groups can be examined by looking at

the decision alone

• That is, for all groups a,b and all values d:

Pa{D=d} = Pb{D=d}

• When D is binary 0/1-variables, this means

Pa{D=1} = Pb{D=1} for all groups a, b! (avoid disparate impact)

(US legal context)

(UK legal context)

• Approximate versions:

Disparate Impact (DI): Pb{D=1}/Pa{D=1}≥1−ϵ [M. Feldman et al. 2015]

Calders-Verwer (CV): |Pb{D=1}−Pa{D=1}|≤ϵ [T. Calders, S. Verwer 2010]

37

38

11/11/2020

18

Fall 2020

39

Simple Discrimination Measures

[D. Pedreschi, S. Ruggieri, F. Turin 2012]:

▪ Risk difference = RD = p1 - p2 UK law

▪ Risk ratio or relative risk = RR = p1 / p2 EU Court of Justice

▪ Relative chance = RC = (1-p1) / (1-p2)

▪ Odds ratio = RR/RC

▪ Extended risk difference = p1 - p

▪ Extended risk ratio or extended lift = p1 / p

▪ Extended chance = (1-p1) / (1-p) US courts focus on

selection rates: (1-p1) and (1-p2)

Protected group vs.

unprotected group

Protected group vs.

entire population

Fall 2020

40

D

Second criterion: Separation

• Require D and A to be independent conditional on Y, denoted D ⊥ A ∣ Y [M. Hardt et al. 2016], [A. Chouldechova 2017], [S. Corbett-Davies et al. 2017] [M Zafar et al. 2017]

• The probability of predicting D does not change after observing A when we have Y

▪That is, for all groups a,b and all values d and y:

Pa{D=d ∣ Y=y} = Pb{D=d ∣ Y=y} (prevent disparate treatment)

39

40

11/11/2020

19

Fall 2020

41

Confusion Matrix: (Mis)Match between

Target Variable Y and Decision DTrue Label

Pre

dic

ted

Ou

tco

me

TPRFPR

FNRTNR

PPV FDR

FOR NPV

Fall 2020

42

Definitions from the Confusion Matrix

• For any box in the confusion matrix involving the decision D, we can require

equality across groups [S. Mitchell et al. 2018]

• Equal TPRs: Pa[D = 1 | Y = 1] = Pb[D = 1 | Y = 1]

• Equal FNRs: Pa[D = 0 | Y = 1] = Pb[D = 0 | Y = 1]

• D ⊥ A | Y = 1

• Equal FPRs: Pa[D = 1 | Y = 0] = Pb[D = 1 | Y = 0]

• Equal TNRs: Pa[D = 0 | Y = 0] = Pb[D = 0 | Y = 0]

• D ⊥ A | Y = 0

• Balanced error rates [A. Chouldechova 2016]: Equal FPR & FNR

▪Equal opportunity [M. Hardt, E. Price, N. Srebro 2016]: Equal FNR

▪Predictive equality [S. Corbett-Davies 2017]: Equal FPR

41

42

11/11/2020

20

Fall 2020

43

Third Criterion: Sufficiency

• Require Y and A to be independent conditional on D, denoted Y⊥ A ∣ D

• Classifier accuracy should be the same for all the groups

▪ That is, for all groups a,b and all values d and y:

Pa{Y=y ∣ D=d} = Pb{Y=y | D=d} i.e. achieve accurate equity

D

Fall 2020

44

Definitions from the Confusion Matrix

• For any box in the confusion matrix involving the label Y, we can require equality across groups [S. Mitchell et al. 2018]

• Equal PPVs: Pa[Y = 1 | D = 1] = Pb[Y = 1 | D = 1]

• Equal FDRs: Pa[Y = 0 | D = 1] = Pb[Y = 0 | D = 1]

• Y ⊥ A | D = 1

• Equal FORs: Pa[Y = 1 | D = 0] = Pb[Y = 1 | D = 0]

• Equal NPVs: Pa[Y = 0 | D = 0] = Pb[Y = 0 | D = 0]

• Y ⊥ A | D = 0

• Well-Calibration [G. Pleiss et al. 2017]: Equal FDR & FOR

43

44

11/11/2020

21

Fall 2020

45

Trade-offs Are Necessary!

• Any two of the three criteria we saw are mutually exclusive except in degenerate

cases! [S. Mitchell, et al. 2018] [Kleinberg et al. 2017]

• Independence vs Sufficiency:

▪Proposition. Assume balance for the negative and positive classes, and calibration

within groups, then, either independence holds or sufficiency but not both

• Independence vs Separation:

▪Proposition. Assume that equal FPRs, equal FNRs, and equal PPVs hold, then,

either independence holds or separation but not both

• Separation vs Sufficiency:

▪Proposition. Assume all events in the joint distribution of (A,D,Y) have positive

probability, then either separation holds or sufficiency but not both

• Variants observed by [Chouldechova 2016] [Kleinberg, Mullainathan, Raghavan 2016]

Fall 2020

46

Separation vs Sufficiency Tradeoffs: Example

• Suppose the following labels and predictions after optimizing our classifier without any

fairness constraint

▪We get the predictions for group a all correct but makes one false negative mistake

on group b

Labels

https://towardsdatascience.com/a-tutorial-on-fairness-in-machine-learning-3ff8ba1040cb

D=1 D=0 D=1 D=0D=0

Y=1 Y=0 Y=1 Y=0

45

46

11/11/2020

22

Fall 2020

47

• Since we want to preserve separation (equalized odds), we decide to make two false

negative mistakes on a as well

▪Now the true negative rates (specificity) as well as the true positive rates

(sensitivity) are equal: both have 1 (3/3 & 2/2 ) and 1/2 (2/4 & 1/2 )

Labels


1/2PPV 2/4


Fall 2020

48

Labels


• However, although positive predictive parity is also preserved, negative predictive

parity is violated with this setting (sufficiency)

▪ It is not possible to preserve negative predictive parity without sacrificing positive

predictive parity


47

48

11/11/2020

23

Fall 2020

49

How About Other Criteria?

• Can we address the impossibility result for fairness independence, separation,

sufficiency with other criteria?

• Fundamental issue: All criteria we've seen so far are observational i.e., properties

of the joint distribution of (A,X,D,Y)

▪Passive observation of the world

▪No what if scenarios or interventions (potential outcomes)

• This leads to inherent limitations

Fall 2020

50

Correlation vs. Causation

• Correlation means two variables are related but does not tell why

• A strong correlation does not necessarily mean that changes in one

variable causes changes in the other

▪𝑋 and 𝑌 are correlated

▪𝑋 causes 𝑌 or 𝑌 causes 𝑋

▪𝑋 and 𝑌 are caused by a third variable 𝑍

• In order to imply causation, a true experiment must be performed

where subjects are randomly assigned to different conditions

▪Sometimes randomized experiments are expensive or immoral!

▪ In the context of fairness, sensitive attributes are typically

imputable; hence, randomization is not even conceivable!

Z

X Y

common cause

X

Y

Y

X

49

50

https://camo.githubusercontent.com/8511dea8e222342f3d77e9ea9f4431fc0defbc1c/68747470733a2f2f63646e2d696d616765732d312e6d656469756d2e636f6d2f6d61782f313630302f312a4235525f357038474e53517842495f4c335474314b672e676966

11/11/2020

24

Fall 2020

51

Causation is a matter of perception

David Hume (1711-1776) Statistical ML

Karl Pearson (1857-1936)

Judea Pearl (1936-)

Mathematical foundations of causality

We remember seeing the flame, and feeling a sensation called heat;

without further ceremony, we call the one cause and the other effect

Forget causation! Correlation is all you should ask for.

Forget empirical observations! Define causality based on a

network of known, physical, causal relationships

Democritus,

(460-370 BC)

I would rather discover one causal law than be the king of Persia

Some History

Fall 2020

52

What do we Make of This?

• Answer to substantive social questions not always provided by observational data

• Association does not mean causation, but discrimination is causal

▪whether an individual would receive the same decision had the individual been

of a different race, sex, age, religion, etc.!

• Knowledge about causal relationships between all attributes should be taken into

consideration

▪ fairness holds when there is no causal relationship from the

protected attribute A to the decision D!

• Need for causal aware methods in discovering and preventing

discrimination in observational data, i.e., data recorded from the

environment with no randomization or other controls

51

52

11/11/2020

25

Fall 2020

53

Structural Causal Model

• Describe how causal relationships can be inferred from non-temporal data if one

makes certain assumptions about the underlying process of data generation

• A causal model is triple ℳ=, where

▪ 𝑼 is a set of exogenous (hidden) variables whose values are determined by factors outside the model

▪ 𝑽={𝑋1,⋯,𝑋𝑖,⋯} is a set of endogenous (observed) variables whose values are determined by factors within the model

▪ 𝑭={𝑓1,⋯,𝑓𝑖,⋯} is a set of deterministic functions where each 𝑓𝑖 is a mapping from 𝑼×(𝑽∖𝑋𝑖) to 𝑋𝑖: 𝑥𝑖 =𝑓𝑖 (𝒑𝒂𝑖, 𝒖𝑖)

– where 𝒑𝒂𝑖 is a realization of 𝑋𝑖’s parents in 𝑽, i.e., 𝑷𝒂𝑖 ⊆ 𝑽, and 𝒖𝑖 is a realization of 𝑋𝑖’s parents in 𝑼, i.e., 𝑼𝑖 ⊆ 𝑼

Fall 2020

54

Causal Graph

• Each causal model ℳ is associated with a direct graph 𝒢=(𝒱,ℰ), where

▪ 𝒱 is the set of nodes represent the variables 𝑼 ∪ 𝑽 in ℳ;

▪ ℰ is the set of edges determined by the structural equations in ℳ: for 𝑋𝑖, there is an edge pointing from each of its parents 𝑷𝒂𝑖 ∪ 𝑼𝑖 to it

–Each direct edge represents the potential direct causal relationship

–Absence of direct edge represents zero direct causal relationship

• Assuming the acyclicity of causality, 𝒢 is a directed acyclic graph (DAG)

• Standard terminology

▪ parent, child, ancestor, descendent, path, direct path

53

54

11/11/2020

26

Fall 2020

55

Markovian Model

• A causal model is Markovian if

❶The causal graph is a DAG

❷All variables in 𝑼 are mutually independent

Each node 𝑋 is conditionally independent of its non-

descendants given its parents 𝑷𝒂𝑋

• Known as the local Markov condition (e.g., in Bayesian network), or causal Markov

condition in the context of causal modeling

▪ it echoes the fact that the information flows from direct causes to their effect and

every dependence between a node and its non-descendants involves the direct

causes!

Equivalent expression

Fall 2020

56

Conditional Independence

A node is conditionally independent of its A node is conditionally independent of all other

non-descendants given its parents nodes in the network given the Markov blanket,

i.e., its parents, children and children's parents

55

56

11/11/2020

27

Fall 2020

57

A Markovian Model and its Graph

Observed Variables

𝑽={𝐼,𝐻,𝑊,𝐸}Hidden Variables

𝑼={𝑈𝐼,𝑈𝐻,𝑈𝑊,𝑈𝐸}

Model (𝑀)𝑖=𝑓𝐼 (𝑢𝐼)ℎ=𝑓𝐻 (𝑖,𝑢𝐻)𝑤=𝑓𝑊 (ℎ,𝑢𝑊)𝑒=𝑓𝐸 (𝑖,ℎ,𝑤,𝑢𝐸)

Graph (𝐺)

Fall 2020

58

Causal Graph of Markovian Model

Each node is associated with an observable

conditional probability table (CPT) 𝑃(𝑥𝑖|𝒑𝒂𝑖)

• We can read off from the causal graph all the conditional independence relationships

encoded in the causal model (graph) by using a graphical criterion called d-separation

57

58

11/11/2020

28

Fall 2020

59

Definition of d-separation

• A path 𝑞 is said to be blocked by conditioning on a set 𝒁 if

▪ 𝑞 contains a chain 𝑖→𝑚→𝑗 or a fork 𝑖←𝑚→𝑗 such that the middle node 𝑚 is in 𝒁, or

▪ 𝑞 contains a sink 𝑖→𝑚←𝑗 such that the middle node 𝑚 is not in 𝒁 and such that no descendant of 𝑚 is in 𝒁

• 𝒁 is said to d-separate 𝑋 and 𝑌 [P. Spirtes, C. Glymour, R. Scheines 2000], if 𝒁 blocks every path from 𝑋 to 𝑌, denoted by (𝑋 ⊥ 𝑌 | 𝑍)𝐺▪With the Markov condition, the d-separation criterion in causal

graph G and the conditional independence relations in a

dataset D are connected such that, if we have (𝑋 ⊥ 𝑌 | 𝑍)𝐺, then we must have (𝑋 ⊥ 𝑌 | 𝑍)Data

Fall 2020

60

d-separation Examples

Path from 𝑋 to 𝑌 is blocked by the following d-separation relations

(𝑋⊥𝑌| 𝑍)𝐺,(𝑋⊥𝑌| 𝑈)𝐺,(𝑋⊥𝑌| 𝑍𝑈)𝐺(𝑋⊥𝑌| 𝑍𝑊)𝐺,(𝑋⊥𝑌| 𝑈𝑊)𝐺,(𝑋⊥𝑌| 𝑍𝑈𝑊)𝐺(𝑋⊥𝑌| 𝑉𝑍𝑈𝑊)𝐺

However we do NOT have

(𝑋⊥𝑌| 𝑉𝑍𝑈)𝐺

59

60

11/11/2020

29

Fall 2020

61

Causal Model of Predictions

Observed Variables 𝑽={A, …,Xi,…, D} Hidden Variables 𝑼

Model (𝑀)a=𝑓A (𝑝𝑎A, 𝑢A)xi=𝑓i (𝑝𝑎𝑖, 𝑢xi), i=1,…,md=𝑓D (𝑝𝑎D, 𝑢D)

Graph (𝐺)

𝑿

𝑼A,⋯,𝑼𝑖,⋯,𝑼D are mutually independent(Markovian Assumption)

A

P(a| 𝑝𝑎A)D

P(d| 𝑝𝑎D)

Fall 2020

62

Modeling Discrimination as Path Specific Effects

• Direct and indirect discrimination can be captured by the causal effects of A on D

transmitted along different paths

▪Direct: causal effect along direct edge 𝜋𝑑 from A to D

▪ Indirect: causal effect along causal paths 𝜋𝑖 from A to D that pass though redliningattributes R

[L. Zhang et al. 2017]

[N. Kilbertus et al. 2017]

[R. Nabi, I. Shpitser 2018] A D

▪ If we observe Prb(d+ | a+) != Prb(d- | a+) for each value assignment x of set X

d-separating A and D, the difference must be due to the existence of the direct

causal effect of A on D

61

62

11/11/2020

30

Fall 2020

63

𝜋𝑑-specific effect:

AD𝜋𝑑(𝑎+, 𝑎−)=Σq ( P(d+| 𝑎+,q) P(q| 𝑎−) ) – P(d+| 𝑎−)

Q is D’s parents except A

𝜋𝑖-specific effect:

AD𝜋𝑖(𝑎+, 𝑎−)=Σq( P(d+|𝑎+,q) ΠG in A𝜋𝑖P(g|𝑎+,paG\{A})

ΠH in A𝜋𝑖 \{D} P(h|𝑎−,paH\{A}) ΠO in V\ChildA P(o|𝑎+,paO ) – P(d+|𝑎−)

A𝜋𝑖 : A’s children that lie on paths in 𝜋𝑖A𝜋𝑖 : A’s children that don’t lie on paths in 𝜋𝑖

Quantitative Measuring

Fall 2020

64

Causal Paths Example

D

• A bank makes loan decisions based on the zip codes, races, and income of the

applicants▪ Inadmissible attributes

–Race: protected attribute

–Zip Code: redlining attribute

▪ Admissible attributes:

–Income: non-protected

▪ Decision: Loan

63

64

11/11/2020

32

Fall 2020

67

Total Causal Effect vs. Path-Specific Effect

• For any 𝜋𝑑 and 𝜋𝑖, we don’t necessarily have

AD𝜋𝑑(𝑎+, 𝑎−) + SE𝜋𝑖(𝑎

+, 𝑎−) = SE 𝜋𝑑 U 𝜋𝑖 (𝑎+, 𝑎−)

• If 𝜋𝑖 contains all causal paths from A to D except 𝜋𝑑, then

TE(𝑎+, 𝑎−) = SE𝜋𝑑(𝑎+, 𝑎−) - SE𝜋𝑖(𝑎

−, 𝑎+ )

“reverse” 𝜋𝑖-specific effect

Fall 2020

69

Summing-Up

• Observational criteria can help discover discrimination, but are insufficient on their own

▪No conclusive proof of (un-)fairness

• Causal viewpoint can help articulate problems,

organize assumptions

▪Social questions starts with measurement

▪Human scrutiny and expertise irreplaceable

• What to do with the different flavors of fairness?

▪ constrain the decision function to satisfy a fairness

flavor, or

▪ design interventions to reduce disparities in input variables and outcomes which

would reduce disparities in decisions in the long term [C. Barabas et al. 2018], [J.

Jackson, T. VanderWeele 2018]

67

69

11/11/2020

33

Fall 2020

70

End-to-End ML Pipelines

• Research on fairness, accountability, and transparency (FAT) of ML algorithms

and their outputs focus solely on the final steps of the data science lifecycle!

normalization

• ML literature generally assumes clean training datasets (no missing, erroneous

or duplicate values) and focuses on optimizing fairness metrics during in, pre or

post processing

Fall 2020

71

Upstream Discrimination Prevention

• Industry practitioners more often turn their attention to the data first [K. Holstein et al

2019]

▪ 65% of survey respondents reported having control over data collection or curation

• Need tools in creating datasets that support fairness upstream i.e., that address the

root cause of statistical bias in the data models are trained on [I. Chen, et al. 2018; B.

Nushi et al. 2018]

▪ e.g., tools to diagnose whether a given fairness issue might be addressed by

collecting more training data from a particular subpopulation or by better cleaning

existing training data

… and to predict how much more data are need or to be repaired

▪ i.e., tools to help actively guide data collection and pre-processing pipelines in order

to jointly optimize fairness and accuracy of downstream models

70

71

11/11/2020

34

Fall 2020

72

Preliminary Remarks

• Data quality issues have the potential to introduce unintended bias and variability in the

data that could potentially have a crucial impact in high-stake applications

• The impact of upstream data cleaning to downstream ML models performance depends

on

▪ the data quality issues and their distributions on the datasets (unknown)

▪ the effectiveness of the data repairing algorithms (also unknown without ground truth)

▪ the internal structure of the ML models (hard to interpret for high-accurate models)

• It is known that the statistical distortion of datasets by cleaning tasks matters when

estimating models’ accuracy [T. Dasu, J. M. Loh 2012], yet its impact on the models’

fairness is only recently started to be investigated [S. Schelter 2019] (for missing values

imputation)

Fall 2020

73

Responsible Data Science Pipelines

• Interplay of Bias Mitigation and Data Repairing interventions on data fairness and

utility for downstream ML Models

Unfairness

Mitigation

Data

Cleaning

Row DataML Model

Data

Bias

Dirty

Data

Accuracy

Fairness

IBM 360 Fairness

FairBench

FairTest

FairnessMetrics

Themis

FairPrep

data from ethnic minorities may

be noisier than data collected

from the majority ethnic group

72

73

11/11/2020

35

Fall 2020

75

Questions?

https://www.lepoint.fr/invites-du-point/aurelie-jean-il-ne-faut-pas-reguler-les-algorithmes-mais-les-pratiques-15-09-2019-2335766_420.php

Fall 2020

76

75

76

Fall 2020 Data Ethics in Algorithmic Decision Making · 2020. 11. 11. · 11/11/2020 4 Fall 2020 7...

Documents

Transcript of Fall 2020 Data Ethics in Algorithmic Decision Making · 2020. 11. 11. · 11/11/2020 4 Fall 2020 7...