Fall 2020 Data Ethics in Algorithmic Decision Making · 2020. 11. 11. · 11/11/2020 4 Fall 2020 7...
Transcript of Fall 2020 Data Ethics in Algorithmic Decision Making · 2020. 11. 11. · 11/11/2020 4 Fall 2020 7...
-
11/11/2020
1
Fall 2020
1
Data Ethics in Algorithmic
Decision Making
Vassilis Christophides, Vasilis Efthymiou{christop|vefthym}@csd.uoc.gr
http://www.csd.uoc.gr/~hy562
University of Crete, Fall 2020
Fall 2020
2
Plan
• Introduction and Motivation
• Discrimination Examples in Algorithmic Decision Making
▪Direct and Indirect Discrimination
• How Machines Learn to Discriminate?
• Anti-Discrimination Learning
▪Associational Fairness Definitions
▪Causal Fairness Definitions
• Upstream Discrimination Prevention in ML Pipelines
▪ Interplay of Data Preprocessing and Fairness Interventions
• Progress so Far and Acknowledgments
1
2
-
11/11/2020
2
Fall 2020
3
The Rise of Artificial Intelligence (AI)
• Widespread investments in Artificial Intelligence (AI) are made to enable computers
to interpret what they see, communicate in natural language, answer complex
questions, and interact with their physical environment
▪ improve people’s lives, e.g., autonomous cars
▪ accelerate scientific discovery, e.g., precision
medicine
▪ protect environment, e.g., reduce energy footprint
▪ optimize business, e.g., targeted advertisement
▪ transform society, e.g., increasing automation
Fall 2020
4
Timeline of the Industrial Revolutions
• The huge societal challenges arriving with the increased automation enabled by AI,
resemble the challenges of industrial revolutions
▪Data is the driver of the new industrial era and actually fuel the development of AI
https://pocketconfidant.com/how-the-era-of-artificial-intelligence-will-transform-society/
3
4
-
11/11/2020
3
Fall 2020
5
AI is Data Hungry!
https://www.slideshare.net/KeithKraus/gpuaccelerating-udfs-in-pyspark-with-numba-and-pygdf
1012
1015
1018
1021
109
Self-Driving
Cars
Superhuman
Doctor
Life On Other
Planets?
Creative arts
Smart farms
& food
systems
'Detoxify'
Social Media
Personal
assistants
Smarter
Cybersecurity
Earth challenge
areas
An ML model is only as good as its data, and no matter how good a training algorithm
is, the ultimate quality of automated decisions lie in the data itself!
Fall 2020
6
Automated Decisions of Consequent
• Widespread algorithmic decision systems with many
small interactions
▪ e.g. search, recommendations, social media, …
• Specialized algorithmic decision systems with fewer
but higher-stakes interactions
▪ e.g. hiring and promotion, credit-worthiness and
loans, criminal justice and predictive policing,
child maltreatment screening, medical
diagnosis, welfare eligibility, …
• At this level of impact, algorithmic decision-making
can have unintended consequences in people’s life
6
Hiring AI
Policing Sentencing AI
Lending AI
5
6
-
11/11/2020
4
Fall 2020
7
Are Automated Decisions Impartial?
• Algorithmic decision making is more objective than humans and yet…
▪All traditional evils of discrimination, and many new ones,
exhibit themselves in the Big Data & AI ecosystem
▪Opaque automated decision systems
• Human decision-making affected by greed, prejudice, fatigue, poor
scalability, etc. and hence can be biased!
▪ Formal procedures can limit opportunities to exercise prejudicial
discretion or fall victim to implicit bias
• High-stakes scenarios = ethical problems!
▪ Despite existing legal/regulation efforts, current anti-discrimination laws
are not yet well equipped to deal with various issues of discrimination in
data analysis [S. Barocas, A.D. Selbst 2016]
Fall 2020
8
Discrimination is not a General Concept
• It is domain specific
▪Concerned with important opportunities that affect people’s life chances
• It is feature specific
▪Concerned with socially salient qualities that have served as the basis for
unjustified & systematically adverse treatment in the past
7
8
-
11/11/2020
5
Fall 2020
10
Legally Recognized ‘Protected Classes’
https://www.slideshare.net/KrishnaramKenthapadi/fairnessaware-machine-learning-practical-challenges-and-lessons-learned-www-2019-tutorial
Societal categories (political
ideology, income, language,
physical traits), Intersectional
subpopulations (young white
woman, old black men), etc.
Fall 2020
11
Legal & Regulatory Frameworks in EU
• EU Council Directive 76/207/EEC of 9 February 1976 on the implementation of the
principle of equal treatment for men and women as regards access to employment,
vocational training and promotion, and working conditions
• GDPR 2016/679 71 [https://eur-lex.europa.eu/legal-content/en/TXT/ ?uri=CELEX:
32015R2120]
▪ […] In order to ensure fair and transparent processing […], the controller should
use appropriate mathematical or statistical procedures for the profiling,
▪ […] and that prevents, inter alia, discriminatory effects on natural persons on
the basis of racial or ethnic origin, political opinion, religion or beliefs,
▪ […] or that result in measures having such an effect.
Direct
Indirect
10
11
-
11/11/2020
6
Fall 2020
12
Discrimination Low: Two Doctrines
• Disparate treatment (DT) is the illegal practice of treating an entity, such as a creditor
or employer, differently based on a protected/sensitive attributes such as race, gender,
age, religion, sexual orientation, etc.
▪ avoid disparities between different outputs for groups of people with the same (or
similar) values of non-sensitive attributes but different values of sensitive ones
• Disparate impact (DI) is the result of systematic disparate treatment, where
disproportionate adverse impact is observed on members of a protected class [M.
Feldman et all 2015]
▪ minimize outputs that benefit (harm) a group of people sharing a value of a
sensitive attribute more frequently than other groups of people
Fall 2020
14
What does Discrimination Law Aim to Achieve?
• Equality of opportunity
(treatment):
▪ Procedural fairness
• Minimized inequality of outcome
(impact)
▪ Outcome fairness
https://www.reddit.com/r/GCdebatesQT/comments/7qpbpp/food_for_thought_equality_vs_equity_vs_justice
12
14
-
11/11/2020
7
Fall 2020
15
Example: Criminal Justice System
Dylon Fugett (right) was rated low risk while Bernad Parcker (left) was rated high risk
• There’s software used across US to predict whether someone who has been arrested will be re-arrested in the future on the basis of criminal history, demographic, and other information
▪ Assessing recidivism risk is biased against blacks [J. Angwin, et al. 2016]
https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
• Recidivism Risk: Defendant’s likelihood of committing a crime
▪ Used to decide pretrial detention, bail and sentencing
Fall 2020
16
Example: Predictive Policing
https://www.smithsonianmag.com/innovation/artificial-intelligence-is-now-used-predict-crime-is-it-biased-180968337/
• PredPol identifies areas in a neighborhood where serious crimes are more likely to occur during a particular period
▪ using a wide range of “leading indicator” data, including reports of crimes, such as simple assaults, vandalism and disorderly conduct, and 911 calls about such things as shots fired or a person seen with a weapon
• The American Civil Liberties Union [ACLU], the Brennan Center for Justice and various civil rights organizations have all raised questions about the risk of bias being baked into the software
15
16
-
11/11/2020
8
Fall 2020
17
• Judge (Decision maker): Of those I’ve labeled
high-risk, how many will recidivate ?
Positive Predictive Value
False Positive Rate
Demographic Populations
Recidivate
Did Not
Recidivate
Label
low-risk
Label
high-risk
Tutorial: 21 fairness definitions and their politics: https://www.youtube.com/watch?v=jIXIuYdnyyk
Detention of High-Risk Criminals
• Defense (think hiring rather than criminal justice): Is
the selected set demographically balanced ?
• Plaintiff: What’s the probability I’ll be incorrectly
classifying high-risk ?
True
Positive
False
Positive
False
Negative
True
Negative
Fall 2020
18
Prediction Fails Differently for Black Defendants!
[J. Dressel and H. Farid 2018]
Labeled
high-risk,
But Didn't
Re-offend
Labeled low-
risk, Yet Did
Re-offend
23,5% 44,9%
47,7% 28,0%
White African
American
• COMPAS: Correctional Offender Management Profiling for Alternative Sanctions
FNRsFPRs
17
18
-
11/11/2020
9
Fall 2020
19
Essence of the COMPAS Debate• ProPublica's main charge:
▪Black defendants face higher false positive rate: among defendants who did not get
rearrested, black defendants were twice as likely to be misclassified as high risk
▪White defendants face higher false negative rate: among defendants who got
rearrested, white defendants were twice as likely to be misclassified as low risk
• Northpointe's (now Equivant, of Canton, Ohio) main defense:
▪COMPAS was not made to make absolute predictions about success or failure and
it was designed to inform probabilities of reoffending across three categories of risk
(low, medium, & high)
▪The system is well-calibrated, and that if a person is assigned to one of the three
risk categories, we can treat them as having a certain risk level
• Word of caution:
▪Neither calibration nor equality of false positive/negative rates rule out blatantly
unfair practices [S. Corbett-Davies et al 2017]
Fall 2020
21
Discrimination in Credit & Consumer Markets
Redlining is the (indirect discrimination) practice of arbitrarily denying or limiting financial
services to specific neighborhoods, generally because its residents are people of color or
are poor
19
21
-
11/11/2020
10
Fall 2020
22
Amazon Redlining
No Amazon Free Same-day Delivery for Restricted Minority Neighborhoods
Fall 2020
23
No Amazon Free Same-day Delivery for Restricted Minority Neighborhoods
Amazon Redlining
22
23
-
11/11/2020
11
Fall 2020
24
Amazon Redlining
No Amazon Free Same-day Delivery for Restricted Minority Neighborhoods
Fall 2020
25
Discrimination in Online Services
• Non-black hosts can charge ~12% more than black hosts in AirBnB [M. Luca, B. Edelman 2014]
• Price steering and discrimination in many online retailers [A. Hannak, G. Soeller, D. Lazer, A. Mislove, and C. Wilson2014]
• Race and gender stereotypes reinforced on the Web [M. Kay, C. Matuszek, S. Munson 2015]
• China is about 21% larger by pixels when shown in Google maps for China [G. Soeller, K. Karahalios, C. Sandvig, and C. Wilson 2016]
24
25
-
11/11/2020
12
Fall 2020
26
Representation Bias
https://towardsdatascience.com/gender-bias-word-embeddings-76d9806a0e17
Word clouds for the nearest neighbours of “man” (L) and “woman” (R).
• Gender Bias found in word embeddings trained with word2vec on Google News
[T. Bolukbasi et al. 2016]
• Represent each word with high-dimensional vector
▪ Vector arithmetic: analogies like Paris - France = London - England
▪ Found also: man - woman = programmer - homemaker = surgeon - nurse
Topic observed: “occupations”
Fall 2020
27
Toxicity: Twitter Taught AI Chatbot to be
a Racist in Less than a Day
https://www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-racist
26
27
-
11/11/2020
13
Fall 2020
29
“Bugs” of Data-driven Decision Systems
Data Aquisition[H. Suresh, J. V Guttag 2019]
• When data is about people, “bugs” related to various bias types can lead to
discrimination !
Representation bias:
selection or sampling bias
Information bias:
observation or
measurement biasHistorical bias
test
“Bugs” in Data Collection
• Skewed samples
• Sample size disparity
“Bugs” in Data Processing
• Limited or proxy features
• Data errors
“Bugs” in Ground Truth
• Tainted samples
world
according
to data
Fall 2020
30
Model Building and Evaluation
[H. Suresh, J. V Guttag 2019]
Aggregation bias
Evaluation bias
test
“Bugs” of Data-driven Decision Systems
world
according
to data
29
30
-
11/11/2020
14
Fall 2020
31
Three Different Performance Problems
• Discovering unobserved differences in performance
▪ Skewed and tainted samples: Garbage in, garbage out (GIGO)
–Samples might be biased
–Labels might be incorrect
• Coping with observed differences in performance
▪ Sample size disparity
–Learn on majority
▪ Limited features
–Errors concentrated in the minority class
• Understanding the causes of disparities in predicted outcome
▪ Proxies (redline attributes): Data as a social mirror
–Protected attributes redundantly encoded in observables
Fall 2020
32
Anti-Discrimination Learning
❶ Discrimination Discovery/Detection
▪ Unveil evidence of discriminatory practices by analyzing the historical dataset
or the predictive model
❷ Discrimination Prevention/Removal
▪ Mitigate discriminative effects by modifying the biased data or by adjusting the
learning process or by twisting the predictive model
• Pre-processing: modify the
training data
• In-processing: adjust the
learning process
• Post-processing: directly
change the predicted labels
31
32
https://camo.githubusercontent.com/d6859ba768b02270d4fab924180a8a51633730c0/68747470733a2f2f63646e2d696d616765732d312e6d656469756d2e636f6d2f6d61782f313630302f312a575258654a7a5537326272686b69586b4375374630772e676966
-
11/11/2020
15
Fall 2020
33
Discrimination Discovery Framework
[D. Pedreshi, S. Ruggieri, and F. Turini 2008]
Database of
historical decision
records
A criterion of
(unlawful)
discrimination
A set of potentially
discriminated
groups
INPUT OUTPUT
A subset of decision
records and potentially
discriminated people for
which the criterion holds
Fall 2020
34
Formal Setup
•X features of an individual (e.g., criminal history, demographics, etc.)
▪ may contain redlining attributes 𝑹 (e.g., neighborhoods)•A protected, sensitive attribute (e.g., race)
▪ binary attribute 𝐴={𝑎+, 𝑎−}•D=d(X,A) predictor of decision (e.g., criminality risk)
▪ binary decision D={d+, d−}
•Y target variable, labels (e.g., recidivate)
Note: random variables are drawn from the same probability distribution
Pa{E}=P{E∣A=a}
33
34
-
11/11/2020
16
Fall 2020
35
Three Fundamental Criteria
• Sufficiency: Y independent of A conditional on D (Y ⊥ A ∣ D)
▪ similar rates of accurate predictions across groups
▪ also called Predictive Rate Parity, Outcome Test, Test-fairness,
Well-Calibration
• The above criteria fall into a larger category called “group fairness”
• Independence: D independent of A (D ⊥ A)
▪ predictions are uncorrelated with the sensitive attribute
▪ also called Demographic Parity, Statistical Parity
• Separation: D independent of A conditional on Y (D ⊥ A ∣ Y )
▪ equal true positive/negative rates of predictions across groups
▪ also called Equalized odds, Positive Rate Parity, Disparate
Mistreatment, Equal Opportunity
Fall 2020
36
• A is conditionally independent of B given C, if the
probability distribution governing A is independent
of the value of B, given the value of C, denoted as
A ⊥ B | C
▪ learning that B = b does not change your belief
in A when you already know C = c and this is
true for all values b that B can take and all
values c that C can take
a ε dom(A),b ε dom(B),c ε dom(C) P(A=a |B=b, C=c) = P(A=a |C=c)
• Note: conditional independence neither implies nor
is implied by independence!
Conditional Independence
35
36
-
11/11/2020
17
Fall 2020
37
(Un)Conditional Independence Examples
A ⊥ B | C
A ⊥ B
A ⊥ B | C
A ⊥ B
Fall 2020
38
First Criterion: Independence
• Require D and A to be independent, denoted D⊥A (fairness through unawareness) [R. Zemel et al. 2013], [M. Feldman et al. 2015] [S. Corbett-Davies et al. 2018]
▪ allocation of benefits and harms across groups can be examined by looking at
the decision alone
• That is, for all groups a,b and all values d:
Pa{D=d} = Pb{D=d}
• When D is binary 0/1-variables, this means
Pa{D=1} = Pb{D=1} for all groups a, b! (avoid disparate impact)
(US legal context)
(UK legal context)
• Approximate versions:
Disparate Impact (DI): Pb{D=1}/Pa{D=1}≥1−ϵ [M. Feldman et al. 2015]
Calders-Verwer (CV): |Pb{D=1}−Pa{D=1}|≤ϵ [T. Calders, S. Verwer 2010]
37
38
-
11/11/2020
18
Fall 2020
39
Simple Discrimination Measures
[D. Pedreschi, S. Ruggieri, F. Turin 2012]:
▪ Risk difference = RD = p1 - p2 UK law
▪ Risk ratio or relative risk = RR = p1 / p2 EU Court of Justice
▪ Relative chance = RC = (1-p1) / (1-p2)
▪ Odds ratio = RR/RC
▪ Extended risk difference = p1 - p
▪ Extended risk ratio or extended lift = p1 / p
▪ Extended chance = (1-p1) / (1-p) US courts focus on
selection rates: (1-p1) and (1-p2)
Protected group vs.
unprotected group
Protected group vs.
entire population
Fall 2020
40
D
Second criterion: Separation
• Require D and A to be independent conditional on Y, denoted D ⊥ A ∣ Y [M. Hardt et al. 2016], [A. Chouldechova 2017], [S. Corbett-Davies et al. 2017] [M Zafar et al. 2017]
• The probability of predicting D does not change after observing A when we have Y
▪That is, for all groups a,b and all values d and y:
Pa{D=d ∣ Y=y} = Pb{D=d ∣ Y=y} (prevent disparate treatment)
39
40
-
11/11/2020
19
Fall 2020
41
Confusion Matrix: (Mis)Match between
Target Variable Y and Decision DTrue Label
Pre
dic
ted
Ou
tco
me
TPRFPR
FNRTNR
PPV FDR
FOR NPV
Fall 2020
42
Definitions from the Confusion Matrix
• For any box in the confusion matrix involving the decision D, we can require
equality across groups [S. Mitchell et al. 2018]
• Equal TPRs: Pa[D = 1 | Y = 1] = Pb[D = 1 | Y = 1]
• Equal FNRs: Pa[D = 0 | Y = 1] = Pb[D = 0 | Y = 1]
• D ⊥ A | Y = 1
• Equal FPRs: Pa[D = 1 | Y = 0] = Pb[D = 1 | Y = 0]
• Equal TNRs: Pa[D = 0 | Y = 0] = Pb[D = 0 | Y = 0]
• D ⊥ A | Y = 0
• Balanced error rates [A. Chouldechova 2016]: Equal FPR & FNR
▪Equal opportunity [M. Hardt, E. Price, N. Srebro 2016]: Equal FNR
▪Predictive equality [S. Corbett-Davies 2017]: Equal FPR
41
42
-
11/11/2020
20
Fall 2020
43
Third Criterion: Sufficiency
• Require Y and A to be independent conditional on D, denoted Y⊥ A ∣ D
• Classifier accuracy should be the same for all the groups
▪ That is, for all groups a,b and all values d and y:
Pa{Y=y ∣ D=d} = Pb{Y=y | D=d} i.e. achieve accurate equity
D
Fall 2020
44
Definitions from the Confusion Matrix
• For any box in the confusion matrix involving the label Y, we can require equality across groups [S. Mitchell et al. 2018]
• Equal PPVs: Pa[Y = 1 | D = 1] = Pb[Y = 1 | D = 1]
• Equal FDRs: Pa[Y = 0 | D = 1] = Pb[Y = 0 | D = 1]
• Y ⊥ A | D = 1
• Equal FORs: Pa[Y = 1 | D = 0] = Pb[Y = 1 | D = 0]
• Equal NPVs: Pa[Y = 0 | D = 0] = Pb[Y = 0 | D = 0]
• Y ⊥ A | D = 0
• Well-Calibration [G. Pleiss et al. 2017]: Equal FDR & FOR
43
44
-
11/11/2020
21
Fall 2020
45
Trade-offs Are Necessary!
• Any two of the three criteria we saw are mutually exclusive except in degenerate
cases! [S. Mitchell, et al. 2018] [Kleinberg et al. 2017]
• Independence vs Sufficiency:
▪Proposition. Assume balance for the negative and positive classes, and calibration
within groups, then, either independence holds or sufficiency but not both
• Independence vs Separation:
▪Proposition. Assume that equal FPRs, equal FNRs, and equal PPVs hold, then,
either independence holds or separation but not both
• Separation vs Sufficiency:
▪Proposition. Assume all events in the joint distribution of (A,D,Y) have positive
probability, then either separation holds or sufficiency but not both
• Variants observed by [Chouldechova 2016] [Kleinberg, Mullainathan, Raghavan 2016]
Fall 2020
46
Separation vs Sufficiency Tradeoffs: Example
• Suppose the following labels and predictions after optimizing our classifier without any
fairness constraint
▪We get the predictions for group a all correct but makes one false negative mistake
on group b
Labels
https://towardsdatascience.com/a-tutorial-on-fairness-in-machine-learning-3ff8ba1040cb
D=1 D=0 D=1 D=0D=0
Y=1 Y=0 Y=1 Y=0
45
46
-
11/11/2020
22
Fall 2020
47
• Since we want to preserve separation (equalized odds), we decide to make two false
negative mistakes on a as well
▪Now the true negative rates (specificity) as well as the true positive rates
(sensitivity) are equal: both have 1 (3/3 & 2/2 ) and 1/2 (2/4 & 1/2 )
Labels
https://towardsdatascience.com/a-tutorial-on-fairness-in-machine-learning-3ff8ba1040cb
1/2PPV 2/4
Separation vs Sufficiency Tradeoffs: Example
Fall 2020
48
Labels
https://towardsdatascience.com/a-tutorial-on-fairness-in-machine-learning-3ff8ba1040cb
• However, although positive predictive parity is also preserved, negative predictive
parity is violated with this setting (sufficiency)
▪ It is not possible to preserve negative predictive parity without sacrificing positive
predictive parity
Separation vs Sufficiency Tradeoffs: Example
47
48
-
11/11/2020
23
Fall 2020
49
How About Other Criteria?
• Can we address the impossibility result for fairness independence, separation,
sufficiency with other criteria?
• Fundamental issue: All criteria we've seen so far are observational i.e., properties
of the joint distribution of (A,X,D,Y)
▪Passive observation of the world
▪No what if scenarios or interventions (potential outcomes)
• This leads to inherent limitations
Fall 2020
50
Correlation vs. Causation
• Correlation means two variables are related but does not tell why
• A strong correlation does not necessarily mean that changes in one
variable causes changes in the other
▪𝑋 and 𝑌 are correlated
▪𝑋 causes 𝑌 or 𝑌 causes 𝑋
▪𝑋 and 𝑌 are caused by a third variable 𝑍
• In order to imply causation, a true experiment must be performed
where subjects are randomly assigned to different conditions
▪Sometimes randomized experiments are expensive or immoral!
▪ In the context of fairness, sensitive attributes are typically
imputable; hence, randomization is not even conceivable!
Z
X Y
common cause
X
Y
Y
X
49
50
https://camo.githubusercontent.com/8511dea8e222342f3d77e9ea9f4431fc0defbc1c/68747470733a2f2f63646e2d696d616765732d312e6d656469756d2e636f6d2f6d61782f313630302f312a4235525f357038474e53517842495f4c335474314b672e676966
-
11/11/2020
24
Fall 2020
51
Causation is a matter of perception
David Hume (1711-1776) Statistical ML
Karl Pearson (1857-1936)
Judea Pearl (1936-)
Mathematical foundations of causality
We remember seeing the flame, and feeling a sensation called heat;
without further ceremony, we call the one cause and the other effect
Forget causation! Correlation is all you should ask for.
Forget empirical observations! Define causality based on a
network of known, physical, causal relationships
Democritus,
(460-370 BC)
I would rather discover one causal law than be the king of Persia
Some History
Fall 2020
52
What do we Make of This?
• Answer to substantive social questions not always provided by observational data
• Association does not mean causation, but discrimination is causal
▪whether an individual would receive the same decision had the individual been
of a different race, sex, age, religion, etc.!
• Knowledge about causal relationships between all attributes should be taken into
consideration
▪ fairness holds when there is no causal relationship from the
protected attribute A to the decision D!
• Need for causal aware methods in discovering and preventing
discrimination in observational data, i.e., data recorded from the
environment with no randomization or other controls
51
52
-
11/11/2020
25
Fall 2020
53
Structural Causal Model
• Describe how causal relationships can be inferred from non-temporal data if one
makes certain assumptions about the underlying process of data generation
• A causal model is triple ℳ=, where
▪ 𝑼 is a set of exogenous (hidden) variables whose values are determined by factors outside the model
▪ 𝑽={𝑋1,⋯,𝑋𝑖,⋯} is a set of endogenous (observed) variables whose values are determined by factors within the model
▪ 𝑭={𝑓1,⋯,𝑓𝑖,⋯} is a set of deterministic functions where each 𝑓𝑖 is a mapping from 𝑼×(𝑽∖𝑋𝑖) to 𝑋𝑖: 𝑥𝑖 =𝑓𝑖 (𝒑𝒂𝑖, 𝒖𝑖)
– where 𝒑𝒂𝑖 is a realization of 𝑋𝑖’s parents in 𝑽, i.e., 𝑷𝒂𝑖 ⊆ 𝑽, and 𝒖𝑖 is a realization of 𝑋𝑖’s parents in 𝑼, i.e., 𝑼𝑖 ⊆ 𝑼
Fall 2020
54
Causal Graph
• Each causal model ℳ is associated with a direct graph 𝒢=(𝒱,ℰ), where
▪ 𝒱 is the set of nodes represent the variables 𝑼 ∪ 𝑽 in ℳ;
▪ ℰ is the set of edges determined by the structural equations in ℳ: for 𝑋𝑖, there is an edge pointing from each of its parents 𝑷𝒂𝑖 ∪ 𝑼𝑖 to it
–Each direct edge represents the potential direct causal relationship
–Absence of direct edge represents zero direct causal relationship
• Assuming the acyclicity of causality, 𝒢 is a directed acyclic graph (DAG)
• Standard terminology
▪ parent, child, ancestor, descendent, path, direct path
53
54
-
11/11/2020
26
Fall 2020
55
Markovian Model
• A causal model is Markovian if
❶The causal graph is a DAG
❷All variables in 𝑼 are mutually independent
Each node 𝑋 is conditionally independent of its non-
descendants given its parents 𝑷𝒂𝑋
• Known as the local Markov condition (e.g., in Bayesian network), or causal Markov
condition in the context of causal modeling
▪ it echoes the fact that the information flows from direct causes to their effect and
every dependence between a node and its non-descendants involves the direct
causes!
Equivalent expression
Fall 2020
56
Conditional Independence
A node is conditionally independent of its A node is conditionally independent of all other
non-descendants given its parents nodes in the network given the Markov blanket,
i.e., its parents, children and children's parents
55
56
-
11/11/2020
27
Fall 2020
57
A Markovian Model and its Graph
Observed Variables
𝑽={𝐼,𝐻,𝑊,𝐸}Hidden Variables
𝑼={𝑈𝐼,𝑈𝐻,𝑈𝑊,𝑈𝐸}
Model (𝑀)𝑖=𝑓𝐼 (𝑢𝐼)ℎ=𝑓𝐻 (𝑖,𝑢𝐻)𝑤=𝑓𝑊 (ℎ,𝑢𝑊)𝑒=𝑓𝐸 (𝑖,ℎ,𝑤,𝑢𝐸)
Graph (𝐺)
Fall 2020
58
Causal Graph of Markovian Model
Each node is associated with an observable
conditional probability table (CPT) 𝑃(𝑥𝑖|𝒑𝒂𝑖)
• We can read off from the causal graph all the conditional independence relationships
encoded in the causal model (graph) by using a graphical criterion called d-separation
57
58
-
11/11/2020
28
Fall 2020
59
Definition of d-separation
• A path 𝑞 is said to be blocked by conditioning on a set 𝒁 if
▪ 𝑞 contains a chain 𝑖→𝑚→𝑗 or a fork 𝑖←𝑚→𝑗 such that the middle node 𝑚 is in 𝒁, or
▪ 𝑞 contains a sink 𝑖→𝑚←𝑗 such that the middle node 𝑚 is not in 𝒁 and such that no descendant of 𝑚 is in 𝒁
• 𝒁 is said to d-separate 𝑋 and 𝑌 [P. Spirtes, C. Glymour, R. Scheines 2000], if 𝒁 blocks every path from 𝑋 to 𝑌, denoted by (𝑋 ⊥ 𝑌 | 𝑍)𝐺▪With the Markov condition, the d-separation criterion in causal
graph G and the conditional independence relations in a
dataset D are connected such that, if we have (𝑋 ⊥ 𝑌 | 𝑍)𝐺, then we must have (𝑋 ⊥ 𝑌 | 𝑍)Data
Fall 2020
60
d-separation Examples
Path from 𝑋 to 𝑌 is blocked by the following d-separation relations
(𝑋⊥𝑌| 𝑍)𝐺,(𝑋⊥𝑌| 𝑈)𝐺,(𝑋⊥𝑌| 𝑍𝑈)𝐺(𝑋⊥𝑌| 𝑍𝑊)𝐺,(𝑋⊥𝑌| 𝑈𝑊)𝐺,(𝑋⊥𝑌| 𝑍𝑈𝑊)𝐺(𝑋⊥𝑌| 𝑉𝑍𝑈𝑊)𝐺
However we do NOT have
(𝑋⊥𝑌| 𝑉𝑍𝑈)𝐺
59
60
-
11/11/2020
29
Fall 2020
61
Causal Model of Predictions
Observed Variables 𝑽={A, …,Xi,…, D} Hidden Variables 𝑼
Model (𝑀)a=𝑓A (𝑝𝑎A, 𝑢A)xi=𝑓i (𝑝𝑎𝑖, 𝑢xi), i=1,…,md=𝑓D (𝑝𝑎D, 𝑢D)
Graph (𝐺)
𝑿
𝑼A,⋯,𝑼𝑖,⋯,𝑼D are mutually independent(Markovian Assumption)
A
P(a| 𝑝𝑎A)D
P(d| 𝑝𝑎D)
Fall 2020
62
Modeling Discrimination as Path Specific Effects
• Direct and indirect discrimination can be captured by the causal effects of A on D
transmitted along different paths
▪Direct: causal effect along direct edge 𝜋𝑑 from A to D
▪ Indirect: causal effect along causal paths 𝜋𝑖 from A to D that pass though redliningattributes R
[L. Zhang et al. 2017]
[N. Kilbertus et al. 2017]
[R. Nabi, I. Shpitser 2018] A D
▪ If we observe Prb(d+ | a+) != Prb(d- | a+) for each value assignment x of set X
d-separating A and D, the difference must be due to the existence of the direct
causal effect of A on D
61
62
-
11/11/2020
30
Fall 2020
63
𝜋𝑑-specific effect:
AD𝜋𝑑(𝑎+, 𝑎−)=Σq ( P(d+| 𝑎+,q) P(q| 𝑎−) ) – P(d+| 𝑎−)
Q is D’s parents except A
𝜋𝑖-specific effect:
AD𝜋𝑖(𝑎+, 𝑎−)=Σq( P(d+|𝑎+,q) ΠG in A𝜋𝑖P(g|𝑎+,paG\{A})
ΠH in A𝜋𝑖 \{D} P(h|𝑎−,paH\{A}) ΠO in V\ChildA P(o|𝑎+,paO ) – P(d+|𝑎−)
A𝜋𝑖 : A’s children that lie on paths in 𝜋𝑖A𝜋𝑖 : A’s children that don’t lie on paths in 𝜋𝑖
Quantitative Measuring
Fall 2020
64
Causal Paths Example
D
• A bank makes loan decisions based on the zip codes, races, and income of the
applicants▪ Inadmissible attributes
–Race: protected attribute
–Zip Code: redlining attribute
▪ Admissible attributes:
–Income: non-protected
▪ Decision: Loan
63
64
-
11/11/2020
31
Fall 2020
65
Causal Paths Fairness Metrics: Example
• AD𝜋𝑑(𝑎+, 𝑎−)=ΣZ,I(P(d+|𝑎+,z,i) – P(d+|𝑎−,z,i)) P(z|𝑎−) P(i|𝑎−)
• AD𝜋𝑖(𝑎+, 𝑎−)=ΣZ,I P(d+|𝑎−,z,i) (P(z|𝑎+) – P(z|𝑎−)) P(i|𝑎−)
D
Fall 2020
66
Causal Effect vs. Risk Difference
• The total causal effect of A (changing from 𝑎− to 𝑎+) on D is given by
▪TE(𝑎+, 𝑎−) = P(d+| do(𝑎+)) - P(d+| do(𝑎−))
transmitted along all causal paths from A to D
• Connection with the risk difference
▪TE(𝑎+, 𝑎−) = P(d+| 𝑎+) - P(d+| 𝑎−)
66
65
66
-
11/11/2020
32
Fall 2020
67
Total Causal Effect vs. Path-Specific Effect
• For any 𝜋𝑑 and 𝜋𝑖, we don’t necessarily have
AD𝜋𝑑(𝑎+, 𝑎−) + SE𝜋𝑖(𝑎
+, 𝑎−) = SE 𝜋𝑑 U 𝜋𝑖 (𝑎+, 𝑎−)
• If 𝜋𝑖 contains all causal paths from A to D except 𝜋𝑑, then
TE(𝑎+, 𝑎−) = SE𝜋𝑑(𝑎+, 𝑎−) - SE𝜋𝑖(𝑎
−, 𝑎+ )
“reverse” 𝜋𝑖-specific effect
Fall 2020
69
Summing-Up
• Observational criteria can help discover discrimination, but are insufficient on their own
▪No conclusive proof of (un-)fairness
• Causal viewpoint can help articulate problems,
organize assumptions
▪Social questions starts with measurement
▪Human scrutiny and expertise irreplaceable
• What to do with the different flavors of fairness?
▪ constrain the decision function to satisfy a fairness
flavor, or
▪ design interventions to reduce disparities in input variables and outcomes which
would reduce disparities in decisions in the long term [C. Barabas et al. 2018], [J.
Jackson, T. VanderWeele 2018]
67
69
-
11/11/2020
33
Fall 2020
70
End-to-End ML Pipelines
• Research on fairness, accountability, and transparency (FAT) of ML algorithms
and their outputs focus solely on the final steps of the data science lifecycle!
normalization
• ML literature generally assumes clean training datasets (no missing, erroneous
or duplicate values) and focuses on optimizing fairness metrics during in, pre or
post processing
Fall 2020
71
Upstream Discrimination Prevention
• Industry practitioners more often turn their attention to the data first [K. Holstein et al
2019]
▪ 65% of survey respondents reported having control over data collection or curation
• Need tools in creating datasets that support fairness upstream i.e., that address the
root cause of statistical bias in the data models are trained on [I. Chen, et al. 2018; B.
Nushi et al. 2018]
▪ e.g., tools to diagnose whether a given fairness issue might be addressed by
collecting more training data from a particular subpopulation or by better cleaning
existing training data
… and to predict how much more data are need or to be repaired
▪ i.e., tools to help actively guide data collection and pre-processing pipelines in order
to jointly optimize fairness and accuracy of downstream models
70
71
-
11/11/2020
34
Fall 2020
72
Preliminary Remarks
• Data quality issues have the potential to introduce unintended bias and variability in the
data that could potentially have a crucial impact in high-stake applications
• The impact of upstream data cleaning to downstream ML models performance depends
on
▪ the data quality issues and their distributions on the datasets (unknown)
▪ the effectiveness of the data repairing algorithms (also unknown without ground truth)
▪ the internal structure of the ML models (hard to interpret for high-accurate models)
• It is known that the statistical distortion of datasets by cleaning tasks matters when
estimating models’ accuracy [T. Dasu, J. M. Loh 2012], yet its impact on the models’
fairness is only recently started to be investigated [S. Schelter 2019] (for missing values
imputation)
Fall 2020
73
Responsible Data Science Pipelines
• Interplay of Bias Mitigation and Data Repairing interventions on data fairness and
utility for downstream ML Models
Unfairness
Mitigation
Data
Cleaning
Row DataML Model
Data
Bias
Dirty
Data
Accuracy
Fairness
IBM 360 Fairness
FairBench
FairTest
FairnessMetrics
Themis
FairPrep
data from ethnic minorities may
be noisier than data collected
from the majority ethnic group
72
73
-
11/11/2020
35
Fall 2020
75
Questions?
https://www.lepoint.fr/invites-du-point/aurelie-jean-il-ne-faut-pas-reguler-les-algorithmes-mais-les-pratiques-15-09-2019-2335766_420.php
Fall 2020
76
75
76