Engineering Ethics: Practicing Fairness

Engineering Ethics: Practicing FairnessClare Corthell

@clarecorthell [email protected]

Data Science and Machine Learning Consulting

gatekeepers of critical life decisions• getting help with homework

• going to college

• buying a car

• getting a mortgage

• getting sentenced in prison

• getting hired

• keeping a job

one of our biggest problems? unfairness of prediction.

*Yes, I will somewhat controversially use “prediction” to refer to both predicting values and predicting class labels (classification); many methods and scenarios here do not apply equivalently to both.

define fairness

Dwork, et al:

similar people should be treated similarly

dissimilar people should be treated dissimilarly

for our technical purposes, we define the subjective societal value of fairness as:

ex: if two people drive similarly, they should receive similar insurance terms

— Abe Gong, Data Scientist

“Powerful algorithms can be harmful and unfair, even when they’re unbiased in a strictly technical sense.”

3 examples of unfair outcomes

Character Testing & Disability Discrimination①

“Good intent or absence of discriminatory intent does not redeem employment procedures or testing mechanisms that operate as 'built-in headwinds' for minority groups”

— Warren Burger, Chief Justice, Griggs v. Duke Power Company, 1971

It is illegal to hire employees based on:

• intrinsic traits like ethnicity or gender (Equal Employment Opportunity Commission, 1965)

• disability (Americans with Disabilities Act, 1990)

• intelligence quotient or “IQ” (Griggs v. Duke Power Company, 1971)

①

In the US, 60-70% of job candidates currently undergo character testing, which is unregulated outside of the aforementioned laws. These tests screen candidates for things like “commuting time” and “agreeableness,” presenting issues of redline and disability discrimination. Problematically, there is little proof that this does not constitute a fresh “built-in headwinds” for minority groups, and in turn a problem for both employers and employees.

Google’s people operations recently exposed that characteristics like GPA did not predict whether an employee would perform well. This indicates that even customary industry practices may not be strongly correlated with the ground truth they intend to predict, particularly employability, performance, and retention.

Character Testing & Disability Discrimination

"Data analytics have the potential to eclipse longstanding civil rights protections in how personal information is used in housing, credit, employment, health, education, and the marketplace”

— White House Report “Big Data: Seizing Opportunities, Preserving Values”

② Insurance Premiums

In the US, banks did not lend within blocks where African-Americans lived, called“redlining,” until it became illegal through the Fair Housing Act of 1968. Standard practices like behavioral segmentation are used to“steer” consumers to less favorable terms based on behavior unrelated to their creditworthiness. These practices are unfair and threaten the principles of the Fair Housing Act.

Future Startup FoundersA decision tree classifier was trained on a set of (seemingly meritocratic) features, then used to predict who might start a company:

• College Education • Computer Science major • Years of experience • Last position title • Approximate age • Work experience in venture backed company

③

the “meritocratic” approach does not work because protected characteristics are

redundantly encoded

Characteristics like gender, race, or ability are often correlated with a combination of multiple other features.

blindness is not the answerrace-blind, need-blind, able-blind, etc

0. data 1. black box

2. scale 3. impact

Problems

0. biased data• data at scale of people’s past decisions are naturally socially biased, and models

will learn that unfairness

• data is dirty and often simply wrong

• data at scale often encodes protected characteristics like race, ability, and health markers

• restricted options, or menu-driven identity mistakes, create worthless or dirty data

• no ground truth to test our assumptions against

• big data is usually not big data for protected classes. Less data for the protected class means bigger error bars and worse predictions

1. black box

• many machine learning systems are not inspectable, because of high dimensionality, hidden layer relationships, etc

• there are limits to what data scientists understand about how their models are learning, because they (probably) didn’t build them

• data scientists make choices — hypotheses, premise, training data selection, processing, outlier exclusion, etc.

- Cathy O’Neil, Weapons of Math Destruction

“Our own values and desires influence our choices, from the data we choose to collect to the questions we ask.

Models are opinions embedded in mathematics.”

2. scale

• modeled decisions are exponentially scalable compared to linear human decisions

• faster

• centralized

3. impact

unfair outcomes often results when specific biases of the data are left unexamined, especially problematic because:

• no user feedback — people do not have personal interactions with decision-makers or recourse

biased data + black box + scale = invisible feedback loops

critical decisions are now in the hands of a model and its designer

instead of trained people

often a “data scientist”

solutions

define fairness

Dwork, et al:

similar people should be treated similarly

dissimilar people should be treated dissimilarly

for our technical purposes, we define the subjective societal value of fairness as:

solutions: constructing fairness• data scientists must construct fairness explicitly (Dwork et al)

• fairness is task-specific, requiring:

• development of context-specific non-blind fairness metrics that utilize protected class attributes (eg gender, race, ability, etc)

• development of context-specific individual similarity metric that is as close as possible to the ground truth or best approximation (ex: measure of how well someone drives to test fairness of insurance terms)

• historical context has bearing on impact (ex: until 1968, african-americans were often denied insurance and loans, which has downstream effects)

solutions: tools and design• inspectability tools to better inspect the whole stack — from

training data to preprocessing algorithms to learned models

• data scientists making critical decisions should validate and check assumptions with others

• better user research: investigate error cases, not just error rates

• better experience design: user outcome feedback systems allow users to help you help them surface and correct bad predictions

Why be fair?sticks & carrots

why be fair? sticks

• treating people differently based on their innate or protected characteristics is wrong and illegal

• adversarial learning exploits proxy measures, or people will learn how to game the system

• unfair predictions leave money on the table; not lending to someone who is falsely predicted to be a higher risk is a missed opportunity

• being unfair begets bad press and accelerates regulation

• consumers dislike unfair companies, much more than they dislike companies that fail to preserve their privacy

why be fair? carrots

• doing good business - there are missed opportunities in not lending to hard-working people, in not funding atypical founders, in not hiring people who think differently and bring new value

• if industry is able to build proof of fair practices prior to regulation, industry might preempt and limit regulation with its own preferred fairness proofs

• we can stop limiting of who people can become by intervening in the self-defeating feedback loop

• when we centralize control, it presents a unique opportunity to correct biases

a paradigm change is an opportune moment

we’re at a special moment when

decisions are being centralized, from distributed groups of people

to central computational decision-making,

which gives us the opportunity and responsibility to correct socially endemic biases

for the benefit of both society and business

bottom line —

it is the professional responsibility of every data scientist to ensure fairness in the

interest of both their business and society

#EthicalAlgorithms

Data Science Practitioner group in San Francisco, hosted by The Design Guild, with the goal of discussing and actively creating fairness:

• Ethics Peer Reviews

• Forum on Fairness and Privacy in Data Science (talk with Data Scientists, Ethics Consultants, Academics, etc)

• Constructing a Professional Responsibility Manifesto for Data Scientists

Thank You

@clarecorthell [email protected]

Data Science and Machine Learning Consulting

referencesAcademic

• “Fairness Through Awareness” Dwork, et al. 2011.

• “Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems” Datta, et al.

Reports

• “Big Data: Seizing Opportunities, Preserving Values” The White House, 2014

• “Will you care when you pay more? The negative side of targeted promotions” Tsai, 2015

Books

• Weapons of Math Destruction, Cathy O’Neil, 2016

• Cybertypes: Race, Ethnicity, and Identity on the Internet, Lisa Nakamura, 2002. (defines “menu-driven identities)

Blog Posts

• Ethics for powerful algorithms, Abe Gong, 2016

https://arxiv.org/pdf/1104.3913v2.pdf

https://www.andrew.cmu.edu/user/danupam/datta-sen-zick-oakland16.pdf

https://www.whitehouse.gov/sites/default/files/docs/big_data_privacy_report_may_1_2014.pdf

http://ir.lib.kuas.edu.tw/bitstream/987654321/13913/2/Will+you+care+when+you+pay+more.pdf

http://amzn.to/2cP2YlO

http://amzn.to/2dcugE0

https://medium.com/@AbeGong/ethics-for-powerful-algorithms-1-of-3-a060054efd84#.g5vuasbyd

Engineering Ethics: Practicing Fairness

Technology

Transcript of Engineering Ethics: Practicing Fairness