Humans to the Rescue: Troubleshooting AI Systems with Human-in-the-loop

Post on 22-Jan-2018

242 views 4 download

Transcript of Humans to the Rescue: Troubleshooting AI Systems with Human-in-the-loop

Humans to the Rescue: Troubleshooting

AI Systems with Human-in-the-loop

Ece Kamar

Senior Researcher, Microsoft Research AI

eckamar@microsoft.com

Exciting Times

AI and the Crowd

training data

accuracy

test data

Power of Data

[Banko&Brill, 2001]

In the Wild

In the Wild

Hybrid Intelligence

Human Intelligence

AI Systems

AI Applied to Critical Domains

Power of the Hybrid

[Courtesy of Murray Campbell]

Troubleshooting of ML Systems

training data

accuracy

test data

querysystem

response

execution

data

In the lab

In the wild

What is the performance in the wild?

How does the system fail?

Why does the system fail?

How the system can be improved?

Biases in ML

[Lakkaraju, K., Caruana, Horvitz; AAAI 2017]

Biases in ML

[Lakkaraju, K., Caruana, Horvitz; AAAI 2017]

Biases in ML

[Lakkaraju, K., Caruana, Horvitz; AAAI 2017]

Where do Blind Spots Come From?

M

cats

dogs

cat

(conf = 0.96)

Unknown unknowns: Data points with confident but incorrect predictions.

Blind-spots: Feature spaces with high concentration of unknown unknowns

Blind-spots Detection

execution data

Beat the Machine [Attenberg, Ipeirotis, Provost, 2011]

Exploration of Unknown Unknowns[Lakkaraju, K., Caruana, Horvitz, 2011]

Step 1:

Descriptive

Space

Partitioning

execution data

Step 2:

Multi-armed

Bandit

based

Exploration

Troubleshooting Complex Systems

Challenge

Possible fixes

for each

component

Limited development time

Where to invest

development time for

biggest impact?

Human-assisted troubleshooting methodology

system

outputComponent

1

Component

2

Component

3

I/OI/O

Evalu

ation

Failures

Fixes

[Nushi, K., Kossmann, Horvitz, 2011]

Complex Issues

Fairness Biases

TransparencyResponsibility

Good vs. Bad

Policy & Law

Complex challenges

require collective efforts

No AI is perfect