Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks)...

44
Discovering Bugs in NLP Models Using Natural Perturbations Sameer Singh Stanford University May 2 nd , 2019

Transcript of Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks)...

Page 1: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Discovering Bugs in NLP Models Using Natural Perturbations

Sameer Singh

Stanford University May 2nd, 2019

Page 2: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

circa 2005

[adapted from Zadeh 2005, From Search Engines to Question-Answering Systems — The Need for New Tools]

Page 3: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

2019NLP has come a long way!

Page 4: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

But we know models remain brittle…Jia and Liang, EMNLP 2017

Feng et al, EMNLP 2018

Anton van den Hengel, ACL 2018

Page 5: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

How do we discover bugs in NLP?

A software bug is an error, flaw, failure or fault in a computer program or system that causes it to produce an incorrect or unexpected result, or to behave in unintended ways.

Original Data Original PredictionNLP Pipeline

NLP Pipeline Unexpected Prediction!Changed Data

Perturb it in a specific way

Page 6: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Outline

Changing individual instances

Semantically Equivalent Adversaries

Semantically Implied Adversaries

Universal Adversaries

Changing training data

Link Prediction Adversaries

Page 7: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Outline

Changing individual instances

Semantically Equivalent Adversaries

Semantically Implied Adversaries

Universal Adversaries

Changing training data

Link Prediction Adversaries

ACL 2018

Page 8: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Adversarial Examples: Oversensitivity

Find closest example with different prediction

8

x f y

x' f y

Page 9: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Oversensitivity in images

Adversaries are indistinguishable to humans…

But unlikely in the real world (except for attacks)

“panda”

57.7% confidence

“gibbon”

99.3% confidence

9

Page 10: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

What about text?

What type of road sign is shown?

> STOP.

What type of road sign is shown?

Perceptible by humans, unlikely in real world

What type of road sign is sho wn?

10

Page 11: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

What about text?

What type of road sign is shown?

> STOP.

What type of road sign is shown?

A single word changes too much!

11

Page 12: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Semantics matter

What type of road sign is shown?

> Do not Enter.

> STOP.

What type of road sign is shown?

Bug, and likely in the real world

12

Page 13: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Semantics matter

The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people.It is the second-longest river in Central and Western Europe (after the Danube), at about 1,230 km (760 mi)

How long is the Rhine?

> More than 1,050,000

> 1230km

How long is the Rhine?

Not all changes are the same: meaning should be same

13

Page 14: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

How do we do this?Semantically-Equivalent Adversary

(SEA)Semantically-Equivalent Adversarial Rules

(SEARs)

color → colour

xBacktranslation

+ Filteringx’ (x, x’)

CommonPatterns

Rules

Page 15: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

SEARs Examples: VisualQA

15

Visual7a-Telling [Zhu et al 2016]

Page 16: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

SEARs Examples: SQuAD

16

BiDAF [Seo et al 2017]

Page 17: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

SEARs Example: Sentiment

fastText [Joulin et al., 2016]

Page 18: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Semantic Adversaries

18

Semantics matterSEA SEARS

Models are prone to these bugs

SEAs and SEARs help find and fix them

Page 19: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Outline

Changing individual instances

Semantically Equivalent Adversaries

Semantically Implied Adversaries

Universal Adversaries

Changing training data

Link Prediction Adversaries

ACL 2019

Page 20: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Consistency in predictions

How many birds? 1

So far, we have considered equivalence, i.e. (x, y) → (x’, y)

Yes

(x, y)

(x’, y’) Is there 1 bird?

Page 21: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Evaluating Implication Consistency

ValidationData

Modelf

(x, y)ImplicationGeneration

Implications

(x,y), (x’,y’)

Consistency

# y ꓥ y’ correct

# y correct

based on parses,POS, WordNet, etc.

Page 22: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Visual QA

(x, y): What room is this? bathroom

Logical Equivalence

(x’, y’): Is this a bathroom? Yes

Necessary Condition

(x’, y’): Is there a bathroom in the picture? Yes

Mutual Exclusion

(x’, y’): Is this a kitchen? No

57%

50%

35%

67%

97% are valid!

Page 23: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Visual QA Results

Model Acc LogEq Mutex Nec Avg Augmentation

SAAA (Kazemi, Elqursh, 2017) 61.5 76.6 42.3 90.2 72.7 94.4

Count (Zhang et al., 2018) 65.2 81.2 42.8 92.0 75.0 94.1

BAN (Kim et al., 2018) 64.5 73.1 50.4 87.3 72.5 95.0

Good at answer w/ numbers, but not questions w/ numberse.g. How many birds? 1 (12%) → Are there 2 birds? yes (<1%)

Page 24: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

SQuAD

SubjWhen did Zhenjin die? 1285→ Who died in 1285? Zhenjin 29%

DobjWhen did Denmark join the EU? 1972→ What did Denmark join in 1972? the EU 10%

Amod When did the Chinese famine begin? 1331→ Which famine began in 1331? Chinese 30%

73%

97% are valid!

PrepWho received a bid in 1915? Edison→ When did Edison receive a bid? 1915 46%

[ Demszky et al. 2018 ]

Page 25: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

SQuAD Results

Model F1 Subj Dobj Amod Prep Avg Augmentation

bidaf (Seo et al., 2017) 77.9 70.6 65.9 75.1 72.4 72.1 79.1

bidaf+e (Peters et al., 2018) 81.3 71.2 69.3 75.8 72.8 72.9 81.2

rnet (Wang et al., 2017) 79.5 68.5 67.0 74.7 70.7 70.9 79.8

Mnem (Hu et al., 2018) 81.5 70.3 68.0 75.8 71.9 72.2 81.5

Bad at questions with Wh-word as direct objecte.g. Who is Moses? (53%) vs Who did Hayk defeat? (12%)

Page 26: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Implication Adversaries

• We shouldn’t treat each prediction in isolation• Inconsistency leads to poor user experience

• Currently, rule-based system for generating them

• Already promising!• Reveals important bugs in the models

• Even simple data augmentation is promising

Page 27: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Outline

Changing individual instances

Semantically Equivalent Adversaries

Semantically Implied Adversaries

Universal Adversaries

Changing training data

Link Prediction Adversaries

in preparation

Page 28: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Universal Adversaries

• Instead of replacement, let’s consider additions

• Are there tokens that make the model misbehave?

b has nothing to do with the task, and is not specific to x

x f y

b+x f y

REDACTED for anonymity period

Page 29: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

How do we do this?

x

b

x yf

x yf

x yf

yf

b REDACTED for anonymity period

Page 30: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Textual Entailment

Token added ESIM DA DA-Elmo

89.5 89.46 90.88

nobody* 0.0 0.1 0.5

never 0.5 1.0 0.2

sad 1.5 0.5 0.7

scared 1.1 0.7 1.0

championship 0.8 0.1 0.8

Entailment

Token added ESIM DA DA-Elmo

89.5 89.46 90.88

nobody* 0.5 8.5 13.6

sleeps 4.6 14.8 22.3

nothing* 1.7 23.6 14.6

none 5.9 17.5 15.4

sleeping* 6.1 15.8 28.8

Neutral

* Identified as artifacts by [Gururangan et. al. 2018]

Premise: Two dogs are running through a field.

Hypothesis: nobody There are animals outdoors.

Entailment

Contradiction

REDACTED for anonymity period

Page 31: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Language Modeling (GPTv2 small)

TH PEOPLEMan god dreams Blacks are the worst people in the world.

A few token prefix that makes the model produce racist language

TH PEOPLEMan god dreams Blacks I bought a mac and a soda for $10.00. I’m not a racist, but I’m not saying that blacks should be allowed to buy a soda.

TH PEOPLEMan god dreams Blacks yesterday I’m going to be a fucking black man. I don’t know what to say to that, but fuck you.

TH PEOPLEMan god dreams Blacks my mother says I’m a racist, and she’s right.

Searched so that we even user input after prefix causes problemsWARNING: Strong Language

REDACTED for anonymity period

Page 32: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Changing Instances

• “Adversarial attacks” for NLP• Semantically Equivalent • Semantic Implications• Universal Tokens

• Useful for identifying different kinds of problems• Not all of them are traditional “bugs”

• General set of approaches that apply for most NLP models

Page 33: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Outline

Changing individual instances

Semantically Equivalent Adversaries

Semantically Implied Adversaries

Universal Adversaries

Changing training data

Link Prediction Adversaries NAACL 2019

Page 34: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Different Kind of Model: Link Prediction

Relation Prediction

Entity Prediction

Page 35: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Knowledge Base Completion

Table from Dettmers, et al. (2018)

ScoringFunction

Page 36: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Link Prediction Example

Why was this prediction made?

What is this sensitive to?

Depends on the graph structure!

Page 37: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Link Prediction: Removing a Link

Page 38: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Link Prediction: Adding a Link

Page 39: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

How do we do it?

argmax(𝑠′,𝑟′)

𝜙 𝑠, 𝑟, 𝑜 − ത𝜙(𝑠, 𝑟, 𝑜)

Original score

Score after retraining

Too many links to search!

Learn a continuous space of links, and search using

gradient descent

Retraining is too expensive!

Taylor approximation,and utilize graph structure

Link to add/remove from the graph

Page 40: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Adding Links: How sensitive is the model?

0

5

10

15

20

25

30

35

40

45

DistMult ConvE

Yago3 Hits@1 (Adding a fake link)

Original After Change

70

75

80

85

90

95

DistMult ConvE

WordNet Hits@1 (Adding a fake link)

Original After Change

Page 41: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Removing Links: Cause behind prediction

isMarriedTo(a,c)∧ hasChild(c,b) ⇒ hasChild(a,b)Bug in DistMult and ConvE

playsFor(a,c) ∧ isLocatedIn(c,b) ⇒ wasBornIn(a,b)*

isAffiliatedTo(a,c)∧ isLocatedIn(c,b) ⇒ diedIn(a,b)*

* Identified as rules by [Yang et. al. 2015]

Only in DistMult

hasAdvisor(a,c)∧ graduatedFrom(c,b) ⇒ graduatedFrom(a,b)

influences(a,c)∧ influences(c,b) ⇒ influences(a,b)Only in ConvE

Summarize by rule mining on which edges are used

Page 42: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Changing the training data

• Sometimes, “bugs” are problems in the training data/pipeline• Embeddings of all kinds, for example

• To find these bugs, you need to change the training data• And efficiently estimate the effect of retraining

• We show how to do that for link prediction

Page 43: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Outline

Changing individual instances

Semantically Equivalent Adversaries

Semantically Implied Adversaries

Universal Adversaries

Changing training data

Link Prediction Adversaries

Page 44: Discovering Bugs in NLP Models Using Natural …But unlikely in the real world (except for attacks) “panda” 57.7% confidence “gibbon” 99.3% confidence 9 What about text? What

Thanks!

[email protected]

@sameer_

Work with Matt Gardner and me

as part ofThe Allen Institute for Artificial Intelligence

in Irvine, CA

All levels: pre-PhD, PhD interns, postdocs, and research scientists!