None of the above: A Bayesian account of the detection of...

Post on 03-Jun-2020

15 views 0 download

Transcript of None of the above: A Bayesian account of the detection of...

None of the above: A Bayesian account of the detection

of novel categories

Dan Navarro School of Psychology

University of New South Wales

Charles Kemp School of Psychology

Carnegie Mellon University

compcogscisydney.com/projects.html#noneoftheabove

Dragon Unicorn Dragon Dragon

DragonUnicorn Unicorn Unicorn

Is this a dragon or a unicorn?

Unicycle? Segway? Roomba?

Unicycle? Segway? Roomba?

None of the above… this is the first item from a novel category

air wheels dabs dwarf planets

The “mental dictionary” of categories is extensible… how do we know when to extend it?

Structure of the talk

• Qualitative desiderata, models, a priori predictions

• Experiments with minimal cues

• Exp. 1: people satisfy the desiderata

• Exp. 2: no they don’t

• An absurd number of computational models

• Experiments with similarity structure

• Exp. 3: people integrate similarity & distribution

• Exp. 4: a better version of Exp. 3

• Conclusions

Qualitative desiderata for the discovery of new categories…

(Zabell 2011)

Any sequence of observations is

possible, so I must (a priori) assign

non-zero probability to them

= 2/5 unicorns

= 2/5 unicorns

Same number of unicorns… so my beliefs about P(unicorn) should be the same

Same number of familiar categories so the probability of a new category is the same

= 2 categories

= 2 categories

What prior beliefs must a learner have in order to satisfy those desiderata?

Bayesian category learning models use the “Chinese restaurant process” (CRP)…

P (old k) / nk“Strength” associated with an existing

category is proportional to its frequency

(Anderson 1990, Sanborn, Griffiths & Navarro 2010, etc)

Bayesian category learning models use the “Chinese restaurant process” (CRP)…

P (new) / ✓

P (old k) / nk

There is a fixed strength associated with novelty

(Anderson 1990, Sanborn, Griffiths & Navarro 2010, etc)

… but it’s a special case: the full* solution to the problem is the generalised CRP

* sort of

P (old k) / nk � ↵

P (new) / ✓ +K↵

(Zabell, 2011)

… but it’s a special case: the full* solution to the problem is the generalised CRP

* sort of

P (old k) / nk � ↵ “Strength” of old categories is slightly attenuated relative

to the CRP…

(Zabell, 2011)

… but it’s a special case: the full* solution to the problem is the generalised CRP

* sort of

P (old k) / nk � ↵

P (new) / ✓ +K↵

… because every time a new category appears, P(new) goes up

(Zabell, 2011)

What empirical predictions does the G-CRP make for human novelty

detection?

1: The familiar addition effect

1,1

2,1

3,1Adding examples from familiar

categories should decrease the probability of labelling the

next thing as “novel”

2: The novel addition effect

1,1

1,1,1

1,1,1,1

Adding an example from a novel category should

increase the probability that the next item is also “novel”

3: No effect of transfer

5,1 4,2 3,3

Nothing else about the frequency table matters except the number of exemplars N and

the number of categories K

Experiments…

Experiment 1

Stimuli were just arbitrary alphanumeric labels, to prevent similarity effects

categories

exemplars

Judge the probability that the next item will come from a new category, for every possible frequency

table with 6 or fewer exemplars

Familiar addition effect…

Novel addition effect…

No transfer effect!

Experiment 2

9,1,1,1

Experiment 2

6,2,2,2

Experiment 2

3,3,3,3

Experiment 2

12 objects in 4 categories

Experiment 2

Vary the number of categories from 2 to 10

3,3,3,39,1,1,1

Does this transfer have an effect?

3,3,3,39,1,1,1

The transfer effect exists(it’s small, so you need bigger frequency tables)

Do these results pose a serious theoretical challenge to categorisation

models?

A list of heuristic methods for estimating the probability that the

next object will be novel

They don’t work

Most existing category learning models

(SUSTAIN, simplicity, etc) also fail

What about the Bayesian models?

Expt 1 Expt 2 Expt 3 Expt 4

r = 0

● ● ●●●●● ● ● ●●●● ●●● ●● ●● ●●● ● ●●● ●●

r = 0.21●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

r = 0.98● ● ●●●●

●●

●●

●●

●●

●●

r = 0.99●

● ●●●●

●●●

●●

●●

r = 0.96●

● ●●●●

●●

●●

●●

●●

●●

●●

●●

●●

r = 0.98● ● ●●●●

●●●

●●

●●

●●

r = 0.98● ● ●●●

●●

●●

●●

●●

●●

r = 0

●●●●●● ●●●●● ●●●●● ●●● ●●●● ●● ●●● ●●●●

r = 0

●●●●●● ●●●●● ●●●●● ●●● ●●●● ●● ●●● ●●●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●

r = 0.97

●●●●●●●●●●●

●●●●●●●●

●●●●

●●

●●●

●●

●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●

●●●

●●●

●●

●●

r = 0.88●●●●

●●

●●●●●

●●● ●● ●

● ●● ●● ●

r = 0.88●●●●●●

●●●●●●

●●● ●● ●

● ●● ●● ●

r = 0.87●●●

●●

●●●

●●

●●

r = 0.9●●●

●●

●●

●●●

●●

●●

●●

r = 0.94●

●●

●●

r = 0.91●

●●

●●

r = 0.96●●●

●●●

●●

●●

●●●

●●

●●

r = 0.9

● ●●●● ●● ●●●

● ●●●● ●●●●●

●●●

●●

●●

●●

r = 0.58

●●

●●

●●●

● ●●

●●

●●●

●●●

●●

●●

r = 0.74

●●●

● ●●

●● ●●

●● ●●●●●

●●●

●●

r = 0.76

●●●

● ●●

●● ●●

●● ●●●●●

●●●

●●

r = 0.94

●●

●●

●●●

●●

●●●●

r = 0.93

●●

●●●

●●

●●

r = 0.95

●●

●●

●●

●●

●●●

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

uniformC

RP

TTRG−C

RP

H−C

RP

H−TTR

HG−C

RP

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

Human

Mod

el

Exp 1 Exp 2

Model

Good quantitative

fit?

Predicts transfer effect?

x xHuman

Despite being near-universal among Bayesian models of categorisation, the CRP is terrible

Expt 1 Expt 2 Expt 3 Expt 4

r = 0

● ● ●●●●● ● ● ●●●● ●●● ●● ●● ●●● ● ●●● ●●

r = 0.21●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

r = 0.98● ● ●●●●

●●

●●

●●

●●

●●

r = 0.99●

● ●●●●

●●●

●●

●●

r = 0.96●

● ●●●●

●●

●●

●●

●●

●●

●●

●●

●●

r = 0.98● ● ●●●●

●●●

●●

●●

●●

r = 0.98● ● ●●●

●●

●●

●●

●●

●●

r = 0

●●●●●● ●●●●● ●●●●● ●●● ●●●● ●● ●●● ●●●●

r = 0

●●●●●● ●●●●● ●●●●● ●●● ●●●● ●● ●●● ●●●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●

r = 0.97

●●●●●●●●●●●

●●●●●●●●

●●●●

●●

●●●

●●

●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●

●●●

●●●

●●

●●

r = 0.88●●●●

●●

●●●●●

●●● ●● ●

● ●● ●● ●

r = 0.88●●●●●●

●●●●●●

●●● ●● ●

● ●● ●● ●

r = 0.87●●●

●●

●●●

●●

●●

r = 0.9●●●

●●

●●

●●●

●●

●●

●●

r = 0.94●

●●

●●

r = 0.91●

●●

●●

r = 0.96●●●

●●●

●●

●●

●●●

●●

●●

r = 0.9

● ●●●● ●● ●●●

● ●●●● ●●●●●

●●●

●●

●●

●●

r = 0.58

●●

●●

●●●

● ●●

●●

●●●

●●●

●●

●●

r = 0.74

●●●

● ●●

●● ●●

●● ●●●●●

●●●

●●

r = 0.76

●●●

● ●●

●● ●●

●● ●●●●●

●●●

●●

r = 0.94

●●

●●

●●●

●●

●●●●

r = 0.93

●●

●●●

●●

●●

r = 0.95

●●

●●

●●

●●

●●●

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

uniformC

RP

TTRG−C

RP

H−C

RP

H−TTR

HG−C

RP

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

Human

Mod

el

Exp 1 Exp 2

Model

Good quantitative

fit?

Predicts transfer effect?

xHuman

The generalised CRP does better, but misses the transfer effect

Generalised CRP

Learner observes exemplars from a subset of the categories

Unknown frequency distribution over many possible categories

(n1, n2, . . . , nK)

(p1, p2, p3, . . .)

Hierarchical generalised CRP

Learner observes exemplars from a subset of the categories

Unknown frequency distribution over many possible categories

Structure of the world that constrains the distribution

↵, ✓

(n1, n2, . . . , nK)

(p1, p2, p3, . . .)

Structure of the world that constrains the distribution

The HG-CRP model learns that this is a world with many low-frequency categories (infers a high ) and expects to see even

more low-frequency categories↵

Structure of the world that constrains the distribution

The HG-CRP model learns that this is a world with very few low-frequency

categories (infers a low ) and does not expect to see more LF categories

Expt 1 Expt 2 Expt 3 Expt 4

r = 0

● ● ●●●●● ● ● ●●●● ●●● ●● ●● ●●● ● ●●● ●●

r = 0.21●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

r = 0.98● ● ●●●●

●●

●●

●●

●●

●●

r = 0.99●

● ●●●●

●●●

●●

●●

r = 0.96●

● ●●●●

●●

●●

●●

●●

●●

●●

●●

●●

r = 0.98● ● ●●●●

●●●

●●

●●

●●

r = 0.98● ● ●●●

●●

●●

●●

●●

●●

r = 0

●●●●●● ●●●●● ●●●●● ●●● ●●●● ●● ●●● ●●●●

r = 0

●●●●●● ●●●●● ●●●●● ●●● ●●●● ●● ●●● ●●●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●

r = 0.97

●●●●●●●●●●●

●●●●●●●●

●●●●

●●

●●●

●●

●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●

●●●

●●●

●●

●●

r = 0.88●●●●

●●

●●●●●

●●● ●● ●

● ●● ●● ●

r = 0.88●●●●●●

●●●●●●

●●● ●● ●

● ●● ●● ●

r = 0.87●●●

●●

●●●

●●

●●

r = 0.9●●●

●●

●●

●●●

●●

●●

●●

r = 0.94●

●●

●●

r = 0.91●

●●

●●

r = 0.96●●●

●●●

●●

●●

●●●

●●

●●

r = 0.9

● ●●●● ●● ●●●

● ●●●● ●●●●●

●●●

●●

●●

●●

r = 0.58

●●

●●

●●●

● ●●

●●

●●●

●●●

●●

●●

r = 0.74

●●●

● ●●

●● ●●

●● ●●●●●

●●●

●●

r = 0.76

●●●

● ●●

●● ●●

●● ●●●●●

●●●

●●

r = 0.94

●●

●●

●●●

●●

●●●●

r = 0.93

●●

●●●

●●

●●

r = 0.95

●●

●●

●●

●●

●●●

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

uniformC

RP

TTRG−C

RP

H−C

RP

H−TTR

HG−C

RP

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

Human

Mod

elExp 1 Exp 2

Model

Human

HG-CRP provides a good quantitative fit

●● ●

●●

● ●● ●

●● ●

●●

● ●●

● ●●

●●

●●

●● ●

●●

● ●● ●

●● ●

●●

● ●●

● ●●

●●

●●

●● ●

●●

● ●● ●

●● ●

●●

● ●●

● ●●

●●

●●

●● ●

●●

● ●● ●

●● ●

●●

● ●●

● ●●

●●

●●

●● ●

●●

● ●● ●

●● ●

●●

● ●●

● ●●

●●

●●

●● ●

●●

● ●● ●

●● ●

●●

● ●●

● ●●

●●

●●

CRP Hierarchical CRP

Types Tokens Ratio Hierarchical TTR

Generalized CRP Hierarchical Generalized CRP

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

[11]

1[1

0]2 93 84 75 66

[10]

11 822

633

552

444

9111

6222

5511

4422

3333

8111

142

222

3322

2

7111

1144

1111

3331

1122

2222

6111

111

2222

211

5111

1111

3311

1111

2222

1111

4111

1111

122

2111

111

3111

1111

1122

1111

1111

[11]

1[1

0]2 93 84 75 66

[10]

11 822

633

552

444

9111

6222

5511

4422

3333

8111

142

222

3322

2

7111

1144

1111

3331

1122

2222

6111

111

2222

211

5111

1111

3311

1111

2222

1111

4111

1111

122

2111

111

3111

1111

1122

1111

1111

Frequency Table

Prob

abilit

y of

a N

ew C

ateg

ory

It also captures the transfer effect

Is this actually a categorisation problem?(a.k.a. Do people still do this in a standard task when similarity

information exists?)

Dragon Unicorn

DragonUnicorn

In most categorisation tasks we have similarity information

Dragon Unicorn

DragonUnicorn

???

High similarity target is less likely to be novel

Dragon Unicorn

DragonUnicorn

???

Low similarity target is more likely to be novel

B A ? ?

C B A ? ?

D C B A ? ?

B B A ? ?

B B B A ? ?

C C B A ? ?

Near Far

20 40 60 80 100

11

111

1111

21

31

211

Stimulus Value

Trai

ning

Con

ditio

n

Training items vary on a single continuous stimulus dimension

B A ? ?

C B A ? ?

D C B A ? ?

B B A ? ?

B B B A ? ?

C C B A ? ?

Near Far

20 40 60 80 100

11

111

1111

21

31

211

Stimulus Value

Trai

ning

Con

ditio

n

Near test

Far test

Similarity manipulation:

B A ? ?

C B A ? ?

D C B A ? ?

B B A ? ?

B B B A ? ?

C C B A ? ?

Near Far

20 40 60 80 100

11

111

1111

21

31

211

Stimulus Value

Trai

ning

Con

ditio

nThe training items form a

2,1 frequency table

B A ? ?

C B A ? ?

D C B A ? ?

B B A ? ?

B B B A ? ?

C C B A ? ?

Near Far

20 40 60 80 100

11

111

1111

21

31

211

Stimulus Value

Trai

ning

Con

ditio

n

6 tables x 2 tests = 12 categorisation tasks

Experiment 3

Lots of stimulus sets used:

New Category Nearest Old Category Other Old Category

● ●

● ●

●● ●

● ●

● ●

● ● ●

● ● ●● ●

● ● ●● ●●

0.00

0.25

0.50

0.75

0.00

0.25

0.50

0.75

Near Test Item

Far Test Item

11 21 31 111 211 1111 11 21 31 111 211 1111 11 21 31 111 211 1111Training Condition

Res

pons

e Pr

obab

ility

New Category Nearest Old Category Other Old Category

● ●

● ●

●● ●

● ●

● ●

● ● ●

● ● ●● ●

● ● ●● ●●

0.00

0.25

0.50

0.75

0.00

0.25

0.50

0.75

Near Test Item

Far Test Item

11 21 31 111 211 1111 11 21 31 111 211 1111 11 21 31 111 211 1111Training Condition

Res

pons

e Pr

obab

ility

Near test Far test

P(new)

Similarity effect

New Category Nearest Old Category Other Old Category

● ●

● ●

●● ●

● ●

● ●

● ● ●

● ● ●● ●

● ● ●● ●●

0.00

0.25

0.50

0.75

0.00

0.25

0.50

0.75

Near Test Item

Far Test Item

11 21 31 111 211 1111 11 21 31 111 211 1111 11 21 31 111 211 1111Training Condition

Res

pons

e Pr

obab

ility

New Category Nearest Old Category Other Old Category

● ●

● ●

●● ●

● ●

● ●

● ● ●

● ● ●● ●

● ● ●● ●●

0.00

0.25

0.50

0.75

0.00

0.25

0.50

0.75

Near Test Item

Far Test Item

11 21 31 111 211 1111 11 21 31 111 211 1111 11 21 31 111 211 1111Training Condition

Res

pons

e Pr

obab

ility

Near test Far test

P(new)

Familiar addition

New Category Nearest Old Category Other Old Category

● ●

● ●

●● ●

● ●

● ●

● ● ●

● ● ●● ●

● ● ●● ●●

0.00

0.25

0.50

0.75

0.00

0.25

0.50

0.75

Near Test Item

Far Test Item

11 21 31 111 211 1111 11 21 31 111 211 1111 11 21 31 111 211 1111Training Condition

Res

pons

e Pr

obab

ility

New Category Nearest Old Category Other Old Category

● ●

● ●

●● ●

● ●

● ●

● ● ●

● ● ●● ●

● ● ●● ●●

0.00

0.25

0.50

0.75

0.00

0.25

0.50

0.75

Near Test Item

Far Test Item

11 21 31 111 211 1111 11 21 31 111 211 1111 11 21 31 111 211 1111Training Condition

Res

pons

e Pr

obab

ility

Near test Far test

P(new)

Novel addition?

=

=

Experiment 4

Similarity effect

Near Far

0.00

0.25

0.50

0.75

0.00

0.25

0.50

0.75

0.00

0.25

0.50

0.75

New

Category

Observed

Other O

ld

[1]

[2]

[3]

[4]

1[1]

2[1]

3[1]

11[1

]

21[1

]

[1]

[2]

[3]

[4]

1[1]

2[1]

3[1]

11[1

]

21[1

]

Frequency Table

Prob

abilit

y of

Cat

egor

y

Familiar addition

Near Far

0.00

0.25

0.50

0.75

0.00

0.25

0.50

0.75

0.00

0.25

0.50

0.75

New

Category

Observed

Other O

ld

[1]

[2]

[3]

[4]

1[1]

2[1]

3[1]

11[1

]

21[1

]

[1]

[2]

[3]

[4]

1[1]

2[1]

3[1]

11[1

]

21[1

]

Frequency Table

Prob

abilit

y of

Cat

egor

y

Novel addition*

Near Far

0.0

0.2

0.4

0.6

0.8

0.0

0.2

0.4

0.6

0.8

0.0

0.2

0.4

0.6

0.8

New

Category

Observed

Other O

ld

[1]

1[1]

11[1

]

111[

1]

2[1]

21[1

]

[1]

1[1]

11[1

]

111[

1]

2[1]

21[1

]

Frequency Table

Prob

abilit

y of

Cat

egor

y

* I’m hiding the [1] condition (ask me why!)

Expt 1 Expt 2 Expt 3 Expt 4

r = 0

● ● ●●●●● ● ● ●●●● ●●● ●● ●● ●●● ● ●●● ●●

r = 0.21●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

r = 0.98● ● ●●●●

●●

●●

●●

●●

●●

r = 0.99●

● ●●●●

●●●

●●

●●

r = 0.96●

● ●●●●

●●

●●

●●

●●

●●

●●

●●

●●

r = 0.98● ● ●●●●

●●●

●●

●●

●●

r = 0.98● ● ●●●

●●

●●

●●

●●

●●

r = 0

●●●●●● ●●●●● ●●●●● ●●● ●●●● ●● ●●● ●●●●

r = 0

●●●●●● ●●●●● ●●●●● ●●● ●●●● ●● ●●● ●●●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●

r = 0.97

●●●●●●●●●●●

●●●●●●●●

●●●●

●●

●●●

●●

●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●

●●●

●●●

●●

●●

r = 0.88●●●●

●●

●●●●●

●●● ●● ●

● ●● ●● ●

r = 0.88●●●●●●

●●●●●●

●●● ●● ●

● ●● ●● ●

r = 0.87●●●

●●

●●●

●●

●●

r = 0.9●●●

●●

●●

●●●

●●

●●

●●

r = 0.94●

●●

●●

r = 0.91●

●●

●●

r = 0.96●●●

●●●

●●

●●

●●●

●●

●●

r = 0.9

● ●●●● ●● ●●●

● ●●●● ●●●●●

●●●

●●

●●

●●

r = 0.58

●●

●●

●●●

● ●●

●●

●●●

●●●

●●

●●

r = 0.74

●●●

● ●●

●● ●●

●● ●●●●●

●●●

●●

r = 0.76

●●●

● ●●

●● ●●

●● ●●●●●

●●●

●●

r = 0.94

●●

●●

●●●

●●

●●●●

r = 0.93

●●

●●●

●●

●●

r = 0.95

●●

●●

●●

●●

●●●

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

uniformC

RP

TTRG−C

RP

H−C

RP

H−TTR

HG−C

RP

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

Human

Mod

el

Exp 3 Exp 4

x

CRP performs poorly

Parameters estimated from Exp 3 and fixed for Exp 4

Expt 1 Expt 2 Expt 3 Expt 4

r = 0

● ● ●●●●● ● ● ●●●● ●●● ●● ●● ●●● ● ●●● ●●

r = 0.21●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

r = 0.98● ● ●●●●

●●

●●

●●

●●

●●

r = 0.99●

● ●●●●

●●●

●●

●●

r = 0.96●

● ●●●●

●●

●●

●●

●●

●●

●●

●●

●●

r = 0.98● ● ●●●●

●●●

●●

●●

●●

r = 0.98● ● ●●●

●●

●●

●●

●●

●●

r = 0

●●●●●● ●●●●● ●●●●● ●●● ●●●● ●● ●●● ●●●●

r = 0

●●●●●● ●●●●● ●●●●● ●●● ●●●● ●● ●●● ●●●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●

r = 0.97

●●●●●●●●●●●

●●●●●●●●

●●●●

●●

●●●

●●

●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●

●●●

●●●

●●

●●

r = 0.88●●●●

●●

●●●●●

●●● ●● ●

● ●● ●● ●

r = 0.88●●●●●●

●●●●●●

●●● ●● ●

● ●● ●● ●

r = 0.87●●●

●●

●●●

●●

●●

r = 0.9●●●

●●

●●

●●●

●●

●●

●●

r = 0.94●

●●

●●

r = 0.91●

●●

●●

r = 0.96●●●

●●●

●●

●●

●●●

●●

●●

r = 0.9

● ●●●● ●● ●●●

● ●●●● ●●●●●

●●●

●●

●●

●●

r = 0.58

●●

●●

●●●

● ●●

●●

●●●

●●●

●●

●●

r = 0.74

●●●

● ●●

●● ●●

●● ●●●●●

●●●

●●

r = 0.76

●●●

● ●●

●● ●●

●● ●●●●●

●●●

●●

r = 0.94

●●

●●

●●●

●●

●●●●

r = 0.93

●●

●●●

●●

●●

r = 0.95

●●

●●

●●

●●

●●●

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

uniformC

RP

TTRG−C

RP

H−C

RP

H−TTR

HG−C

RP

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

Human

Mod

el

Exp 3 Exp 4

x

G-CRP model does slightly better

Parameters estimated from Exp 3 and fixed for Exp 4

Expt 1 Expt 2 Expt 3 Expt 4

r = 0

● ● ●●●●● ● ● ●●●● ●●● ●● ●● ●●● ● ●●● ●●

r = 0.21●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

r = 0.98● ● ●●●●

●●

●●

●●

●●

●●

r = 0.99●

● ●●●●

●●●

●●

●●

r = 0.96●

● ●●●●

●●

●●

●●

●●

●●

●●

●●

●●

r = 0.98● ● ●●●●

●●●

●●

●●

●●

r = 0.98● ● ●●●

●●

●●

●●

●●

●●

r = 0

●●●●●● ●●●●● ●●●●● ●●● ●●●● ●● ●●● ●●●●

r = 0

●●●●●● ●●●●● ●●●●● ●●● ●●●● ●● ●●● ●●●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●

r = 0.97

●●●●●●●●●●●

●●●●●●●●

●●●●

●●

●●●

●●

●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●

●●●

●●●

●●

●●

r = 0.88●●●●

●●

●●●●●

●●● ●● ●

● ●● ●● ●

r = 0.88●●●●●●

●●●●●●

●●● ●● ●

● ●● ●● ●

r = 0.87●●●

●●

●●●

●●

●●

r = 0.9●●●

●●

●●

●●●

●●

●●

●●

r = 0.94●

●●

●●

r = 0.91●

●●

●●

r = 0.96●●●

●●●

●●

●●

●●●

●●

●●

r = 0.9

● ●●●● ●● ●●●

● ●●●● ●●●●●

●●●

●●

●●

●●

r = 0.58

●●

●●

●●●

● ●●

●●

●●●

●●●

●●

●●

r = 0.74

●●●

● ●●

●● ●●

●● ●●●●●

●●●

●●

r = 0.76

●●●

● ●●

●● ●●

●● ●●●●●

●●●

●●

r = 0.94

●●

●●

●●●

●●

●●●●

r = 0.93

●●

●●●

●●

●●

r = 0.95

●●

●●

●●

●●

●●●

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00uniform

CR

PTTR

G−C

RP

H−C

RP

H−TTR

HG−C

RP

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

Human

Mod

el

Exp 3 Exp 4

HG-CRP model is easily the best

Parameters estimated from Exp 3 and fixed for Exp 4

Summary

Expt 1 Expt 2 Expt 3 Expt 4

r = 0

● ● ●●●●● ● ● ●●●● ●●● ●● ●● ●●● ● ●●● ●●

r = 0.21●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

r = 0.98● ● ●●●●

●●

●●

●●

●●

●●

r = 0.99●

● ●●●●

●●●

●●

●●

r = 0.96●

● ●●●●

●●

●●

●●

●●

●●

●●

●●

●●

r = 0.98● ● ●●●●

●●●

●●

●●

●●

r = 0.98● ● ●●●

●●

●●

●●

●●

●●

r = 0

●●●●●● ●●●●● ●●●●● ●●● ●●●● ●● ●●● ●●●●

r = 0

●●●●●● ●●●●● ●●●●● ●●● ●●●● ●● ●●● ●●●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●

r = 0.97

●●●●●●●●●●●

●●●●●●●●

●●●●

●●

●●●

●●

●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●

●●●

●●●

●●

●●

r = 0.88●●●●

●●

●●●●●

●●● ●● ●

● ●● ●● ●

r = 0.88●●●●●●

●●●●●●

●●● ●● ●

● ●● ●● ●

r = 0.87●●●

●●

●●●

●●

●●

r = 0.9●●●

●●

●●

●●●

●●

●●

●●

r = 0.94●

●●

●●

r = 0.91●

●●

●●

r = 0.96●●●

●●●

●●

●●

●●●

●●

●●

r = 0.9

● ●●●● ●● ●●●

● ●●●● ●●●●●

●●●

●●

●●

●●

r = 0.58

●●

●●

●●●

● ●●

●●

●●●

●●●

●●

●●

r = 0.74

●●●

● ●●

●● ●●

●● ●●●●●

●●●

●●

r = 0.76

●●●

● ●●

●● ●●

●● ●●●●●

●●●

●●

r = 0.94

●●

●●

●●●

●●

●●●●

r = 0.93

●●

●●●

●●

●●

r = 0.95

●●

●●

●●

●●

●●●

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

uniformC

RP

TTRG−C

RP

H−C

RP

H−TTR

HG−C

RP

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

Human

Mod

el

4 experiments, 20+ models, 100+ experimental conditions,

1000+ participants later…

Expt 1 Expt 2 Expt 3 Expt 4

r = 0

● ● ●●●●● ● ● ●●●● ●●● ●● ●● ●●● ● ●●● ●●

r = 0.21●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

r = 0.98● ● ●●●●

●●

●●

●●

●●

●●

r = 0.99●

● ●●●●

●●●

●●

●●

r = 0.96●

● ●●●●

●●

●●

●●

●●

●●

●●

●●

●●

r = 0.98● ● ●●●●

●●●

●●

●●

●●

r = 0.98● ● ●●●

●●

●●

●●

●●

●●

r = 0

●●●●●● ●●●●● ●●●●● ●●● ●●●● ●● ●●● ●●●●

r = 0

●●●●●● ●●●●● ●●●●● ●●● ●●●● ●● ●●● ●●●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●

r = 0.97

●●●●●●●●●●●

●●●●●●●●

●●●●

●●

●●●

●●

●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●

r = 0.99

●●●●●●

●●●●●

●●●●●

●●

●●●

●●●

●●

●●

r = 0.88●●●●

●●

●●●●●

●●● ●● ●

● ●● ●● ●

r = 0.88●●●●●●

●●●●●●

●●● ●● ●

● ●● ●● ●

r = 0.87●●●

●●

●●●

●●

●●

r = 0.9●●●

●●

●●

●●●

●●

●●

●●

r = 0.94●

●●

●●

r = 0.91●

●●

●●

r = 0.96●●●

●●●

●●

●●

●●●

●●

●●

r = 0.9

● ●●●● ●● ●●●

● ●●●● ●●●●●

●●●

●●

●●

●●

r = 0.58

●●

●●

●●●

● ●●

●●

●●●

●●●

●●

●●

r = 0.74

●●●

● ●●

●● ●●

●● ●●●●●

●●●

●●

r = 0.76

●●●

● ●●

●● ●●

●● ●●●●●

●●●

●●

r = 0.94

●●

●●

●●●

●●

●●●●

r = 0.93

●●

●●●

●●

●●

r = 0.95

●●

●●

●●

●●

●●●

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

uniformC

RP

TTRG−C

RP

H−C

RP

H−TTR

HG−C

RP

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

Human

Mod

el

… a model you’ve never heard of is the winner!

A better summary

How do I know this device needs a new label?

How similar is it to familiar things?

How often do I tend to run into novel categories?

How often do I encounter familiar categories?

What does the distribution of objects across categories tell me?

Similarity effects

Novel addition effectsFamiliar addition effects

Transfer effects

A theory of novelty detection needs to accommodate all these things

HG-CRP works because it also learns “what kind of world is this?” when asked

“is this novel?”

Observations

Unknown distribution over many possible categories

Structure of the world

Thanks

Individual differences (E2)

Replication check:

Why does the transfer effect exist?Learning distributional shape on the fly

0.0 0.2 0.4 0.6 0.8 1.0

Discount parameter, alpha

n = (2,2,2,2,2,2)n = (7,1,1,1,1,1)