Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. ·...

80
Vision and visual neuroscience II Thomas Serre & Tomaso Poggio McGovern Institute for Brain Research Department of Brain & Cognitive Sciences Massachusetts Institute of Technology

Transcript of Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. ·...

Page 1: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Vision and visual neuroscience IIThomas Serre & Tomaso Poggio

McGovern Institute for Brain ResearchDepartment of Brain & Cognitive SciencesMassachusetts Institute of Technology

Page 2: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Past lecture

Problem of visual recognition and visual cortex

Historical background

Neurons and areas in the visual system

Feedforward hierarchical models

Page 3: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Hierarchical anatomical organization

Felleman & van Essen 1991

Page 4: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

source: Jim DiCarlo

Object recognition in the visual cortex

Page 5: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Ventral visual stream

source: Jim DiCarlo

Object recognition in the visual cortex

Page 6: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Ventral visual stream

source: Jim DiCarlo

Object recognition in the visual cortex

Hierarchical architecture:

Page 7: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Ventral visual stream

source: Jim DiCarlo

Object recognition in the visual cortex

Hierarchical architecture:Latencies

Page 8: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Ventral visual stream

source: Jim DiCarlo

Object recognition in the visual cortex

Hierarchical architecture:LatenciesAnatomy

Page 9: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Ventral visual stream

source: Jim DiCarlo

Object recognition in the visual cortex

Hierarchical architecture:LatenciesAnatomyFunction

Page 10: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Object recognition in the visual cortex

Hubel & Wiesel 1959, 1962, 1965, 1968

Nobel prize 1981

simplecells

complexcells

Page 11: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Object recognition in the visual cortex

Kobatake & Tanaka 1994

see also Oram & Perrett 1993; Sheinberg & Logothetis 1996; Gallant et al 1996; Riesenhuber & Poggio 1999

gradual increase in complexity of preferred stimulus

Page 12: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Object recognition in the visual cortex

see also Oram & Perrett 1993; Sheinberg & Logothetis 1996; Gallant et al 1996; Riesenhuber & Poggio 1999

Parallel increase in invariance properties (position and scale)

of neuronsKobatake & Tanaka 1994

Page 13: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Rapid recognition: monkey electrophysiology

Hung* Kreiman* Poggio & DiCarlo 2005

Robust invariant readout of category information from small population of neurons

Single spikes after response onset carry most of the information

Page 14: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Thorpe et al ‘96

Rapid recognition: human behavior

Page 15: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Computational considerations

Simple units Complex units

Template matching Gaussian-like tuning

~ “AND”

Riesenhuber & Poggio 1999 (building on Fukushima 1980 and Hubel & Wiesel 1962)

Invariance max-like operation

~”OR”

Page 16: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Animal

vs.

non-animal

Complex cells

Tuning

Simple cells

MAX

Main routes

Bypass routes

PG

Co

rte

x

Ro

str

al S

TS

Prefrontal

Cortex

STP

DP VIP LIP 7a PP FST

PO V3A MT

TPO PGa IPa

V3

V4

PIT TF

TG 36 35

LIP

,VIP

,DP,7

a

V2

,V3

,V4

,MT,M

ST

PIT

, A

IT

AIT

,36

,35

MSTc

}V1

PG

TE

46 8 45 1211,

13

TEa TEm

AIT

V2

V1

dorsal stream

'where' pathway

ventral stream

'what' pathway

MSTp

C1

S1

S2

S3

S2b

C2

classification

units

0.2 - 1.1o

0.4 - 1.6o

0.6 - 2.4o

1.1 - 3.0o

0.9 - 4.4o

1.2 - 3.2

o

o

o

o

o

oo

Model

layers

RF sizes

S4 7o

Num.

units

C2b 7o

C3 7o

10 6

104

107

105

104

107

100

102

103

103

Incre

ase

in

co

mp

lexity (

nu

mb

er

of

su

bu

nits),

RF

siz

e a

nd

in

va

ria

nce

Un

su

pe

rvis

ed

ta

sk-in

de

pe

nd

en

t le

arn

ing

Superv

ised

task-d

ependent le

arn

ing

(Riesenhuber & Poggio 1999 2000; Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005; Serre Oliva & Poggio 2007)

✦V1:•Simple and complex cells tuning properties

(Schiller et al 1976; Hubel & Wiesel 1965; Devalois et al 1982)

•MAX operation in subset of complex cells (Lampl et al 2004)

✦V4:•Tuning for two-bar stimuli (Reynolds Chelazzi

& Desimone 1999)

•MAX operation (Gawne et al 2002)

• Two-spot interaction (Freiwald et al 2005)

• Tuning for boundary conformation (Pasupathy & Connor 2001)

• Tuning for Cartesian and non-Cartesian gratings (Gallant et al 1996)

✦IT:•Tuning and invariance properties (Logothetis

et al 1995)

•Differential role of IT and PFC in categorization (Freedman et al 2001 2002 2003)

•Read out data (Hung Kreiman Poggio & DiCarlo 2005)

•Average effect in IT (Zoccolan Cox & DiCarlo 2005; Zoccolan Kouh Poggio & DiCarlo in press)

✦Human behavior:•Rapid animal categorization (Serre Oliva

Poggio 2007)

Page 17: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

This lecture

Page 18: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

This lecture

1.Learning a loose hierarchy of image fragmentsThe algorithm

Recognition in the real-world

Page 19: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

This lecture

1.Learning a loose hierarchy of image fragmentsThe algorithm

Recognition in the real-world

2.Rapid recognition and feedforward processing:Predicting human performance

“Clutter problem”

Page 20: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

This lecture

1.Learning a loose hierarchy of image fragmentsThe algorithm

Recognition in the real-world

2.Rapid recognition and feedforward processing:Predicting human performance

“Clutter problem”

3.Beyond feedforward processing:Top-down cortical feedback and attention to solve the “clutter problem”

Predicting human eye movements

Page 21: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

This lecture

1.Learning a loose hierarchy of image fragmentsThe algorithm

Recognition in the real-world

2.Rapid recognition and feedforward processing:Predicting human performance

“Clutter problem”

3.Beyond feedforward processing:Top-down cortical feedback and attention to solve the “clutter problem”

Predicting human eye movements

Page 22: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Gabor filters

Parameters fit to V1 data (Serre & Riesenhuber 2004)

17 spatial frequencies (=scales)

4 orientations

Animalvs.

non-animal

C1S1

S2

S3S2bC2

classif.units

S4

C2b

C3S1 units

Page 23: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Animalvs.

non-animal

C1S1

S2

S3S2bC2

classif.units

S4

C2b

C3C1 units

Increase in tolerance to position (and in RF size)

Local max over pool of S1 cells

C1

S1

Page 24: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Animalvs.

non-animal

C1S1

S2

S3S2bC2

classif.units

S4

C2b

C3C1 units

Increase in tolerance to scale

C1 Local max over pool of S1 cells

Page 25: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Receptive field sizesModel Cortex References

simple cells 0.2o ! 1.1o " 0.1o ! 1.0o [Schiller et al., 1976e;Hubel and Wiesel, 1965]

complex cells 0.4o ! 1.6o " 0.2o ! 2.0o

Peak frequencies (cycles /deg)Model Cortex References

simple cells range: 1.6 ! 9.8 bulk " 1.0 ! 4.0 [DeValois et al., 1982a])mean/med: 3.7/ 2.8 mean: " 2.2

range: " 0.5 ! 8.0complex cells range: 1.8 ! 7.8 bulk " 2.0 ! 5.6

mean/med: 3.9/ 3.2 mean: 3.2range " 0.5 ! 8.0

Frequency bandwidth at 50% amplitude (cycles / deg)Model Cortex References

simple cells range: 1.1 ! 1.8 bulk " 1.0 ! 1.5 [DeValois et al., 1982a]med: " 1.45 med: " 1.45

range " 0.4 ! 2.6complex cells range: 1.5 ! 2.0 bulk " 1.0 ! 2.0

med: 1.6 med: 1.6range " 0.4 ! 2.6

Frequency bandwidth at 71% amplitude (index)Model Cortex References

simple cells range: 44 ! 58 bulk " 40 ! 70 [Schiller et al., 1976d]med: 55

complex cells range 40 ! 50 bulk " 40 ! 60med. 48

Orientation bandwidth at 50% amplitude (octaves)Model Cortex References

simple cells range: 38o ! 49o — [DeValois et al., 1982b]med: 44o

complex cells range: 27o ! 33o bulk " 20o ! 90omed: 43o med: 44o

Orientation bandwidth at 71% amplitude (octaves)Model Cortex References

simple cells range: 27o ! 33o bulk " 20o ! 70o [Schiller et al., 1976c]med: 30o

complex cells range: 27o ! 33o bulk " 20o ! 90omed: 31o

Serre & Riesenhuber 2004

50 0 500

0.2

0.4

0.6

0.8

1

orientation (in degree)

resp

onse

optimal baredgegrating

Page 26: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Animalvs.

non-animal

C1S1

S2

S3S2bC2

classif.units

S4

C2b

C3S2 units

Features of moderate complexity (n~1,000 types)

Combination of V1-like complex units at different orientations

Synaptic weights w learned from natural images

5-10 subunits chosen at random from all possible afferents (~100-1,000)

Page 27: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Animalvs.

non-animal

C1S1

S2

S3S2bC2

classif.units

S4

C2b

C3S2 units

stronger facilitation

stronger suppression

homogenous fields

cross-orientation

fields

Page 28: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Nature Neuroscience - 10, 1313 - 1321 (2007) / Published online: 16 September 2007 | doi:10.1038/nn1975

Neurons in monkey visual area V2 encode combinations of orientationsAkiyuki Anzai, Xinmiao Peng & David C Van Essen

a b c

d e f0 2

0

–1

–1–1

–2–0.5

–0.52–1–

–2

–0.5

–1.0

–1–2

1

2

0 2

0

–1

–2

–2

1

2V2

24 spikes per s

0 1

0

0.5

1.0 V2

11 spikes per s

V2

18 spikes per s

0 0.5

0

0.5

V1

32 spikes per s

0 2

0

1

2 V2

14 spikes per s

0 1

0

1

y (°

)

V2

16 spikes per s

Page 29: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Animalvs.

non-animal

C1S1

S2

S3S2bC2

classif.units

S4

C2b

C3C2 units

Same selectivity as S2 units but increased tolerance to position and size of preferred stimulus

Local pooling over S2 units with same selectivity but slightly different positions and scales

S2 units in V2 and C2 in V4?

Page 30: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Beyond C2 units

Units increasingly complex and invariantS3/C3 units:

Combination of V4-like units with different selectivitiesDictionary of ~1,000 features = num. columns in IT (Fujita 1992)

S4 units:View-tuned units (imprinted with part of the training set, e.g. animal and non-animal images but still unsupervised)Tuning and invariance properties agrees with IT data (Logothetis Pauls & Poggio 1995)

Animalvs.

non-animal

C1S1

S2

S3S2bC2

classif.units

S4

C2b

C3

Page 31: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Idea 1: Built-in invariance to 2D transformations (rotation and scale)

Idea 2: Generic features shared between multiple categories

Overall reduce “sample complexity” and reduces number of training examples needed to learn a task .

So why hierarchies?

Page 32: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Task-specific = categorization circuitsAnimal

vs.

non-animal

Complex cells

Tuning

Simple cells

MAX

Main routes

Bypass routes

Prefrontal

Cortex

V4

PIT

35

PIT

, A

IT

AIT

,36

,35

V1

PG

TE

45 1211,

13

AIT

V2

V1

dorsal stream

'where' pathway

ventral stream

'what' pathway

C1

S1

S2

S3

S2b

C2

classification

units

0.2 - 1.1o

0.4 - 1.6o

0.6 - 2.4o

1.1 - 3.0o

0.9 - 4.4o

1.2 - 3.2

o

o

o

o

o

oo

Model

layers

RF sizes

S4 7o

Num.

units

C2b 7o

C3 7o

10 6

104

107

105

104

107

100

102

103

103

Incre

ase

in

co

mp

lexity (

nu

mb

er

of

su

bu

nits),

RF

siz

e a

nd

in

va

ria

nce

Un

su

pe

rvis

ed

ta

sk-in

de

pe

nd

en

t le

arn

ing

Superv

ised

task-d

ependent le

arn

ing

V1

V2

V4

PIT

AIT

PFC

features of increasing complexity and tolerance to position and scale

view-based object representation but tolerant position, scale and small rotations

Page 33: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Animal

vs.

non-animal

Complex cells

Tuning

Simple cells

MAX

Main routes

Bypass routes

Prefrontal

Cortex

V4

PIT

35

PIT

, A

IT

AIT

,36

,35

V1

PG

TE

45 1211,

13

AIT

V2

V1

dorsal stream

'where' pathway

ventral stream

'what' pathway

C1

S1

S2

S3

S2b

C2

classification

units

0.2 - 1.1o

0.4 - 1.6o

0.6 - 2.4o

1.1 - 3.0o

0.9 - 4.4o

1.2 - 3.2

o

o

o

o

o

oo

Model

layers

RF sizes

S4 7o

Num.

units

C2b 7o

C3 7o

10 6

104

107

105

104

107

100

102

103

103

Incre

ase

in

co

mp

lexity (

nu

mb

er

of

su

bu

nits),

RF

siz

e a

nd

in

va

ria

nce

Un

su

pe

rvis

ed

ta

sk-in

de

pe

nd

en

t le

arn

ing

Superv

ised

task-d

ependent le

arn

ing

V1

V2

V4

PIT

AIT

PFC Evidence for adult plasticity

very likely

likely

limited evidence

Page 34: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

supervised learning from a handful of training examples ~ linear perceptron

Animal

vs.

non-animal

Complex cells

Tuning

Simple cells

MAX

Main routes

Bypass routes

Prefrontal

Cortex

V4

PIT

35

PIT

, A

IT

AIT

,36

,35

V1

PG

TE

45 1211,

13

AIT

V2

V1

dorsal stream

'where' pathway

ventral stream

'what' pathway

C1

S1

S2

S3

S2b

C2

classification

units

0.2 - 1.1o

0.4 - 1.6o

0.6 - 2.4o

1.1 - 3.0o

0.9 - 4.4o

1.2 - 3.2

o

o

o

o

o

oo

Model

layers

RF sizes

S4 7o

Num.

units

C2b 7o

C3 7o

10 6

104

107

105

104

107

100

102

103

103

Incre

ase

in

co

mp

lexity (

nu

mb

er

of

su

bu

nits),

RF

siz

e a

nd

in

va

ria

nce

Un

su

pe

rvis

ed

ta

sk-in

de

pe

nd

en

t le

arn

ing

Superv

ised

task-d

ependent le

arn

ing

V1

V2

V4

PIT

AIT

PFC

unsupervised developmental-like learning stage

Page 35: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Columns in the cortex

Page 36: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Layers of the model are organized in columns

Each model unit is equivalent to ~100 IF (~1 column of cortex)

Each hypercolumn contains the same basic dictionary of features and is replicated at all positions and scales

..

..

. .... ...

..

.. .

.. ..

. ...

. ..

. .. .

.

.......

Page 37: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Learning is sequential

Start with layer S2/C2 then S2b/C2b and S3/C3

Pick one unit in layer Sk

Select random set of inputs from retinotopically organized afferents

Sk

Ck-1

w1

w2 w3

Page 38: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Sk

Ck-1

w1

w2 w3

x1xk

xp

x j

x2 x3

y

w=xImprint with random patch of

natural image

Page 39: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Sk

Ck-1

w1

w2 w3

x1xk

xp

x j

x2 x3

y = exp !1

2!2

n

j =1

(wj ! x j )2[ ]y

Page 40: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

...

✦ We learn ~1,000 units this way and then move to the next layer✦ Learning follows a long tradition of researchers who have argued that the visual system may be adapted to the statistics of the natural environment (Attneave 1954; Barlow 1961; Atick 1992; Ruderman 1994; Simoncelli & Olshausen 2001)

✦Here we assume the input image moves (shifting and looming) so that the selectivity of the imprinted units gets replicated at all positions and scales

...

.. .

.

Page 41: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Learning invariancesw| T. Masquelier & S. Thorpe

(CNRS, France)

see also (Foldiak 1991; Perrett et al 1984; Wallis & Rolls, 1997; Einhauser et al 2002; Wiskott & Sejnowski 2002; Spratling 2005)

✦ Simple cells learn correlation in space (at the same time)

✦ Complex cells learn correlation in time

movie courtesy of Wolfgang Einhauser

Page 42: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Learning invariancesw| T. Masquelier & S. Thorpe

(CNRS, France)

see also (Foldiak 1991; Perrett et al 1984; Wallis & Rolls, 1997; Einhauser et al 2002; Wiskott & Sejnowski 2002; Spratling 2005)

✦ Simple cells learn correlation in space (at the same time)

✦ Complex cells learn correlation in time

movie courtesy of Wolfgang Einhauser

S1 units

C1 unit

Page 43: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Learning a dictionary of shape-components in the visual cortex

Learning frequent image features during development

Object categories share reusable features

Large redundant vocabulary for implicit geometry

Page 44: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Learning a dictionary of shape-components in the visual cortex

Learning frequent image features during development

Object categories share reusable features

Large redundant vocabulary for implicit geometry

Page 45: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Learning a dictionary of shape-components in the visual cortex

Learning frequent image features during development

Object categories share reusable features

Large redundant vocabulary for implicit geometry

V1

IT

Page 46: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Learning a dictionary of shape-components in the visual cortex

Learning frequent image features during development

Object categories share reusable features

Large redundant vocabulary for implicit geometry

V1

IT

Page 47: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Learning a dictionary of shape-components in visual cortex

“critical” feature columns in IT

(Tanaka, 1996)

Page 48: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Learning a dictionary of shape-components in visual cortex

“critical” feature columns in IT

(Tanaka, 1996)

✦ Pre-attentive processing:• “Loose collection of basic features” (Wolfe & Bennett 1997)

• “Unbound features” (Treisman et al)

Page 49: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Learning a dictionary of shape-components in visual cortex

“critical” feature columns in IT

(Tanaka, 1996)

✦ Pre-attentive processing:• “Loose collection of basic features” (Wolfe & Bennett 1997)

• “Unbound features” (Treisman et al)

✦ Computer vision:• Component-based > holistic representation (Perona

et al 1995, 1996, 2000; Heisele Serre & Poggio 2001, 2002)

• Features of intermediate complexity are optimal (Ullman, 2002)

• Bag of features (Csurka et al 2004; Sivic et al 2005; Sudderth et al 2005)

Page 50: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

C2 vs. IT neurons

Model data: Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005 Experimental data: Hung* Kreiman* Poggio & DiCarlo 2005

TRAIN

TEST

3.4ocenter

Size:Position:

3.4ocenter

1.7ocenter

6.8ocenter

3.4o2o horz.

3.4o4o horz.

0

0.2

0.4

0.6

0.8

1

Cla

ssifi

catio

n pe

rform

ance

IT Model

Page 51: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Application to computer vision

Page 52: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Bio-motivated computer vision

Computer vision system based on the

response properties of neurons in the ventral stream of the visual

cortex

Serre Wolf & Poggio 2005; Wolf & Bileschi 2006; Serre et al 2007

Scene parsing and object recognition

Page 53: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Bio-motivated computer vision

Jhuang Serre Wolf & Poggio 2007

Action recognition in video sequences motion-sensitive MT-like units

Page 54: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Bio-motivated computer vision

Jhuang Serre Wolf & Poggio 2007

Action recognition in video sequences motion-sensitive MT-like units

Page 55: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Recognition accuracy

Dollar et al ‘05

model chance

KTH Human 81.3% 91.6% 16.7%

Weiz. Human 86.7% 96.3% 11.1%

UCSD Mice 75.6% 79.0% 20.0%

★ Cross-validation: 2/3 training, 1/3 testing, 10 repeats Jhuang Serre Wolf & Poggio ICCV’07

Page 56: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Automatic recognition of rodent behavior

Serre Jhuang Garrote Poggio Steele in prep

Page 57: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Automatic recognition of rodent behavior

human agreement

72%

proposed system

71%

commercial system

56%

chance 12%

Performance

Serre Jhuang Garrote Poggio Steele in prep

Page 58: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

This lecture

1.Learning a loose hierarchy of image fragmentsThe algorithm

Recognition in the real-world

2.Rapid recognition and feedforward processing:Predicting human performance

“Clutter problem”

3.Beyond feedforward processing:Top-down cortical feedback and attention to solve the “clutter problem”

Predicting human eye movements

Page 59: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

This lecture

1.Learning a loose hierarchy of image fragmentsThe algorithm

Recognition in the real-world

2.Rapid recognition and feedforward processing:Predicting human performance

“Clutter problem”

3.Beyond feedforward processing:Top-down cortical feedback and attention to solve the “clutter problem”

Predicting human eye movements

Page 60: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Database collected by Torralba & Oliva (2003)

Head Close-body Medium-body Far-body

Animals

Natural

distractors

Artificial

distractors

Page 61: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Head Close-

body

Far-

body

Medium-

body

1.0

1.4

2.6

2.4

1.8

Pe

rfo

rma

nce

(d

')

Model (82% correct)

Serre Oliva Poggio 2007

Page 62: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

High performance (~90%) when

maximal amount of information present

in the absence of clutter

Performance decreases (~74%) with increasing amount of clutter

Limitation of feedforward model compatible with decrease in response in V4 (Reynolds

Chelazzi & Desimone 1999) and IT in the presence of clutter (Zoccolan, Cox, DiCarlo, 2005; Zoccolan, Kouh, Poggio, DiCarlo, in sub; Rolls, Aggelopoulos, Zheng, 2003)

“Clutter effect”

Head Close-

body

Far-

body

Medium-

body

1.0

1.4

2.6

2.4

1.8

Perf

orm

ance (

d')

Model (82% correct)

Page 63: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Animal presentor not ?

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask1/f noise

80 ms

(Thorpe et al 1996; Van Rullen & Koch 2003; Bacon-Mace et al 2005)

Page 64: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Same effect for human observers!

Head Close-

body

Far-

body

Medium-

body

1.0

1.4

2.6

2.4

1.8

Pe

rfo

rma

nce

(d

')

Model (82% correct)

Human observers (80% correct)

Serre Oliva Poggio 2007

(n=24)

Page 65: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Image-by-image correlation:

Heads: ρ=0.71

Close-body: ρ=0.84

Medium-body: ρ=0.71

Far-body: ρ=0.60

Model predicts level of performance on rotated images (90 deg and inversion)

Further comparisons

Serre Oliva Poggio 2007

Page 66: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Show matlab demo

Page 67: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

This lecture

1.Learning a loose hierarchy of image fragmentsThe algorithm

Recognition in the real-world

2.Rapid recognition and feedforward processing:Predicting human performance

“Clutter problem”

3.Beyond feedforward processing:Top-down cortical feedback and attention to solve the “clutter problem”

Predicting human eye movements

Page 68: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

This lecture

1.Learning a loose hierarchy of image fragmentsThe algorithm

Recognition in the real-world

2.Rapid recognition and feedforward processing:Predicting human performance

“Clutter problem”

3.Beyond feedforward processing:Top-down cortical feedback and attention to solve the “clutter problem”

Predicting human eye movements

Page 69: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Spatial attention solves the “clutter problem”

see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980; Duncan & Desimone 1995; Wolfe, 1997;and many others

Problem: How to know where to attend?

Page 70: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Spatial attention solves the “clutter problem”

see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980; Duncan & Desimone 1995; Wolfe, 1997;and many others

Problem: How to know where to attend?

foreground

Page 71: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Spatial attention solves the “clutter problem”

see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980; Duncan & Desimone 1995; Wolfe, 1997;and many others

Problem: How to know where to attend?

foreground

background

Page 72: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Spatial attention solves the “clutter problem”

see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980; Duncan & Desimone 1995; Wolfe, 1997;and many others

XXXX

Problem: How to know where to attend?

foreground

background

Page 73: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980; Duncan & Desimone 1995; Wolfe, 1997;and many others

Science 22 April 2005:Vol. 308. no. 5721, pp. 529 - 534

Parallel and Serial Neural Mechanisms for Visual Search in Macaque Area V4

Narcisse P. Bichot, Andrew F. Rossi, Robert Desimone

XXXX

Spatial attention solves the “clutter problem”

Page 74: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980; Duncan & Desimone 1995; Wolfe, 1997;and many others

Science 22 April 2005:Vol. 308. no. 5721, pp. 529 - 534

Parallel and Serial Neural Mechanisms for Visual Search in Macaque Area V4

Narcisse P. Bichot, Andrew F. Rossi, Robert Desimone

XXXX

Spatial attention solves the “clutter problem”

Answer: Parallel feature-based attention

Page 75: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

XXXX

Parallel feature-based attention modulation

0 100 200 0 100 200

time from fixation (ms)

0norm

aliz

ed s

pike

act

ivity

1

2

Page 76: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

the preferred feature was cued (22). Neuronsresponded better to their preferred featurein the RF compared to nonpreferred features(Fig. 3, A and B) (color, P G 0.01; shape, P G0.001). In the key test, we found that responseswere enhanced if the distracter in the RF wasof the neuron’s preferred color and it was alsothe same color (but, by design, not the sameshape) as the color-shape conjunction target(Fig. 3A) (P 0 0.002). In other words, thedistracter shared in the bias for the targetstimulus if it shared one of its features, con-sistent with the predictions of parallel searchmodels. The median enhancement was 8%,with more than 86% of the neurons having alarger response when the RF stimulus shareda feature with the searched-for target (chi-square, P G 0.005). There was also an en-hancement of the response when the shape ofthe distracter matched the shape of the color-shape conjunction target, consistent with paral-lel models, but this enhancement was smallerand developed later than the color-related en-hancement (Fig. 3, A and B). When the RFdistracter was of the preferred feature, shape-

related enhancement was not significant inthe same time interval as that used in thefeature search task, but it became significantÈ150 ms after fixation onset (P 0 0.035).This is consistent with the behavioral evi-dence described above, that the monkey usedthe color feature more than the shape featurein guiding its search to the color-shape con-junction target (fig. S2B). The LFP magni-tude (Fig. 3, C and D) and power were notmodulated by stimulus or cue features in theconjunction task.

There was also significant enhancementof the spike-field coherence in the gammaband when the RF distracter had the neu-ron’s preferred feature and that feature wasin common with the target for either a color(Fig. 3E) (P G 10j5) or shape (Fig. 3F) (P G0.001) match. The enhancement in the lattercase was smaller, again consistent with themonkey’s behavioral bias in favor of usingcolor information. The median enhancementof coherence with a color match was 22%,with 97% of spike-LFP pairs showing an in-crease (chi-square, P G 10j5), and the median

enhancement with a shape match was 17%,with 78% of spike-LFP pairs showing an in-crease (P G 0.002). Thus, the top-down biasin visual search is not limited to cases inwhich the RF stimulus is the search target butinstead applies to any stimulus, even a dis-tracter, that contains a feature relevant to thesearch, consistent with parallel models. It isalso consistent with the results from the featuresearch task, in which we found that enhance-ment occurred for colors that were similar tothe target color. Both results potentially ex-plain why search is often more difficult whenthe distracters share features with the target,as in some forms of conjunction search (8).

Serial selection during search. Finally,although we have emphasized the evidencefor parallel mechanisms in search, the tasknecessarily had a spatial attention (serial)component to it, in that the animals made sev-eral saccades to stimuli in the array whilesearching for the targets. To test for spatial at-tention effects on responses, we compared re-sponses and spike-field synchronization to astimulus in the RF when either it was selectedfor a saccade or the saccade was made to astimulus outside the RF (Fig. 4).

Selecting the RF stimulus for a saccade ledto an enhancement of the neuronal responseacross the population (Fig. 5A) (populationmedian enhancement of 36%, P G 10j5,

RF stimulus istarget of saccade

RF stimulus is nottarget of saccade

SACCADE:

SACCADE:

RF

FIX

vs.

Test for serial (spatial) selection

Fig. 4. Illustration of the saccade enhancementanalysis. We compared neuronal measures whenthe monkey made a saccade to an RF stimulusversus a saccade away from the RF. In this dis-play, fixating the purple cross, for example,brings the green star into the neuron’s RF. Wewould then compare neuronal responses whenthe green star in the RF was the target of thesaccade, to those when the saccade target wasto a stimulus outside the RF, e.g., the orange A.Activity was analyzed from the time the purplecross was fixated to when the next saccadewas initiated.

1

0

0

.15

20

Time from fixation (ms) Time from fixation (ms)

Frequency (Hz) Frequency (Hz)

COLOR EFFECT

-.2

0

.2

A B

C D

E F

Nor

mal

ized

spi

ke a

ctiv

ityN

orm

aliz

ed L

FP

200100 0 200100

SHAPE EFFECT

Spi

ke-f

ield

coh

eren

ce

1006020 10060

Fig. 3. (A to F) Feature-related enhancement of neuronal activity and synchronization duringconjunction search. Conventions are as given in Fig. 2.

R E S E A R C H A R T I C L E S

22 APRIL 2005 VOL 308 SCIENCE www.sciencemag.org532

on

Fe

bru

ary

18

, 2

00

9

ww

w.s

cie

nce

ma

g.o

rgD

ow

nlo

ad

ed

fro

m

2000 100

time from fixation (ms)

0

norm

aliz

ed s

pike

act

ivity

1

2

attend within RF

attend away from RF

Serial spatial attention modulation

XXXX

Page 77: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Attention as Bayesian inference

see also Rao 2005; Lee & Mumford 2003 Chikkerur Serre & Poggio in prep

PFC

IT

V4/PIT

V2

Page 78: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Attention as Bayesian inference

see also Rao 2005; Lee & Mumford 2003 Chikkerur Serre & Poggio in prep

PFC

IT

V4/PIT

V2

feature-basedattention

Page 79: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Attention as Bayesian inference

see also Rao 2005; Lee & Mumford 2003 Chikkerur Serre & Poggio in prep

PFC

IT

V4/PIT

V2

FEF/LIP

spatial attention

feature-basedattention

Page 80: Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. · Animal vs. non-animal Complex cells Tuning Simple cells MAX Main routes Bypass routes

Attention as Bayesian inference

see also Rao 2005; Lee & Mumford 2003 Chikkerur Serre & Poggio in prep

PFC

IT

V4/PIT

V2

FEF/LIP

spatial attention

feature-basedattention

O

Fi

Fli

I

L

location priors

object priors

N