Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. ·...
Transcript of Vision and visual neuroscience II - MIT9.520/spring09/Classes/Class22_ts... · 2010. 1. 22. ·...
Vision and visual neuroscience IIThomas Serre & Tomaso Poggio
McGovern Institute for Brain ResearchDepartment of Brain & Cognitive SciencesMassachusetts Institute of Technology
Past lecture
Problem of visual recognition and visual cortex
Historical background
Neurons and areas in the visual system
Feedforward hierarchical models
Hierarchical anatomical organization
Felleman & van Essen 1991
source: Jim DiCarlo
Object recognition in the visual cortex
Ventral visual stream
source: Jim DiCarlo
Object recognition in the visual cortex
Ventral visual stream
source: Jim DiCarlo
Object recognition in the visual cortex
Hierarchical architecture:
Ventral visual stream
source: Jim DiCarlo
Object recognition in the visual cortex
Hierarchical architecture:Latencies
Ventral visual stream
source: Jim DiCarlo
Object recognition in the visual cortex
Hierarchical architecture:LatenciesAnatomy
Ventral visual stream
source: Jim DiCarlo
Object recognition in the visual cortex
Hierarchical architecture:LatenciesAnatomyFunction
Object recognition in the visual cortex
Hubel & Wiesel 1959, 1962, 1965, 1968
Nobel prize 1981
simplecells
complexcells
Object recognition in the visual cortex
Kobatake & Tanaka 1994
see also Oram & Perrett 1993; Sheinberg & Logothetis 1996; Gallant et al 1996; Riesenhuber & Poggio 1999
gradual increase in complexity of preferred stimulus
Object recognition in the visual cortex
see also Oram & Perrett 1993; Sheinberg & Logothetis 1996; Gallant et al 1996; Riesenhuber & Poggio 1999
Parallel increase in invariance properties (position and scale)
of neuronsKobatake & Tanaka 1994
Rapid recognition: monkey electrophysiology
Hung* Kreiman* Poggio & DiCarlo 2005
Robust invariant readout of category information from small population of neurons
Single spikes after response onset carry most of the information
Thorpe et al ‘96
Rapid recognition: human behavior
Computational considerations
Simple units Complex units
Template matching Gaussian-like tuning
~ “AND”
Riesenhuber & Poggio 1999 (building on Fukushima 1980 and Hubel & Wiesel 1962)
Invariance max-like operation
~”OR”
Animal
vs.
non-animal
Complex cells
Tuning
Simple cells
MAX
Main routes
Bypass routes
PG
Co
rte
x
Ro
str
al S
TS
Prefrontal
Cortex
STP
DP VIP LIP 7a PP FST
PO V3A MT
TPO PGa IPa
V3
V4
PIT TF
TG 36 35
LIP
,VIP
,DP,7
a
V2
,V3
,V4
,MT,M
ST
PIT
, A
IT
AIT
,36
,35
MSTc
}V1
PG
TE
46 8 45 1211,
13
TEa TEm
AIT
V2
V1
dorsal stream
'where' pathway
ventral stream
'what' pathway
MSTp
C1
S1
S2
S3
S2b
C2
classification
units
0.2 - 1.1o
0.4 - 1.6o
0.6 - 2.4o
1.1 - 3.0o
0.9 - 4.4o
1.2 - 3.2
o
o
o
o
o
oo
Model
layers
RF sizes
S4 7o
Num.
units
C2b 7o
C3 7o
10 6
104
107
105
104
107
100
102
103
103
Incre
ase
in
co
mp
lexity (
nu
mb
er
of
su
bu
nits),
RF
siz
e a
nd
in
va
ria
nce
Un
su
pe
rvis
ed
ta
sk-in
de
pe
nd
en
t le
arn
ing
Superv
ised
task-d
ependent le
arn
ing
(Riesenhuber & Poggio 1999 2000; Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005; Serre Oliva & Poggio 2007)
✦V1:•Simple and complex cells tuning properties
(Schiller et al 1976; Hubel & Wiesel 1965; Devalois et al 1982)
•MAX operation in subset of complex cells (Lampl et al 2004)
✦V4:•Tuning for two-bar stimuli (Reynolds Chelazzi
& Desimone 1999)
•MAX operation (Gawne et al 2002)
• Two-spot interaction (Freiwald et al 2005)
• Tuning for boundary conformation (Pasupathy & Connor 2001)
• Tuning for Cartesian and non-Cartesian gratings (Gallant et al 1996)
✦IT:•Tuning and invariance properties (Logothetis
et al 1995)
•Differential role of IT and PFC in categorization (Freedman et al 2001 2002 2003)
•Read out data (Hung Kreiman Poggio & DiCarlo 2005)
•Average effect in IT (Zoccolan Cox & DiCarlo 2005; Zoccolan Kouh Poggio & DiCarlo in press)
✦Human behavior:•Rapid animal categorization (Serre Oliva
Poggio 2007)
This lecture
This lecture
1.Learning a loose hierarchy of image fragmentsThe algorithm
Recognition in the real-world
This lecture
1.Learning a loose hierarchy of image fragmentsThe algorithm
Recognition in the real-world
2.Rapid recognition and feedforward processing:Predicting human performance
“Clutter problem”
This lecture
1.Learning a loose hierarchy of image fragmentsThe algorithm
Recognition in the real-world
2.Rapid recognition and feedforward processing:Predicting human performance
“Clutter problem”
3.Beyond feedforward processing:Top-down cortical feedback and attention to solve the “clutter problem”
Predicting human eye movements
This lecture
1.Learning a loose hierarchy of image fragmentsThe algorithm
Recognition in the real-world
2.Rapid recognition and feedforward processing:Predicting human performance
“Clutter problem”
3.Beyond feedforward processing:Top-down cortical feedback and attention to solve the “clutter problem”
Predicting human eye movements
Gabor filters
Parameters fit to V1 data (Serre & Riesenhuber 2004)
17 spatial frequencies (=scales)
4 orientations
Animalvs.
non-animal
C1S1
S2
S3S2bC2
classif.units
S4
C2b
C3S1 units
Animalvs.
non-animal
C1S1
S2
S3S2bC2
classif.units
S4
C2b
C3C1 units
Increase in tolerance to position (and in RF size)
Local max over pool of S1 cells
C1
S1
Animalvs.
non-animal
C1S1
S2
S3S2bC2
classif.units
S4
C2b
C3C1 units
Increase in tolerance to scale
C1 Local max over pool of S1 cells
Receptive field sizesModel Cortex References
simple cells 0.2o ! 1.1o " 0.1o ! 1.0o [Schiller et al., 1976e;Hubel and Wiesel, 1965]
complex cells 0.4o ! 1.6o " 0.2o ! 2.0o
Peak frequencies (cycles /deg)Model Cortex References
simple cells range: 1.6 ! 9.8 bulk " 1.0 ! 4.0 [DeValois et al., 1982a])mean/med: 3.7/ 2.8 mean: " 2.2
range: " 0.5 ! 8.0complex cells range: 1.8 ! 7.8 bulk " 2.0 ! 5.6
mean/med: 3.9/ 3.2 mean: 3.2range " 0.5 ! 8.0
Frequency bandwidth at 50% amplitude (cycles / deg)Model Cortex References
simple cells range: 1.1 ! 1.8 bulk " 1.0 ! 1.5 [DeValois et al., 1982a]med: " 1.45 med: " 1.45
range " 0.4 ! 2.6complex cells range: 1.5 ! 2.0 bulk " 1.0 ! 2.0
med: 1.6 med: 1.6range " 0.4 ! 2.6
Frequency bandwidth at 71% amplitude (index)Model Cortex References
simple cells range: 44 ! 58 bulk " 40 ! 70 [Schiller et al., 1976d]med: 55
complex cells range 40 ! 50 bulk " 40 ! 60med. 48
Orientation bandwidth at 50% amplitude (octaves)Model Cortex References
simple cells range: 38o ! 49o — [DeValois et al., 1982b]med: 44o
complex cells range: 27o ! 33o bulk " 20o ! 90omed: 43o med: 44o
Orientation bandwidth at 71% amplitude (octaves)Model Cortex References
simple cells range: 27o ! 33o bulk " 20o ! 70o [Schiller et al., 1976c]med: 30o
complex cells range: 27o ! 33o bulk " 20o ! 90omed: 31o
Serre & Riesenhuber 2004
50 0 500
0.2
0.4
0.6
0.8
1
orientation (in degree)
resp
onse
optimal baredgegrating
Animalvs.
non-animal
C1S1
S2
S3S2bC2
classif.units
S4
C2b
C3S2 units
Features of moderate complexity (n~1,000 types)
Combination of V1-like complex units at different orientations
Synaptic weights w learned from natural images
5-10 subunits chosen at random from all possible afferents (~100-1,000)
Animalvs.
non-animal
C1S1
S2
S3S2bC2
classif.units
S4
C2b
C3S2 units
stronger facilitation
stronger suppression
homogenous fields
cross-orientation
fields
Nature Neuroscience - 10, 1313 - 1321 (2007) / Published online: 16 September 2007 | doi:10.1038/nn1975
Neurons in monkey visual area V2 encode combinations of orientationsAkiyuki Anzai, Xinmiao Peng & David C Van Essen
a b c
d e f0 2
0
–1
–1–1
–2–0.5
–0.52–1–
–2
–0.5
–1.0
–1–2
1
2
0 2
0
–1
–2
–2
1
2V2
24 spikes per s
0 1
0
0.5
1.0 V2
11 spikes per s
V2
18 spikes per s
0 0.5
0
0.5
V1
32 spikes per s
0 2
0
1
2 V2
14 spikes per s
0 1
0
1
y (°
)
V2
16 spikes per s
Animalvs.
non-animal
C1S1
S2
S3S2bC2
classif.units
S4
C2b
C3C2 units
Same selectivity as S2 units but increased tolerance to position and size of preferred stimulus
Local pooling over S2 units with same selectivity but slightly different positions and scales
S2 units in V2 and C2 in V4?
Beyond C2 units
Units increasingly complex and invariantS3/C3 units:
Combination of V4-like units with different selectivitiesDictionary of ~1,000 features = num. columns in IT (Fujita 1992)
S4 units:View-tuned units (imprinted with part of the training set, e.g. animal and non-animal images but still unsupervised)Tuning and invariance properties agrees with IT data (Logothetis Pauls & Poggio 1995)
Animalvs.
non-animal
C1S1
S2
S3S2bC2
classif.units
S4
C2b
C3
Idea 1: Built-in invariance to 2D transformations (rotation and scale)
Idea 2: Generic features shared between multiple categories
Overall reduce “sample complexity” and reduces number of training examples needed to learn a task .
So why hierarchies?
Task-specific = categorization circuitsAnimal
vs.
non-animal
Complex cells
Tuning
Simple cells
MAX
Main routes
Bypass routes
Prefrontal
Cortex
V4
PIT
35
PIT
, A
IT
AIT
,36
,35
V1
PG
TE
45 1211,
13
AIT
V2
V1
dorsal stream
'where' pathway
ventral stream
'what' pathway
C1
S1
S2
S3
S2b
C2
classification
units
0.2 - 1.1o
0.4 - 1.6o
0.6 - 2.4o
1.1 - 3.0o
0.9 - 4.4o
1.2 - 3.2
o
o
o
o
o
oo
Model
layers
RF sizes
S4 7o
Num.
units
C2b 7o
C3 7o
10 6
104
107
105
104
107
100
102
103
103
Incre
ase
in
co
mp
lexity (
nu
mb
er
of
su
bu
nits),
RF
siz
e a
nd
in
va
ria
nce
Un
su
pe
rvis
ed
ta
sk-in
de
pe
nd
en
t le
arn
ing
Superv
ised
task-d
ependent le
arn
ing
V1
V2
V4
PIT
AIT
PFC
features of increasing complexity and tolerance to position and scale
view-based object representation but tolerant position, scale and small rotations
Animal
vs.
non-animal
Complex cells
Tuning
Simple cells
MAX
Main routes
Bypass routes
Prefrontal
Cortex
V4
PIT
35
PIT
, A
IT
AIT
,36
,35
V1
PG
TE
45 1211,
13
AIT
V2
V1
dorsal stream
'where' pathway
ventral stream
'what' pathway
C1
S1
S2
S3
S2b
C2
classification
units
0.2 - 1.1o
0.4 - 1.6o
0.6 - 2.4o
1.1 - 3.0o
0.9 - 4.4o
1.2 - 3.2
o
o
o
o
o
oo
Model
layers
RF sizes
S4 7o
Num.
units
C2b 7o
C3 7o
10 6
104
107
105
104
107
100
102
103
103
Incre
ase
in
co
mp
lexity (
nu
mb
er
of
su
bu
nits),
RF
siz
e a
nd
in
va
ria
nce
Un
su
pe
rvis
ed
ta
sk-in
de
pe
nd
en
t le
arn
ing
Superv
ised
task-d
ependent le
arn
ing
V1
V2
V4
PIT
AIT
PFC Evidence for adult plasticity
very likely
likely
limited evidence
supervised learning from a handful of training examples ~ linear perceptron
Animal
vs.
non-animal
Complex cells
Tuning
Simple cells
MAX
Main routes
Bypass routes
Prefrontal
Cortex
V4
PIT
35
PIT
, A
IT
AIT
,36
,35
V1
PG
TE
45 1211,
13
AIT
V2
V1
dorsal stream
'where' pathway
ventral stream
'what' pathway
C1
S1
S2
S3
S2b
C2
classification
units
0.2 - 1.1o
0.4 - 1.6o
0.6 - 2.4o
1.1 - 3.0o
0.9 - 4.4o
1.2 - 3.2
o
o
o
o
o
oo
Model
layers
RF sizes
S4 7o
Num.
units
C2b 7o
C3 7o
10 6
104
107
105
104
107
100
102
103
103
Incre
ase
in
co
mp
lexity (
nu
mb
er
of
su
bu
nits),
RF
siz
e a
nd
in
va
ria
nce
Un
su
pe
rvis
ed
ta
sk-in
de
pe
nd
en
t le
arn
ing
Superv
ised
task-d
ependent le
arn
ing
V1
V2
V4
PIT
AIT
PFC
unsupervised developmental-like learning stage
Columns in the cortex
Layers of the model are organized in columns
Each model unit is equivalent to ~100 IF (~1 column of cortex)
Each hypercolumn contains the same basic dictionary of features and is replicated at all positions and scales
..
..
. .... ...
..
.. .
.. ..
. ...
. ..
. .. .
.
.......
Learning is sequential
Start with layer S2/C2 then S2b/C2b and S3/C3
Pick one unit in layer Sk
Select random set of inputs from retinotopically organized afferents
Sk
Ck-1
w1
w2 w3
Sk
Ck-1
w1
w2 w3
x1xk
xp
x j
x2 x3
y
w=xImprint with random patch of
natural image
Sk
Ck-1
w1
w2 w3
x1xk
xp
x j
x2 x3
y = exp !1
2!2
n
j =1
(wj ! x j )2[ ]y
...
✦ We learn ~1,000 units this way and then move to the next layer✦ Learning follows a long tradition of researchers who have argued that the visual system may be adapted to the statistics of the natural environment (Attneave 1954; Barlow 1961; Atick 1992; Ruderman 1994; Simoncelli & Olshausen 2001)
✦Here we assume the input image moves (shifting and looming) so that the selectivity of the imprinted units gets replicated at all positions and scales
...
.. .
.
Learning invariancesw| T. Masquelier & S. Thorpe
(CNRS, France)
see also (Foldiak 1991; Perrett et al 1984; Wallis & Rolls, 1997; Einhauser et al 2002; Wiskott & Sejnowski 2002; Spratling 2005)
✦ Simple cells learn correlation in space (at the same time)
✦ Complex cells learn correlation in time
movie courtesy of Wolfgang Einhauser
Learning invariancesw| T. Masquelier & S. Thorpe
(CNRS, France)
see also (Foldiak 1991; Perrett et al 1984; Wallis & Rolls, 1997; Einhauser et al 2002; Wiskott & Sejnowski 2002; Spratling 2005)
✦ Simple cells learn correlation in space (at the same time)
✦ Complex cells learn correlation in time
movie courtesy of Wolfgang Einhauser
S1 units
C1 unit
Learning a dictionary of shape-components in the visual cortex
Learning frequent image features during development
Object categories share reusable features
Large redundant vocabulary for implicit geometry
Learning a dictionary of shape-components in the visual cortex
Learning frequent image features during development
Object categories share reusable features
Large redundant vocabulary for implicit geometry
Learning a dictionary of shape-components in the visual cortex
Learning frequent image features during development
Object categories share reusable features
Large redundant vocabulary for implicit geometry
V1
IT
Learning a dictionary of shape-components in the visual cortex
Learning frequent image features during development
Object categories share reusable features
Large redundant vocabulary for implicit geometry
V1
IT
Learning a dictionary of shape-components in visual cortex
“critical” feature columns in IT
(Tanaka, 1996)
Learning a dictionary of shape-components in visual cortex
“critical” feature columns in IT
(Tanaka, 1996)
✦ Pre-attentive processing:• “Loose collection of basic features” (Wolfe & Bennett 1997)
• “Unbound features” (Treisman et al)
Learning a dictionary of shape-components in visual cortex
“critical” feature columns in IT
(Tanaka, 1996)
✦ Pre-attentive processing:• “Loose collection of basic features” (Wolfe & Bennett 1997)
• “Unbound features” (Treisman et al)
✦ Computer vision:• Component-based > holistic representation (Perona
et al 1995, 1996, 2000; Heisele Serre & Poggio 2001, 2002)
• Features of intermediate complexity are optimal (Ullman, 2002)
• Bag of features (Csurka et al 2004; Sivic et al 2005; Sudderth et al 2005)
C2 vs. IT neurons
Model data: Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005 Experimental data: Hung* Kreiman* Poggio & DiCarlo 2005
TRAIN
TEST
3.4ocenter
Size:Position:
3.4ocenter
1.7ocenter
6.8ocenter
3.4o2o horz.
3.4o4o horz.
0
0.2
0.4
0.6
0.8
1
Cla
ssifi
catio
n pe
rform
ance
IT Model
Application to computer vision
Bio-motivated computer vision
Computer vision system based on the
response properties of neurons in the ventral stream of the visual
cortex
Serre Wolf & Poggio 2005; Wolf & Bileschi 2006; Serre et al 2007
Scene parsing and object recognition
Bio-motivated computer vision
Jhuang Serre Wolf & Poggio 2007
Action recognition in video sequences motion-sensitive MT-like units
Bio-motivated computer vision
Jhuang Serre Wolf & Poggio 2007
Action recognition in video sequences motion-sensitive MT-like units
Recognition accuracy
Dollar et al ‘05
model chance
KTH Human 81.3% 91.6% 16.7%
Weiz. Human 86.7% 96.3% 11.1%
UCSD Mice 75.6% 79.0% 20.0%
★ Cross-validation: 2/3 training, 1/3 testing, 10 repeats Jhuang Serre Wolf & Poggio ICCV’07
Automatic recognition of rodent behavior
Serre Jhuang Garrote Poggio Steele in prep
Automatic recognition of rodent behavior
human agreement
72%
proposed system
71%
commercial system
56%
chance 12%
Performance
Serre Jhuang Garrote Poggio Steele in prep
This lecture
1.Learning a loose hierarchy of image fragmentsThe algorithm
Recognition in the real-world
2.Rapid recognition and feedforward processing:Predicting human performance
“Clutter problem”
3.Beyond feedforward processing:Top-down cortical feedback and attention to solve the “clutter problem”
Predicting human eye movements
This lecture
1.Learning a loose hierarchy of image fragmentsThe algorithm
Recognition in the real-world
2.Rapid recognition and feedforward processing:Predicting human performance
“Clutter problem”
3.Beyond feedforward processing:Top-down cortical feedback and attention to solve the “clutter problem”
Predicting human eye movements
Database collected by Torralba & Oliva (2003)
Head Close-body Medium-body Far-body
Animals
Natural
distractors
Artificial
distractors
Head Close-
body
Far-
body
Medium-
body
1.0
1.4
2.6
2.4
1.8
Pe
rfo
rma
nce
(d
')
Model (82% correct)
Serre Oliva Poggio 2007
High performance (~90%) when
maximal amount of information present
in the absence of clutter
Performance decreases (~74%) with increasing amount of clutter
Limitation of feedforward model compatible with decrease in response in V4 (Reynolds
Chelazzi & Desimone 1999) and IT in the presence of clutter (Zoccolan, Cox, DiCarlo, 2005; Zoccolan, Kouh, Poggio, DiCarlo, in sub; Rolls, Aggelopoulos, Zheng, 2003)
“Clutter effect”
Head Close-
body
Far-
body
Medium-
body
1.0
1.4
2.6
2.4
1.8
Perf
orm
ance (
d')
Model (82% correct)
Animal presentor not ?
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask1/f noise
80 ms
(Thorpe et al 1996; Van Rullen & Koch 2003; Bacon-Mace et al 2005)
Same effect for human observers!
Head Close-
body
Far-
body
Medium-
body
1.0
1.4
2.6
2.4
1.8
Pe
rfo
rma
nce
(d
')
Model (82% correct)
Human observers (80% correct)
Serre Oliva Poggio 2007
(n=24)
Image-by-image correlation:
Heads: ρ=0.71
Close-body: ρ=0.84
Medium-body: ρ=0.71
Far-body: ρ=0.60
Model predicts level of performance on rotated images (90 deg and inversion)
Further comparisons
Serre Oliva Poggio 2007
Show matlab demo
This lecture
1.Learning a loose hierarchy of image fragmentsThe algorithm
Recognition in the real-world
2.Rapid recognition and feedforward processing:Predicting human performance
“Clutter problem”
3.Beyond feedforward processing:Top-down cortical feedback and attention to solve the “clutter problem”
Predicting human eye movements
This lecture
1.Learning a loose hierarchy of image fragmentsThe algorithm
Recognition in the real-world
2.Rapid recognition and feedforward processing:Predicting human performance
“Clutter problem”
3.Beyond feedforward processing:Top-down cortical feedback and attention to solve the “clutter problem”
Predicting human eye movements
Spatial attention solves the “clutter problem”
see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980; Duncan & Desimone 1995; Wolfe, 1997;and many others
Problem: How to know where to attend?
Spatial attention solves the “clutter problem”
see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980; Duncan & Desimone 1995; Wolfe, 1997;and many others
Problem: How to know where to attend?
foreground
Spatial attention solves the “clutter problem”
see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980; Duncan & Desimone 1995; Wolfe, 1997;and many others
Problem: How to know where to attend?
foreground
background
Spatial attention solves the “clutter problem”
see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980; Duncan & Desimone 1995; Wolfe, 1997;and many others
XXXX
Problem: How to know where to attend?
foreground
background
see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980; Duncan & Desimone 1995; Wolfe, 1997;and many others
Science 22 April 2005:Vol. 308. no. 5721, pp. 529 - 534
Parallel and Serial Neural Mechanisms for Visual Search in Macaque Area V4
Narcisse P. Bichot, Andrew F. Rossi, Robert Desimone
XXXX
Spatial attention solves the “clutter problem”
see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980; Duncan & Desimone 1995; Wolfe, 1997;and many others
Science 22 April 2005:Vol. 308. no. 5721, pp. 529 - 534
Parallel and Serial Neural Mechanisms for Visual Search in Macaque Area V4
Narcisse P. Bichot, Andrew F. Rossi, Robert Desimone
XXXX
Spatial attention solves the “clutter problem”
Answer: Parallel feature-based attention
XXXX
Parallel feature-based attention modulation
0 100 200 0 100 200
time from fixation (ms)
0norm
aliz
ed s
pike
act
ivity
1
2
the preferred feature was cued (22). Neuronsresponded better to their preferred featurein the RF compared to nonpreferred features(Fig. 3, A and B) (color, P G 0.01; shape, P G0.001). In the key test, we found that responseswere enhanced if the distracter in the RF wasof the neuron’s preferred color and it was alsothe same color (but, by design, not the sameshape) as the color-shape conjunction target(Fig. 3A) (P 0 0.002). In other words, thedistracter shared in the bias for the targetstimulus if it shared one of its features, con-sistent with the predictions of parallel searchmodels. The median enhancement was 8%,with more than 86% of the neurons having alarger response when the RF stimulus shareda feature with the searched-for target (chi-square, P G 0.005). There was also an en-hancement of the response when the shape ofthe distracter matched the shape of the color-shape conjunction target, consistent with paral-lel models, but this enhancement was smallerand developed later than the color-related en-hancement (Fig. 3, A and B). When the RFdistracter was of the preferred feature, shape-
related enhancement was not significant inthe same time interval as that used in thefeature search task, but it became significantÈ150 ms after fixation onset (P 0 0.035).This is consistent with the behavioral evi-dence described above, that the monkey usedthe color feature more than the shape featurein guiding its search to the color-shape con-junction target (fig. S2B). The LFP magni-tude (Fig. 3, C and D) and power were notmodulated by stimulus or cue features in theconjunction task.
There was also significant enhancementof the spike-field coherence in the gammaband when the RF distracter had the neu-ron’s preferred feature and that feature wasin common with the target for either a color(Fig. 3E) (P G 10j5) or shape (Fig. 3F) (P G0.001) match. The enhancement in the lattercase was smaller, again consistent with themonkey’s behavioral bias in favor of usingcolor information. The median enhancementof coherence with a color match was 22%,with 97% of spike-LFP pairs showing an in-crease (chi-square, P G 10j5), and the median
enhancement with a shape match was 17%,with 78% of spike-LFP pairs showing an in-crease (P G 0.002). Thus, the top-down biasin visual search is not limited to cases inwhich the RF stimulus is the search target butinstead applies to any stimulus, even a dis-tracter, that contains a feature relevant to thesearch, consistent with parallel models. It isalso consistent with the results from the featuresearch task, in which we found that enhance-ment occurred for colors that were similar tothe target color. Both results potentially ex-plain why search is often more difficult whenthe distracters share features with the target,as in some forms of conjunction search (8).
Serial selection during search. Finally,although we have emphasized the evidencefor parallel mechanisms in search, the tasknecessarily had a spatial attention (serial)component to it, in that the animals made sev-eral saccades to stimuli in the array whilesearching for the targets. To test for spatial at-tention effects on responses, we compared re-sponses and spike-field synchronization to astimulus in the RF when either it was selectedfor a saccade or the saccade was made to astimulus outside the RF (Fig. 4).
Selecting the RF stimulus for a saccade ledto an enhancement of the neuronal responseacross the population (Fig. 5A) (populationmedian enhancement of 36%, P G 10j5,
RF stimulus istarget of saccade
RF stimulus is nottarget of saccade
SACCADE:
SACCADE:
RF
FIX
vs.
Test for serial (spatial) selection
Fig. 4. Illustration of the saccade enhancementanalysis. We compared neuronal measures whenthe monkey made a saccade to an RF stimulusversus a saccade away from the RF. In this dis-play, fixating the purple cross, for example,brings the green star into the neuron’s RF. Wewould then compare neuronal responses whenthe green star in the RF was the target of thesaccade, to those when the saccade target wasto a stimulus outside the RF, e.g., the orange A.Activity was analyzed from the time the purplecross was fixated to when the next saccadewas initiated.
1
0
0
.15
20
Time from fixation (ms) Time from fixation (ms)
Frequency (Hz) Frequency (Hz)
COLOR EFFECT
-.2
0
.2
A B
C D
E F
Nor
mal
ized
spi
ke a
ctiv
ityN
orm
aliz
ed L
FP
200100 0 200100
SHAPE EFFECT
Spi
ke-f
ield
coh
eren
ce
1006020 10060
Fig. 3. (A to F) Feature-related enhancement of neuronal activity and synchronization duringconjunction search. Conventions are as given in Fig. 2.
R E S E A R C H A R T I C L E S
22 APRIL 2005 VOL 308 SCIENCE www.sciencemag.org532
on
Fe
bru
ary
18
, 2
00
9
ww
w.s
cie
nce
ma
g.o
rgD
ow
nlo
ad
ed
fro
m
2000 100
time from fixation (ms)
0
norm
aliz
ed s
pike
act
ivity
1
2
attend within RF
attend away from RF
Serial spatial attention modulation
XXXX
Attention as Bayesian inference
see also Rao 2005; Lee & Mumford 2003 Chikkerur Serre & Poggio in prep
PFC
IT
V4/PIT
V2
Attention as Bayesian inference
see also Rao 2005; Lee & Mumford 2003 Chikkerur Serre & Poggio in prep
PFC
IT
V4/PIT
V2
feature-basedattention
Attention as Bayesian inference
see also Rao 2005; Lee & Mumford 2003 Chikkerur Serre & Poggio in prep
PFC
IT
V4/PIT
V2
FEF/LIP
spatial attention
feature-basedattention
Attention as Bayesian inference
see also Rao 2005; Lee & Mumford 2003 Chikkerur Serre & Poggio in prep
PFC
IT
V4/PIT
V2
FEF/LIP
spatial attention
feature-basedattention
O
Fi
Fli
I
L
location priors
object priors
N