Understanding conjunction and double feature searches by a saliency map in primary visual cortex

1
Understanding conjunction and double feature searches by a saliency map in primary visual cortex Li Zhaoping, Department of Psychology, University College London, [email protected], www.gatsby.ucl.ac.uk/~zhaoping Conjunction search --- orientation- color Double feature search -- orientation- color Single feature search -- orientation Single feature search --- color Target differs from backgrou nd in both color and orientat ion Question: How much easier is a double feature search than the corresponding single feature searches, and how much easier are the single feature searches than the conjunction search? How do they depend on the underlying features? Observations: Double feature searches are easier than the corresponding single feature searches, which in turn are easier than conjunction searches. Observations: Motion- orientation and depth- orientation conjunctions are not much more difficult than the single feature searches (Nakayama and Silverman 1986, Mcleod et al 1988), Color- orientation conjunction search is more difficult (Treiman and Gelade 1980). Double feature advantage is greater in motion-orientation than color-orientation (Nothdurft 2000) V1 model Input to model Mo d e l ou t p u t Highlighting important image locations. These locations evoke stronger responses because they have fewer iso- orientation neighbors that suppress them and/or more co- linear neighbors that facilitate them. V1’s output as saliency map is viewed under the idealization of the top-down feedback to V1 being disabled, e.g., shortly after visual exposure or under anesthesia. The V1 model is based on V1 physiology and anatomy (e.g., horizontal connections linking cells tuned to similar orientations), tested to be consistent with physiological data on contextual influences (e.g., iso-orientation suppression, Knierim and van Essen (1992) co-linear facilitation, Kapadia et al 1995). V1 produces a saliency map Signaling saliency regardless of features: Contrary to common beliefs, this does not mean that the cells reporting salience must be un-tuned to specific features. In other words, here “regardless of” means the following — in this saliency map, the meaning of firing rates for saliency is universal, and, given an input scene, the same firing rate from two V1 (output) neurons selective to different features mean the same salience value of the two corresponding inputs even if, say, one of the cells is color selective, responding to a static red bar, and the other cell is tuned to motion, responding to a moving black dot. Usually, an image item, say, a red short bar, evokes responses from many cells with different optimal features and overlapping tuning curves or receptive fields. The actual input features have to be decoded in a complex and feature specific manner from the population responses. However, locating the most responsive cell to a scene locates the most salient item whether or not features can be decoded beforehand or simultaneously from the same cell population. It is economical not to use subsequent cell layers (whether they are feature tuned or not) for a saliency map; the small receptive fields in V1 also mean that this saliency map can have a higher resolution. For more details, see “A saliency map in primary visual cortex” in Trends in Cognitive Sciences, Vol. 6, No.1 January 2002, p.9-16. Z = (S-S)/σ , z score, measuring saliencies of items V1 response S Original input V1 processing S=0.2, z=1.0 S=0.4, z=7 S=0.12,z=-1.3 S=0.22, z=1.7 Histogram of all responses S regardless of features Saliency of an item is assumed to increase with its evoked V1 response. We assume that efficiency of a visual search task increases with the salience of the target (or its most salient part, e.g., the horizontal bar in the target cross above). The high z score, z = 7, (of the horizontal bar), a measure of the cross’ salience, enables the cross to pop out, since its evoked V1 response (to the horizontal bar) is much higher than the average population response of the whole image. The cross has a unique feature, the horizontal bar, which evokes the highest response since it experiences no iso-orientation suppression while all distractors do. Hence, intra-cortical interaction is a neural basis for why feature searches are often efficient. The V1 saliency map agrees with visual search behavior. Input images Model outputs Targe t, and its Z score Comments Z=0.8 Target= Target lacking a feature Z=- 0.9 Target= Conjunctio n search Z=0.2 2 Target= Distracto rs irregular ly placed Z=0.2 5 Target= Distracto rs dissimila r to each other Z=3.4 Target= Homogeneous background, identical distractors regularly placed Z=- 0.63, next to targe t, z=0.6 8 Target= Distracto rs irregular ly placed Z=- 0.83, next to target, z=3.7 Target= Homogeneous background, identical distractors regularly placed S e a r c h b e c o m e s e a s i e r i n h o m o g e n e o u s b a c k g r o u n d s , s i n c e z i n c r e a s e s w i t h d e c r e a s i n g σ T h i s i s s o e v e n w h e n a t a r g e t h a s n e g a t i v e z s c o r e , b e c a u s e t h e i t e m s n e x t t o t h e t a r g e t b e c o m e s m o r e s a l i e n t i n a h o m o g e n e o u s b a c k g r o u n d , a t t r a c t i n g a t t r a c t i o n . Model behavior agrees with the subtle changes in search efficiency in asymmetries in visual search --- search efficiency change when target and distractors swap roles. Shown in 2 examples. Only input images are shown, output response differences are too small to be visualized here, but z score differences can be significant. Ellipse in circles vs. Circle in ellipses. Curved line among straight lines vs. Straight among curved. Target: ellipse, z = 2.8 Target: circle, z = 0.7 Target: curved, z = 1.12 Target: straight, z = 0.3 Two neural substrates necessary to make a basic feature: (1) Tuning of cells’ receptive fields to feature, i.e., a population of V1 cells selective to different values of this feature dimension, such that the feature can be signaled, (2) tuning of the horizontal connections to feature, i.e., selectivity of the horizontal intra-cortical connections to the optimal feature values of both the pre-synaptic and post-synaptic cells in this feature dimension, such that a lack of iso-feature (e.g., iso-orientation) suppression of the target can lead to a relatively higher response. E.g., a vertical bar pops out among horizontal ones since cells are selective to orientation,and horizontal connections link cells tuned to similar orientations, hence responses to horizontal bars are suppressed due to iso-orientation suppression. Hence, on conjunction searches A conjunction of 2 orientations is difficult to find since V1 cells are not tuned to two different orientations that differ significantly from each other. A conjunction of motion-orientation (or depth-orientation) is easy to find since many V1 cells are conjunctively tuned to both motion direction (or disparity) and orientation. We predict: there are underlying horizontal connections linking cells tuned conjunctively to the same orientation and motion direction (or disparity). A conjunction of color-orientation can be easy or difficult to find depending on the stimuli, since most V1 cells are tuned only to orientation or only to color, and a small population of V1 cells is broadly tuned to both orientation and color. Prediction: Color-orientation conjunction search can be made easier by adjusting the scale and/or density of the stimuli, since V1 cells conjunctively tuned to both orientation and color are mainly tuned to a specific spatial frequency band. The target red vertical bar evokes responses from 3 cell types: (1) orientation selective cells tuned to vertical, (2) color selective cells tuned to red, and (3) conjunctively tuned cells selective to red-vertical. All 3 cell types experience no iso- feature suppression, the most responsive of them should signal the target saliency. Assuming that cells tuned to the single features determine the ease of the corresponding single feature searches, then the double feature search should be no less difficult than the easier of the two single feature searches, and may be more efficient than the This explains why the double feature advantage is stronger in motion- orientation double feature search than the color- orientation double feature search (Nothdurft 2000), since motion-orientation conjunction cells are more abundant in V1 than the color-orientation conjunction cells. On double feature searches: Stimuli for a conjunction search for target Response from a model with conjunction cells Response from a model without conjunction cells A colored bar evokes responses in cells tuned to orientation only or tuned to color only, A colored bar evokes responses in cells tuned to orientation only, or tuned to color only, or tuned to both color and orientation. The conjunction cell experiences the least iso-feature suppression and enables pop-out. The responses from the orientation selective cells are visualized by the thickness or the black, oriented lines, from color tuned cells by the size of the colored circle, from conjunctively tuned cells by the size of the adequated colored and oriented ellipses. The horizontal connections link cells tuned to similar features (orientation, color, or both). Outputs S to higher Visual Areas S σ

description

Target lacking a feature. Target=. Target=. Z=0.8. Target=. Target=. Target=. Distractors irregularly placed. Z=0.22. Distractors dissimilar to each other. Z=0.25. S. - PowerPoint PPT Presentation

Transcript of Understanding conjunction and double feature searches by a saliency map in primary visual cortex

Page 1: Understanding conjunction and double feature searches  by a saliency map in primary visual cortex

Understanding conjunction and double feature searches by a saliency map in primary visual cortexLi Zhaoping, Department of Psychology, University College London, [email protected], www.gatsby.ucl.ac.uk/~zhaoping

Conjunction search --- orientation-color

Double feature search -- orientation-color

Single feature search -- orientation

Single feature search --- color

Target differs from background in both color and orientation

Question: How much easier is a double feature search than the corresponding single feature searches, and how much easier are the single feature searches than the conjunction search? How do they depend on the underlying features?

Observations: Double feature searches are easier than the corresponding single feature searches, which in turn are easier than conjunction searches.

Observations: Motion-orientation and depth-orientation conjunctions are not much more difficult than the single feature searches (Nakayama and Silverman 1986, Mcleod et al 1988), Color-orientation conjunction search is more difficult (Treiman and Gelade 1980). Double feature advantage is greater in motion-orientation than color-orientation (Nothdurft 2000)

V1

mod

elIn

put

to m

odel

Mod

el o

utpu

t

Highlighting important image locations. These locations evoke stronger responses because they have fewer iso-orientation neighbors that suppress them and/or more co-linear neighbors that facilitate them.

V1’s output as saliency map is viewed under the idealization of the top-down feedback to V1 being disabled, e.g., shortly after visual exposure or under anesthesia.

The V1 model is based on V1 physiology and anatomy (e.g., horizontal connections linking cells tuned to similar orientations), tested to be consistent with physiological data on contextual influences (e.g., iso-orientation suppression, Knierim and van Essen (1992) co-linear facilitation, Kapadia et al 1995).

V1 produces a saliency map

Signaling saliency regardless of features: Contrary to common beliefs, this does not mean that the cells reporting salience must be un-tuned to specific features. In other words, here “regardless of” means the following — in this saliency map, the meaning of firing rates for saliency is universal, and, given an input scene, the same firing rate from two V1 (output) neurons selective to different features mean the same salience value of the two corresponding inputs even if, say, one of the cells is color selective, responding to a static red bar, and the other cell is tuned to motion, responding to a moving black dot. Usually, an image item, say, a red short bar, evokes responses from many cells with different optimal features and overlapping tuning curves or receptive fields. The actual input features have to be decoded in a complex and feature specific manner from the population responses. However, locating the most responsive cell to a scene locates the most salient item whether or not features can be decoded beforehand or simultaneously from the same cell population. It is economical not to use subsequent cell layers (whether they are feature tuned or not) for a saliency map; the small receptive fields in V1 also mean that this saliency map can have a higher resolution. For more details, see “A saliency map in primary visual cortex” in Trends in Cognitive Sciences, Vol. 6, No.1 January 2002, p.9-16.

Z = (S-S)/σ , z score, measuring saliencies of items

V1 response SOriginal input

V1 processing

S=0.2, z=1.0

S=0.4, z=7

S=0.12,z=-1.3

S=0.22, z=1.7

Histogram of all responses S regardless of features

Saliency of an item is assumed to increase with its evoked V1 response. We assume that efficiency of a visual search task increases with the salience of the target (or its most salient part, e.g., the horizontal bar in the target cross above). The high z score, z = 7, (of the horizontal bar), a measure of the cross’ salience, enables the cross to pop out, since its evoked V1 response (to the horizontal bar) is much higher than the average population response of the whole image. The cross has a unique feature, the horizontal bar, which evokes the highest response since it experiences no iso-orientation suppression while all distractors do. Hence, intra-cortical interaction is a neural basis for why feature searches are often efficient.

The V1 saliency map agrees with visual search behavior.

Input images Model outputsTarget, and its Z score

Comments

Z=0.8

Target= Target lacking a feature

Z=-0.9

Target= Conjunction search

Z=0.22

Target=Distractors irregularly placed

Z=0.25

Target=Distractors dissimilar to each other

Z=3.4

Target= Homogeneous background, identical distractors regularly placed

Z=-0.63, next to target, z=0.68

Target=

Distractors irregularly placed

Z=-0.83, next to target, z=3.7

Target= Homogeneous background, identical distractors regularly placed

Search

beco

mes easier in

ho

mo

gen

eou

s backg

rou

nd

s, sin

ce z increases w

ith d

ecreasing

σ

Th

is is so even

wh

en a tar g

et has

neg

ative z score, b

ecause th

e item

s next to

the targ

et beco

mes

mo

r e salient in

a ho

mo

gen

eou

s b

ackgro

un

d, attractin

g attr actio

n.

Model behavior agrees with the subtle changes in search efficiency in asymmetries in visual search --- search efficiency change when target and distractors swap roles. Shown in 2 examples. Only input images are shown, output response differences are too small to be visualized here, but z score differences can be significant.

Ellipse in circles vs. Circle in ellipses.

Curved line among straight lines vs. Straight among curved.

Target: ellipse, z = 2.8 Target: circle, z = 0.7

Target: curved, z = 1.12 Target: straight, z = 0.3

Two neural substrates necessary to make a basic feature: (1) Tuning of cells’ receptive fields to feature, i.e., a population of V1 cells selective to different values of this feature dimension, such that the feature can be signaled, (2) tuning of the horizontal connections to feature, i.e., selectivity of the horizontal intra-cortical connections to the optimal feature values of both the pre-synaptic and post-synaptic cells in this feature dimension, such that a lack of iso-feature (e.g., iso-orientation) suppression of the target can lead to a relatively higher response. E.g., a vertical bar pops out among horizontal ones since cells are selective to orientation,and horizontal connections link cells tuned to similar orientations, hence responses to horizontal bars are suppressed due to iso-orientation suppression.

Hence, on conjunction searches

•A conjunction of 2 orientations is difficult to find since V1 cells are not tuned to two different orientations that differ significantly from each other.

•A conjunction of motion-orientation (or depth-orientation) is easy to find since many V1 cells are conjunctively tuned to both motion direction (or disparity) and orientation. We predict: there are underlying horizontal connections linking cells tuned conjunctively to the same orientation and motion direction (or disparity).

•A conjunction of color-orientation can be easy or difficult to find depending on the stimuli, since most V1 cells are tuned only to orientation or only to color, and a small population of V1 cells is broadly tuned to both orientation and color. Prediction: Color-orientation conjunction search can be made easier by adjusting the scale and/or density of the stimuli, since V1 cells conjunctively tuned to both orientation and color are mainly tuned to a specific spatial frequency band.

The target red vertical bar evokes responses from 3 cell types: (1) orientation selective cells tuned to vertical, (2) color selective cells tuned to red, and (3) conjunctively tuned cells selective to red-vertical. All 3 cell types experience no iso-feature suppression, the most responsive of them should signal the target saliency.

Assuming that cells tuned to the single features determine the ease of the corresponding single feature searches, then the double feature search should be no less difficult than the easier of the two single feature searches, and may be more efficient than the single feature searches if the conjunctively tuned cell is the most responsive.

This explains why the double feature advantage is stronger in motion-orientation double feature search than the color-orientation double feature search (Nothdurft 2000), since motion-orientation conjunction cells are more abundant in V1 than the color-orientation conjunction cells.

On double feature searches:

Stimuli for a conjunction search for target

Response from a model with conjunction cells

Response from a model without conjunction cells

A colored bar evokes responses in cells tuned to orientation only or tuned to color only,

A colored bar evokes responses in cells tuned to orientation only, or tuned to color only, or tuned to both color and orientation. The conjunction cell experiences the least iso-feature suppression and enables pop-out.

The responses from the orientation selective cells are visualized by the thickness or the black, oriented lines, from color tuned cells by the size of the colored circle, from conjunctively tuned cells by the size of the adequated colored and oriented ellipses. The horizontal connections link cells tuned to similar features (orientation, color, or both).

Outputs S to higher Visual Areas

S

σ