Fleming Anderson Preprint

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

1

The Perceptual Organization of Depth

Roland Fleming and Bart Anderson Department of Brain and Cognitive Sciences, MIT

Running Head: Perceptual Organization of Depth Correspondence: Roland Fleming, MIT Room NE20-451, 77 Mass. Ave., Cambridge, MA 02139. Email: [email protected]


2

Introduction

The goal of depth perception is to identify the spatial layout of the objects and surfaces

that constitute our surroundings. One important observation about the world around us

that influences the way we see depth is that physical matter is not distributed randomly,

with arbitrary depths at every location. On the contrary, the environment is generally

organized: the world consists mainly of tightly bound objects in a discernable layout.

This order results from countless forces and processes in our world which tend to

organize matter into objects and place those objects in certain spatial relations. The

central thesis of this chapter is that our perception of depth mirrors this organization. We

argue that because the world consists of objects and surfaces, our perception of depth

should likewise be represented in terms of the functionally valuable units of the

environment, namely surfaces and objects. As we shall see, this has profound

consequences for the processing of depth information. In particular, there is more to

depth perception than simply measuring the distance from the observer of every location

in the visual field. Rather, the perception of depth is the active organization of depth

estimates into meaningful bodies. Depth constrains the formation of perceptual units,

and, reciprocally, the figural relations between depth measurements allow the visual

system to parse its representation of depth into ecologically valuable structures.

There are many sources of information about depth from ‘pictorial’ perspective to

motion parallax. An exhaustive review of all these sources of information is beyond the

scope of this chapter (although see Bruce et al., 1996 and Palmer, 1999 for introductory

reviews). Instead, we discuss three key domains in which the visual system “organizes”


3

our perception of depth into meaningful units, to emphasise the intimate relationship

between depth processing and perceptual unit formation.

In the first section we discuss how the visual system infers the layout of surfaces

from local measurements of depth. We will argue that local estimates of depth are

ambiguous, but that the geometry of occlusion critically constrains the legal

interpretations. Occlusion occurs when one opaque object partly obscures the view of a

more distant object, as happens frequently under normal viewing conditions. Occlusion

is important because it occurs at object boundaries, and therefore the depth

discontinuities introduced by occlusion provide ideal locations for the segmentation of

depth into objects. Moreover, as we will show in section 1, the geometry of occlusion

causes relatively near and relatively far depths to play different roles in the inference of

surface structure.

In the second section we discuss the visual representation of environmental

structures that are hidden from view. If the visual system is to organize depth into

meaningful bodies, it must represent whole objects and not only those fragments that

happen to be visible. In order to do this, the visual system must interpolate across gaps in

the image to complete its representation of form. We argue that by considering the

particular environmental conditions under which structures become invisible (specifically

occlusion and camouflage) we can make predictions about the mechanisms underlying

visual completion. We also discuss how visual completion influences the representation

of depth.

Finally, we discuss what happens when the scene contains transparent surfaces,

and thus multiple depths are visible along a single line of sight. We argue that this


4

introduces a second segmentation problem in the perceptual organization of depth. The

visual system not only needs to segment “perpendicular” to the image plane, such that

neighbouring locations are assigned to different objects; with transparency, the visual

system also has to segment depth “parallel” to the image plane, by separating a single

image intensity into multiple depths, a process known as ‘scission’ (Koffka, 1935). We

discuss the conditions under which the visual system performs scission, and how the

ordering of the surfaces in depth is resolved.

We argue that the ambiguity of local depth measurements, the representation of

missing structure, and the depiction of multiple depth planes are three of the major

problems faced by a visual system if it is to organize depth into surfaces and objects.

Through systematic explanations of example stimuli, we discuss some of the ways in

which the visual system overcomes these problems.

1. Interpreting local depth measurements: the contrast depth asymmetry principle

In this section we discuss how occlusion constrains the interpretation of local depth

estimates. Specifically we show that occlusion enforces a crucial asymmetry between

relatively near and relatively distant structures that can have profound implications for

the representation of surface layout. Although the principles are discussed in terms of

binocular disparity, the fundamental logic relates to the geometry of occlusion and

therefore applies to any local estimate of depth.


5

1.1 Binocular stereopsis and the correspondence problem.

Binocular stereopsis is the most thoroughly studied source of information about depth.

Binocular depth perception relies on the fact that the two eyes receive slightly different

views of the same scene. The horizontal parallax between the views has the consequence

that a given feature in the world often projects to two slightly different locations on the

two retinae (see Figure 1). These small differences in retinal location, or binocular

“disparities”, vary systematically with distance in depth from the point of convergence

and can thus be used to triangulate depth. For a thorough treatment of stereopsis see

Howard and Rogers (1995) and chapters [chapter numbers for Ohzawa, Schor and

Shimojo] of this volume.

In order to determine the disparity of a feature in the world, the visual system

must localize that feature in the two retinal images. Once it has identified matching

image features, the difference in retinal location is the binocular disparity, which can then

be scaled to estimate depth. The visual system must not measure the disparity between

features that do not belong together, otherwise it will derive spurious depth estimates (see

Figure 1). Because of this, the accuracy of the matching process is critical to binocular

depth perception. The problem of identifying matching features in the two eye’s views

(that is, features that originate from a common source in the world) is known as the

“correspondence problem”.

If the features that the visual system localizes in the two images are very simple,

such as raw intensity values (or ‘pixels’) then in principle there could be many distracting

features that do not in reality share a common origin in the world. Under these


6

conditions, the correspondence problem would be difficult as the visual system would

have to identify the one true match from among a large number of false targets.

However, there is considerable debate about what types of image features the

visual system matches to determine disparity (Julesz, 1960, 1971; Sperling, 1970; Marr

and Poggio, 1976, 1979; Pollard, Mayhew and Frisby, 1985; Prazdny, 1985; Jones and

Malik, 1992). Psychophysically, at least, it now seems unlikely that the visual system

matches raw luminances. Rather, the visual system seems to match local contrast

signals, that is, localizable variations in intensity, such as luminance edges (Anderson

and Nakayama, 1994, Smallman and McKee, 1995). This seems an almost inevitable

consequence of early visual processing, which maximises sensitivity to contrasts, rather

than to absolute luminances (Hartline, 1940; Wallach, 1948; Ratliff, 1965; Cornsweet,

1970). By the time binocular information converges in V1, the visual field appears to be

represented in terms of local measurements of oriented contrast energy (Hubel and

Wiesel, 1962; DeValois and DeValois, 1988) and thus it is likely that these are the

features from which disparity is computed.

If this is true, then the image features that carry disparity information are local

contrasts, such as luminance edges. However, this poses a problem for the visual system,

for in order to capture the functional units of the environment, the visual representation of

depth should be tied to surfaces and objects, not to local image features. There is

therefore, a potential discrepancy between the image features that carry disparity

information (i.e. local contrasts), and the perceptual structures to which depth is assigned

(i.e. regions) in the ultimate representation of environmental layout. This discrepancy

plays a critical role in the theoretical discussion that follows.


7

A local image feature, such as an edge, has only one true match in the other eye’s

image. Therefore, the edge carries only one disparity. However, depth is ultimately

assigned to the two regions that meet to form the edge. This results in a problem: in

order to represent surface structure the visual system must assign depth to both sides of

an edge, even though the edge carries only one disparity (see Figure 2). How does the

visual system infer the depths of two regions from every local disparity signal? We will

show that the geometry of occlusion imposes an inviolable constraint on the

interpretation of local disparity-carrying features. To anticipate, we show that the simple

fact that near surfaces can occlude more distant ones, but not vice versa, has profound

consequences for the assignment of depth to whole regions.

1.2 Asymmetries in depth: a demonstration.

By way of motivation for the theoretical discussion that follows, consider figure 3, which

is based on a figure developed by Takeichi, Watanabe and Shimojo (1992). The figure

consists of a Kanizsa “illusory” triangle and three diamonds. When disparity places the

diamonds nearer to the observer than the triangle and inducers (by cross-fusing the

stereopair on the left of figure 3), the diamonds appear to float independently in front of

the background, and the Kanizsa triangle tends to be seen as a figure in front of the

circular inducers; this percept is schematised in figure 3b. The disparities in the display

can be inverted simply by swapping the left and right eye’s views, as can be seen by

cross-fusing the stereopair on the right of figure 3. In this case what was previously

distant becomes near and vice versa, such that the diamonds are placed behind the plane

of the inducers. In both versions of the display, the triangle itself carries no disparity


8

relative to the circular inducers; only the disparity of the diamonds changes from near to

far. This simple inversion leads to a change in surface representation that is more

complex than a simple reversal in the depth ordering of the perceptual units (as

schematised in figure 3b). When the diamonds recede, they drag their background back

with them, such that the triangle appears as a hole through which the observer can see a

white surface; the three black diamonds lie embedded in the more distant white surface.

This recession of the background has a secondary effect of increasing the strength of the

illusory contour (the border of the triangle).

The important observations with regard to the theory are the following. First,

when the diamonds are in front, they are freely floating and separate, while when they

recede, they drag the background with them. Second, when the dots are forward, the

Kanizsa triangle tends to be seen as a figure (rather than ground), but when the diamonds

are more distant, the triangle is seen as a hole. And yet all that changed in the display

was the disparity of the diamonds. Why does this simple reversal in depth lead to an

asymmetric change in the surface representation? Why does the disparity of the

diamonds influence the appearance of the triangle? These are the asymmetries of depth

to which the following discussion pertains.

1.3 From features to surfaces: interpretation of local disparity signals.

Let us assume that the visual system has located a luminance edge and derived a

disparity, d0, from that edge. What possible surface configurations are consistent with the

local disparity measurement? Broadly, the legal interpretations fall into two classes, as

shown in figure 4. The first class consists of surface events in which both sides of the


9

edge meet at the depth of the edge, d0. There are many surface events for which this is

the case: reflectance edges, cast shadows, and creases in the surface, to name just three.

When the feature originates from a continuous manifold, as in these cases, interpretation

is simple, as both sides of the edge are assigned the same depth, d0.

The second class of interpretations occurs when the edge corresponds to an object

boundary, and therefore represents a depth discontinuity (see figure 4). In this case, one

side of the edge lies at the depth of the occluding object, and the other side of the edge

lies at the depth of the background. Therefore, the visual system must assign different

depths to the two sides of the edge. How can the visual system assign two depths, when

it is given only one disparity, d0? The answer is that it only assigns a unique depth to the

occluding side. The critical insight is the following: The depth measurement acquired at

an occluding edge only specifies the depth of the occluding surface. The visual system

assigns depth d0 to the occluding surface. All that it knows about the other side is that it

must be more distant than the occluding surface. If the more distant surface is

untextured, then it could be at any depth behind the occluder and the local image data

would remain the same. By contrast, if the depth of the occluding surface varies, the

disparity carried by the object boundary must also change, because the occluding surface

“owns” the contour (Koffka, 1935, Nakayama, Shimojo and Silverman, 1989) and is

therefore responsible for the disparity associated with the edge.

Although the visual system cannot uniquely derive the depth of the occluded side

(i.e. the background) from the local disparity computation, there is one critical piece of

information that it does have, and that is that the occluded side is more distant than the

occluder. There is no way for an occluding object to be more distant than the background


10

that it occludes. If the background is brought closer than the object, then the background

becomes the occluding surface, and carries the edge with it. In this way, occlusion

introduces a fundamental asymmetry into the interpretation of disparity-carrying edges:

the occluded side of the edge can be at any distance greater than d0, but neither side can

be nearer than d0.

We can summarise the possible depth assignments (from the occlusion and non-

occlusion classes just described) in the form of a constraint on the interpretation of local

disparity-carrying contrasts, which is termed the “contrast depth asymmetry principle”

(Anderson, submitted; see also Anderson, Singh and Fleming, 2002):

Both sides of an edge must be situated at a depth that is greater than

or equal to the depth carried by that edge.

Although this geometric fact is simple in form, it can have pronounced effects on

the global interpretation of images, when the constraint applies to all edges

simultaneously. We will now run through an example to show how the principle can

explain the asymmetric changes in perceived surface structure that occur when near and

far disparities are inverted.

1.4 Application of the contrast depth asymmetry principle.

In order to demonstrate the explanatory power of the contrast depth asymmetry principle

(hereafter CDAP), we will now use it to account for the demo in figure 3. Recall that

when the diamonds carry near disparity, they float freely in front of the background, and


11

the illusory triangle tends to be seen as figure. When the disparity is reversed, however,

the diamonds drag the background back with them, and the triangle appears as a hole.

This asymmetry in surface layout is depicted in figure 3b.

Let us first consider the case in which the diamonds appear to float in front. The

visual system has to interpret the disparity signals carried by the edges of the diamonds.

The CDAP requires that both sides of the diamonds’ edges (i.e. the black inside and the

white outside of the diamonds) have to be at least as distant as the edges. Now consider

the inducers, which are more distant than the diamonds. The constraint requires both

sides of these edges to be at least as distant as their edges. This means that all of the

black interior of the inducers must be at least this distant and, more importantly, all of the

white background must be at least this distant, which is further than the disparity of the

diamonds. If all of the white background is further than the diamonds, then the edges of

the diamonds must be occluding edges, and the black interior of the diamonds must be an

occluding surface. This explains why the diamonds are seen as independent occluders,

floating in front of the large white background and black inducers: the edges of the

inducers drag the white background back, leaving the diamonds floating in front.

Now consider the case in which the diamonds are more distant than the inducers.

Again, the CDAP requires that both the inside and the outside of the diamonds have to be

at least as far back as their disparity dictates. This means that both the diamonds and

their white background are dragged back to the more distant disparity. Now consider the

inducers, which carry a relatively near disparity. Because the white background behind

the diamonds has been dragged back with the diamonds, the inducers and their white

background must be occluding surfaces. This means that the background immediately


12

surrounding the diamonds must be visible through a hole in the occluding surface. The

edges of this hole are the illusory contours of the Kanizsa figure. Note again, the fact that

both sides of every edge have to be at least as far as the edge, leads to asymmetrical

surface structures when disparities are inverted.

This is just one example that shows how the CDAP can account for asymmetrical

effects of relatively near and relatively far disparities on perceived surface layout;

because the CDAP is derived from the geometry of occlusion, it can account for a very

large number of displays, and can be used to generate surprising new displays (see

Anderson, 1999; Anderson, submitted).

2. Occlusion and camouflage: hallucinating the invisible

The central thesis of this chapter is that the visual system does not merely record depth at

each location in the visual field; rather, it actively organizes its depth measurements into

functionally valuable units. In the last section, we discussed how occlusion plays a key

role in this organization. In this section, we discuss how the visual system handles what

is arguably the hardest problem posed by occlusion: the visual representation of

structures that are hidden and are therefore completely invisible. If seeing depth is about

representing the actual layout of objects in the environment, then all portions of the

objects must be represented, even those that are hidden from view: hidden portions do not

disappear from the environment just because they do not appear in the image. Therefore,

the visual system has to go beyond local image data to construct representations of


13

hidden structures. We will now discuss how the environmental conditions of occlusion

and camouflage predict properties of the construction process.

2.1 Modal and amodal completion

We will consider two major ways in which parts of the scene can become invisible. The

first is simple occlusion, when an opaque object obscures part of a more distant object.

When this happens, the occluded structures of the more distant object have no

corresponding features in the image, and thus the visual system must somehow

‘reconstruct’ the missing data. The second way that viewing conditions can lead to

invisible structures is through camouflage. In camouflage it is the nearer, occluding

surface that is rendered invisible because it happens to match the color of its background.

Because the boundaries of the camouflaged object do not project any contrast, they have

no corresponding features in the image and thus the nearer object is effectively invisible.

Under these circumstances, the visual system must actively ‘hallucinate’ the invisible

structures. In both cases, the visual system interpolates missing data, a process that is

known as “visual completion”. This process is important to depth perception because it

is one of the means by which the visual system organizes its depth measurements into

meaningful bodies. We argue that depth perception and unit formation are intimately

intertwined, for depth constrains the perceptual units that are formed, and perceptual

organization influences the interpretation of local depth measurements.

The phenomenal quality of completed structures differs, depending on whether it

is near (camouflaged) or far (occluded) structures that are interpolated. In the case of

camouflage, the interpolation leads to a distinct impression of a contour or surface across


14

the region of missing data. This is referred to as “modal completion” (Michotte, Thines

and Crabbe, 1991/1964) because the experience is of the same phenomenal modality as

ordinary visual experience. An illusory contour, for example, is crisp, and subjectively

similar to a real contour, as can be seen in figure 5a. In contrast to this, the sense of

completion experienced with occluded structures is less distinct. The black form in

figure 5b tends to be seen as a single object, part of which is hidden, rather than as two

distinct objects, whose boundaries coincide with the boundary of the grey occluder.

There is a compelling sense that the two visible portions of the black form belong to the

same object, and that that object continues in the space behind the occluder. However,

this impression, although visual in origin, is not of the same phenomenal mode as normal

and modal contours, and is therefore referred to as “amodal completion” (Michotte et al.).

In general, the regions of the image which are visible, and lead to visual completion are

referred to as “inducers”.

2.2 The identity hypothesis.

There is a vast literature on visual completion and a thorough discussion of all the issues

is beyond the scope of this chapter. One important issue that is discussed in greater detail

in chapter [chapter number for Shimojo], is whether visual completion occurs relatively

early or late in the putative processing hierarchy. However, the perceptual organization

of depth has a direct bearing on another current debate, specifically, the extent to which

modal and amodal completion are the consequence of a single process. This issue is

intimately bound to depth perception because it determines the extent to which depth

processing and perceptual organization are independent.


15

The debate runs roughly as follows. On the one hand there has been the strong

claim that a single completion mechanism is responsible for both modal and amodal

completion. According to this account, perceptual organization (including visual

completion) produces perceptual units, and an independent process places those units in

depth. The theory states that psychological differences between modal and amodal

completion results from the final depth ordering of the completed forms (Kellman and

Shipley, 1991; Shipley and Kellman, 1992; Kellman, Yin and Shipley, 1998) rather than

a difference between the completion processes themselves. This is known as the

“identity hypothesis”. On the other hand the two processes could be largely independent,

subject to different constraints and subserved by distinct neural mechanisms. The strong

form of this “dual mechanism” hypothesis would be that the two processes are of a

fundamentally different kind, for example, that modal completion is largely data-driven,

while amodal completion is essentially “cognitive”. To anticipate, although we do not

subscribe to the strongest form of the dual-mechanism hypothesis, we will provide

evidence that modal and amodal completion follow different constraints and argue that

they are subserved by distinct neural processes. Central to the arguments that we present

are the geometric and photometric conditions under which occlusion and camouflage

actually occur in the environment.

The principle evidence for the identity hypothesis has been that subjects perform

similarly with modally and amodally completed figures in a variety of tasks. In one task,

Shipley and Kellman (1992) varied the spatial alignment of the inducing elements in both

modally and amodally completed squares. Such misalignment is known to weaken the

sense of completion, as the completed boundary is forced to undergo an inflection.


16

Subjects were asked to rate the subjective strength of visual completion as a function of

the degree of misalignment for modal and amodal versions of the display. Shipley and

Kellman (1992) found that ratings declined at the same rate as a function of misalignment

for both modal and amodal figures. This has been interpreted as evidence that a single

mechanism is responsible for both forms of completion.

Using a more rigorous method, Ringach and Shapley (1996) performed a shape

discrimination task with modal and amodal versions of a Kanizsa figure. By rotating the

inducing elements, the vertical contours of the completed square can be made to bow out

(creating a “Fat” Kanizsa), or curve in (creating a “Thin” Kanizsa). Subjects were asked

to discriminate between Fat and Thin versions of the display while the angle through

which the inducers were rotated was varied. Ringach and Shapley found that

discrimination performance as a function of rotation was nearly identical for modal and

amodal versions of the display, a finding which is consistent with the identity hypothesis.

One problem with this type of evidence is that it relies on negative results, that is,

a failure to detect a difference, which could be due to the method rather than a

fundamental property of the system being studied. Should positive evidence be provided

that modal and amodal completion are subject to different constraints, or result in

different perceptual units, then the identity hypothesis would no longer be tenable.

There are two major reasons for believing that modal and amodal completion

should be subject to different constraints, both of which are related to the environmental

conditions under which occlusion and camouflage occur. First, occlusion occurs over

greater distances across images because it only requires that one object is in front of

another. Camouflage, on the other hand, requires a perfect match in color between the


17

near surface and its background, and thus occurs less frequently in general. This

difference is reflected in a constraint on the image distances over which modal and

amodal completion occur, which was first documented by Petter (1956). Petter used a

class of stimuli now known as spontaneously splitting objects (SSOs), which consist of a

single homogeneously colored shape, such as the one shown in figure 5c, that tends to be

interpreted as two independent shapes, one behind the other. Which object is seen in

front tends to oscillate with prolonged viewing. However, which shape is seen in front

first, and which tends to be seen in front for a greater proportion of the time can be

predicted rather well from the lengths of the contours that must be interpolated. Petter’s

rule states that longer contours tend to be completed amodally, while shorter contours

tend to be completed modally. Thus, which figure is seen in front can be predicted from

the length of the contours that must be completed. If the two types of completion are

subject to different constraints on the distances over which they occur, this opens the

possibility that they are subserved by different mechanisms.

A second reason for believing that modal and amodal completion are subject to

different constraints relates to the color conditions that are required for occlusion and

camouflage to occur. Again, occlusion can happen between objects of any color. The

reflectance of the near object is unrelated to the fact that it hides the more distant one

from view. This suggests that amodal completion should not be sensitive to the

luminance relations between the image regions involved. Camouflage, by contrast,

requires a perfect match in luminance between the near and far surface. This implies that

modal completion should be sensitive to the luminance relations between the image

regions involved.


18

Recent experimental work has shown that this luminance sensitivity can lead to

large differences between modal and amodal displays (Anderson, Singh, Fleming, 2002).

Anderson et al. created displays consisting of two vertically separated circles filled with

light and dark stripes, as shown in figure 6. The binocular disparity of the circles was

kept constant, but the disparity of the light/dark contours inside the circles was altered to

place the stripes behind or in front of the circular boundaries. When the stripes were

further than the circles, the top and bottom stripes tended to complete amodally to form a

single continuous dark and light surface, which appeared to be visible through two

circular holes, as schematised in figure 6d. This percept occurred irrespective of the

luminance of the region surrounding the circles.

By contrast, when the disparity placed the contours in front of the circles, the dark

and light stripes separated into different depth planes. The way in which the stripes

separated from one another depended on the luminance of the surround. When the

surround was the same color as the light stripes, the light stripes appeared to float in front

and completed modally across the gap between the two circles. In this condition, the

dark stripes completed amodally underneath the light stripes to form complete circles.

This lead to an impression of light vertical stripes in front of dark circles, as schematised

in figure 6e. However, when the surround was the same luminance as the dark stripes,

the percept inverted, such that the dark stripes appeared to float in front of light disks.

This demonstrates a fundamental dependence on luminance that was not present in the

amodal version of the display. Furthermore, if the surround was an intermediate grey,

then the display was not consistent with camouflage, as neither the light nor the dark

stripes perfectly matched the luminance of the background. Under these conditions, there


19

was no modal completion across the gap, and the percept was difficult to interpret. This

demonstrates that modal completion is sensitive to luminance relations, while amodal

completion is not.

Anderson et al. showed that this luminance sensitivity could affect performance

on basic visual tasks such as vernier acuity. The stripes in the top and bottom circles can

be horizontally offset (i.e. misaligned slightly), without destroying the sense of

completion. Subjects were asked to report in which of two displays the contours were

slightly misaligned. Both modal and amodal completion facilitate performance in this

task. However, in the amodal case performance was unaffected by the luminance of the

surround, while in the modal case, performance was much worse when the luminance of

the surround was an intermediate grey (the condition in which the stripes do not complete

across the gap). Thus, modal and amodal completion are subject to different constraints,

both on the distance over which they occur, and the luminance conditions that are

required to induce them. This positive evidence for a difference between modal and

amodal completion uses essentially the same types of task as the negative evidence that

had previously been used to support the identity hypothesis.

2.3 Visual completion and the perceptual organization of depth.

The geometric and photometric differences between modal and amodal completion are

derived directly from the environmental conditions of occlusion and camouflage.

Because occlusion and camouflage occur under different circumstances, they have

different consequences for the organization of depth into meaningful bodies. In fact, the

differences can be exploited to generate stimuli in which modal and amodal completion


20

lead to different shapes. This is important as it shows that unit formation is intimately

bound to the placement of structures in depth.

The greater “promiscuity” of amodal completion is the key in the generation of

these displays. Figure 7 is a recently developed stereoscopic variant of the Kanizsa

configuration in which the inducing elements are rotated outwards (Anderson et al.,

2002). When the straight segments (the “mouths” of the “pacmen”) are placed in front of

the circular portions of the inducers, the impression is of 5 independent illusory

fragments that float in front of 5 black disks on a white background. However, when the

two eyes’ views are interchanged, and thus the straight contours are placed behind the

circular segments, the impression is rather dramatically altered. With the disparity

inverted, the impression is of a single amodally-completed, irregularly-shaped, black

figure on a white background, which is visible through 5 holes in a white surface (these

percepts are schematised in figures 7b and c). Thus, the former case consists of a total of

11 surfaces (5 fragments + 5 disks + white background), while the latter case consists of

3 (1 white surface with 5 holes + 1 black shape + white background). Clearly the

placement in depth has a considerable effect on what perceptual units are formed.

Anderson et al. also provided evidence that differences between modal and

amodal interpolation can lead to differences in the very shapes of completed contours

themselves. When the left-hand stereopair in Figure 8a is uncross fused, the resulting

percept consists of six circular disks that are partly occluded by a jagged white surface on

the right-hand side, as schematised in figure 8b. However, when the disparities are

inverted (by uncross-fusing the right pair of Figure 8a), the modal completion across the

regions between the four black blobs tends to take the form of a continuous wavy contour


21

that runs down the center of the display. This percept is schematised in figure 8c. The

importance of this demonstration is that it shows that modal and amodal completion can

not only result in different surface structures, but even in differently shaped contours. It

is difficult to see what the concept of a single completion mechanism serves to explain if

the two processes can result in different completed forms.

Ultimately, the identity hypothesis is a claim about mechanism and can therefore

be assessed physiologically. There is a considerable body of evidence for extrastriatal

units that are sensitive to illusory, but not to amodally-completed, contours (see chapter

[chapter number for von der Heydt], this volume, for a review). A critical additional

piece of evidence was provided recently by Sugita (1999), who found cells in V1 that

respond to amodal completion across their receptive fields, but not to modal completion.

Cells responded weakly when presented with two unconnected edges; holes and

occluding surfaces on their own; and stimuli in which two unconnected edges were

separated by a hole. However, when the cells were presented with two edge fragments

separated by an occluder (a stimulus that leads to amodal completion of the edge), the

cells responded vigorously. This shows that at the earliest stages of cortical processing,

there is a double dissociation between the representations of modal and amodal

structures, a conclusion which supports the dual mechanism hypothesis.

3. Transparency, scission, and the representation of multiple depth planes.


22

Transparency poses a particularly interesting problem in the perceptual organization of

depth. With transparency, one object is visible through another, and thus two distinct

depths lie along the same line of sight (see Figure 9). If the visual system is to represent

depth in terms of the actual surfaces of the environment, it has to depict two distinct

depths at a single location in the visual field. The process of projection compresses the

light arriving from the transparent surface and the light arriving from the more distant

surface into a single image intensity on the retina. In order to represent both surfaces, the

visual system has to separate a single luminance value into multiple contributions, a

process known as scission (Koffka, 1935). We argue that scission is a type of perceptual

segmentation as it parses the representation of depth into distinct surfaces. However,

rather than segmenting neighbouring locations into distinct objects, scission separates

depth into layers, or planes, and thus operates “parallel” to the image plane.

Scission poses the visual system with two principle problems. The first is to

identify when a single luminance results from two distinct depths. The second is to

assign surfaces properties correctly at the two depths. By studying when and how we see

transparency, we can learn how the visual system scissions depth into layers.

Much of the seminal work on perceptual transparency was conducted by Metelli

(1970, 1974a,b; see also Metelli et al., 1985), who provided a thorough quantitative

analysis of the color mixing that occurs when one surface is visible through another.

When a background is visible through a transparent sheet, only certain geometrical and

luminance relations can hold between the various regions of the display (see Figure 9).

From these relations Metelli derived constraints that determine whether a region will look

transparent or not, and how opaque it will appear if it does look transparent. This is


23

important as it determines the conditions under which the visual system scissions a single

image intensity into multiple layers, and thus how the visual system stratifies its

representation of depth.

Broadly the conditions required for perceptual scission fall into two classes. The

first are the photometric conditions for transparency, which detail the relations between

the light intensities of neighbouring regions that are necessary for scission. The second

set of conditions for perceptual scission are geometrical, or figural. Depth only separates

into layers when these relations hold between the various regions of the display.

3.1 Photometric conditions for scission.

Consider the display shown in figure 9a, which tends to be seen as a bipartite background

that is visible through a transparent filter. The vivid separation of the central region into

two depths only occurs when certain luminance relations hold. Metelli derived two

constraints on the photometric conditions required for perceptual scission.

The intuition behind the first constraint, which we refer to as the “magnitude

constraint”, is that a transparent medium cannot increase the contrast of the structures

visible through it. The consequence of this constraint is that the central diamond must be

lower contrast than its surround in order to appear transparent, as shown in Figure 9a.

This constraint is important as it restricts the conditions under which scission occurs: a

region can only scission if its contrast is less than or equal to the contrast of its flanking

regions. As can be seen from figure 9c, infringement of this constraint with respect to the

central diamond prevents the central disk from undergoing scission. However, in this

display, the constraint is satisfied for the region surrounding the diamond, and thus, the


24

display can be seen as a bipartite display seen through a transparent filter with a

diamond-shaped hole in the centre.

The intuition behind the second luminance constraint, which we refer to as the

“polarity constraint”, is that a transparent medium cannot alter the contrast polarity of the

structures visible through it. Put another way, if a dark-light edge passes underneath a

transparent medium, the dark side will remain darker than the light side, no matter what

the absolute luminances are. As can be seen from Figure 9d, infringement of this

constraint prevents perceptual scission, demonstrating that the visual system respects this

optical outcome of transparency. This constraint is particularly important in determining

the depth ordering in transparent displays.

The polarity constraint enforces certain restrictions on the ordinal relationships

between the luminances of neighbouring regions. This means that, in principle, we can

classify the locations where neighbouring regions meet to determine whether scission is

or is not possible in each region. This provides the visual system with a local signature

of transparency. Beck and Ivry (1988) noted that if one draws a series of lines running

progressively from the brightest to the darkest regions, there are three possible shapes

that result, as shown in figure 10. The only difference between the three figures is the

luminance of the region of overlap between the two squares. In the first instance (Figure

10a), the image is bistable, as either square can be seen as a transparent overlay. In these

circumstances the lines linking regions of increasing luminance form a Z-configuration.

When the lines form a C-shape (Figure 10b), only one of the squares is seen as

transparent, and when the lines criss-cross (Figure 10c), the polarity constraint is

infringed for all regions, and neither square scissions. Adelson and Anandan (1990)


25

provided a similar taxonomy based on the number of polarity reversals. A number of

lightness illusions demonstrate that scission can be predicted from the class of X-

junctions in the display, and that these X-junctions can have powerful effects on many

qualities of our experience (see, for example, Adelson, 1993, 1999).

The magnitude and polarity constraints can be unified as a single rule that

describes a powerful local cue to scission. Anderson (1997) phrased the rule as follows:

“When two aligned contours undergo a discontinuous change in contrast magnitude, but

preserve contrast polarity, the lower-contrast region is decomposed into two causal

layers”. There are two valuable consequences of this rule. The first is that it unifies the

two Metelli constraints. The second is that it provides a local signature of transparency

that can be applied to any meeting of contours. This includes those T-junctions that are

in fact degenerate X-junctions; that is, those in which two neighbouring regions happen

to have exactly the same luminance. Anderson (1997) also demonstrated that a number

of traditional lightness phenomena, including White’s effect and its variants, and neon

color spreading, can be accounted for as cases of scission, rather than the consequence of

traditional “contrast” or “ assimilation” processes.

Having identified that a location contains two surfaces, the visual system has to

partition the luminance at that location between the two depths. How much of the light is

due to reflectance of underlying surface, and how much is due to the properties of the

overlying layer? The opacity of the overlying layer determines how the luminance is

divided between the two depths. Metelli’s model makes explicit predictions about the

perceived opacity and lightness of the transparent layer. The equations predict that two

surfaces with identical transmittance should look equally opaque irrespective of their


26

lightness. However, Metelli himself noted that dark filters tend to look more transparent

than light filters with the same transmittance. Why does the visual system confuse

lightness and transmittance in partitioning luminance between two depths?

In a series of matching experiments, Singh and Anderson (in press) recently

resolved this issue. Subjects adjusted the opacity of one filter until it matched the

perceived opacity of another filter with a different lightness. Singh and Anderson found

that perceived transmittance is predicted almost perfectly by the ratio of Michelson

contrasts inside and outside the transparent region, even though such a measure is

actually inconsistent with the optics of transparency. As discussed above, there is a

general consensus that the early visual processing tends to optimise sensitivity to

contrast, rather than absolute luminance. Hence, in assigning transmittance, the visual

system appears to use the readily available contrast measurements, even though they are

not strictly accurate measurements of opacity.

3.2 Figural conditions for scission.

In addition to the luminance conditions, certain geometrical relations must hold between

the various regions of the display in order for depth stratification to occur (Metelli, 1974;

Kanizsa, 1979/1955). These figural conditions fall in two broad classes. The first class

requires good continuation of the underlying layer. Specifically, the contours that are in

plain view should be continuous with the contours viewed through the region of

presumed transparency. As can be seen from figure 9e, infringement of this condition

interrupts the percept of transparency. The second figural condition requires good


27

continuation of the transparent layer. Figure 9f shows that infringement of this condition

weakens or eliminates the percept of transparency.

There are conditions in which the figural cues to transparency are so strong that

they can override the luminance cues. Beck and Ivry (1988) showed subjects displays

like the one shown in figure 10c, in which the region of overlap between the two figures

is the wrong contrast polarity for either figure to be seen as transparent. Despite this,

naïve subjects did occasionally report seeing such figures as transparent, demonstrating

that the sense of figural overlap is a central aspect of the percept of transparency.

Certainly most observers are willing to agree that the region of overlap in Figure 10c

appears to belong to two figures simultaneously, an impression that can be enhanced with

stereo and relative motion. However, it should be noted that the grey of the overlap

region does not appear to scission into two distinct sources, at least not in the same way

as the overlap of a normal transparency display does (as in Figures 10a and 10b). This

leads to the possibility of two distinct neural processes in the perception of transparency.

One is driven by relatively local cues and leads to phenomenal color scission. The other

is driven by more global geometrical relations, and leads to stratification in depth. Under

normal conditions of transparency, the two processes operate concinnously to produce the

full impression of transparency. However, using carefully designed cue-conflict stimuli,

such as those used by Beck and Ivry, these two factors in the representation of transparent

surfaces can be distinguished. An open question, however, is how these processes are

instantiated neurally. All we can conclude is that the representation of depth is much

more sophisticated than a mere 2D map of depth values.


28

3.3 Scission and the perceptual organization of depth.

Scission can have pronounced effects on perceptual organization. For example, Stoner,

Albright and Ramachandran (1990) demonstrated that perceived transparency can alter

the integration of motion signals into coherent moving objects. When a plaid is drifted at

constant velocity across the visual field, it is typically seen as a single coherent pattern

that moves at the velocity of the intersections between the two component gratings.

However, with prolonged viewing the plaid appears to separate into two component

gratings that slide across each other, each of which appears to move in the direction

perpendicular to its orientation. When the plaid is coherent, it appears to occupy a single

depth plane, but when it separates into its components, the gratings tend to appear at

different depths.

Stoner et al. varied the intensity of the intersections of the plaids and measured

the proportion of time for which the plaid was seen as coherent. They found that when

the color of the intersection was consistent with one grating being seen through the other

(i.e. when the junctions are consistent with transparency), the proportion of the time for

which the plaid appeared to separate into gratings was greatly increased. By contrast,

when the color of the intersections infringed the polarity constraint, such that neither

grating could be seen as transparent, the pattern tended to be seen as a coherent plaid,

rather than undergoing scission into distinct layers. This demonstrates that scission has

important consequences for the representation of visual structure. When an image region

scissions, the effects can spread to regions distant from the local cues to scission.

Scission acts as a nexus between depth and other visual attributes. Scission of

depth can cause regions to change in apparent lightness, and conversely changes in


29

luminance can cause changes in depth stratification. Figure 11 (taken from Anderson,

1999) demonstrates this close relationship between luminance, scission and the

perceptual organization of depth. Three circular patches of a random texture were placed

on a uniform background. Critical to the demonstration is that disparity is introduced

between the circular boundaries and the texture inside the circles. When the disparity

places the texture behind the circular boundaries, the circles appear as holes, through

which the texture is visible. The texture tends to appear as a single plane with

continuously, stochastically varying lightness. However, when the disparity places the

texture in front of the circular boundaries, the percept changes considerably. The texture

separates into two distinct layers: a near layer made of clouds with spatially varying

transmittance, and a far layer that is visible through the clouds, which consists of uniform

disks on a uniform background.

Another interesting property of this display is that the lightness and spatial

structure of the clouds and disks reverse completely when the luminance of the surround

varies. In figure 11, the top and bottom displays are completely identical except for the

lightness of the surround. When the surround is dark, the texture scissions into dark,

smoke-like clouds in front of white disks. However, when the surround is white, it is the

light portions of the texture that move forward, floating like mist in front of dark disks.

One final observation about the display is that when the texture carries near disparity, and

thus undergoes scission, the clouds that float in front tends to complete modally across

the gaps in between the disks. This is in part due to the fact that the conditions for

camouflage are satisfied, as discussed in section 2.


30

When the depth is reversed in the display, two asymmetries occur. The first is

geometrical in that it alters the structure of the depths in the scene. In the near case the

texture scissions into two layers, while in the far case the texture appears relatively

uniform in depth by comparison. The second asymmetry that occurs with depth inversion

is photometric in that it is driven by the luminance of the surround and determines the

lightness of the cloud and disks. When the texture is distant, the percept changes very

little with changes to the luminance of the surround; by contrast, when the texture is near,

the luminance of the surround critically determines how the scission occurs as well as the

lightness of the cloud and disks. In what follows, we will use the contrast depth

asymmetry principle (CDAP) discussed in section 1 and the concept of scission to

explain theses asymmetries. For a more thorough discussion see Anderson (submitted).

Let us first consider the case in which the texture carries far disparity relative to

the circular boundaries. Because the texture is continuously varying in luminance, it

carries localizable disparity signals at almost every location. Put another way, if disparity

is carried by contrast, as argued in section 1, then patterns that are richly structured bear

the densest distribution of disparities. Recall that the CDAP requires both sides of every

contrast to be at least as distant as the disparity carried by the contrast. This means that

when the texture is given far disparity (or more precisely, when the contrasts of the

texture are given far disparity), both the light and dark matter in the texture recede to this

depth. In turn, the depth-placement of the texture uniquely determines the border-

ownership of the boundaries of the disks, which carry relatively near disparity. If the

insides of the disks (i.e. the texture) carry far disparity, then the outsides (i.e. the region

surrounding the disks) must be at the depth carried by the circular boundary. Thus, the


31

circles are seen as holes in the surrounding surface; it is through these holes that the

texture is visible.

The situation is more complex when the depth is reversed, i.e. when the contrasts

of the texture are nearer than the contrast of the circular boundaries. Crucial to the

following argument is that it is contrasts that carry disparity, while it is the light and dark

regions that make up the contrasts to which depth is assigned. First let us consider the

circular boundary between the surround and the texture. When the surround is light, it is

the dark portions of the texture (inside the circles) that contrast with the surround. Thus,

the disparity of the circular boundary is carried by the contrast between the light matter of

the surround, and the dark portion of the disk. The CDAP requires both of these regions

to be at least as distant as the disparity carried by the boundary. This means that the light

surround is dragged back to this depth, and the dark matter of the texture is also dragged

back to this depth. Now consider the contrasts between the dark and light portions within

the texture. These contrasts carry relatively near disparity. But the contrast between the

dark matter and the surround has already constrained the dark matter to be at least as

distant as the circular boundary. This means that it must be the light matter of the texture

that is responsible for the near disparity of the texture — i.e. the light matter is a near

surface that partly obscures the dark matter. This explains why the texture splits into two

depths: the dark matter is dragged back by forming a contrast that carries far disparity

(i.e. the boundary of the disk) and the light matter floats in front as its boundaries with

the dark matter carry near disparity.

The final logical step in the explanation involves scission. The texture does not

consist of only two luminances, but of a continuous range of luminances from light to


32

dark. How can we explain the appearance of the intermediate luminances in the texture?

Scission makes it possible to separate the intermediate luminances into two distinct

components: dark “stuff”, and light “stuff”, which have been compressed into a single

luminance by the process of projection onto the retina. These two components lie in

different depth planes. Put another way, scission allows the visual system to interpret the

grey regions as dark matter viewed through light matter. The critical insight is that it is

the dark “stuff” in the texture that forms the contrast with the surround. Therefore, all of

the dark stuff belongs to the more distant depth, including the dark stuff “in” the greys.

All of the “remaining” lightness in the greys belongs to the transparent clouds that float

in front of the disks. In this way, the intermediate luminances are interpreted as varying

degrees of transmittance of the overlying layer. The lighter the grey, the thicker the

cloud; the darker, the sparser. This explains why the disk appears as a uniform black

disk: all of the black is “sucked out” of intermediate regions and is dragged back to form

the disk. The “left-over” lightness is attributed to the transparent clouds.

The whole argument reverses when we change the surround from light to dark.

When the surround is dark, it is the light portions of the texture that contrast with the

surround, and therefore, it is the light portions of the texture that are dragged back. The

near disparity of the texture must therefore be due to the dark regions, and thus dark

clouds are seen to float in front of white disks. Again, as it is the whiteness of the texture

that is dragged back, all of the whiteness in the intermediate luminances is attributed to

the more distant disks. The “remaining” darkness in the greys is attributed to the dark

clouds that float in front. In this way, changing the luminance of the surround changes

which contrasts carry the disparities, and thus which regions are dragged back by virtue


33

of the CDAP. Scission enables the visual system to separate luminances into multiple

contributions and thus segment the intermediate greys into two distinct depth planes.

This demonstration and others like it are important as they show how multiple

processes interface to determine our percepts of depth and material quality. It is through

the CDAP and scission that the visual system interprets local variations in luminance as

meaningful surfaces located in depth. Depth stratification complements traditional

segmentation as an important process through which the visual system organizes its

representation of depth into ecologically valid structures.

Conclusions

It is common to think that depth perception involves little more than determining the

depth at each location in the visual field. We have argued, to the contrary, that the visual

system mirrors the structural organization of the environment by tying its representation

of depth to surfaces and objects. Thus depth perception is an active process of perceptual

organization, as well as a passive process of acquiring depth estimates. We have argued

that luminance, disparity and contrast are some of the basic image features that carry

local information about depth, while scission, visual completion and the CDAP are some

of the means by which depth is organized into surfaces.

In the first section we introduced the CDAP and argued that:

(1) disparity is carried by local contrasts (e.g. luminance edges) but assigned to

the regions that meet to form the contrasts.


34

(2) Occlusion introduces a critical constraint on the interpretation of local

disparity signals, the CDAP. This constraint requires that both sides of a

contrast are at the depth specified by the contrast, or one side could be a more

distant occluded surface. In the latter case, the disparity determines the depth

of the occluding side.

(3) The CDAP imposes a fundamental asymmetry between near and far

structures. When simultaneously applied to all edges in a display, the CDAP

can explain a number of asymmetrical changes in perceived surface layout

that occur with simple inversion of the disparity field.

In the second section, we discussed how the visual system deals with structures

that are invisible because they are hidden by occlusion or camouflaged against their

background. We argued that:

(1) The visual system has to actively complete the missing data if it is to

accurately segment depth into objects.

(2) Consideration of the environmental conditions of occlusion and camouflage

predicts (a) that modal completion is sensitive to luminance, while amodal

completion is not, and (b) that modal completion tends to occur over shorter

distances than amodal completion.

(3) As predicted from the environmental differences, distinct mechanisms are

responsible for the two types of completion. The differences can be used to

generate displays in which the completed forms differ when the disparity

field is inverted.


35

Finally, in the third section, we discussed how scission allows the visual system to

represent two depths along the same line of sight, and thus organize depth into layer. We

argued that:

(1) Certain luminance and figural relations must obtain in order for a region to

undergo scission.

(2) Scission can have pronounced effects on perceptual organization in regions

distant from the local signatures of transparency.


36

References

Adelson, E.H. & Anandan, P. (1990). Ordinal characteristics of transparency. AAAI-90

Workshop on Qualitative Vision, July 29, 1990, Boston, MA.

Adelson, E.H. (1999). Lightness perception and lightness illusions, in The new cognitive

neurosciences, (M. Gazzaniga, Editor-in-chief), Cambridge, MA: MIT Press.

Anderson, B.L. (1997). A theory of illusory lightness and transparency in monocular

and binocular images: The role of contour junctions. Perception, 26: 419-453.

Anderson, B.L. (1999). Stereoscopic surface perception. Neuron, 24: 919-928.

Anderson, B.L. Stereoscopic surface perception: Contrast, disparity and perceived depth.

Submitted to Psychological Review.

Anderson, B.L., Singh, M. & Fleming, R.W. (2002). The Interpolation of Object and

Surface Structure. Cognitive Psychology, 44, 148-190.

Anderson, B.L. & Nakayama, K. (1994). Towards a general theory of stereopsis:

Binocular matching, occluding contours and fusion. Psychological Review, 101:

414-445.

Beck, J. & Ivry, R. (1988). On the role of figural organization in perceptual

transparency. Perception & psychophysics, 44: 585-594.

Bruce, V., Green, P.R. & Georgeson, M.A. (1996). Visual Perception (3rd Edition).

Hove, East Sussex, UK: Psychology Press.

Cornsweet, T.N. (1970). Visual Perception. New York: Academic Press.

DeValois, R.L. & DeValois, K.K. (1988). Spatial Vision. New York: Oxford University

Press.


37

Hartline, H.K. (1940). The Receptive Fields of Optic Nerve Fibres. American Journal

of Physiology, 130: 690-699.

Howard, I.P. & Rogers, B.J. (1995). Binocular vision and stereopsis., New York: Oxford

University Press.

Hubel, D.H. & Wiesel, T.N. (1962). Receptive fields, binocular interaction and

functional architecture of monkey striate cortex. Journal of Physiology, 160: 106-

154.

Jones, J. & Malik, J. (1992). A computational framework for determining stereo

correspondence from a set of linear spatial filters. Image and Vision Computing,

10: 699-708.

Julesz, B. (1960). Binocular depth perception of computer generated patterns. Bell

System Technical Journal, 39: 1125-1162.

Julesz, B. (1971). Foundations of cyclopean perception., Chicago, IL: University of

Chicago Press.

Kellman, P.J. & Shipley, T.F. (1991). A theory of visual interpolation in object

perception. Cognitive Psychology, 23: 141-221.

Kellman, P.J., Yin, C. & Shipley, T.F. (1998). A common mechanism for illusory and

occluded object completion. Journal of Experimental Psychology: Human

Perception & Performance, 24: 859-869.

Koffka, K. (1935). Principles of Gestalt Psychology. Harcourt, Brace and World:

Cleveland.

Kanizsa, G. (1979/1955). Organization in Vision. New York: Praeger.


38

Marr, D. & Poggio, T. (1976). Cooperative computation of stereo disparity. Science,

194: 283-287.

Marr, D. & Poggio, T. (1979). A computational theory of human stereo vision.

Proceedings of the Royal Society of London (B), 204: 301-328.

Metelli, F. (1970). An algebraic development of the theory of perceptual transparency.

Ergonomics, 13: 59-66.

Metelli, F. (1974a). The perception of transparency. Scientific American, 230: 90-98.

Metelli, F. (1974b). Achromatic color conditions in the perception of transparency, in

Perception: Essays in Honor of J.J. Gibson, (R.B. MacLeod, H.L. Pick, eds.).

Ithaca, NY: Cornell University Press.

Metelli, F., da Pos, O. & Cavedon, A. (1985). Balanced and unbalanced, complete and

partial transparency. Perception & psychophysics, 38: 354-366.

Michotte, A., Thines, G. & Crabbe, G. (1991/1964). Amodal completion of perceptual

structures, in Michotte’s experimental phenomenology of perception., (G. Thines,

A. Costall, & G. Butterworth, eds.), Hillsdale, NJ: Erlbaum, pp. 140-167.

Nakayama, K., Shimojo, S. & Silverman, G.H. (1989). Stereoscopic depth. Its relation

to image segmentation, grouping, and the recognition of occluded objects.

Perception, 18: 55-68.

Palmer, S.E. (1999). Vision Science. Cambridge, MA: MIT Press.

Petter, G. (1956). Nuove ricerche sperimentali sulla totalizzazione percettiva. Rivista di

Psicologia, 50: 213-227.

Pollard, S.B., Mayhew, J.E.W. & Frisby, J.P. (1985). A stereo correspondence algorithm

using a disparity gradient limit. Perception, 14: 449-470.


39

Prazdny, K. (1985). Detection of binocular disparities. Biological Cybernetics, 52: 93-

99.

Ratliff, F. (1965). Mach Bands: Quantitative studies on neural networks in the retina.

San Francisco, CA: Holden-Day.

Ringach, D.L. & Shapley, R. (1996). Spatial and temporal properties of illusory contours

and amodal boundary completion. Vision Research, 36: 3037-3050.

Singh, M. & Anderson, B.L. (in press). Toward a perceptual theory of transparency. To

appear in Psychological Review.

Sugita, Y. (1999). Grouping of image fragments in primary visual cortex. Nature, 401:

269-272.

Shipley, T.F. & Kellman, P.J. (1992). Perception of partly occluded objects and illusory

figures: Evidence for an identity hypothesis. Journal of Experimental

Psychology: Human Perception and Performance, 18: 106-120.

Smallman, H.S. & McKee, S.P. (1995). A contrast ratio constraint on stereo matching.

Proceedings of the Royal Society of London (B), 260: 265-271.

Sperling, G. (1970). Binocular vision: A physiological and neural theory. American

Journal of Psychology, 83: 461-534.

Stoner, G.R., Albright, T.D. & Ramachandran, V.S. (1990). Transparency and coherence

in human motion perception. Nature, 344: 153-155.

Takeichi, H., Watanabe, T. & Shimojo, S. (1992). Illusory occluding contours and

surface formation by depth propagation. Perception, 21: 177-184.

Wallach, H. (1948). Brightness constancy and the nature of achromatic colors. Journal

of Experimental Psychology, 38: 310-324.


40

Figure Captions.

Figure 1. (a) The two eyes converge by angle α on a point P. Therefore, by definition,

P projects to the foveae of both eyes (P’). The Vieth-Müller circle is one of the

geometrical horopters, that is, it traces a locus of points in space that project to the

equivalent retinal locations in the two eyes, and thus carry no interocular

disparity. Point Q is closer to the observer than P (as it falls inside the horopter).

Therefore, it projects to different locations on the two retinae (Q’). The

difference in the locations of Q’ is the binocular disparity, which can be scaled by

the vergence angle, α, to derive depth. (b) When the visual field contains many

points, there is a potential ambiguity as to which image features correspond in the

two eyes. Correct matches yield correct depth estimates, such as dA. (c) By

contrast, false matches yield erroneous depth estimates. Here, the image of point

A has been incorrectly matched with the image of point B, leading to an incorrect

depth estimate, d*.

Figure 2. (a) The image of a square occluding a diamond. A receptive field of limited

extent (the ellipse) captures only local information about the scene, here a vertical

luminance edge. This local information is ambiguous as many different scenes

could have resulted in the same image feature. (b) If disparity is calculated by

matching local contrasts, then the edge carries only a single disparity. However,

in this case, the light and dark sides of the edge result from two distinct objects

and therefore different depths have to be assigned to the two sides of the edge.


41

Figure 3. Asymmetries in depth interpolation, adapted from Takeichi et al. (1990). (a)

When the left stereopair is cross-fused, the diamonds appear to float

independently in front of the Kanizsa triangle, as schematised in (b). When the

disparity of the diamonds is inverted (by cross-fusing the right stereopair), the

diamonds drag their background with them, creating the percept of a triangular

hole, even though only the disparity of the diamonds has changed. This

asymmetrical change in surface structure can be explained by the contrast depth

asymmetry principle (see main text).

Figure 4. Adapted from Anderson et al. (2002). A contour which carries a depth signal

(e.g. disparity) is inherently ambiguous. Two main classes of world states could

have given rise to the contour: the contour could have originated from a single

continuous surface (e.g. a reflectance edge or cast shadow), or it could have

originated from an occlusion event. In the occlusion case, the border ownership

of the contour (i.e. which side is the occluder) is ambiguous. Nonetheless, in all

configurations, both sides of the contour are constrained to be at least as far as the

depth signal carried by the contour. This introduces a fundamental asymmetry in

the role of near and far contours in determining surface structure (see text for

details).

Figure 5. (a) Modal completion. Most observers report seeing a vivid white triangle in

front of three disks and a black triangular outline. The contours of the white


42

triangle are subjectively distinct, resembling real contours, even though there is

no corresponding image contrast, and hence the triangle is “illusory”. (b)

Amodal completion. Most observers report seeing a single continuous black

shape, part of which is hidden from view by the grey occluder, even though the

parts that are hidden from view are, by definition, invisible. (c) A self-splitting

object (SSO). Even though the shape is uniform black, it tends to be seen as two

forms, one in front of the other. Which form tends to complete modally, and

which amodally, depends in part on the distance that must be spanned by the

completion (Petter’s law).

Figure 6. Adapted from Anderson et al. (2002). Demonstration of dependence of modal

completion on surround luminance. When the left stereopairs of (a), (b), and (c)

are cross-fused, the stripes tend to amodally complete between the gaps between

the circular hole, creating the impression of a single striped surface (like

wallpaper) viewed through two apertures, as depicted in (d). This occurs

irrespective of the luminance of the surround. However, when the right

stereopairs are cross-fused, thus inverting the disparity, only two stripes appear to

complete modally, and which stripes complete depends critically on the surround

luminance, as depicted in (e). When the surround is dark, as in (a), the dark

stripes complete modally; When the surround is light, as in (b), the light stripes

complete modally; and when the surround is intermediate, no completion is

visible. This demonstrates that modal completion is luminance dependent, while

amodal is not.


43

Figure 7. Adapted from Anderson et al. (2002). (a) Relative depth alters perceptual

organization. When the left stereopair is cross-fused, the figure tends to appear as

five disks occluded by five distinct image fragments, as depicted in (b); the

transparency in (b) is included only so that both depth planes can be depicted

simultaneously. When the depth ordering is reversed by cross-fusing the right

stereopair, a single irregular black “star” appears to lie on a continuous white

background, which is visible through five holes in a continuous overlying layer.

In this depth ordering the black shape tends to appear as figure.

Figure 8. The serrated-edge illusion, adapted from Anderson et al. (2002). When the left

stereopair in (a) is uncross-fused, the resulting percept consists of six circular

disks that are partly occluded by a jagged white surface on the right, as depicted

in (b). When the right stereopair is uncross-fused, the modal completion of these

four black blobs tends to take the form of a single wavy contour that runs

vertically down the center of the display, as depicted in (c). Although other

percepts are possible, this is an existence proof that depth inversion alone can

alter the shape of modally and amodally completed contours.

Figure 9. Perceptual transparency. The figure in (a) tends to be seen as a light grey

transparent surface in front of a bipartite background, as depicted in (b), and thus

two distinct surfaces are visible along the same line of sight. Transparency is

only seen when certain relations hold between the various regions of the display.


44

In (c) the central region is higher contrast than its surround and thus is not seen as

transparent. In (d), the polarity of the contrasts is reversed, and again

transparency is not seen. In (e), the contour of the underlying layer is not

continuous inside and outside the central region, eliminating the percept of

transparency. In (f), the contour of the overlying layer is not continuous, which

also reduces the percept of transparency.

Figure 10. Adapted from Beck and Ivry (1988). The polarity constraint means that

transparency manifests itself in distinctive local ordinal relations in luminance.

The only difference between the three figures is the luminance of the region of

overlap. In (a), the region is dark, and the image is bistable as either square can

be seen in front. When this occurs, a line that progressively passes from brighter

to darker regions creates a Z-shape. In (b), the overlap is intermediate, such that

the line that joins regions of decreasing brightness is C-shaped. When this

happens, exactly one of the surfaces appears transparent. In (c), the overlap is

light, creating a criss-cross pattern. In this case, neither square appears

transparent as the polarity constraint is infringed for both squares.

Figure 11. Scission and the perceptual organization of depth; adapted from Anderson

(1999). The top and bottom figures are identical apart from the brightness of the

surround. When the right stereopair is cross-fused, the figure appears as a single

textured plane that is visible through three circular holes. This is seen irrespective

of the luminance of the surround. However, when the disparity is reversed (by


45

cross-fusing the left stereopair), the texture appears to separate into two depth

planes. The near layer contains near clouds that vary spatially in thickness or

opacity. Through these clouds can be seen three more distant disks, which appear

more-or-less uniform in lightness. With this depth ordering, the structure

completely reverses with a change in the luminance. In the top case, the dark

portions of the texture form the clouds; in the bottom case, the light portions of

the texture form the clouds. Scission makes these percepts possible by allowing

the visual system to separate the intermediate greys into two distinct

contributions.


46

Figure 1.

P

Q

Vieth-M ller Circle

P’P’Q’

Q’

(a)

(b) (c)

A AB B

A’ A’ A’B’

!

dA

d*


47

Figure 2.

image

world

(a)

(b)

? ?


48

Figure 3.

(a)

(b) (c)


49

Figure 4.

Continuous Surfaces

Possible depth interpretations

matching,disparity

computation

Local Image Data

Occluding Surfaces


50

Figure 5.

(a)

(b) (c)


51

Figure 6.

(a)

(b)

(c)

(d)

(e)


52

Figure 7.

(a)

(b) (c)


53

Figure 8.

(b) Serrated edge near (c) Serrated edge far

(a)


54

Figure 9.

(a)

(e) (f)

(c) (d)

(b)


55

Figure 10.

(a) (b) (c)


56

Figure 11.

Fleming Anderson Preprint

Documents

Transcript of Fleming Anderson Preprint