Self-Organizing Neural Networks for Perception of...

30
Neural Networks. Vol. 3, pp. 45-74, 1990 0893-6080/90 $3.00 + ,00 Printed in the USA. All rights reserved. Copyright ~, 1990 Pcrgamon Press plc ORIGINAL CONTRIBUTION Self-Organizing Neural Networks for Perception of Visual Motion JONATHAN A. MARSHALL Boston University (Received 6 January 1989; revised and accepted 16 June 1989) Abstract--The human visual system overcomes ambiguities, collectively known as the aperture problem, in its local measurements of the direction in which visual objects are moving, producing unambiguous percepts of motion. A new approach to the aperture problem is presented, using an adaptive neural network model. The neural network is exposed to moving images during a developmental period and develops its own structure by adapting to statistical characteristics of its visual input history. Competitive learning rules ensure that only connection "chains" between cells of similar direction and velocity sensitivity along successive spatial positions survive. The resultant self-organized configuration implements the type of disambiguation necessary for solving the aperture problem and operates in accord with direction judgments of human experimental subjects. The system not only accommodates its structure to long-term statistics of visual motion, but also simultaneously uses its acquired structure to assimilate, disambiguate, and represent visual motion events in real-time. Keywords---Self-organization, Motion perception, Hypercomplexcells, Cooperative-competitive learning, Neural networks, Aperture problem, Intrinsic connections, Visual tracking. 1. INTRODUCTION Subjectively, the process of seeing is effortless; our visual systems perform complex visual tasks, such as motion detection and depth perception, with striking ease. Yet even the simplest, most commonplace vis- ual tasks, such as distinguishing light from dark, mar- shal an enormous array of neural processing mech- anisms. Not surprisingly, the visual areas of the brain appear to have a highly complex internal organiza- tion in order to accomplish their diverse set of pro- cessing tasks. Despite its complexity, the structure of visual cor- Acknowledgements--Based on the author's Ph.D. disserta- tion submitted to Boston University. Supported in part by Boston University (University Graduate Fellowship award) and by grants to Dr. Stephen Grossberg and the Boston University Center tk~r Adaptive Systems from the Air Force Office of Scientific Research (AFOSR 85-0149, F49620-86-C-0037, and F49620-87-C-0018), the Army Research Office (ARO DAAG-29-85-K-0095), and the Na- tional Science Foundation (NSF IRI-84-17756). The author thanks Stephen Grossberg for his support, instruction, and useful criti- cisms. The author also gratefully acknowledges the help of Ennio Mingolla in every aspect of this research. Requests for reprints should be sent to Jonathan A. Marshall, Center for Research in Learning, Perception, and Cognition, 205 Elliott Hall, University of Minnesota, Minneapolis, MN 55455. tex in higher animals is not formed strictly by genetic means. Rather, it is determined by genetics plus ad- aptation to visual experience during a developmental period. In animals such as cats, monkeys, and human beings, a rudimentary neural interconnection struc- ture is genetically set up in visual cortex before birth. After birth, the animal is exposed to the visual world. Its cortical interconnection structure becomes tuned, adaptively gaining sensitivity to the kinds of visual input that are likely to occur in the animal's envi- ronment, and losing sensitivity to unlikely visual events. Such developmental tuning has been dem- onstrated experimentally in regard to several aspects of visual processing, including sensitivity to orien- tation (Braastad & Heggelund, 1985; Fr6gnac & Im- bert, 1978, 1984; Gary-Bobo, Milleret, & Buisseret, 1986; Hirsch & Spinelli, 1970; Hubel & Wiesel, 1970; Wiesel & Hubel, 1965), spatial frequency (Derring- ton, 1984), binocularity (Fr6gnac & Imbert, 1978; Trotter, Fr6gnac, & Buisseret, 1987), depth (Graves, Trotter, & Fr6gnac, 1987), and motion (Cremieux, Orban, Duysens, & Amblard, 1987; Kennedy & Or- ban, 1983; Pasternak & Leinen, 1986). This paper explores how a visual system can adapt to aspects of its visual environment, and in so doing, can produce useful visual processing structures with- out human intervention. A general-purpose class of adaptive mechanisms is proposed and applied to a

Transcript of Self-Organizing Neural Networks for Perception of...

Page 1: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

Neural Networks. Vol. 3, pp. 45-74, 1990 0893-6080/90 $3.00 + ,00 Printed in the USA. All rights reserved. Copyright ~, 1990 Pcrgamon Press plc

ORIGINAL CONTRIBUTION

Self-Organizing Neural Networks for Perception of Visual Motion

JONATHAN A. MARSHALL

Boston University

(Received 6 January 1989; revised and accepted 16 June 1989)

Abstract--The human visual system overcomes ambiguities, collectively known as the aperture problem, in its local measurements of the direction in which visual objects are moving, producing unambiguous percepts of motion. A new approach to the aperture problem is presented, using an adaptive neural network model. The neural network is exposed to moving images during a developmental period and develops its own structure by adapting to statistical characteristics of its visual input history. Competitive learning rules ensure that only connection "chains" between cells of similar direction and velocity sensitivity along successive spatial positions survive. The resultant self-organized configuration implements the type of disambiguation necessary for solving the aperture problem and operates in accord with direction judgments of human experimental subjects. The system not only accommodates its structure to long-term statistics of visual motion, but also simultaneously uses its acquired structure to assimilate, disambiguate, and represent visual motion events in real-time.

Keywords---Self-organization, Motion perception, Hypercomplex cells, Cooperative-competitive learning, Neural networks, Aperture problem, Intrinsic connections, Visual tracking.

1. I N T R O D U C T I O N

Subjectively, the process of seeing is effortless; our visual systems perform complex visual tasks, such as motion detection and depth perception, with striking ease. Yet even the simplest, most commonplace vis- ual tasks, such as distinguishing light from dark, mar- shal an enormous array of neural processing mech- anisms. Not surprisingly, the visual areas of the brain appear to have a highly complex internal organiza- tion in order to accomplish their diverse set of pro- cessing tasks.

Despite its complexity, the structure of visual cor-

Acknowledgements--Based on the author's Ph.D. disserta- tion submitted to Boston University. Supported in part by Boston University (University Graduate Fellowship award) and by grants to Dr. Stephen Grossberg and the Boston University Center tk~r Adaptive Systems from the Air Force Office of Scientific Research (AFOSR 85-0149, F49620-86-C-0037, and F49620-87-C-0018), the Army Research Office (ARO DAAG-29-85-K-0095), and the Na- tional Science Foundation (NSF IRI-84-17756). The author thanks Stephen Grossberg for his support, instruction, and useful criti- cisms. The author also gratefully acknowledges the help of Ennio Mingolla in every aspect of this research.

Requests for reprints should be sent to Jonathan A. Marshall, Center for Research in Learning, Perception, and Cognition, 205 Elliott Hall, University of Minnesota, Minneapolis, MN 55455.

tex in higher animals is not formed strictly by genetic means. Rather, it is determined by genetics plus ad- aptation to visual experience during a developmental period. In animals such as cats, monkeys, and human beings, a rudimentary neural interconnection struc- ture is genetically set up in visual cortex before birth. After birth, the animal is exposed to the visual world. Its cortical interconnection structure becomes tuned, adaptively gaining sensitivity to the kinds of visual input that are likely to occur in the animal's envi- ronment, and losing sensitivity to unlikely visual events. Such developmental tuning has been dem- onstrated experimentally in regard to several aspects of visual processing, including sensitivity to orien- tation (Braastad & Heggelund, 1985; Fr6gnac & Im- bert, 1978, 1984; Gary-Bobo, Milleret, & Buisseret, 1986; Hirsch & Spinelli, 1970; Hubel & Wiesel, 1970; Wiesel & Hubel, 1965), spatial frequency (Derring- ton, 1984), binocularity (Fr6gnac & Imbert, 1978; Trotter, Fr6gnac, & Buisseret, 1987), depth (Graves, Trotter, & Fr6gnac, 1987), and motion (Cremieux, Orban, Duysens, & Amblard, 1987; Kennedy & Or- ban, 1983; Pasternak & Leinen, 1986).

This paper explores how a visual system can adapt to aspects of its visual environment, and in so doing, can produce useful visual processing structures with- out human intervention. A general-purpose class of adaptive mechanisms is proposed and applied to a

Page 2: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

46 J, A. Marshall

fundamental visual processing problem: the aperture problem in motion detection. The results suggest not only novel avenues of research in visual motion pro- cessing, but also a general means by which a per- ceptual system can adaptively use its input history to construct representation mechanisms for new input.

Several papers have explored adaptive approaches to understanding important issues in early visual pro- cessing, such as how cells can become sensitive to spatial contrasts and contrast orientations, and how orientation "columns" can form in visual cortex (Ahumada & Yellott, 1988; Amari, 1977; Bienen- stock, Cooper, & Munro, 1982; Fukushima & Mi- yake, 1982; Grossberg, 1976b; Linsker, 1986a, 1986b, 1986c; Nagano & Kurata, 198t; O'Toole & Kersten, 1986; Pearson, Finkel, & Edelman, 1987; Poggio & Hurlbert, 1988; Singer, 1983, 1985a, 1985b; Takeuchi & Amari, 1979; vonde r Malsburg, 1973; vonde r Malsburg & Cowan, 1982). This paper builds on the earlier results, but addresses issues at a somewhat higher processing level: perception of visual motion.

2. THE APERTURE PROBLEM

Direction Ambiguity

What the activity of a simple cell (Hubel & Wiesel, 1962, 1963, 1968) represents is inherently ambigu- ous. Grossberg and Mingolla (1985a, 1985b) showed that the spatial extent of a simple cell's receptive field induces uncertainty about both the position and ori- entation of a visual stimulus. A simple cell's activity is ambiguous in an additional sense, however, be- cause it can indicate the presence of a visual contrast edge moving in any of several directions, instead of just one direction. Because each cell's receptive field is sensitive to visual stimuli only within a spatially local region, or "aperture," no single simple cell can determine the actual direction of motion of an edge (Figure la). The aperture problem (Hildreth, 1983; Marr, 1982; Marr & Ullman, 1981) arises as a con- sequence of this ambiguity: since the local direction of motion signaled by the activity of such cells is ambiguous, how can the actual direction of motion of a contrast edge be computed and represented un- ambiguously? The task associated with the aperture problem is to specify how such ambiguous local ac- tivations can be combined coherently to form un- ambiguous global motion percepts.

The ambiguity in the aperture problem can be traced back to two degrees of freedom in the motion measurements performed by simple cells. First, a simple cell can only partly localize the position of an edge segment. An edge segment can activate many simple cells whose receptive fields are positioned along the edge (Figure la). But none of the simple cells

(a)

(b)

individually can indicate where the segment lies along the cell's preferred orientation axis relative to the cell's receptive field position (Figure lb). Second, a simple cell is sensitive only to one local component of motion direction and hence to many possible ac- tual motion directions consistent with its local motion preference. It cannot indicate which actual direction activates it (Figure la).

In many analyses of the aperture problem (e.g., Adelson & Movshon, 1982; Ferrera & Wilson, 1988; Heeger, 1987, 1988; Movshon, Adelson, Gizzi, & Newsome, 1985; Sereno, 1986, 1987; Tanner, 1986; Welch, 1988), the two degrees of freedom are treated simultaneously. In a hierarchy o f cortical visual pro- cessing stages (Van Essen & Maunsetl, 1983),a stage of simple cells is considered to supply input to a stage of "pattern-motion" cells, similar to ones found in area MT of macaque monkey cortex (Movshon et al., 1985). A pattern-motion cell combines motion information from multiple local sources and becomes

Page 3: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

Self-Organization in Motion Perception 47

activated when globally coherent motion in a pre- scribed direction is present in the cell's receptive field. The motion sensitivity of higher-level cortical cells, such as those in areas V3, MT, and STS, is the subject of intensive research (Albright, 1984; Allman, Mie- zin, & McGuinness, 1985; Felleman & Van Essen, 1987; Maunsell & Van Essen, 1983; Mikami, New- some, & Wurtz, 1986a, 1986b; Newsome, Mikami, & Wurtz, 1986; Newsome & Par6, 1988; Rodman & Albright, 1987; Saito et al., 1986). The neural mech- anism by which the local sources, such as simple cells, contribute to the activity of pattern-motion cells in MT has not yet been precisely identified, however.

This paper will treat the two degrees of freedom in the aperture problem separately, resulting in a new kind of solution. One degree of freedom is elimi- nated by first localizing visual features, such as con- trast edges. The second degree of freedom can then also be eliminated quite easily using a simple tracking network.

Effect of Window Shape on Perceived Direction

Wallach (1935, 1976) investigated certain phenom- ena related to the aperture problem. He showed that when human subjects view moving objects which ap- pear to be occluded except within a "window" re- gion, the shape of the window affects the perceived direction of motion of objects viewed through the window. For example, he showed experimental sub- jects a diagonal line moving behind a horizontal rec- tangular window (Figure 2a). His subjects tended to report that the visible segment appeared to move diagonally when it was in the left or right corner of the rectangle, but horizontally when it was in the middle portion of the rectangle (Figure 2b). When the segment is near the corners of the rectangle, its endpoints appear to move in perpendicular direc- tions, and the segment's length changes (Figure 2c). But when it travels along the middle section of the rectangle, its endpoints both move horizontally, and the segment's length is constant. The direction per- cepts are the same even if the borders of the window are invisible (Figure 2d).

How do our visual systems determine the apparent direction of motion of the segment? A variety of factors, including depth (Shimojo, Silverman, & Na- kayama, 1988), presence of additional segments (Wallach, 1935), length, endpoint motion, and po- sition, may influence the segment's apparent actual direction of motion. It is important to note that the segment's apparent actual direction of motion cannot be determined by examining the motion of either of its endpoints separately; information from both of the bar's endpoints must be integrated nonlocally in order to construct the percept of motion. Thus, our visual systems are capable of integrating visual in-

i ! i ̧ ~ . . . . . . i ~ i . . . . . .

(a)

(b)

(c)

t=l t=2 t=3 t=4 t=5 1=6 t=7 t=8 1=9

(d)

FIGURE 2. (a) Subject views a diagonal stripe moving behind a horizontally elongated rectangular window. (b) Stripe seg- ment appears to move diagonally near the rectangle's cor- ners but horizontally along its length. (c) Stripe's position, length, and endpoints may influence its perceived direction of motion. Arrows indicate motion of endpoints. (d) Same direction percepts arise when window borders are invisible.

formation from widely separated regions of an image (Nelson & Frost, 1985; Norman, Lappin, & Wason, 1988). From that fact, it is only a short step to the conclusion that our visual systems contain processing units sensitive to widely separated regions of the im- age. Wallach's (1935) study thus raises a key question for visual perception: how is information combined over long spatial ranges to produce a unitary motion percept for each visual object?

In order to address unsolved aspects of the ap- erture problem, the present work expands the for- mulation of the problem proposed by Marr and Ull- man (1981). This paper shows how a visual system can use spatially nonlocal contextual cues (Marshall, 1988a) to construct a representation of the motion of a contour as a whole, not just at confined regions where motion is determined by intersecting local con- straints.

The definition of the aperture problem proposed

Page 4: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

48 L A. Ma;~shalI

by Marr and UUman (1981) applies only to systems that use local contour detectors where "the motion is detected by a unit that is small compared with the overall contour" (p. 154). However, their definition may be overly restrictive as a characterization of the operations that human motion perception systems must perform, because Wallach's (1935) study of win- dow shape showed that the motion of a single bar can produce strong directional percepts, even where no small detector units can signal the overall direc- tion of motion.

In recognition of the fact that information from the endpoints of a moving contour must be com- bined, an analysis of the aperture problem must per- mit the effective sizes of the receptive fields of some contour-detection units to be large, comparable to the image sizes of the contours whose direction of motion is to be determined. This choice will allow information about the motion of distinguishing points, such as endpoints, to be integrated over the long spatial ranges necessary for motion perception (Bur- beck, 1985, 1986, 1987; Norman et al., 1988), pro- ducing a unitary representation of each contour's ve- locity and direction of motion. The statement of the aperture problem remains the same: how does the system obtain global representations of actual direc- tion of motion, given locally ambiguous motion mea- surements?

The present analysis emphasizes these fundamen- tal questions raised by Wallach (1935) regarding the endpoints of contours. In addition, this paper will specify how certain aspects of a visual system's mo- tion processing mechanisms can adaptively self-or- ganize, that is, form without guidance from an ex- ternal teacher.

3. A MOTION PROCESSING NETWORK

This section describes the architecture of a self-or- ganizing neural network that computes actual direc- tion of motion. The network generates representa- tions of motion analogous to the percepts arising from Wallach's (1935) displays. The network's struc- ture and operation after self-organization has oc- curred will be described first, in order to motivate the analysis of how the network acquired its special structure. The network's self-organization will be de- scribed later in this paper.

The network contains two layers of cells: an input layer L1 and a processing layer L2. For the example at hand, L1 consists of idealized cells analogous to the hypercomplex cells (Hubel & Wiesel, 1965, 1968, 1977) found in visual cortex, that is, cells sensitive to the orientation, length, and local direction of mo- tion of visual stimuli in their receptive field. Hyper- complex cells are similar to simple cells; however, hypercomplex cells also possess inhibitory end-zones

flanking the central excitatory regions of their re- ceptive fields (Hubel & Wiesel, 1965, 1977; Kato, Bishop, & Orban, 1978; Orban, Kato, & Bishop, 1979a, 1979b). A hypercomplex cell's activity is in- hibited if an edge extends into onc or both of its inhibitory end-zones. Both simple and hypercomplex cells fire with increasing strength as a function of input edge segment length--up to the length of the excitatory region of the cell's receptive field. For longer stimuli, simple cells continue to fire at their maximum rate, whereas hypercomplex cells respond more weakly.

A hypercomplex-type LI cell thus can detect a contrast segment of its preferred orientation and length moving through its receptive field in any of several possible direction/speed combinations. How- ever, it cannot distinguish in which one of the direc- tion/speed combinations the segment is actually moving. The network takes the output of the L1 cells. which are already sensitive to locally measured ("short- range") direction of motion (Adelson & Bergen, 1985; Anstis, 1977; Braddick, 1974, 1980; Harris, 1986; Reichardt, 1961; Sperling, van Santen, & Burr, 1985; van Santen & Sperling, 1984), and---in a manner to be shown--creates a representation in L2 that indi- cates actual velocity.

As an oriented edge segment moves across the retina, the responses of hypercomplex cells whose stimulus preferences do not match the properties of the edge segment (e.g., wrong position or wrong length) are attenuated due to their tuning. The only hypercomplex cells that respond optimally are those whose receptive field position, preferred orientation, length, and speed most closely match that of the edge segment; the activity of this limited subset of the hypercomplex cells can be used as the representation of the edge segment. Edge segments can be effec- tively localized by finding the activity peaks across the hypercomplex layer. (Simple cells cannot localize visual features in this manner.) The activity of hy- percomplex cells can thus constitute a measure of a contrast edge segment's position, orientation, local direction component of motion, and length. A curved edge would activate hypercomplex-type cells of a va- riety of orientations, approximating its curvature to the extent permitted by the scale and length-sensi- tivity of the cells (Dobbins, Zucker. & Cynader, 1987, 1988; Hubel & Wiesel, 1977).

The correctness of the present arguments does not depend specifically on the use of hypercomplex cells. Rather, what is important is the existence of classes of cells that localize different visual features. Hy- percomplex cells are useful because they can localize edge segments.

As this paper will show, the problem of deter- mining the direction of motion for a localized edge segment is much easier than that for an edge whose

Page 5: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

Se!f-Organization in Motion Perception 49

length is locally unknown. In fact, a simple tracking mechanism can easily determine the direction of mo- tion of any localized feature. For example, if a class of cells responds preferentially to the bright dia- mond-shaped intersections of a "plaid" pair of grat- ings (Adelson & Movshon, 1982), then the direction of motion of the diamonds can be tracked. The tracked direction is equivalent to the direction computed from the "intersection of constraints" proposed by Adel- son & Movshon (1982)--however, tracking does not require any explicit computation of constraint inter- sections.

Psychophysical evidence indicates that human vis- ual systems are indeed very good at localizing the relative positions of visual features (Burbeck, 1985, 1986, 1987). It might be reasonable to suggest that hypercomplex-type cells serve to localize edge seg- ments in the visual motion detection systems of an- imals, but only if their visual systems contain such cells of a sufficiently wide range of preferred lengths. Current physiological evidence does not support the existence in cortical area 17 of hypercomplex cells with sufficiently large receptive fields to localize very long edge segments. However, progressively larger receptive field sizes in general are observed at higher stages in visual cortex (Van Essen & Maunsell, 1983). For example, the preferred lengths for hypercomplex cells typically range between 0.3 and 3 degrees of visual angle in area 17 of cat visual cortex (Kato et al., 1978; Orban et al., 1979a, 1979b), but in cat area 19, hypercomplex cells respond optimally to edge segments between 2 and 8 degrees in length (Saito, Tanaka, Fukada, & Oyamada, 1988). It thus may be possible that the hypercomplex property of length- sensitivity will be observed in cells with even larger receptive fields when more detailed probes of higher stages of visual cortex are reported. The use of hy- percomplex-type cells in this paper is a convenient device to show how localization of an edge segment allows the segment's actual direction of motion to be computed.

These hypercomplex-type L~ cells project excita- tory connections forward, in an orderly fashion, to cells in L~. To fix ideas, only a subset of the cells in L~ will be shown: those preferring a particular ori- entation, length, and local direction component of motion (Figure 3). The kind of structure described in this single case is reproduced correspondingly for all other combinations of orientation, length, and local directions to which the system is sensitive. Fig- ure 3 depicts the structure of the feedforward (or bottom-up) excitatory connections from cells in L~ to cells in L 2. Each L1 cell projects to a cluster of retinotopicaily neighboring L 2 cells. For simplicity, the figure shows three cells in each such cluster, but increasing the number would serve only to increase the resolution of the system. Each of the three cells

% % % %% % % %

% % %

+

/ 11 FIGURE 3. Subnetwork of cells preferring a particular com- bination of orientation, local direction component of motion, and length. Each L1 cell projects excitatory connections to a cluster of nearby cells in the corresponding L2 position.

in an L2 cluster receives bottom-up connections of equal strength from a single L~ cell. Hence, all the cells in a cluster inherit identical bottom-up receptive field properties: those of the L 1 cell. All of the L 2

cells in a cluster are sensitive to a particular orien- tation, length, and local direction component of mo- tion of visual stimuli traversing their receptive field.

Although the L 2 cells within a cluster all receive the same bottom-up input, their lateral connectivi- ties--to and from other L2 cells--differ. Figure 4a shows a close-up of the lateral input and output struc- ture of a cluster. Each L 2 cell in a cluster receives a strong excitatory input connection from an L2 cell displaced spatially in one direction and sends a strong excitatory output connection to another L2 cell dis- placed in the opposite direction. Furthermore, each cell within a cluster connects to others along a dif- ferent axis in the L2 plane. In the figure, one cell receives and sends lateral connections along a hori- zontal axis, another along a diagonal axis, and an- other along a vertical axis. The directions of the lat- eral connections along each of the axes are consistent with the possible actual directions in which an edge segment activating the cluster's L~ input could be moving. In this manner, each L 2 cell constitutes a link in a chain of lateral connections along successive positions in a direction in which an edge segment might travel. Figure 5 depicts the embedding of an L 2 cluster in its three lateral connection chains.

In addition, each lateral connection possesses a signal transmission latency (Adelson & Bergen, 1985; Barlow & Levick, 1965; Fleet & Jepson, 1985; Rei- chardt, 1961; Sperling et al., 1985; van Santen & Sperling, 1984; Waibel, Hanazawa, Hinton, Shi-

Page 6: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

5O

4-

(a)

(b)

FIGURE 4. (a) Each L2 cell receives Input excitatory oonnsc- tlon8 from other L2 cells in ixxdttons displaced In one direc- t ion-and sends output exc,~r~Ty c o n ~ to L2 ~ l ls displaced in the opposite dim=tton. Each L2 cell withi, a cluster ¢onnscts to others dlsp~::d ~ m . g a dmer- ent axis in the L2 plane. (b) Ceils ~ a cluster project inhibitory connections to one another, if one cell is active, then the others in Its cluster are less likely to be active.

//t. ~4ar~'haii

kano, & Lang, 1987). That is, a signal emitted by one cell does not reach its lateral destination cell until a prescribed time later. The timing of the lateral transmission latencies figures prominently in the op- eration of the network. In vivo, all synaptic connec- tions possess signal transmission latencies. However, for convenience in computational simulation, the bottom-up excitatory connections transmit their sig- nals instantaneously, and delays are simulated only in the lateral excitatory connections

The final element in the structure of L2 is the set of lateral inhibitory connections between cells in each cluster (Figure 4b). Inhibitory connections project reciprocally between all Lz cells in a cluster. For reasons of computational ease and numerical stabil- ity, the lateral inhibitory connections are assumed to have negligible signal transmission tatencies.

The exposition to this point has sketched the static structure of the network and the operation of indi- vidual cells. Details are in the Appendix, and further exposition is supplied by Marshall (1988b, 1989). How the network operates dynamically, in response to visual input, will be examined next. The network shown here is an example of a more general class of networks for visual processing. It can be thought of as a stage within a hierarchy of visual processing networks; this stage is suited for certain aspects of visual motion processing.

How the Network Resolves Uncertainty

The network described above resolves uncertainty by tracking (Anstis & Ramachandran, 1987; Burr & Ross, 1986; Ramachandran & Anstis, 1983; Sethi & Jain, 1987; Thompson & Pong, in press; Waxman & Duncan, 1986) visual features as they move across the visual field. It operates in a two-phase fashion, as shown in Figure 6. Consider a diagonally oriented contrast segment moving horizontally to the right. It excites L ~ cells with the corresponding preferred:ori- entation, length, and local direction, at successive positions along its path of travel.

FIGURE 5. Lateral chains of ~ conm~ions. Each cell in a cluster participates in a different chain. The cha'ms for one ©luster ( e ~ ) are ~ ~ ~ arrows.

Page 7: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

Self-Organization in Motion Perceotion

% % %

J

(b)

L1

FIGURE 6. (a) Horizontally moving stripe activates L, cell a, which excites cells b, c, d in L2 cluster. Moderately active L2 cells (shaded) emit lateral excitatory signals, which do not arrive at their destinations yet because of transmission la- tencies. (b) A short while later, stripe has moved farther to the right and activates L~ cell e, which excites L2 cluster f, g, h. Lateral signals now reach their destinations: cells f, i, j. One cell receives both bottom-up plus lateral excitation, becomes strongly active (solid), and suppresses its neigh- bors' activities via lateral inhibition. Cells that receive only lateral excitation (i, j) are too weakly activated to propagate their lateral signals. Only one cell per cluster propagates its lateral signals in this phase.

In Phase 1 (Figure 6a), the segment activates cell a in LI, which--like all L1 cells--measures only the local component of motion perpendicular to the ori- entation of its receptive field. Cell a transmits bot- tom-up excitation to L 2 cells b, c, and d. Since cells b, c, and d are equally activated, they in turn initiate the transmission of excitatory signals through their lateral output connections. However, because the lateral connections possess a transmission time-de- lay, the output signals do not reach their destinations yet. The equal activation of the three cells, b, c, and d represents the uncertainty at this phase about the actual direction in which the contrast is moving. By permitting more than one cell at a spatial position to be active, the network multiplexes, or represents simultaneously, all possible actual directions in which the segment could be moving.

A short time later, Phase 2 begins (Figure 6b). By

51

this time, the input segment has moved farther to the right. It no longer activates cell a; instead it now activates L~ cell e, which in turn sends bottom-up excitatory signals to its corresponding cluster of L 2

cells: f, g, and h. At just this moment, the delayed lateral signals from b, c, and d are delivered to L 2

cells f, i, and j. Thus cell f receives both bottom-up and lateral excitation, while cells g and h receive only bottom-up input. The extra excitation delivered to f increases its activation and enables it to suppress (via lateral inhibition) the activities of g and h. A single cell in the cluster is thereby chosen to be active. The lateral excitation received by i and j is too weak to activate those cells supraliminally; hence only cell f is fully active at Phase 2. Consequently, only cell f can propagate its lateral signals to its own successor. Since cellfis on a horizontal chain, the full activation of the single chosen cell f represents the network's newly computed decision that the input segment is moving horizontally at Phase 2. The initial broad dispersal of lateral signals (three active cells) at Phase 1 has been narrowed (one active ceil) at Phase 2.

As long as the input segment continues to follow the same trajectory, the lateral signals continue to propagate along the same horizontal chain. Thus, the signal transmission latencies allow the represen- tation of the segment's horizontal motion direction to predictively track the segment's changing position.

4. DETERMINATION OF ACTUAL DIRECTION

The network sketched in the previous section can be elaborated in a variety of ways and applied to several uncertainty-resolution problems. This section devel- ops the details of how such a network can function to disambiguate local motion measurements and to spatially track edge segments, even when the length of a segment can vary. Simulations are presented showing how the network's outputs are analogous to the direction judgments of human beings in response to Wallach's (1935) motion displays. This approach allows a new and easily generalizable solution to the aperture problem to be implemented.

Simulation I: Direction Judgments in Wallach's Display

Simulation I shows how the kind of network de- scribed in the previous section can resolve motion ambiguity and respond to changes in direction, using a slightly more general version of the network. Im- plementation details for Simulation I are described in the Appendix and are elaborated further by Mar- shall (1988b, 1989). Figure 7 shows a grid of cell positions within layer L 2. Bars representing the net- work's input pattern sequence are superimposed on

Page 8: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

52 ,t ,4. Ma~:s/m/i

FIGURE 7. Grid of cell positions (dotted lines) is overlaid with bars representing changing ~ a n d length of input segment. Arrows repremmt direction ambiguity in L~ ceil ac- tivations.

the grid. The input sequence represents one of Wal- lach's displays: a diagonal line moving behind a hor- izontal rectangular window (Figure 2d). A single di- agonally oriented segment is presented at each discrete time step, so that a single bar appears to sweep across the window. The bar starts at one corner of the rec- tangle and appears to become disoccluded, length- ening as it changes position diagonally upward. At time 5, the bar appears to stop lengthening and to change direction, shifting horizontally. It continues shifting horizontally until time 10, when it appears to shorten and shift diagonally upward again, through time 12.

Each box in the grid contains a cluster of 12 cells; Figure 8 describes the preferred orientation, pre- ferred length, and lateral connectivity of each of the 12 cells at a position. The 12 cells shown at each position are all sensitive to the same orientation and local direction component of motion, and thus only a subset of the full network is shown. However, the

t Ce La,era, ou,.u La,e a, Lateral input, / / \ ~'~-" connection

Cell cluster J' ~ ~ 4~-~JPreferred position ~ " ~ lengths

S S. each ceil in t , c l . m r is rap--n..-----ted by a dot.

which cell ~ -.nil ~ ~ s~lnshs.

12 cells in a cluster each possess a different combi- nation of one of four preferred lengths and one of three lateral connection chain directions. Thus, one of the 12 cells might, for example, respond prefer- entially to presentation of a short~ diagonally ori- ented contrast segment, preceded by activation of a cell prior along a horizontal axis. By virtue of its lateral inputs, the cell can be said to respond prel- erentially to a horizontally moving diagonal edge traversing its receptive field.

At each time step in the simulation, the input pattern is fed into LI cells sensitive to the appropriate position, orientation, length, and local direction component of motion. None of the l.~ cells individ- ually can determine the actual direction in which the bar appears to be moving; determination of actual direction is the task of L2. Excitatory signals from the maximally active L~ cells are fed forward to clus- ters of L2 cells. The simulation shows how the net- work at layer L2 computes and represents the actual direction in which the bar is moving, in a manner consistent with human percepts of actual direction of motion.

Figure 9 summarizes the network's direction com- putations as the segment traverses the network. The initial (time 1) uncertainty about the segment's di- rection of motion is represented by the equal bottom- up activation of three cells (three output arrows, shaded area) belonging to separate chains. Next (time 2), only a single cell (on a diagonal chain) receives both bottom-up and lateral excitation; lateral inhi- bition then ensures that only that cell remains sig- nificantly active. The network's decision that the seg- ment is moving in a diagonal direction is represented by that cell's activity (single output arrow). The rep- resentation of diagonal motion propagates (times 3 and 4).

At time 5 (and time 10), the segment begins to shift in a new direction, preventing the bottom-up and lateral signals from arriving at the same location. Three cells receive equal excitation (bottom-up only), engendering a new moment of uncertainty (three output arrows). The uncertainty is resolved at time 6 (and time 11), when again a single cell receives both bottom-up and lateral excitation, representing the network's decision about the segment's direction of motion (single output arrow).

By virtue of the lateral chains, the preferences of L2 cells for actual direction of motion were not, in general, necessarily perpendicular to the cells' ori- entation preferences. Cells whose orientation pref- erence is not perpendicular to their direction pref- erence in response to moving edge segments (Slits) have been found in area MT of macaque monkey visual cortex (Albright, 1984). Thus, the dissociation of orientation and direction preferences exhibited :by L2 cells in the present simulations may correspond

Page 9: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

Self Organization in Motion Perception 53

FIGURE 9. Summary of network's direction computations. Number adjacent to each diagonal row indicates time-step at which moving stripe reaches the row's position. Input arrows indicate lateral excitation; diagonal bars indicate cells receiving bottom-up excitation. Output arrows (shaded regions) indicate active cells. Triple output arrows (times 1, 5, and 10) indicate direction uncertainty; single output arrows indicate direction decision.

to that dissociation found in such physiologically identified cells.

Simulation I illustrates how the two degrees of freedom in the aperture problem can be disentangled and then dealt with separately: first, the hypercom- plex-type cells localize each edge segment; and sec- ond, the lateral chains track the segments' actual directions of motion. The direction decisions pro- duced by the simulation accord with the percepts reported by Wallach's (1935) experimental subjects. Simulation I shows how a simple network can over- come the ambiguity inherent in L1 cell activations and can represent actual directions of motion. The same kind of mechanism can be replicated for other orientations and velocities; each such subnetwork is engaged by stimuli of its preferred orientation and velocity.

5. HOW THE NETWORK SELF-ORGANIZES

The complex chain structures described above might seem arbitrary, or indeed bizarre, as processing structures for visual information, were it not for a special property: self-organization. Self-organization refers to the ability of the network to acquire its s tructure adaptively without detailed external "teaching." The networks described here begin with a rudimentary, undifferentiated interconnection structure. As cells in the network are exposed to sequences of visual input, a simple adaptation rule causes them to modify their connection strengths ac-

cording to the spatiotemporal correlations in the in- put. As a result, certain characteristic patterns of connections form between cells in the network. The chain structures in L2 form in this manner; they are thus a natural consequence of simple adaptation to ordinary moving visual input.

Initially, the networks are formed according to simple growth rules (Cohen & Grossberg, 1987; Dammasch, Wagner, & Wolff, 1986; Grossberg, 1976a, 1976b; Kohonen, 1982a, 1982b; Linsker, 1986a; vonder Malsburg, 1973; vonder Malsburg & Cowan, 1982; Willshaw & von der Malsburg, 1976). Cells are distributed uniformly throughout each layer. The weights of connections between cells decrease with distance according to a Gaussian function. Thus, the initial processing capabilities of cells in each layer are uniform and nonspecific.

The strengths of neural interconnections are mod- ified on a "use it or lose it" basis: connections in- volved in representing frequent visual events are ex- ercised often, becoming stronger, while infrequently exercised connections become weaker (Amari & Takeuchi, 1978; Bienenstock et al., 1982; Carpenter & Grossberg, 1987a, 1987b; Dubin, Stark, & Archer, 1986; Fukushima & Miyake, 1982; Garey & Petti- grew, 1974; Globus, Rosenzweig, Bennett, & Dia- mond, 1973; Grcenough, 1975; Grossberg, 1972, 1976a, 1976b, 1984; Hebb, 1949; Hubel. Wiesel, & LeVay, 1977; Kohonen, 1982a, 1982b, 1984, 1987; Kohonen & Oja, 1976; Linsker, 1986a, 1986b, 1986c; Rakic, 1977; Singer, 1983, 1985a, 1985b; v o n d e r Malsburg, 1973; von der Malsburg & Cowan, 1982). The present paper goes beyond previous analyses in

Page 10: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

54 ~. A. Mars~toi/

several ways. First, the self-organization is applied to the domain of motion, which implies that repre- sentations of temporal sequences must be formed. Second, the dual processes of input representation and network self-organization occur simultaneously, even when uncertainty is represented. Third, a new kind of inhibitory learning rule is combined with an excitatory learning rule to regulate the amount of permissible overlap between the input patterns rep- resented by each cell.

A simple learning rule governs the gradual changes in the excitatory connection weights between cells: Whenever a cell is active, its input connections from active cells become stronger--at the expense of its input connections from inactive cells. This kind of rule is in the class of instar learning rules (Grossberg, 1982b); that is, for the (excitatory) connection from cell j to cell i, the learning occurs whenever cell i is active, but the target value of the connection strength is determined by the activity of cell j.

The learning rule is followed independently by each cell Yet when combined in a network with excitatory and inhibitory signal transmission, it leads to several important properties of the network as a whole. In particular, two global network properties emerge from the local cellular interactions governed by such learning:

1. Selectivity Property. Each cell becomes increas- ingly sensitive to a particular input pattern.

2. Dispersion Property. Every cell tends to become sensitive to a different input pattern.

Because learning occurs whenever a target cell is active, the dispersion property is upheld when in- hibition is strong enough to prevent two target cells from responding to the same input pattern. (This condition can be relaxed somewhat: see section be- low on Tuning of Inhibition Strength.) Selectivity follows from dispersion: input patterns that differ sufficiently will activate different target cells, each of which then learns more strongly its own input pattern.

Spatiotemporal Correlations in Moving Images

The networks presented here exploit certain spati- otemporal correlation properties (Field, i987; Ker- sten, 1987; Knill & Kersten, 1988) of information carried by images moving on the retina. To clarify how the networks self-organize, these ecologically derived correlations will now be described.

The self-organization that occurs in these net- works is based on the following premise: Moving visual features detected in an image have visual (Anstis & Ramachandran, 1987; Ramachandran &

Anstis, 1983; Sethi & Jain, 1987; Waxman & Dun- can, 1986). For example, if a vertically oriented con- trast segment moving rightward at velocity v activates a cell at position (x, y) at time t, then another cell near position (x + vAt, y) is likely to become acli vated at time t + At (for small St). On average, it is possible to predict the position and velocity of a given visual feature a short time into the future, given past measurements of position and velocity. For ex- ample, measurements over an interval of time of the motion of an oriented edge segment can lead to ~ fairly reliable estimate of the segment's future po- sition. Of course, such predictions often fail to hold, because of noise, changes in the objcct's motion, cell unreliability, etc. However, if construed as proba- bilities rather than rigid sequences, such predictions can be used for resolving motion ambiguities, as in the aperture problem. The self-organizing networks described here learn and exploit these inertial tend- encies.

Figure 10 sketches a contour plot of a probability density function that estimates the a posteriori like- lihood of cells at various positions becoming acti- vated, given the activation of a cell sensitive to a particular local visual motion, At time units ago. The likeliest positions toward which a segment might move lie along a ridge whose location is consistent with the possible actual motions of the segment. Note that it is possible for the a posteriori activation to occur even in the opposite direction from where the given cell's activity nominally would indicate, due to fac- tors such as change in motion or noise. Although it may be possible to accommodate some of these fac- tors to an extent within this framework by allowing connections to spread across a "fuzzy" region (see Discussion), this paper shall be limited to a study of the sharply defined connections arising from the likeliest activation sequences.

The networks described here incorporate the probabilities of these sequences into their connection weights via an adaptive rule, producing a connection structure suitable for resolving motk~n ambiguity. The networks take advantage of the interactions between signal transmission timing, retinotopic distance, and velocity measurements to generate prediction signals that correspond to the probabilities that a measured visual motion will be followed by a particular future visual input. Motion of a visual feature is represented in terms of the extent to which it follows these pre- dicted motion sequences. In other words, the net- work automatically forms a model of the spatiotem- poral structure of its input patterns and represents incoming visual information in terms of its model. Such an approach is similar in these respects to some maximum a posteriori (MAP) or Bayesian methods (Anderson & Abrahams, 1987; Golden, 11988; Ker-

Page 11: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

Sel f Organization in Motion Perception 55

sten, & Barlow, 1988; Kersten, 1987; Stork & Wil- son, 1988; Watson, 1987). However, they may re- quire an external "teacher" to supply a vector specifying the desired outcome of the learning. An analysis of self-organization provides even more power, by specifying not only how an organism can use environmental statistics to efficiently encode en- vironmental events, but also how the same organism can accomplish the uptake of such statistics--without an external teacher.

FIGURE 10. Assuming that a cell is active, which cells will become activated a short time into the future? The figure sketches a schematic contour plot of the probability density function of the a posteriori likelihood of cells at various po- sitions becoming activated, provided that a given cell is ac- tivated. Suppose a particular cell (ellipse), which responds maximally to a vertically oriented edge segment moving with a particular horizontal speed component (arrow), is active. Then a cell at the position marked by the x is the most likely cell to be activated next. Cells located along the contour ridge are also likely candidates for activation. Cells located elsewhere are much less likely to be activated next, although they might become activated by chance or by change in the segment's trajectory. The sketch is drawn based on the sim- plifying assumption that motion of visual objects is equally likely in all radial directions. In the case of linear motion without noise, the probability density along the ridge would

dO 1 - where y represents the dis- be proportional to dy 1 + y2,

tance from the midpoint of the ridge and 0 = arctan y rep- resents the radial direction angle of the motion.

sten, O'Toole, Sereno, Knill, & Anderson, 1987; Knill & Kersten, 1988), which have begun to be explored in the area of motion perception (Sereno, 1986, 1987). Such methods are useful because they elucidate how a perceiving organism's representational efficiency can be maximized and its coding redundancy mini- mized (Barlow, 1980, 1981; Bossomaier & Snyder, 1986; Daugman, 1985, 1988; Field, 1987; Field, Ker-

Self -Organizat ion o f the N e t w o r k Structure

The networks described here incorporate the se- quence probabilities of such motion stimuli into their connection weights via an adaptive rule, producing a connection structure suitable for resolving motion ambiguity. Because the lateral connections are time- delayed, the receptive field profile of each L 2 cell develops to include information about the prior state of other cells in its own layer. Each L 2 cell acquires strong lateral connections from other L 2 cells whose activations are likely to precede its own, that is, from cells of the same direction and speed sensitivity prior on the same direction and speed trajectory. Likewise, each L 2 cell is likely to develop strong lateral con- nections to other cells of the same sensitivity but subsequent on the same trajectory. Thus, the inertial tendencies of moving visual features lead to the for- mation of lateral chain structures in layer L 2 of the network. The signal transmission latencies of con- nections along each chain match the rate at which bottom-up signals activate the cells along the chain. The chains allow the network to track moving visual features, as in Simulation I.

To illustrate, Figure l la shows how each L2 cell initially receives an undifferentiated profusion of ex- citatory connections from nearby cells in L] and L 2.

After a period of exposure to moving visual images, most of those connections weaken or disappear (Fig- ure 11b). Each L2 cell then receives bottom-up input representing a single preferred orientation, length, and local direction component of motion. Each L 2

cell also receives lateral connections consistent with its bottom-up input. That is, a n L 2 cell receives con- nections from other L 2 cells of similar receptive field preferences. The only such lateral connections that remain, after sufficient exposure to moving visual input, are those for which the transmission latency equals the time a visual feature (moving at the orig- inating cell's preferred speed) would take to travel from the originating cell's receptive field to the des- tination cell's receptive field (Figure 11c). Due to the dispersion property, neighboring L 2 cells tend to ac- quire different lateral connections, even if they re-

Page 12: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

56 ,/, A. Mar~tzaii

(b) (a)

/

L (c) (d)

FIGURE 11. (a) Initially, each L2 cell receives excitatory connections from all its neighbors in L~ and L2. (b) AoUvity c ~ s (shading) s~lrengthen some connections and weaken others. (c) When a cell receives both bottom-up ex~tatla~ and ~m)- de/a,/ed lateral excitstion, it becomes more eansltive to the ¢o~ ceils with like receptive field properties. Transmission lata~ their usual activation eaym:hron¥. (d) Lateral ihbibnion ens~ taneously. This gives rise to the d/spefll/on property: every c even if two ceils receive the same connections from L,, they

ceive the same bottom-up connections (Figure 11d). Thus, even cells that are physically clustered together tend to spread their input topologies apart.

6. SELF-ORGANIIATION OF VELOCITY S E N S I T M T Y

This section shows how a neural network which ~m- plements the adaptive principles described above ac- quires sensitivity to motion at several velocities. In particular, the network's bottom-up and lateral ex- citatory connections develop, in response to moving visual input, to endow L2 cells with predictive ve- locity tracking capabilities. The lateral connections in L2 form chain structures, linking cells with similar bottom-up velocity sensitivities along successive spa- tial positions. The global pattern of excitatory con- nections to all the L2 cells becomes maximally se- lective and dispersed. The system climbs out of local minima in its connection landscape as it proceeds to a globally consistent L~ --~ L2 mapping.

~ n 11: (A) Devdooment of Bottom.Up Conneetions

Simulation II shows how adaptation, combined with moving visual input, leads to the formation of lateral connection chains consistent with bottom-up con- nections in a 1-dimensional slice across the network. Implementation details are specified in the Appendix and by Marshall (1988b, 1989). At each spatial po- sition in the simulation there are 6 L1 cells and 6 L~ cells. Each L~ cell is sensitive to visual input moving at a different speed across the visual field: fast, me- dium. or slow to the left. or fast. medium, or slow to the right (Figure 12a).

The network's input sequences are designed ac- cording to the simplest possible assumption: that each L~ cell is equally likely over time to be stimu- lated. The L1 cells are activated sequentially by sim- ulated stimuli moving at velocities chosen randomly from the set { - 3 , - 2 , - 1,1,2,3}. This design is in- tended only as a rough approximation o f the eco- logically constrained behavior of an animal's visual

Page 13: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

Self-Organization in Motion Perception 57

- 1~1-- ~--I~+1 -2~ # 12

(a) - 3 4 ~ + 3

L2 Cell -3 -2 -1 +1 +2 +3

(b) L1 Velocity

t -

o ~ ' ' L - "

-3 -2 -1 +1 +2 +a ,4 D,

(c) I_1 Velocity

Cell

1 2 3 4 5 6

(d) -3 -2 -1 +1 +2 +3

FIGURE 12. (a) Input patterns for Simulation II, A visual fea- ture sweeps either left or right across the 1-D simulated vis- ual field, with speed 1, 2, or 3. (b) Initially, the 6 L1 cells connect equally well to the 6 L2 cells. Length of vertical bar in matrix indicates bottom-up connection strength from an L, cell to an L2 cell. (c) After repeated exposure to visual input, most of the connections have weakened. (d) The sur- viving strong connections form a one-to-one mapping.

inputs. A wider range of input velocities could be accommodated simply by adding properly tuned L~ cells. Intermediate velocities could be handled by broadening the cells' tuning curves. The generality of the networks is not impaired by the choice here to limit input patterns to a small discrete set; inter- mediate patterns would be approximated by similar neighbors (Grossberg, 1976a, 1976b; Kohonen, 1984) and processed appropriately.

Figures 12b-c depict the development of the bot- tom-up connection structure from a group of L] cells to a group of L 2 cells. An L~ cell's preferred speed is indicated by a number: - 3, - 2, - 1, + 1, + 2, or ÷ 3, which refers to the number of positions a visual feature moving at the cell's preferred speed would traverse in one unit of time. One matrix shows the initial strengths of the connections from the 6 L1 cells to the 6 L2 cells at a single position (Figure 12b). Another matrix shows the strengths of the same con- nections after a period of exposure to moving visual

input (Figure 12c). The pattern of connections de- velops so that each L~ cell connects strongly to ex- actly one L2 cell. The one-to-one pattern of bottom- up connections is detailed in Figure 12d. Each L 2

cell thus inherits the receptive field preferences of a single L~ cell. The L 2 cell numbered 2, for example, has become sensitive to visual input moving at the rate of 1 position to the left per unit of time. The same kind of bottom-up self-organization occurs at every position in the network.

The selectivity and dispersion properties of the network's adaptive rules make the network useful as a general-purpose pattern classification scheme. The combination of selectivity and dispersion enable the network to climb out of local minima in its connec- tion landscape, in a manner similar to the formation of globally consistent feature maps in Kohonen's (1982a, 1982b, 1984, 1987) self-organizing networks.

Occasionally, due to spurious correlations in the input patterns, improper bottom-up connections gain a small amount of strength. However, these connec- tions tend to disappear rapidly. In general, the high correlation probabilities between the activity of an L] cell and the L 2 cell to which it connects keep the correct connections strong and incorrect connections weak.

In this manner, each L2 cell acquires a unique bottom-up receptive field profile. Each L2 cell within a position becomes preferentially sensitive to a par- ticular local velocity. The bottom-up velocity pref- erence of each L 2 cell in the 6 simulated network positions is summarized in Figure 13.

Simulation lh (B) Development of Lateral Connections

In addition to the bottom-up connections, each L 2

cell acquires a set of lateral excitatory connections. Both bottom-up and lateral excitatory connection strengths vary according to the same adaptive rule.

In Figure 13a, all the surviving strong connections with transmission latency 1 are shown. Note that all the + 1 cells participate in a single chain of connec- tions that jumps from each + 1 cell to the one + 1 positions to its right (heavy arrows). Likewise, every cell connects laterally only to other cells of the same velocity sensitivity. The direction and number of po- sitions that each link crosses corresponds to the bot- tom-up velocity preferences of the cells it connects.

In Figures 13b and 13c, the surviving strong con- nections with latencies 2 and 3, respectively, are shown. These also form chains, linking cells of like velocity preference; however, each link crosses 2 or 3 positions, respectively, for each unit of velocity to which its endpoint cells are sensitive. Each cell here acquires lateral connections from the L2 cells likeliest

Page 14: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

58 ~ A Mar~h.!:

Delay = 1

i I°' I.-'*~'-k:~a~-~ 3 [ O,Xly = 2

I C I-*1+ oQxty = a

Position 1 Position 2 Position 3 Position 4 Position 5 Position 6

FIGURE 13. The bottom-up velocity preference of each L2 cell in the 6 simulated network positions is indicated by a n u ~ within the corresponding box. All surviving strong connecUons of time-delay I (a), time-deiw 2(b), e n d t ~ 3 ( © ) m displayed as arrows. The chain structure, connecting ceils of like bottom,up ~ asnsltivlty, is ~ ~ ~ . ~ chain is shown by heavy arrows.

to have been active 2 or 3 time units prior to its own activation.

Although the connections with latencies 1, 2, and 3 are displayed separately in Figures 13a--c, they all exist simultaneously in the single network of Simu- lation II. Based on the bottom-up velocity sensitivity of each cell, the surviving lateral connections, dis- played in Figure 13, are exactly the correct ones-- none are missing, and none are superfluous.

Simulation II points out three main features of the network's self-organization in response to visual in- put: (a) the initial nonspecific bottom-up mapping becomes maximally selective and dispersed, in this case coding each velocity by a separate L2 cell; (b) the lateral excitatory connections that thrive respect the bottom-up sensitivity of the cells they link; and (c) unidirectional chain-like structures of lateral con- nections form along successive positions, linking cells that respond to visual features with similar motion characteristics.

7. SELF-ORGANIZATION OF D I R E ~ O N SENSITIVITY

Simulation II illustrated how a network can acquire sensitivity to visual motion trajectories based on ve- locity. In order to show that self-organization can

provide all the structural elements the network needs to perform its motion-disambiguation tasks, an ad- ditional adaptive capability must be demonstrated: the network's ability to acquire sensitivity to visual motion trajectories based on 2-D actual direction of motion. The adaptive rule used in Simulation II. combined with a new inhibitory adaptive rule, to- gether allow layer L2 to become sensitive to actual direction of motion. Each L2 cell becomes a member of a lateral chain which connects cells of similar di- rection preference; in addition, the L2 cells in a local neighborhood all become members of different chains. The inhibitory learning rule enables the sys- tem to develop representations of both uncertainty and decision, to use its full representational capacity, and to dynamically maintain symmetry of inhibition.

m : (A) of to Actual Direction

Simulation Iti shows how layer L2 of the network can become sensitive to actual direction of motion. Implementation details are described in the ~ e n - dix and by Marshall (1988b, 1989). Figure 14 degicts the 3 x 5 grid of L2 celldusters on w h i e h ~ a t i o n III takes place. At each of the 3 × 5 spatial positions in L~, there is a single cell, which is assumed to be

Page 15: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

Self-Organization in Motion Perception 59

FIGURE 14. Time = 0. Schematic diagram of L2. Each cell in 3 x 5 matrix of clusters is identified by number. Some cells are displayed more than once to facilitate display of "wraparound" connections. Initially, every L2 cell is connected weakly to all its L2 neighbors. The output connections from a single cell are displayed.

sensitive to vertically-oriented edge segments mov- ing in any of three directions with rightward normal component. Each L1 cell projects bottom-up con- nections to the three cells in its topographically cor- responding L2 cluster. Lateral excitatory connections within L2 are initially symmetric: each L2 cell sends weak excitatory connections to all its neighbors, as shown in Figure 14. Strong reciprocal inhibitory con- nections between cells within each cluster are also present.

The network is exposed to an input sequence rep- resenting oriented edge segments moving across a region of the visual field. At random intervals a ver- tical edge segment appears in the visual field at one of the 3 × 5 L1 cell positions. It then sweeps in one of three rightward directions for several time-steps: horizontally, diagonally upward, or diagonally down- ward; and finally, it disappears. Such simulated vis- ual motion stimuli are presented repeatedly. On each

presentation, the weights of both the bottom-up and lateral connections change only slightly, so that the learning does not reflect the effects of any single input presentation, but rather the accumulated ef- fects of statistical trends in the input over long pe- riods of time.

Figures 15 and 16 show the resultant lateral ex- citatory connection patterns at progressively longer periods of simulated visual exposure. Already after 2000 units of time (Figure 15), a universal rightward trend in the direction of lateral excitatory connec- tions is discernable. Although much of the desired chain structure is still missing at this stage, the lack of leftward input motion is reflected in the absence of leftward connections.

By time 8000 (Figure 16), the network has self- organized its lateral chain structure completely: every L2 cell is a link in one lateral chain, and furthermore, within each cluster, each of the cells is part of a

Page 16: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

60 ./. A. Marsh.I/

"\ , • /

\ \ . f

• / / /

\ ................ )i i I'

l

FIGURE 15. Time = 2000. Already, the connections show a distinct kd t - to - r t l~ trend.

different lateral chain. Thus, Simulation III illustrates how each cluster of neighboring cells in L2 inde- pendently develops a separate means of representing each of the possible actual directions in which an L1 activation can move.

After time 8000, the connection strengths con- tinue to vary--the learning rules are not shut off. However, the overall pattern of lateral chains estab- lished at this point (Figure 16) does not change throughout the remainder of the simulation, which runs to time 15,000. Thus, the overall pattern of lateral excitatory connections is stable, as long as the system's input sequences continue to follow similar statistical distributions. See section below on Stabil- ity and Plasticity of the Network for a discussion of the tradeoffs between the network's learning rate and its sensitivity to temporary fluctuations in the statis- tical behavior of its input sequences. After a period of self-organization has occurred ~ ~: la tera l ~ s have become established, as in Figure 16, the net- work processes visual input in its normal fashion, as described above (Figures 6 and 9).

Simulation I11: (B) Tuning o f ~ n Strength

Simulation III combines both excitatory and inhibi- tory learning (Easton & Gordon. 1984) in a new way. Both types of learning contribute to the network's self-organization. The excitatory learning allows the L2 cells to form categories for the input, based on spatiotemporal correlations. The inhibitory learning (Amari & Takeuchi, 1978; Easton & Gordon, t98~: Nagano & Kurata, 1981; Wilson. 1988) governs the amount of coactivation, or permissible overlap, per- mitted between categories.

Figure 17 illustrates the desired inhibition prop- erties. If too little inhibition were present wi~in each cluster, then all three cells in a cluster coultt become active Simultaneously (Figure 17a). Since cormection strengths are initially isotropic, all the cells in aclus- ter would becon~ sensitive tothe same input pattern. Because each celt ought to acquire a d/fferent sen- sitivity, the inhibition strengths need to be ~ enough (at least initially) so that the slightdifferenees in cell inputs would result in great differences in cell acti-

Page 17: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

Self-Organization in Motion Perception 61

FIGURE 16. Time = 8000. The desired chain lattice has formed perfectly. All strong excitatory connections are part of a lateral chain. Also, each cell in every cluster is part of a different chain. The same overall chain structure remains permanently through time 15,000, even though learning is allowed to continue.

vations (Figure 17b). Over long periods this would ensure (by dispersion) that no two cells would tend to acquire the same input sensitivity. However, once the cells' input pattern sensitivities are established through strong inhibition, they become incorporated into the excitatory connection weights. Thereafter, less inhibition is needed because cells tend not to become coactivated anyway. Moreover, too much inhibition would then preclude the transient activa- tion of multiple cells when a new stimulus first ap- pears; a single cell (probably the wrong one within the L2 cluster) would be chosen to actively represent the input. Since the wrong cell might become the only active cell, the concomitant incorrect learning would be likely to distort the network's connection patterns, in general preventing a stable configuration from becoming established. The inhibitory learning rule in Simulation III prevents such pathologies from arising by permitting the network to choose the ap- propriate intermediate levels of inhibition.

The desired inhibition properties dovetail nicely with the network's representation of uncertainty via multicell activation. Most of the time, as a stimulus sweeps across the visual field, the system is able to make a definite choice about its actual direction of motion. The kind of visual uncertainty examined in this paper generally occurs only during the moments of onset or direction-change of a moving visual stim- ulus. Thus, coactivation of multiple cells within a cluster occurs relatively rarely in this case. When it does occur, it is characterized by the absence of lat- eral excitation.

Because Simulation II was not designed to dem- onstrate how uncertainty can be represented by mul- tiple-cell activity, it did not employ such tuning of lateral inhibition strength. Rather, a high but con- stant amount of inhibition in Simulation II ensured that the L2 cells would simply develop highly selec- tive receptive field profiles. The high inhibition levels in Simulation II prevented simultaneous activation

Page 18: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

62 J A. Marshal!

Weak inhibitior

(a)

Strong:! inhibitior

(b)

(c) (d)

FIGURE 17. (a) When the ~ within a L= cluster is weak, then all the cells In the ~ ~ (b) When the inhibition Is ~ , one ceil can ~ suppress the others. Thus, uncertainty cannot be ~ via si- m u ~ s ~-'~Iv~ of multiple cells. An i ~ hMI is desirable, so that the ~ r k can (c) allow a cell that receive= both bottom-up ph~ ~ : ~ to ~ u ~ the other ceils' ~ , but also (~ allow ~ a activation of cells when only bottom-up input Is ~ .

of multiple L2 cells within each spatial position from occurring.

The rule governing inhibitory learning in Simu- lation III was similar to the excitatory learning rule. Whenever a cell is active, its output inhibitory con- nections to other active cells become stronger; its out- put inhibitory connections to other inactive cells be- come weaker. Thus, if two cells tend to be coactivated, the amount of inhibition between them tends to in- crease (Easton & Gordon, 1984), thereby making them less likely to be coactivated. If two cells tend not to be coactivated, the amount of inhibition be- tween them tends to decrease, so that they can be- come coactivated on relatively rare occasions. This rule is a reverse of inhibitory learning rules proposed previously (Amari & TakeuchL 1978; Nagano & Kurata, 1981), in which coactivation results in a de- crease of inhibition strength.

The inhibitory learning rule helps ensure that the network's representational capacity is fully used. If, for instance, a particular cell is never activated, then it will be unable to learn to code any input pattern, and its network role will be wasted. But the inhibi- tory learning rule would cause the cell to receive less

and less inhibition, until it begins to respond to some input pattern. The cell would continue to receive reduced levels of inhibition generally until it is active as often as other cells.

Symmetric Inhibition from an Asymmetric Learning Rnle

It is usually desirable to keep the amount of mutual inhibition between cells symmetric; that is, the strength of the inhibitory connection j -:> i should be the same as the strength of the inhibitory connection i-~ j. Otherwise, one cell could become strong enough to inhibit all the other cells. Thus whenever j - ~ i changes, so must i-7-> j. In addition, theorems proving stability of learning in neural networks require the assumption of symmetric inhibition (Cohen & Gross- berg, 1983; Grossberg, 1982b).

How can symmetry be preserved when both j - - , i and i -~ j are allowed to vary independently? Or- dinarily one would think either that the connections j ~ i and i --~ j must communicate with each other (via some privileged nonlocal means) or that the in- hibitory learning rule must treat j and i symmetrically and interchangeably. Both of these options for main- taining symmetry are unattractive: in the first case. the physical requirement that all neural interactions be locally mediated (Grossberg, 1984; Stent. 1973) rules out such weight-communication or weight- transport schemes; in the second case. explicitly sym- metric learning rules are unlikely to possess a phys- iological interpretation (Grgssberg, 1984; Stent, 1973). The problem of maintaining connection symmetry is thus a difficult one.

The inhibitory learning rule proposed in this paper successfully maintains connection symmetry using a local, asymmetric learning rule. thereby avoiding the difficulties outlined above. The following example sketches how the learning rule keeps the connection strengths in balance. Suppose j ---> i is stronger than i -~ j. Then cell j is more likely to become activated than i because j can suppress i's activity. However. when j does become activated, the rule causes j -~ i to weaken, heading toward restored symmetry. The following statistic indicates the rule's effectiveness: within every reciprocal pair of inhibitory connections in Simulation Ill. the strength of the weaker con- nection was at least 83% of that of the stronger con- nection, at time t = 15,000. Thus the simple learning rule described above nicely solves the problem of keeping inhibition symmetric--with an asymmetric learning rule.

Simulation III shows (a) that development of sen- sitivity to velocity and to direction can both be gov- erned by the same excitatory learning rule; (b) that even if all the L2 cells in a cluster receive the same bottom-up input, they can all acquire different lateral

Page 19: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

Self-Organization in Motion Perception 63

input connections; (c) that both uncertainty and de- cision can be represented, without disrupting the net- work's development; and (d) that excitatory and in- hibitory learning have different roles and can be combined to produce sophisticated and useful forms of adaptation.

8. DISCUSSION: ISSUES IN REPRESENTATION AND

SELF-ORGANIZATION

Representation in Visual Processing Networks

Visual systems construct sharp, vivid percepts from diffuse, uncertain, noisy measurements. In the do- main of motion perception, for example, a moving visual stimulus appears smeared when it is viewed for a brief exposure (30 ms), yet perfectly sharp when viewed for a longer exposure (100 ms) (Burr & Ross, 1986, Van Essen & Anderson, 1987). Our ability to counteract smear at longer exposures suggests that our visual systems combine and sharpen motion in- formation from multiple locations along a spatiotem- poral trajectory that matches the motion of the stim- ulus (Barlow, 1979, 1981; Burr & Ross, 1986). What neural mechanisms perform such spatiotemporal in- tegration? The type of long-range lateral connections proposed in this paper allow motion information computed at one location to propagate to successive locations in the correct direction and velocity. The lateral motion signals can then influence the outcome of inhibitory sharpening of bottom-up visual data. Thus, such lateral tracking mechanisms can imple- ment the kind of integration-along-trajectory found in human vision.

The "shifter circuits" suggested by Anderson and Van Essen (1987) and by Van Essen and Anderson (1987) may permit a visual system to compensate for the blurring effects of eye movements. However, their shifter circuits are controlled by a "black box" that takes its input from a global retinal motion signal. Consequently, their mechanism handles only the kinds of motion that arise from eye motion. It does not compensate for blur due to motion of visual features independent of eye motion.

In contrast, the tracking mechanisms proposed in this paper do not require a global motion signal to regulate image shifts. Instead, the amount of shift is controlled separately for each visual feature, inde- pendent of other visual features, and independent of eye motion. The amount of L 2 shift for each visual feature is governed by the feature's locally measured L1 velocity. Furthermore, it is easy to add mecha- nisms to compensate for global eye-motion to this kind of system, simply by adding a global eye-motion input to each L 2 cell. The system can then develop a chain structure, again connecting cells with similar

receptive-field characteristics--where the notion of similarity is expanded to include eye-motion as well as orientation, local velocity, and length. Several subclasses of cells that represent different eye mo- tions would result, and the representation of each object's motion could then be allocated appropri- ately between retinal and object components. Thus, the kind of self-organized tracking mechanisms pro- posed in this paper can be expanded to handle both eye motion and visual object motion.

The network's tracking mechanisms can be im- plemented either by lateral connections that traverse a single distance with a variety of transmission la- tencies, or by lateral connections across a variety of distances but with a single, fixed transmission la- tency. Functionally, these two alternatives are equiv- alent. However, biological evidence tends to favor either the latter alternative (Amthor & Grzywacz, 1988; Movshon, Newsome, Gizzi, & Levitt, 1988) or a combination of both alternatives (Baker, 1988). The adaptive mechanisms of Simulation II can pro- duce connection chains consistent with either of the alternatives, or both alternatives together, as illus- trated in Figure 13. Baker (1988) provides a discus- sion of how small variations in the spatial and tem- poral response properties of cells in striate cortex combine to produce large variations in the range of the cells' velocity sensitivities.

Simultaneous Representation of Multiple Moving Objects

Unlike schemes that represent visual motion via op- tical flow or velocity fields (Gibson, 1950; Gibson, Olum, & Rosenblatt, 1955; Horn & Schunck, 1981; Koenderink, 1986; Mart & Ullman, 1981; Nakayama & Loomis, 1974; Regan, 1986; Waxman & Duncan, 1986), the networks described here do not maintain a field of vectors to indicate local velocity at every visual position. Rather, these networks represent motion via a localized cell activation for each visual feature, such as an edge segment. For example, an entire edge segment is represented by activation of a cell whose receptive field is centered retinotopically at the centroid of the segment. Because the net- work's representation of a visual feature is local and moves with the feature itself, more than one feature can be represented simultaneously. In response to multiple moving features in the visual field, the ac- tivations representing each feature propagate inde- pendently. Some computational schemes for repre- senting visual motion require that each visual object be explicitly identified, segregated, and labeled be- fore its motion can be determined--and then further require explicit mechanisms to map an abstract rep- resentation of motion back to a retinotopic repre- sentation of the object (e.g., Feldman, 1988). In the

Page 20: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

64 .) A. Mars~mii

networks presented here, the segregation (Mart, 1982; Orban & Gulyfis, 1988), identification, and motion mapping of visual features is accomplished implicitly by simply preserving retinotopy throughout all net- work layers and allowing the internal representations to track the external stimuli.

How Many Cell Types Are Needed?

Coding schemes like the present one, in which the activity of each cell represents a different situation (Waiters, 1987), always exhibit a tradeoff between sampling resolution and coding economy (Duda & Hart, 1972). One feature of how this kind of network represents information is that many categories of cell can develop. For instance, in Simulation III, each position contains one type of cell for every actual direction of motion that can be represented. If the network can represent 3 directions, 4 lengths, and 5 velocities, then one might argue that 3 × 4 x 5 = 60 cell types are needed at each position in the net- work. Even more might be needed if more input dimensions are represented. This is a serious con- cern, but it can be addressed in at least four ways.

First, for computational simplicity, every simu- lated cell was tuned very sharply to a specific input feature. However, the tuning curves of real cells are typically much broader than those depicted in the simulations. If more realistic input conditions were applied to the network (e.g., more noise, nonlinear trajectories), then the cells would naturally develop broader tuning curves. Fewer broadly tuned cells than sharply tuned cells would then be required to rep- resent the input--although fine degrees of resolution might be unavailable at higher network levels. The broader tuning would help establish connections be- tween pairs of cells preferring somewhat more dis- parate stimulus attributes. These connections would in turn allow the network to represent with greater certainty some changes of direction and velocity.

Second, cells at a given network level could have larger receptive fields than cells at prior levels. This amounts to broader tuning in the positional domain and would allow cell positions at higher levels to be spaced farther apart than cells at lower levels. The notions of broader tuning and increased receptive field size at higher levels have a great deal of support in the physiological literature (Van Essen & Maun- sell, 1983). In trading sharpness for cell types, we actually lose nothing because sharp information is still available at lower levels. Alternatively, certain neural network architectures (e.g., Grossberg & Marshall, 1989) can recover sha~ness after a stage of diffusion (Grossberg & Mingolla, 1985a, 1985b).

Third, it might be further argued that the apparent massive redundancy of ceils in physiological mea- surements of visual cortex could be illusory. Elec-

trophysiological and cytohistochemical studies have concluded that cells often are organized into cortical columns, in which cells have similar receptive field properties (Hubel & Wiesel, 1977). However, such studies, by their very nature, may use insufficiently detailed visual probes to reveal fine receptive field differences. For instance, studies of cell response in various cortical areas to actual (versus local) direc- tion of motion could be informative and regarded as a test of the biological application of some of the hypotheses in this paper.

Fourth, it is possible that the visual systems of animals use axo-axonal connections, in addition to the axo-somatic connections modeled here. Analo- gous rules can be derived to describe the behavior of such neural architectures. An axo-axonal system would allow each cell's activity to represent many possible points along a feature dimension, instead of just one point. Thus, the number of cells required could be cut by orders of magnitude~

Dependence of Serf.Organized Structures on Input Sequence

Ideally, Simulations II and III could be merged into a single simulation showing how self-organization based on both velocity and direction could occur si- multaneously. Because of the high costs of simulating the parallel self-organization processes on a serial computer, it has been necessary to split the scope of the ideal simulation into two separately manageable tasks. (The numerical integration of Simulation HI alone required approximately six weeks to run on a 0.3 megaflop computer, for example.) However. in principle, a network can structure itself in the desired manner when presented with input sequences that combine a variety of velocities and directions of mo- tion because both Simulations II and III operate according to the same kind of network rules. No cell "knows" whether its input represents a given velocity or direction; its input is simply a pattern of activity distributed spatially across other cells and correlated in time. Thus the same learning algorithm would work whether the network's input patterns represent visual velocity, direction, or velocity and direction combined.

Simulations II and III together illustrate how lat- eral chains can self-organize in a network that op- erates according to quite simple rules. The simula- tions show how the structures that develop can reflect the velocities and directions represented in the net- work's visual input history. In general, the adaptive rules that the network obeys can enable it to learn and encode any statistical input distributions, not just of velocity and direction, but also of attributes such as orientation (Bienenstock et al.. 1982; Linsker. 1986b), contrast (Linsker. 1986a), and length. For

Page 21: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

Self Organization in Motion Perception 65

example, the premise that the detection of a short bar is more likely to be followed by other detections of a short bar than by detections of a long bar could be captured by this type of network and encoded in cell connections. The explicit assumption of this premise formed part of the basis of the hardwiring of con- nections in Simulation I.

What is truly striking about this type of network is not that it just encodes sequence probabilities in connection weights, but rather that it also subse- quently uses those weights to represent and disam- biguate actual incoming visual data. There is an el- egant isomorphism between the coded structure of the world (in the connection chains) and the prop- agated processing of visual input (which becomes represented in terms of the chains).

The generality of these adaptive properties opens the possibility that a lengthy hierarchy of visual pro- cessing layers can form through self-organization. It has been shown that contrast-detection layers and orientation-detection layers can form through self- organization (Amari, 1977; Bienenstock et al., 1982; Fukushima & Miyake, 1982; Grossberg, 1976b; Lin- sker, 1986a, 1986b, 1986c; Pearson et al., 1987; Singer, 1983, 1985a, 1985b; Takeuchi & Amari, 1979; von der Malsburg, 1973; von der Malsburg & Cowan, 1982). This paper adds direction and velocity sensi- tivity to the list of known visual processing functions that can self-organize. Eventually, it may be possible, by using the output of each layer as input to subse- quent layers, to show that even higher-order capa- bilities, such as depth perception and object recog- nition, can self-organize.

Stability and Plasticity of the Network

The network's connection weights constitute a cod- ing of the structure of the external world. The for- mation and persistence of this code depends heavily on the statistics of the network's visual input history. That is, if the probabilities of events are altered over a long enough period of time, then the connection patterns will change. For example, in Simulation II, if the positional displacements of the input sequences were changed from {_+ 1, -+2, -+3} to {-+2, -+3, -+4}, then the network would eventually lose its strong lateral connections between cells -+ 1 position apart and gain strong connections between cells -+4 posi- tions apart. This plasticity is an advantage under many circumstances; for example, it would compensate for the systematic distortions produced by growth of the eyeball as a newborn animal ages.

However, plasticity could also be a liability in other cases. For instance, if the alteration of input statistics is only temporary or spurious, then the changes it induces might erode desired connectivity patterns. The networks in this paper control the degree of

plasticity and stability of connection strengths by in- suring that a single input presentation can change connection strengths only a tiny amount. Only the cumulative and systematic effects of many input pre- sentations can significantly recode the network's con- nectivity. If the rate of such weight change is made small enough, then one can be reasonably sure that the resultant connection patterns stably code the sta- tistics of the input history rather than the adventi- tious correlations in a small number of input presen- tations. However, if the rate of weight-change is made too small, then a very large number of input presen- tations would be needed to produce the desired ad- aptation effects. A more sophisticated approach to the tradeoffs between stability and plasticity is ex- plored by Carpenter and Grossberg (1987a, 1987b) and Grossberg (1980, 1982a). For the purposes of this paper, the simple expedient of fixing the rate of connection weight-change is sufficient, though.

The approach taken in Simulations II and III was to choose a rate of weight-change small enough that even several spuriously correlated input presenta- tions would not change the connection topology-- but no smaller. The result was that the connection pattern was quite stable once it settled into its final form--and that the network would reach its final structure after a reasonable amount of visual expo- sure. Simulation II required approximately 30 pre- sentations per cell before it reached its final overall structure (1080 time-steps + 36 cells in L2). Simu- lation III required approximately 270 presentations per cell before it reached its final overall structure (12,000 time-steps + 45 cells in L2).

Physiological and Psychophysical Correlates

A number of lines of evidence support the notion that long-range lateral connections exist in visual cor- tex and are used for motion processing. Long-range, direction-specific spatiotemporal facilitatory inter- actions have been found in area MT of macaque monkeys (Mikami et al., 1986a, 1986b). Long-range axons and long-range excitatory interactions be- tween cells of similar orientation preferences have been found in area 17 of the visual cortex of cats, tree shrews, and macaque monkeys (Blasdel, Lund, & Fitzpatrick, 1985; Gabbott, Martin, & Whitter- idge, 1987; Luhmann, Martinez-Mill~in, & Singer, 1986; Lund, 1987; Michalski, Gerstein, Czarkowska, & Tarnecki, 1983; Mitchison & Crick, 1982; Nelson & Frost, 1985; Rockland &Lund, 1982, 1983; Rock- land, Lund ,& Humphrey, 1982; Ts'o, Gilbert, & Wiesel, 1986). Gabbott et al. (1987) suggest that such lateral interactions may be used in motion percep- tion: "This feedforward excitation by a neuron into whose receptive field an object has just entered could act as a facilitatory device to 'prime' (by increasing

Page 22: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

66 J. A. Marshal/

their response sensitivity) other neurons into whose receptive fields the object might eventually travel. These operations could also provide some predictive estimate of the future position of the object" (pp. 378-379). Such "priming" is exactly the function of the lateral connections in Simulations I-III.

Luhmann et al. (1986) further note that the de- velopment of such lateral connections in cat area 17 is dependent on visual experience: "There is an in- born pattern of discrete horizontal connections in striate cortex which is shaped by visual experience and requires contour vision for its maintenance" (p. 443). "The development of the horizontal connec- tions in striate cortex occurs in at least two phases, an early phase during which connections are formed in excess and a later phase during which connections are again eliminated" (p. 447). The self-organization processes presented in this paper offer a ready ex- planation of the mechanism and function of such developmental processes. Luhmann et al. (1986) found that such horizontal connections spanned distances of up to 10.5 mm across area 17, that is, very far in retinotopic coordinates. The present paper does not make an explicit identification of particular levels in the network with area 17. Rather, the levels LI and L2 are predicted to correspond to certain higher cor- tical processing areas, such as MT (Albright, 1984; Allman et al., 1985; Maunsell & Van Essen, 1983; Mikami et al., 1986a, 1986b; Rodman & Albright, 1987) and STS (Saito et al., 1986). Since the long- range lateral connections are found already in area 17, similar connections are likely to be found within higher levels as well. The long-range horizontal con- nections found in cat area 17 serve to illustrate (a) that the visual systems of animals do make use of long-range interactions, and (b) that kind of the lat- eral connections proposed in this paper do have a degree of support in the physiological literature.

Lateral connections can contribute to explana- tions of certain psychophysical data, as well as phys- iological data. Phenomena such as visual entrain- ment or inertia suggest neural mechanisms by which motion computations at one spatial position influ- ence motion computations at other spatial positions. For example, the multistable displays of Anstis and Ramachandran (1987) and Ramachandran and An- stis (1983) show that the directions of motion com- puted over one sequence pair of image frames cause the same directions to be favored in successive pairs of frames, at successive positions along the same mo- tion trajectories. Such inertial phenomena can be readily analyzed in light of the lateral connection chains proposed in this paper: the chains cause the directions of motion computed at one location to propagate to successive locations along the corre- sponding motion trajectories, with the appropriate timing. The laterally propagated direction signals

cause the representation of motion in the inertial directions to be favored.

Williams and Phillips (1987) report certain co- operative phenomena in the perception of motion direction; their results are consistent with the co- operative (i.e., excitatory) properties of the lateral connections proposed in this paper. They presented observers with a moving random dot field in which the individual dots moved either in a random or glob: ally coherent direction. By temporally varying the proportions of randomness and coherence, they showed that the percept of coherent motion tended to persist hystereticaUy. That is, an observer began to detect coherent motion at a certain point as the proportion of coherence was increased; subse- quently, coherence persisted even when the propor- tion was decreased below that l~int. Lateral exci- tatory connections could contribute to the hysteresis by propagating the computed motion directions at one moment to successive moments. However, since lateral connections propagate spatially as well as tem- porally, they may be predicted to cause coherence to appear to spread spatially in the direction of mo- tion, or at least to be influenced by spatial factors. Williams and Phillips (1987) did not control for such spatial factors; that might be done by varying the spatial displacement of each dot over successive frames or by varying the rate of change of the proportion of coherence. If their results are expanded to indicate whether spatial, as well as temporal factors influence perceived motion, then the persistence of perceived direction may be found to be attributable to prop- agation of computed direction at one moment to suc- cessive moments via lateral excitatory connections.

The psychophysical phenomena described above can be used to analyze one further fundamental ques- tion regarding the structure of motion-processing networks. Why not just use time-delayed bottom-up connections, instead of bottom-up and lateral con- nections, to accomplish motion tracking? The answer is that vision requires feedback: if all connections were bottom-up, then the network's motion com- putations would be performed afresh at each spatial position, without the feedback necessary to produce the phenomena of visual inertia (Anstis & Rama- chandran, 1987; Ramachandran & Anstis, 1983) and motion hysteresis (Williams & Phillips, 1987): Lat- eral connections allow the results of motion com- putations at each spatial location to influence the outcome of subsequent motion computations at other locations, thus permitting inertia and hysteresis,

CONCLUSIONS

One can imagine many reasons for a visual system to be adaptive. Adaptive mechanisms could allow the system to compensate for distortions, for ex-

Page 23: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

Self-Organization in Motion Perception 67

ample, due to growth of the eyeball. They could maintain the proper behavior of the network by com- pensating for changes or deterioration in the behav- ior of individual cells. They can reduce the infor- mational burden on genetic coding by allowing the details of neural interconnection structure to be spec- ified by the network's input correlation history.

The exposition here highlights some of the adap- tive issues a visual system must confront and some possible solutions to those issues. The strength of this approach is its reliance on simple, general-pur- pose adaptive processes in showing how a rudimen- tary network can acquire an ability to represent and disambiguate visual input.

Simulations I-III sketch separate but related frag- ments of the puzzle of visual processing. The behav- iors illustrated in the three simulations must be joined into a single model--and then combined with many other features--in order to constitute a full theory of vision. With sufficient computational resources, this can be done. So, the true value of this path of research lies not in its simulated construction of spe- cific adaptive networks, but in its broader principles: use of input correlation history to guide the forma- tion of its structure; use of strictly local adaptive rules to govern the self-organization of global processing mechanisms; use of lateral time-delays to bring events into temporal register; competition between cells and between connections to foster selectivity and disper- sion; inhibitory as well as excitatory learning to bal- ance selectivity and multiplexing; preservation of spatiotopic relations to allow simultaneous represen- tation of multiple independent objects; and simul- taneous use of network structures for both processing and adaptation.

REFERENCES

Adelson, E. H., & Bergen, J. R. (1985). Spatiotemporal energy models for the perception of motion. Journal of the Optical Society of America A, 2, 284-299.

Adelson, E. H., & Movshon, J. A. (1982). Phenomenal coherence of moving visual patterns. Nature, 300, 523-525.

Ahumada, A. J., Jr., & Yellott, J. I. Jr. (1988). A connectionist model for learning receptor positions. Investigative Ophthal- mology and Visual Science, 29(Suppl.), 58.

Albright, T. D. (1984). Direction and orientation selectivity of neurons in visual area MT of the macaque. Journal of Neu- rophysiology, 52, 1106-1130.

Allman, J., Miezin, E, & McGuinness, E. (1985). Direction- and velocity-specific responses from beyond the classical receptive field in the middle temporal visual area (MT). Perception, 14, 105-126.

Amari, S. (1977). Neural theory of association and concept for- mation. Biological Cybernetics, 26, 175-185.

Amari, S., & Takeuchi, A. (1978). Mathematical theory on for- mation of category detecting nerve cells. Biological Cyber- netics, 29, 127-136.

Amthor, F. R., & Grzywacz, N. M. (1988). The time course of inhibition and the velocity independence of direction selectiv- ity in the rabbit retina. Investigative Ophthalmology and Visual Science, 29(Suppl.), 225.

Anderson, C. H., & Abrahams, E. (1987). The Bayes connection. In M. Caudill & C. Butler (Eds.), Proceedings of the First IEEE International Conference on Neural Networks, 111 (pp. 105-112). Piscataway, N J: IEEE.

Anderson, C. H., & Van Essen, D. C. (1987). Shifter circuits: A computational strategy for dynamic aspects of visual process- ing. Proceedings of the National Academy of Sciences of the U.S.A., 84, 6297-6301.

Anstis, S. M. (1977). Apparent movement. In R. Held, H. W. Leibowitz, & H.-L. Teuber (Eds.), Handbook of sensory phys- iology, Vol. VIII: Perception. New York: Springer-Verlag.

Anstis, S. M., & Ramachandran, V. S. (1987). Visual inertia in apparent motion. Vision Research, 27, 755-764.

Baker, C. L., Jr. (1988). Spatial and temporal determinants of directionally selective velocity preference in cat striate cortex neurons. Journal of Neurophysiology, 59, 1557-1574.

Barlow, H. B. (1979). Reconstructing the visual image of space and time. Nature, 279, 189-190.

Barlow, H. B. (1980). The absolute efficiency of perceptual de- cisions. Philosophical Transactions of the Royal Society of Lon- don, Ser. B, 290, 71-82.

Barlow, H. B. (1981). Critical limiting factors in the design of the eye and visual cortex. Proceedings of the Royal Society of London, Ser. B, 212, 1-34.

Barlow, H. B., & Levick, W. R. (1965). The mechanism of di- rectionally selective units in the rabbit's retina. Journal of Physiology, 178, 477-504.

Bienenstock, E. L., Cooper, L. N., & Munro, P. W. (1982). Theory for the development of neuron selectivity: Orientation specificity and binocular interaction in visual cortex. Journal of Neuroscience, 2, 32-48.

Blasdel, G. G., Lund, J. S., & Fitzpatrick, D. (1985). Intrinsic connections of macaque striate cortex: Axonal projections of cells outside lamina 4C. Journal of Neuroscience, 5, 3350- 3369.

Bossomaier, T., & Snyder, A. W. (1986). Why spatial frequency processing in the visual cortex? Vision Research, 26, 1307- 1309.

Braastad, B. O., & Heggelund, P. (1985). Development of spatial receptive-field organization and orientation selectivity in kitten striate cortex. Journal of Neurophysiology, 53, 1158-1178.

Braddick, O. J. (1974). A short-range process in apparent motion. Vision Research 14, 519-527.

Braddick, O. J. (1980). Low-level and high-level processes in apparent motion. Philosophical Transactions of the Royal So- ciety of London, Ser. B, 290, 137-151.

Burbeck, C. A. (1985). Separate channels for the analysis of form and location. Investigative Ophthalmology and Visual Science, 26(Suppl.), 82.

Burbeck, C. A. (1986). Orientation selectivity in large-scale lo- calization. Journal of the Optical Society of America, A, 3, 98.

Burbeck, C. A. (1987). Position and spatial frequency in large- scale localization judgments. Vision Research, 27, 417-427.

Burr, D., & Ross, J. (1986). Visual processing of motion. Trends in Neuroscience, 9,304-307.

Carpenter, G. A., & Grossberg, S. (1987a). A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer Vision, Graphics, and Image Processing, 37, 54-115.

Carpenter, G. A., & Grossberg, S. (1987b). ART 2: Self-orga- nization of stable category recognition codes for analog input patterns. Applied Optics, 26, 4919-4930.

Cohen, M. A., & Grossberg, S. (1983). Absolute stability of global pattern formation and parallel memory storage by com- petitive neural networks. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13,815-826.

Cohen, M. A., & Grossberg, S. (1987). Masking fields: A mas- sively parallel neural architecture for learning, recognizing,

Page 24: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

68 .L A. Marshall

and predicting multiple groupings of patterned data. Applied Optics, 26, 1866-1891.

Cremieux, J., Orban, G. A., Duysens, J., & Amblard, B. (1987). Response properties of area 17 neurons in cats reared in stro- boscopic illumination. Journal of Neurophysiology, 57, 1511- 1535.

Dammasch, I. E., Wagner, G. P., & Wolff, J. R. (1986). Self- stabilization of neuronal networks. Biological Cybernetics, 54, 211-222.

Daugman, J. G. (1985). Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two- dimensional visual cortical filters. Journal of the Optical Society of America A, 2, 1160-1169.

Daugman, J. G. (1988). Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression. 1EEE Transactions on Acoustics, Speech, and Signal Processing, 36. 1169-1179.

Derrington, A. M. (1984). Development of spatial frequency se- lectivity in striate cortex of vision-deprived cats. Experimental Brain Research, 55,431-437.

Dobbins, A., Zucker, S. W., & Cynader, M. S. (1987). End- stopped neurons in the visual cortex as a substrate for calcu- lating curvature. Nature, 329(6138), 438-441.

Dobbins, A., Zucker, S. W., & Cynader, M. S. (1988). End- stopped simple cells and curvature: Predictions from a com- putational model. Investigative Ophthalmology and Visual Sci- ence, 29(Suppl.), 331.

Dubin, M. W., Stark, L. A., & Archer, S. M. (1986). A role for action-potential activity in the development of neuronal con- nections in the kitten retinogeniculate pathway. Journal of Neuroscience, 6, 1021-1036.

Duda, R. O., & Hart, P. E. (1972). Use of the Hough transfor- mation to detect lines and curves in pictures. Communications of the ACM, 15, 11-15.

Easton, P., & Gordon, P. E. (1984). Stabilization of Hebbian neural nets by inhibitory learning. Biological Cybernetics, 51, 1-9.

Ellias, S. A., & Grossberg, S. (1975). Pattern formation, contrast control, and oscillations in the short term memory of shunting on-center off-surround networks. Biological Cybernetics, 20, 69-98.

Feldman, J. A. (1988). Time, space and form in vision. Unpub- lished manuscript, University of Rochester Department of Computer Science.

Felleman, D. J., & Van Essen, D. C. (1987). Receptive field properties of neurons in area V3 of macaque monkey extras- triate cortex. Journal of Neurophysiology, 57, 889-920.

Ferrera, V. E, & Wilson, H. R. (1988). Perceived direction of moving 2D patterns. Investigative Ophthalmology and Visual Science, 29(Suppl.), 264.

Field, D. J. (1987). Relations between the statistics of natural images and the response properties of cortical cells. Journal of the Optical Society of America A, 4, 2379-2394.

Field, D. J., Kersten, D., & Barlow, H. B. (1988). Is redundancy increased or decreased in visual coding? Investigative Oph- thalmology and Visual Science, 29(Suppl.), 408.

Fleet, D. J., & Jepson, A. D. (1985). Spatiotemporal insepara- bility in early vision: Centre-surround models and velocity selectivity. Computational Intelligence, 1, 89-102.

Fr6gnac, Y., & Imbert, M. (1978). Early development of visual cortical cells in normal and dark-reared kittens: Relationship between orientation selectivity and ocular dominance. Journal of Physiology, 278, 27-44.

Fr6gnac, Y., & lmbert, M. (1984). Development of neuronal se- lectivity in primary visual cortex of cat. Physiological Reviews, 64, 325-434.

Fukushima, K., & Miyake, S. (1982). Neocognitron: A new al- gorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recognition, 15,455-469.

Gabbott, P. L. A., Martin, K. A. C., & Whitteridge, D. (1987).

Connections between pyramidal neurons in layer 5 of cat visual cortex (area 17). Journal of Comparative Neurology, 259,364- 381.

Garey, L. J., & Pettigrew, J. D. (1974). Ultrastructural changes in kitten visual cortex after environmental modification. Brain Research, 66, 165-172.

Gary-Bobo, E., Milleret, C., & Buisseret, l ~. (1986). Role of eye movements in developmental process of orientation selectivity in the kitten visual cortex. Vision Research, 26, 557-567.

Gibson. J. 1 (1950). The perception of the visual world. Westport, CT: Greenwood Press.

Gibson. J. J.. Olum. P.. & Rosenblatt, F. ¢1955/. Parallax and perspective during aircraft landings. American Journal of Psy- chology, 68, 372-385.

Globus. A., Rosenzweig, M. R.. Bennett E l_. & Diamond. M. C. t1973). Effects of differential experience on dendritic spine counts in rat cerebral cortex. Journal of Comparattve and Physiological Psychology, 82. t 75-- l 81

Golden. R M. (1988). A unified framework for connecuonist systems. Biological Cybernetics. 59. 109-120.

Graves. A. L., Trotter. Y.. & Fr6gnac, Y. 11987). Role of extra- ocular muscle propnoception m the development of depth perception in cats. Journal of Neurophysiology, 58, 816-831

Greenough, W q'. (1975). Experimental modification of the de- veloping brain. American Scientist. 63, 37-46

Grossberg, S. (1972). Neural expectation: Cerebeltar and retinal analogs of cells fired by learnable or unlearned pattern classes. Kvbernetik. 10.49-57.

Grossberg, S. (1976a). Adaptive pattern classification and uni- versal recoding: ! Parallel development and coding of neural feature detectors. Biological Cybernetics, 23. 121-134.

Grossberg, S. (1976b). On the development of feature detectors m the visual cortex with applications to learning and reacuon- diffusion systems. Biological Cybernetics. 21, 145-159.

Grossberg, S. (1980). How does a brain build a cognitive code'? Psychological Review. 87. 1-51.

Grossberg, S. (1982al. Processing of expected and unexpected events during conditioning and attention: A psychophysiol- ogical theory. Psychological Review. 89 529-572

Grossberg. S. (1982b). Studies of mind and brain: Neural prin- ciples of learning, perception, developmem, cognition, and mo- tor control. Boston: Reidel Press.

Grossberg, S. (1984]. Some psychophysiological and pharmacol- ogical correlates of a developmental, cognitive, and motiva- tional theory. In R. Karrer. J. Cohen, & P. Tueting CEds.), Brain and information: event related potentials, New York: New York Academy of Sciences.

Grossberg, S., & Marshall, J. A. (1989). Stereo boundary fusion by cortical complex cells: A system of maps. filters, and feed- back networks for multiplexing distributed data. Neural Net- works, 2, 29-.5 i,

Grossberg, S.. & MingoUa, E. 11985a). Neural dynamics of Iorm percepuon: boundary completion, illusory figures, and neon color spreading. Psychological Review, 92, 173-21 l,

Grossberg, S., & Mingolla, E. (1985b). Neural dynamics of per- ceptual grouping: Textures, boundaries, and emergent seg- mentations. Perception & Psychophysics, 38, 141~171.

Harris. M. G. (1986). The perception of moving stimuli: A model of spatiotemporal coding in human vision. Vision Research, 26, 1281-1287.

Hebb. D. O. (1949). The organization of behavior. New York: Wiley.

Heeger, D. J. ~1987). Model for the extraction of image flow. Journal of the Optical Society o f America A. 4. 1455-1471.

Heeger, D. J. (1988). Optical flow using spatiotemporal filters. International Journal of Computer Vision, 1,279-302,

Hildreth. E. C. (1983). Computing the velocity field along con- tours. Proceedings of the ACM SIGGRAPH/SIGART Inter- disciplinary Workshop on Motion (pp. 26-32/. Association for Computing Machinery.

Page 25: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

Self-Organization in Motion Perception 69

Hirsch, H. V. B., & Spinelli, D. N. (1970). Visual experience modifies distribution of horizontally and vertically oriented receptive fields in cats. Science, 168, 869-871.

Horn, B. K. P., & Schunck, B. G. (1981). Determining optical flow. Artificial Intelligence, 17, 185-203.

Hubel, D. H., & Wiesel, T. N. (1962). Receptive fields, binocular interactions, and functional architecture in cat's visual cortex. Journal of Physiology, 160, 106-154.

Hubel, D. H., & Wiesel, T. N. (1963). Shape and arrangement of columns in cat's striate cortex. Journal of Physiology, 165, 559-568.

Hubel, D. H., & Wiesel, T. N. (1965). Receptive fields and func- tional architecture in two nonstriate visual areas (18 and 19) of the cat. Journal of Neurophysiology, 28,229-289.

Hubel, D. H., & Wiesel, T. N. (1968). Receptive fields and func- tional architecture of monkey striate cortex. Journal of Phys- iology, 195. 215-243.

Hubel, D. H., & Wiesel, T. N. (1970). The period of susceptibility to the physiological effects of unilateral eye closure in kittens. Journal of Physiology, 206, 419-436.

Hubel, D. H., & Wiesel, T. N. (1977). Functional architecture of macaque monkey visual cortex. Proceedings of the Royal So- ciety of London, Ser. B, 198, 1-59.

Hubel, D. H., Wiesel, T. N., & LeVay, S. (1977). Plasticity of ocular dominance columns in monkey striate cortex. Philo- sophical Transactions of the Royal Socie(v of London, Ser. B. 278, 377-409.

Kato, H., Bishop, P. O., & Orban, G. A. (1978). Hypercomplex and simple/complex cell classifications in cat striate cortex. Journal of Neurophysiology, 41, 1071-1095.

Kennedy, H., & Orban, G. A. (19831. Response properties of visual cortical neurons in cats reared in stroboscopic illumi- nation. Journal of Neurophysiology, 49, 686-704.

Kersten, D. (1987). Predictability and redundancy of natural im- ages. Journal of the Optical Society of America A, 4, 2395- 2400.

Kersten, D., O'Toole, A. J., Sereno, M. E., Knill, D. C., & Anderson, J. A. (1987). Associative learning of scene param- eters from images. Applied Optics, 26, 4999-5006.

Knill, D. C., & Kersten, D. (1988). The perception of correla- tional structure in natural images. Investigative Ophthalmology and Visual Science, 29(Suppl.), 407.

Koendcrink, J. J. (1986). Optic flow. Vision Research, 26(1), 161- 180.

Kohonen, T. (1982a). Self-organized formation of topolog- ically correct feature maps. Biological Cybernetics, 43, 59- 69.

Kohoncn, T. (1982b). A simple paradigm for the self-organized formation of structured feature maps. In S. Amari & M. A. Arbib (Eds.), Competition and cooperation in neural networks. New York: Springer-Verlag.

Kohonen, T. (1984). Self organization of associative memory. New York: Springer-Verlag.

Kohonen. T. (19871. Adaptive, associative, and self-organizing functions in neural computing. Applied Optics, 26, 491/)- 4918.

Kohonen, T., & Oja, E. (19761. Fast adaptive formation of or- thogonalizing filters and associative memory in recurrent net- works of neuron-like elements. Biological Cybernetics, 21, 85- 95.

Linsker, R. (1986a). From basic network principles to neural ar- chitecture: Emergence of spatial-opponent cells. Proceedings of the National Academy of Sciences of the U.S.A., 83, 7508- 7512.

Linsker, R. (1986b). From basic network principles to neural architecture: Emergence of orientation-selective cells. Pro- ceedings of the National Academy of Sciences of the U.S.A., 83. 8390-8394.

Linsker, R. (1986c). From basic network principles to neural ar- chitecture: emergence of orientation columns. Proceedings. of

the National Academy of Sciences of the U.S.A., 83, 8779- 8783.

Luhmann, H. J., Martinez Mill~n, L., & Singer, W. (19861. De- velopment of horizontal intrinsic connections in cat striate cor- tex. Experimental Brain Research, 63, 443-448.

Lund, J. S. (1987). Local circuit neurons of macaque monkey striate cortex: I. Neurons of laminae 4C and 5A. Journal of Comparative Neurology, 257, 60-92.

Mart, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. San Francisco: W. H. Freeman and Company.

Marr, D., & Ullman, S. (19811. Directional selectivity and its use in early visual processing. Proceedings of the Royal Society of London, Ser. B, 211, 151-180.

Marshall, J. A. (1988a). Aperture effects in visual motion: Ve- locity and direction judgments via lateral intrinsic connections. Investigative Ophthalmology and Visual Science, 29(Suppl.), 2511.

Marshall, J. A. (1988b). Self-organizing neural networks ¢or per- ception of visual motion (Tech. Rep. 88-01(I). Boston Univer- sity Computer Science Department.

Marshall, J. A. (1989). Neural networks lor computational vision: Motion segmentation and stereo fusion. Ph.D. Dissertation, Boston University, MA. Ann Arbor, MI: University Micro- films Inc.

Maunsell, J. R., & Van Essen, D. C. (1983). Functional prop- erties of neurons in middle temporal visual area of the macaque monkey. 1. Selectivity for stimulus direction, speed, and orientation. Journal of Neurophysiology, 49, 1127- 1147.

Michalski, A., Gerstein, G. L., Czarkowska, J., & Tarnecki, R. (1983). Interactions between cat striate cortex neurons. Ex- perimental Brain Research, 51, 97-107.

Mikami, A., Newsome, W. T., & Wurtz, R. H. (1986a). Motion selectivity in macaque visual cortex: I. Mechanisms of direction and speed selectivity in extrastriate area MT. Journal of Neu- rophysiology, 55, 1308-1327.

Mikami, A., Newsome, W. T., & Wurtz, R. H. (1986b). Motion selectivity in macaque visual cortex: II. Spatiotemporal range of directional interactions in MT and V1. Journal of Neuro- physiology, 55, 1328-1339.

Mitchison, G., & Crick, F. (1982). Long axons within the striate cortex: Their distribution, orientation, and patterns of con- nection. Proceedings of the National Academy of Sciences of the USA., 79, 3661-3665.

Movshon, J. A., Adelson, E. H., Gizzi, M. S., & Newsome, W. T. (1985). The analysis of moving visual patterns. In C. Cha- gas, R. Gattass, & C. Gross (Eds.), Pattern recognition mech- anisms (pp. 117-1511. Vatican City: Pontifical Academy of Sciences.

Movshon, J. A., Newsome, W. T., Gizzi, M. S., & Levitt, J. B. (1988). Spatio-temporal tuning and speed sensitivity in ma- caque visual cortical neurons. Investigative Ophthalmology and Visual Science, 29(Suppl.), 327.

Nagano, T., & Kurata, K. (1981). A self-organizing neural net- work model for the development of complex cells. Biological Cybernetics, 40, 195-200.

Nakayama, K., & Loomis, J. M. (1974). Optical velocity patterns, velocity-sensitive neurons, and space perception: A hypoth- esis. Perception, 3, 63-80.

Nelson, J. I., & Frost, B. J. (1985). Intracortical facilitation among co-oriented, co-axially aligned simple cells in cat striate cortex. Experimental Brain Research, 61, 54-61.

Newsome, W. T., Mikami, A., & Wurtz, R. H. (19861. Motion selectivity in macaque visual cortex. III. Psychophysics and physiology of apparent motion. Journal of Neurophysiology, 55, 1340-1351.

Newsome, W. T., & Pard, E. B. (1988). A selective impairment of motion perception following lesions of the middle temporal visual area (MT). Journal of Neuroscience, 8, 22(11-2211.

Page 26: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

70 ./. A. Marshall

Norman, J. E , Lappin, J. S., & Wason, T. D. (1988). Long-range detection of the geometric components of optic flow. Inves- tigative Ophthalmology and Visual Science, 29(Suppl.), 25t.

Orban, G. A., & Gulyhs, B. (1988). Image segregation by motion: Cortical mechanisms and implementation in neural networks~ In R. Eckmiller & C. vonder Malsburg (Eds.), Neural com- puters. NATO ASI Series, 41. New York: Springer-Verlag.

Orban, G. A., Kato, H., & Bishop, P. O. (1979a). End-zone region in receptive fields of hypercomplex and other striate neurons in the cat. Journal of Neurophysiology, 42, 818- 832.

Orban, G. A., Kato, H., & Bishop, P. O. (1979b). Dimensions and properties of end-zone inhibitory areas in receptive fields of hypercomplex cells in cat striate cortex. Journal of Neu- rophysiology, 42, 833-849.

O'Toole, A. J., & Kersten, D. (1986). Adaptive connectionist approach to structure from stereo. Journal of the Optical So- ciety of America A, 3, 72.

Pasternak, T., & Leinen, L. J. (1986). Pattern and motion vision in cats with selective loss of cortical directional selectivity. Journal of Neuroscience, 6, 938-945.

Pearson, J. C., Finkel, L. H., & Edelman, G. M. (1987). Plasticity in the organization of adult cerebral cortical maps: A computer simulation based on neuronal group selection. Journal of Neu- roscience, 7, 4209-4223.

Poggio, T., & Hurtbert, A. C. (1988). Learning receptive fields for color constancy. Investigative Ophthalmology and Visual Science, 29(Suppl.), 301.

Rakic, P. (1977). Prenatal development of the visual system in rhesus monkey. Philosophical Transactions of the Royal So- ciety of London, Ser. B, 2711, 245-260.

Ramachandran, V. S., & Anstis, S. M. (1983). Extrapolation of motion path in human visual perception. Vision Research, 23, 83-85.

Regan, D. (1986). Visual processing of four kinds of relative motion. Vision Research, 26, 127-145.

Reichardt, W. (1961). Autocorrelation, a principle for the eval- uation of sensory information by the central nervous system. In W. A. Rosenblith (Ed.), Sensory communications. New York: Wiley.

Rockland, K. S., & Lund, J. S. (1982). Widespread periodic in- trinsic connections in the tree shrew visual cortex. Science, 215, 1532-1534.

Rockland, K. S., & Lund, J. S. (1983). Intrinsic laminar lattice connections in primate visual cortex. Journal of Comparative Neurology, 216, 303-318.

Rockland, K. S., Lurid, J. S., & Humphrey, A. L. (1982). An- atomical banding of intrinsic connections in striate cortex of tree shrews ( Tupaia glis). Journal of Comparative Neurology, 209, 41-58.

Rodman, H. R., & Albright, T. D. (1987). Coding of visual stim- ulus velocity in area MT of the macaque. Vision Research, 27, 2035-2048.

Saito, H., Tanaka, K., Fukada, Y., & Oyamada, H. (1988). Anal- ysis of discontinuity in visual contours in area 19 of the cat. Journal of Neuroscience, 8, 1131-1143.

Saito, H., Yukie, M., Tanaka, K., Hikosaka, K., Fukada, Y., & Iwai, E. (1986). Integration of direction signals of image mo- tion in the superior temporal sulcus of the macaque monkey. Journal of Neuroscience, 6, 145-157.

Sereno, M. E. (1986). Neural network model for the measurement of visual motion. Journal of the Optical Society of America A, 3, 72.

Sereno, M. E. (1987). Implementing stages of motion analysis in neural networks. Program of the Ninth Annual Conference of the Cognitive Science Society (pp. 405-416). Hillsdale, NJ: Lawrence Erlbaum Associates.

Sethi, I. K., & Jain, R. (1987). Finding trajectories of feature

points in a monocular image sequence. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-9, 56-73.

Shimojo, S., Silverman, G. H., & Nakayama, K. (1988). Occlu- sion and the solution to the aperture problem for motion. Investigative Ophthalmology and Visual Science, 29(Suppl. ). 264.

Singer, W. (1983). Neuronal activity as a shaping factor in the self-organization of neuron assemblies. In E. Basar, H. Flohr, H. Haken, & A. J. Mandell (Eds.), Synergetics of the brain. New York: Springer-Verlag.

Singer, W. (1985a). Activity-dependent ~elf-organization of the mammalian visual cortex. In D. Rose & V. G. Dobson (Eds.), Models of the visual cortex (pp. 123-136). New York: John Wiley and Sons.

Singer, W. (1985b). Central control of developmental plasticity in the mammalian visual cortex. Vision Research, 25,389-396.

Sperling, G., van Santen, J. P. H., & Butt. P. (1985). Three theories of stroboscopic motion detection. Spatial Vision, 1. 47-56.

Stent, G. S. (1973). A physiological mechanism for Hebb's pos- tulate of learning. Proceedings of the National Academy of Sciences of the U.S.A., 70, 997-1001.

Stork, D. G., & Wilson, H. R. (1988). Considerations of Gabor functional descriptions of visual cortical receptive fields. Pre- print.

Takeuchi, A., & Amari, S. (1979). Formation of topographic maps and columnar microstructures in nerve fields. Biological Cybernetics, 35, 63-72.

Tanner, J. E. (t986). Integrated optical motion detection. Ph.D. Dissertation, California Institute of Technology (Caltech Tech. Rep. 5223:TR:86).

Thompson, W. B., & Pong, T. C. (in press). Detecting moving objects. International Journal of Computer Vision.

Trotter, Y., Frtgnac, Y., & Buisseret, P. (1987). The period of susceptibility of visual cortical binocularity to unilateral pro- prioceptive deafferentation of extraocular muscles, Journal of Neurophysiology, 58, 795-815.

Ts'o, D. Y., Gilbert, C. D., & Wiesel, T. N. (t986). Relationships between horizontal interactions and functional architecture in cat striate cortex as revealed by cross-correlation analysis. Journal of Neuroscience, 6, 1160-1170.

Van Essen, D. C., & Anderson, C. H. (t987). Reference frames and dynamic remapping processes in vision. In E. Schwartz (Ed.), Computational neuroscience. Cambridge, MA: MIT Press.

Van Essen, D. C., & Maunsell, J. H. R. (1983). Hierarchical organization and functional streams in the visual cortex. Trends in Neurosciences, 6, 370-375.

van Santen, J. P. H., & Sperling, G. (1984). A temporal covari- ance model of motion perception. Journal of the Optical So- ciety of America A, 1,451-473.

yon der Malsburg, C. (1973). Self-organization of orientation sen- sitive cells in the striate cortex. Kybernetik. 14, 85-100.

yon der Malsburg, C., & Cowan. L D. 11982). Outline of a theory. for the ontogenesis of iso-orientation domains in visual cortex. Biological Cybernetics. 45, 49-56.

Waibel, A.. Hanazawa, T.. Hinton. G.. Shikano, K., & Lang, K. (1987). Phoneme recognition using time-delay neural networks. ATR Technical Report. Advanced Telecommunications Re- search Institute International. Japan.

Wallach, H. (1935). Uber visuelt wahrgenommene Bewegungs- richtung. Psychologische Forschung, Zll, 325-380.

Wallach, H. (1976). On perception. New York: Quadrangle. Waiters, D. (1987). Properties of conneetionist variable repre-

sentations. Program of the Ninth Annual Conference of the Cognitive Science Society (pp. 265-273). Hillsdate, NJ: Law- rence Erlbaum Associates.

Watson, A. B. (1987). Efficiency of a model human image code. Journal of the Optical Society of America A. 4. 2401-2417.

Page 27: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

Sel f-Organizat ion in Mot ion Perception 71

Waxman, A. M., & Duncan, J. H. (1986). Binocular image flows: Steps toward stereo-motion fusion. IEEE Transactions on Pat- tern Analysis and Machine Intelligence, PAMI-8,715-729.

Welch, L. (1988). Speed discrimination and the aperture problem. Investigative Ophthalmology and Visual Science, 29(Suppl.), 264.

Wiesel, T. N., & Hubel, D. H. (1965). Comparison of the effects of unilateral and bilateral eye closure on cortical unit responses in kittens. Journal o f Neurophysiology, 28, 1029-1040.

Williams, D., & Phillips, G. (1987). Cooperative phenomena in the perception of motion direction. Journal of the Optical So- ciety of America A, 4, 878-885.

Willshaw, D. J., & v o n der Malsburg, C. (1976). How patterned neural connections can be set up by self-organization. Pro- ceedings o f the Royal Society of London, Ser. B, 194, 431- 445.

Wilson, H. R. (1988). Development of spatiotemporal mecha- nisms in infant vision. Vision Research, 28, 611-628.

A P P E N D I X : I M P L E M E N T A T I O N D E T A I L S

This section will describe in detail the three simulations. For each simulation, equations and parameters will specify the coordinate positions of the cells in the network, the strengths of excitatory and inhibitory connections between cells, the manner in which each cell's activity level changes according to its inputs, the man- ner in which connection strengths vary according to cell activity correlations, and the sequences of simulated visual input to the network. In the following discussion, let the symbol ~)~ (with in- dexing subscripts and superscripts) refer to a number drawn pseu- dorandomly from the interval [0,1), and define the notations

Ix] -= max(x, 0), lxl = floor(x), [x] = ceil(x).

The function floor(x) produces the greatest integer less than or equal to x, and the function ceil(x) produces the least integer greater than or equal to x.

N e t w o r k S t r u c t u r e : C e l l C o o r d i n a t e P o s i t i o n s

The cells in each simulated network are organized into two layers, L, and L2; the layer in which the ith cell resides is specified by ':-> Within each layer, every cell has X and Y spatial coordinates, specified by rL and %. Let the quantities <:,~.(1) = t.]~'(21 and ~.'t ~ = ".'J(:' be the total number of coordinate positions along the X- dimension and Y-dimension of the network lattice, respectively. Let t ~u and c ~2~ be the number of cells per lattice position in L~ and L2, respectively. Let "S. ~ --- r'(")~,~(~>t -'(~) be the number of cells in L~ and '2. ('~> -= r't{2)~'~!2)t ~2~ be the number of cells in L2. Then

! i f 0 ~ i < 2. ~'', ~:., = if-'~) ~ i < ?l~> + -~l:, (1)

otherwise;

= / l ( i mod ( ! ' / ' ( l l ( ' ~ l ) ) ) / ~ a{l)] if ~,, = 1, rL [ 1 ( ( i - - :u) rood (r~,'(:)~:(:)))/02' I if ":,, = 2: (2)

~,, = Ili/(<*<'~C"~)l if "S,, = 1, ' [ l ( i - ~<")/(r:t%':') 1 if_., = 2. (3)

in addition, each cell can have a preferred length, .~'~, a preferred local direction, %, and a preferred local velocity, i",.

(I.) In Simulation I, ,~:',, %, and ~'~ are specified in advance for each cell, and the connections between cells are prewired to re- spect those preferences. Only cells sensitive to a single local ve- locity are simulated; thus, let ~", = 1 for all i. Let the quantities s: (u and S'~-') represent the total number of distinct preferred lengths of ceils at layers L~ and L> respectively. Likewise, let ~v~) and 'J'~:~ represent the total number of distinct preferred local directions in L, and L2. Then for each cell,

J ~ l + 2(i mod (~'('),,~<~)) if £., = 1, ':" = [1 (4) + 2[((i - %~')) mod (.~,(2)~(2)))/.~,(2)] if "S., = 2;

,,, { ( ) if "----, = 1, ' = 1 + ( i - "-"))mod,~:) if?_., = 2. (5)

Using the parameters supplied for Simulation I, this arrangement produces values in the set {1,3,5,7} for ,se,, representing segment lengths of 1, 3, 5, or 7, and values in the set {-1,0,1} for 'J~, representing vertical, diagonal (rightward-upward), or horizontal motion. This range of values was chosen to satisfy the constraints on length imposed by the geometry of the Wallach (1935) display; the length of the diagonal edge varies in increments of 2 units. In L~, the value 0 for each % represents diagonal local direction, whereas in L2, the values - 1, 0, or 1 for % represent the actual direction along which the cell's lateral excitatory connection chains are preferentially aligned.

(II.) In Simulation II, only a l-dimensional slice through the network is simulated; this is represented by letting ,,¢u = ¢q{2t =

1. Similarly, the simulation uses only a single length; hence let .s:,(,) = .~,{2) = 1 and let .~:'~ = 1 for all i. Two directions (left- ward = - 1 and rightward = 1) and three speeds (1,2,3) are represented. However, the direction and velocity sensitivities only of cells in Lt are prespecified; the sensitivities of L2 cells become determined later by the process of self-organization. Ac- cordingly, 'L and ~', will be described only for cells in L,:

% = s g n ( ( i m o d 6 ) - 2 . 5 ) i f - . , - 1 , and (6)

t',, = [l(i mod 6) - 2.51] if 5_., - 1. (7)

This results in values of (3,2,1,1,2,3) for the ¢, and ( - 1 , - 1 , - 1,1,1,1) for the ~'~,.

(III.) In Simulation III, .~', = 1 and ~', = 1 for all i. All cells in layer L~ have the same local direction sensitivity, so % = 0 if -., = 1. Finally, the direct'ion sensitivity of L2 cells is unspecified initially and becomes determined later through self-organization.

N e t w o r k S t r u c t u r e : In i t ia l C o n n e c t i o n S t r e n g t h s

The strengths of excitatory and inhibitory connections between cells depend initially on a combination of positional and random factors. Exposure to visual input may modify the connection strengths, and subsequently the connection strengths may reflect certain patterns or trends that occur statistically often in the input.

In all three simulations, the initial inhibitory connection strengths are defined the same way. Let the quantity zj,(t) represent the strength of the inhibitory ( - ) connection from the jth cell to the ith cell at time t. Then

{Z~ if~'~,, = rl',,% = %, ,~. = -- = 2, a n d i : ~ j , z,;:(O) = 0 otherwise l

(8)

Thus in all three simulations, each L2 cell projects inhibitory con- nections to all the other cells at the same coordinate position.

(I.) Because Simulation I is intended to model the bahavior of a network after a period of self-organization has occurred, its excitatory connection strengths are initialized at values that would have resulted from exposure to visual input. A number of as- sumptions, embodied in the following equations, were required to accomplish this prewiring. Since the process of self-organization itself is not modeled in Simulation I, the connection strengths do not change. The strength of an excitatory ( + ) connection from the jth cell to the ith cell may therefore be written as z~Tk = zi~k(t) for all times t. The additional index k is included to allow multiple excitatory connections between each pair of cells to be described. (Typically, different values of k specify different signal transmis- sion latencies.) So let

Z~ i f~ = 1,-., = 2, r),) = r~),% = ,~,, ~: = 3:', an~l k = O,

_ ~ Z L e x p ( - g a x ( r L - r~.; - k)'- - ~br('.'), % - k'o~,) 2 z,a - 1 -4>, , ( 'Z- ¢,)2 _ ~,( ,~,_ ~,).,)

[ if that value _> 0.05, 5, = "5:, = 2, and i :~ j, [(3 otherwise.

(9) This equation specifies two kinds of excitatory connections: bot- tom-up (B) and lateral (L). The bottom-up connections are all equal to a constant, ZB, but the lateral connections depend on a multiterm Gaussian factor. The Gaussian is designed to produce the strongest connections between cells with similar receptive field

Page 28: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

72 J. A . Marsha l l

properties at the appropriate relative positions. The lateral con- nection strengths will be strongest when the factor is minimized, and the factor is most easily understood by examining the con- ditions under which it is minimized. Consider each term in the factor separately, and assume that the parameters Cx, Cr, ~bo, and ~b~ are nonzero. Then the factor is minimized when the fol- lowing four conditions hold:

r~.'~ = ;% + k, % = % + k%, % = v), ando~'~ = .~). (10)

If k is viewed as a time-delay for a stimulus moving at a rate of l spatial position per time unit, then the four conditions hold true under the following interpretation. For each value of k, the ith cell and the ]th cell are k positions apart along the X-dimension and k positions apart (in the preferred direction % of cell i) along the Y-dimension; furthermore, the ith and j th cells have the same direction and length preferences. Under these conditions, the kth lateral excitatory connection from the jth cell to the ith cell will be as strong as possible. For each combination of j, i, and k, if any of these conditions fails to hold, then the strength of z~,+~ will be correspondingly weaker. The influence of each term on the overall connection strength is determined by the Gaussian band- width parameters qSx, ,dpr, ¢o, and q~r. In Simulation I, the time- delay variable k was restricted to the value 1, and the bandwidth parameters had the following values:

4 ~ . ~ = ~ = ~ , , = 4 ; ~, =0.01.

This combination made the excitatory connection strengths heav- ily dependent on the cells' relative positions and direction pref- erences but only weakly dependent on the cells' length prefer- ences. The combination was chosen to illustrate how connection strengths that are tuned broadly with respect to length could allow the network to process an edge segment whose length is changing, as in Wallach's (1935) displays (Figure 5). The bandwidth param- eters could be altered to permit the network to handle variations in other stimulus at t r ibutes--for example, smoothly curved mo- tion trajectories. Such parameter choices are only illustrative of the capabilities of this type of network, however. In true self- organizing systems, such as Simulations II and III, the actual bandwidth values are determined largely by the statistical behavior of the input to the system.

(II.) In Simulation II, the initial excitatory connection strengths were generated via a combination of positional and random fac- tors.

Zt~(1 + .01(2,~j,~ 11) it ~, = 1, ~, = 2. ~ = ~,. and k = 0,

ZL(I + .01(2%ik - 1))exp(-jr(% - ,).j)z)

- %1 -<- 4. i ¢ i. and k E {1 2,3},

0 otherwise.

(it)

Bottom-up strengths are topographically mapped and are all in- itialized to the same value, Zn, up to a small random factor (+-1%). Lateral strengths depend on distance along the 1-dimensional slice across the network, with a small random factor, and are scaled by the constant ZL.

(III:) Initial excitatory connection strengths were generated in a similar manner for Simulation III. One slight difference is that the cells in Simulation III are positioned on a toroidal lattice, in order to eliminate complications arising from asymmetries of the network near its outer edges. The computation of adjacency re- lations between cells thus uses a modulus operation to allow "wraparound."

' Z . ( 1 + .01(2.%,, - 1)) i f ~/, = ,%, % = %, 2-j = 1, "S.~ = 2,

and k = O, Z,.(a + .Ol(2'.R,k - 1 ) ) exp ( -# (~ . % - ~.))2

-IK%- %)") z,~,(O) = ' if S, = S~ = 2, i # j, k = 1,

((~X, - ,~) rood 5 <-- 1 or ( ~ - ~X,) rood 5 <- 1), and

( ( % - % ) m o d 3 - - - l o r (~i - %) mod 3 <- 1),

,0 otherwise.

(12)

Eqnations Governing Cell Activity The activity x, of the ith cell in L2 is governed by the following differential equation, which is a variant of equations studied in depth by Ellias and Grossberg (t975) and Gro~sberg (1972, 1982b). The simulations require interpolation of certain values, .f,, defined in eqns. (18) and (19). For all i such that : = 2. let

d dtX, = - A x~

+ ( B - x 3 ( ~ a ( 2 i ( t - r k ) ) z A + / L~ ; ) ) (13}

- - ( ( + .0 ~ g(x,)z;; i

where

[ , ] a lx t ~- a (1 - exp(m - s~x 1~)) o (14)

f{x) = l~x - I'~ ~ ~15I

g(x) = ;,~x |,~2 t16/

z~ = k. (17'1

The right-hand side of the differential equanon consists of three main terms, each of which contributes to the rate of change of the activity level of an L2 cell. described b) the variable x,. The first term. - A x . . is a decay term; whenever x, is nonzero, this term tends to push it back toward zero. Even when all the other terms of the equation are zero. the cell's activiW level will decay exponentially.

The second term, - (B - x,) (X,.k a(£,(t r,))zgk - f(x,)), is a shunting term that describes the effect of excitatory input on the cell's activity. The factor (B - x,) multiplies t he summed input: this factor keeps the effects of excitation bounded. If a cell's activity, x , becomes so high that it approaches the quantity B. then (B - xi) approaches zero. Because (B - x,) multiplies, or shunts, the remainder of the term, the effect of this term dimin- ishes as x, grows toward B. Thus. this term can never force x. quite up to the level B. The total amount of excitatory input to a cell is computed by applying a sigmoid-shaped signal function, a, to the time-delayed outputs of other cells. ~,(t - rk), then multiplying each result by the appropriate excitatory connection strength zgk, and then summing all the results. The signal function a(x) is related via scaling factors to the sigmoid-shaped function 1/(1 - e "). In addition, each cell generates a certain amount of feedback excitation, f (x3, which can autoeatalytieally sharpen the cell's activity level in contrast to the activity levels of other cells.

The third term. - ( C + xi) X,g(x)z,; . also a shunting term. describes the effect of inhibition on the ith cell's activity level The result of applying a signal function to the activity level o~ each cell, g(x,), is multiplied by the inhibitory input connection strength to the ith cell. ziT. The individual inhibitory inputs are then added together and multiplied by the shunt factor - ( C - x~), which prevents x from declining below the quantity - C.

The effects of time-delays in the equation are approximated by recording the values of each x. at discrete intervals and inter- polating when necessary. The interpolation can be described by first defining the step functions

X ( t ) = x.(lt]) (18~

and then stipulating that the time-delayed values $,(t - rk) be found during numerical integration by using the function

~xi(t) if T = t. L( T) = ~ X,( T) + ( X~( T + 1) - X,( T))( T - [T])P otherwise,

(19~

where the exponent p is chosen so that , i ( t ) approximates a typical shape of x i (0 on an interval [t, t + 1].

When each Simulation commences, all cells are inactive and at equilibrium: x~(0) = 0 and d/dtx,(O) = O.

Equations Governing Adaptation Let the quantity zi~(t) represent the strength at time t of the kth excitatory connection from the j th cell to the ith cell. In Simulation

Page 29: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

S e l f - O r g a n i z a t i o n in M o t i o n P e r c e p t i o n 73

I, both the excitatory and inhibitory connection strengths are assumed to be constant: (d/dt)zjTk(t) = (d/dt )z~, ( t ) = 0. In Simulation II, the inhibitory connection strengths are constant: (d /d t ) z j , ( t ) = 0, but the excitatory connections strengths between each pair of cells can vary adaptively according to the correlations in activity of the pair. Excitatory connections in both Simulations II and III "compete" for quantity E of target-cell connection sites according to a differential equation which is a variant of a learning rule proposed by Carpenter and Grossberg (1987a):

d "~ttzJ~k(t) - eS (x i ( t ) ) ( (E - z/,~)c~(Ycj(t- rk))W,

-- Z,:a ~ a ( 2 , ( t - ~,))Wh~) (20) Ih l : h ~ i v l~'k}

where

S(x ) = [x~ 2 (21)

and

otherwise.

The rate of change of each excitatory connection strength z,,+k depends on several factors. The first factor, e, is a parameter that governs the time scale of the adaptation. It is chosen to be small in relation to the parameters in the cell-activity equation, so that each connection strength changes only gradually, reflecting sta- tistical tendencies accumulated over many input presentations.

Because the second factor, S ( x M ) ) , multiplies the entire right- hand side of the equation, z~,+e cannot change unless S ( x M ) ) is nonzero. When the ith cell is active, S ( x M ) ) is positive, and the cell's excitatory input connections are thereby permitted to vary. Thus, S is a sampl ing function, by which the activity level x M ) of the ith target cell controls the rate at which its excitatory input connection strengths can vary. Because the sampling function S ( x M ) ) for Zj,+k depends on the activity of the target cell, x, (as opposed to the activity of the source cell, xj), this adaptive rule is classified as an instar rule (as opposed to an outstar rule) (Gross- berg, 1982b). An instar rule was chosen in this case primarily because it permits each L~ cell to maintain excitatory connections to more than one L., cell. An architecture using an outstar rule would have tended to force each LI cell to connect more strongly to one L2 cell than to others, instead of connecting equally to all cells in an L2 cluster.

The third factor consists of two main terms, (E - Zj+,k)a(2,(t -- rk))Wj, and - z~TkE/h.,: h,j , ~ a ( 2 ~ ( t -- r,)) W~i, which implement a form of competition for target-cell connection sites. The factors (E - zjT,) in the first term and - z~ , in the second term serve to keep connection strengths bounded between 0 and E via a shunting mechanism. The factor a( i j ( t - rk)) W, represents the activity-dependent "vigor" with which z~,+, "grabs for" con- nection sites, in competition with the vigor of all the other input excitatory connections to the ith cell, E l ~ . ~ : ~ a ( . f ~ ( t - r~))W~,.

This adaptive rule thereby produces the desired effects: when a cell is active (S(x,) > 0), its excitatory input connections (z~.D from active cells (where cr(.f~(t - r~)) is large) become stronger, at the expense of the strengths of its excitatory input connections (z~) from inactive cells (where a(2~(t - r0) is small).

In Simulation III, the inhibitory weights vary according to the following equation, modified from an excitatory learning equation proposed by Grossberg (1980):

d "~tz, = 6~xi~ ( - z , + Vq(x~)). (23)

The sampling factor, [xi~, for the inhibitory connection z~, in this equation depends on the activity level of the source cell, x~. There- fore, this inhibitory learning equation is an outstar rule (Gross- berg, 1982b), unlike the excitatory learning equation described previously. Whenever the source cell is active (x, > 0), the con- nection strength z~7 tends to move toward the quantity represented by Vq(x3 . The simplest possible form for q(x3 is used in Simu- lation III:

q(x,) = &. (24)

Thus, when x~ and x~ simultaneously attain high values, the inhib- itory connection z~7 tends to strengthen. The strengthening then

in turn reduces the probability that x, and x~ will simultaneously attain high values in the future, because the two cells will be able to inhibit each other 's activity more strongly. Conversely, when the jth cell is active but the ith cell is inactive, the connection z,, tends to weaken, so that the ith cell might become more active in the future. The parameter Vcontrols the overall extent to which coactivation of the ith and jth cells is permitted. For example, if V is a large number, then the inhibition strengths will become correspondingly large, and the cells can be only weakly coacti- vated. The time scale of the equation is governed by the parameter g, which is chosen so that 6 < < ~. This choice forces inhibitory strengths to become established very gradually, based on the over- all response properties generated over time by excitatory con- nections.

Input Sequences (l.) Input to the network in Simulation I was produced as follows: for all i such that 2,, = I, let

x,(t) = w,.~,([t]). (25)

where

j ~ ' t , , k-= %, and / - - ~.',: (26)

and let all wj.k.e(t) = 0 for t = I. 2 .3 . . . . . except the following, which all equal 1:

w,,.~.,(1), w,.,.3(2), w23.5(3), u',,~(4), w~r(5), u,,.~.,(6),

w<<7(7), w7,48), w~sT(9), w,,.~(l(I), w s ( l l ) w,,,~(12).

(II.) In Simulation II, an input stimulus can begin its sweep across the network at random times. However, due to the ran- domness, there might occur gaps of many time-steps during which no input is presented to the network. In order to speed the sim- ulation, such gaps are limited to a maximum of b time-steps. The equations below specify that gaps are compressed by finding the smallest index L, for which some input stimulus is presented to the network, and by skipping ahead to that stimulus. Input to the network in Simulation II was produced in the following manner. For all i such that % = I, let

x,(t) = w,,([t]), (27)

where

j ---= 0,, and v = [[(i rood 6) - 2.5M × sgn ((i rood 6) 2.5), (28)

and w~, is defined as follows. Let w,,.((I) = 0 for all j ,v . Then for t = 1, 2, 3 . . . . and L = 1, 2 .3 . . . . . define

~'max((), % ,.,(t 1)) if 0 -< ,/ - v -< 5, (29) y l g ( t ) = [max((), z~(," ,j>,/,,,,3)> otherwise,

where L (E {1, 2, 3 . . . . } and where

{'; it" .r > ~,. (30) ).~(x) = otherwise,

Let

[,, = min {L: 3], 0 such that y/~'(t) ~" 0 or

( 3 i such that t - b < i <- t and (31)

3], 0 such that w/o(t) ~ 0)}.

Then fo r t = 1 , 2 , 3 . . . . .

w,.,.(t) = yl'~,~(t). (32)

The following parameters were used to generate the input set for Simulation II: 1,- = 0.025, b = 3.

(III.) Input to Simulation III was produced in a similar manner. For all i such that % = 1, let

x,(t) = {~ if ::td such that wi,,,([t]) ,a 0, otherwise, (33)

where j -= ~.~.~, k -= %, and d ¢~ { - 1 , 0, 1}, and where w~k e(t) is defined as follows. For t E {0, 1, 2 . . . . }. let w,~.,,(O) = (I 'for all j , k , d. Define

Page 30: Self-Organizing Neural Networks for Perception of …cns-web.bu.edu/~yazdan/pdf/Marshall1990motion.pdfSelf-Organizing Neural Networks for Perception of ... (Figure la). The aperture

74 Y. A . Marsha l l

y}.L,!a(t) = max{0, wt,_, I . . . . {k a) . . . . a(t - 1) - 1,

2^.(9~t},)a,)[G + H~R~L,'e,]} (34)

whe re L E {1, 2, 3 . . . . } and where

{ ~ i f x - x, (35) 2~(x) = o the rwise .

For t = 1, 2, 3 . . . . . let

L, = min{L: ::l.f, j~, d such tha t y}.~.a(t) # 0 or

(=ff such tha t t - b < i -< t and (36) 3j , ~, c~ such tha t wi.t.a(t) # 0)}.

T h e n

wj.k.~(t) = y~.{'~e(t). (37)

Th~ fo l lowing p a r a m e t e r s were used to g e n e r a t e the input set for S imula t ion I I I : x = 0.0003, b = 2, G = 3, H = 10.

Parameters

The fo l lowing tab le lists the p a r a m e t e r s c o m m o n to all the Sim- u la t ions :

A = 30 B = I C = 0.1

a = 900/994 /~ = 30 x 1.25 ~ I = 0.01 s = 1300/0.1287 rn = 40 o = 0.006 ii = 3.38 x 642t105 Zi = l E = 1 p = ~

P a r a m e t e r s tha t d i f fered across the S imu la t ions are shown m the fo l lowing table :

P a r a m e t e r S imula t ion I S imu la t i on I1 S imula t ion 111

r:C/~ = r'~ ~2i 13 6 5

C ~z~ t 2 6 3 ~,~,, = yi2) 4 ] 1

¢,2! 3 2 3

y 40,000 80,000 200,000 Z~ 0.67 1 l Z~, 0.33 0.001 0.001

- - 211.25/0 .9 105.625/0.9 W --- 0.16 0.48 6 - - --- 0.9 V - - - - 7