Table 11.acs.ist.psu.edu/misc/dirk-files/Papers/VanOverwalle/... · Web viewIt documents how...

Social Cognition and Connectionism 1

Connectionist Exploration in Social Cognition

Frank Van Overwalle

Vrije Universiteit Brussel, Belgium

Christophe Labiouse

Belgian NFSR Research Fellow & University of Liège, Belgium

Robert French

Université de Liège, Belgium

This research was supported by Grant OZR423 of the Vrije Universiteit Brussel to Frank Van

Overwalle and Grant HPRN-CT-2000-00065 of the European Commission to Robert French.

Address for correspondence: Frank Van Overwalle, Department of Psychology, Vrije Universiteit

Brussel, Pleinlaan 2, B - 1050 Brussel, Belgium; or by e-mail: [email protected].

Running Head: Connectionism and Social Cognition

[PUBSOCO]

8 May, 2023


Connectionist Exploration in Social Cognition

Abstract

Major findings in social cognition are reviewed and modeled from a connectionist

perspective. These findings are in the areas of categorization and base-rate neglect, impression

formation, primacy and recency in impression formation, assimilation and contrast, increased recall

for inconsistent information, discounting in causal attribution, attitude formation, central and

heuristic processing, cognitive dissonance and the use of reasoning heuristics. The majority of

these phenomena are illustrated with well-known experiments, and simulated with an auto-

associative network architecture with linear activation update and delta learning algorithm for

adjusting the connection weights. All of the phenomena considered were successfully reproduced

in the simulations. Moreover, the proposed model is shown to be consistent with algebraic models

of impression formation (Anderson, 1981), causal attribution (Cheng & Novick, 1992) and attitude

formation (Ajzen, 1991). The discussion centers on how the particular simulation specifications

may be used to develop novel hypotheses for testing the connectionist modeling approach and,

more generally, for improving and unifying theorizing in the field of social cognition.


Connectionist modeling of theoretical and empirical data in social cognition has only

emerged during the last decade. This new approach arose from a certain dissatisfaction with

mainstream models and a growing concern for the limitations of these models. In particular, the

field suffered from a lack of theoretical integration. Inspired by the ever-increasing success of

connectionist models in cognitive psychology, a number of authors have turned to these models in

an attempt to provide a unified framework for social psychology research. In 1993 Read and

Marcus-Newhall wrote the first major article describing a connectionist model of causal reasoning.

Later, Smith (1996) forcefully argued for the application and the development of connectionist

ideas in social psychology.

Researchers have since made substantial progress in developing connectionist models of

diverse social psychological phenomena including person perception and group stereotyping

(Kunda & Thagard, 1996; Smith & DeCoster, 1998; Labiouse & French, 2001), causal attribution

(Van Overwalle, 1998, Read & Montoya, 1999), cognitive dissonance (Shultz & Lepper, 1996;

Van Overwalle & Jordens, 2001), group impression formation and change (Kashima, Woolcock, &

Kashima, 2000) and illusory correlation (Van Rooy & Van Overwalle, 2001c). However, there are

still large domains of social psychology that remain untouched by this new approach. Further,

insofar as each of the above articles focuses largely on a single domain of social psychology, the

field is still waiting for an overarching theoretical perspective.

In an attempt to provide an integrative account of the various fields within social

psychology, we will examine a number of mainstream findings in social cognition and will analyze

them from a common connectionist perspective. In the past, many findings in social cognition

have been explained by appeals to what often appear to be rather ad-hoc hypotheses and theories.

Moreover, various areas of the field, such as the “person perception”, the “impression formation”,

and the “intergroup relations” traditions, have unfortunately developed largely independently of

each other, despite the close conceptual connections of the topics. This has left the field with a

fragmentary theoretical basis. Our connectionist approach is an attempt to integrate some of these

theoretical areas into a more comprehensive whole. While we are aware that this is a major

undertaking, given the great power and flexibility of connectionist networks, as well as previous

successful attempts to model social data within this framework, we believe it is possible to use

these models to take a modest step towards the goal of unification of the field.


Many mainstream processes and findings in social cognition can be explained within a

connectionist framework and, in many cases, better than the statistical or associative memory

models developed in the past. What are the main characteristics of the connectionist models that

accomplish this?

First, connectionist models exhibit emergent properties such as prototype extraction, pattern

completion, generalization, constraint satisfaction, and graceful degradation. (All of these are

extensively reviewed in Smith, 1996, and Rumelhart & McClelland, 1986). It is clear that these

characteristics are potentially useful for any account of social cognitive phenomena. In addition,

connectionist models assume that the development of internal representations and the processing of

these representations are done in parallel by simple and highly interconnected units, contrary to

traditional models where the processing is inherently sequential. As a result, these systems have no

need for a central executive, which eliminates the requirement of previous theories of explicit

(central) processing of relevant social information. Consequently, information can, in principle, be

processed in an implicit and automatic manner without recourse to explicit conscious reasoning.

This does not, of course, preclude people’s being aware of the outcome of these preconscious

processes.

Second, neural networks are not fixed models but are able to learn over time, usually by

means of a simple learning algorithm that progressively modifies the strength of the connections

between the units making up the network. The fact that most traditional models in social

psychology are incapable of learning is a significant restriction. Interestingly, the ability to learn

incrementally puts connectionist models in broad agreement with developmental and evolutionary

pressures.

Third, connectionist networks have a degree of neurological plausibility that is generally

absent in previous statistical approaches to information integration and storage (e.g., Anderson,

1981; Cheng & Novick, 1992; Fishbein & Ajzen, 1975). While it is true that connectionist models

are highly simplified versions of real neurological circuitry and processing, it is commonly

assumed that they reveal a number of emergent processing properties that real human brains also

exhibit. One of these emergent properties is the integration of long-term memory (i.e., connection

weights), short-term memory (i.e., internal activation) and outside information (i.e., external

activation). There is no clear separation between memory and processing as there is in traditional


models. Even if biological constraints are not strictly adhered to in connectionist models of social

cognition (i.e., persuasion, prejudice, …), concerns of the biological implementation of social

cognitive mechanisms have indeed started to emerge (Adolphs & Damasio, 2001; Allison, Puce &

McCarthy, 2000; Ito & Cacioppo, 2001; Cacioppo, Berntson, Sheridan & McClintock, 2000;

Phelps, O’Connor, Cunningham, Funayama, Gatenby, Gore & Banaji, 2000) and parallel the

increasing attention paid to neurophysiological determinants of social behavior. Other emergent

properties of the connectionist approach will be explained in more depth in the next section.

This article is organized as follows: First, we will describe the proposed connectionist

model in some detail, giving the precise architecture, the general learning algorithm and the

specific details of how the model processes information. In addition, a number of other less well-

known emergent properties of this type of network will be discussed. We will then present a series

of simulations, using the same network architecture applied to a number of significantly different

phenomena. These phenomena involve categorization (base-rate neglect), impression formation

(primacy, recency and memory advantage for inconsistencies), assimilation and contrast (of traits

and exemplars), causal attribution (covariation, discounting and augmentation), attitudes

(formation and cognitive dissonance) and judgmental heuristics. We will also very briefly discuss

related work on group judgments (illusory correlation, ingroup versus outgroup differences).

Our review of empirical phenomena in the field is not meant to be exhaustive, but is rather

designed to illustrate how connectionist principles can be used to shed light on the processes

underlying social cognition. While the emphasis of the present article is on the use of a particular

connectionist model to explain a wide variety of phenomena in social cognition, previous

applications of connectionist modeling to social psychology (Smith & DeCoster, 1998; Read &

Montoya, 1999; Van Overwalle, 1998) are also mentioned. In addition, we will perform a

comparison of different models. Finally, we will discuss the limitations of the proposed

connectionist approach and discuss areas where further theoretical developments are under way or

are needed. Ultimately, what we would like to accomplish in this paper is to create a greater

awareness that connectionist principles could potentially underlie diverse social cognitive

phenomena.

A Recurrent Model

Throughout this paper, we will use the same basic network model - namely, the recurrent


auto-associator developed by McClelland and Rumelhart (1985). This model has already gained

some familiarity among social psychologists studying person and group impression (Smith &

DeCoster, 1998) and causal attribution (Read & Montoya, 1999). We decided to apply a single

basic model to emphasize the theoretical similarities that underlie a great variety of processes in

social cognition. In particular, we chose this model because it is capable of reproducing a wider

range of phenomena than other connectionist models, like feedforward networks (see Read &

Montoya, 1999) or constraint satisfaction models such as Thagard's ECHO (Van Overwalle, 1998).

The auto-associative network can be distinguished from other connectionist models on the

basis of its architecture (the elements of the model) and its learning algorithm (how information in

processed in the model). We will discuss these points in turn.

Architecture

The generic architecture of an auto-associative network is illustrated in Figure 1A. Its most

salient property is that all nodes are interconnected with all of the other nodes. Thus, all nodes

send out and receive activation.

Information Processing

In a recurrent network, processing information takes place in two phases. During the first

activation phase, each node in the network receives activation from external sources. Because the

nodes are interconnected, this activation is spread throughout the network in proportion to the

weights of the connections to the other nodes. The activation coming from the other nodes is

called the internal input (for each node, it is calculated by summing all activations arriving at that

node). This activation is further updated during a number of cycles through the network. Together

with the external input, this internal input determines the final pattern of activation of the nodes,

which reflects the short-term memory of the network. Typically, activations and weights have

lower and upper bounds of –1 and +1.

In the linear version of activation spreading in the auto-associator that we use here, the final

activation at each cycle is the linear sum of the external and internal input. In non-linear versions

used by other social researchers (Smith & DeCoster, 1998; Read & Montoya, 1999), the final

activation is determined by a non-linear combination of external and internal input. During our

simulations, however, we found that the linear version with a single internal updating cycle often


reproduced the observed data better. Therefore, we used the linear variant of the auto-associator

for all the reported simulations. We will discuss later why the linear variant might have been more

efficient.

After the first activation phase, the recurrent model enters the second learning phase in

which the short-term activations are consolidated in long-term weight changes of the connections.

Basically, these weight changes are driven by the error between the internal input generated by the

network and the external input received from outside sources. This error is reduced in proportion

to the learning rate that determines how fast the network changes its weights (typically between .01

and .50). This error reducing mechanism is known as the delta algorithm (McClelland &

Rumelhart, 1988).

Thus, when the network overestimates the external input of a node, this means that this

node received too much internal input from the other nodes through their connections. To adjust

this, the delta algorithm decreases the weights of these connections. Conversely, when the network

underestimates the external input, this means that it received too little internal input and the

weights are increased. These weight changes allow the network to better approximate the external

input. Thus, the delta algorithm strives to match the internal predictions of the network as closely

as possible to the actual state of the external environment, and stores this information in the

connection weights.

Structural and Dynamic Connections

The social phenomena that we analyze can be subdivided in two main classes: structural

processes that focus on stable attributes of social actors and objects (categorization, impression

formation, generalization, assimilation and contrast) and dynamic processes that focus on causal

consequences (attributions and attitudes). Although these two processes may differ somewhat at

the surface level, we believe that they are very similar in their logical structure in that they both

reflect predictions from features to categories (e.g., from behaviors to trait categories and from

causes to events).

For structural processes, the prediction involves a category or attribute, for instance, the

trait of a person or the stereotype of a group. For dynamic processes, the prediction involves the

outcome of a cause or the behavior toward an attitude-object. To illustrate these two types of

predictions, imagine that when meeting an unfamiliar individual, a perceiver may wonder to what


type of group the individual belongs (stereotyping or structural prediction) and what he or she

might do next (behavioral outcome or dynamic prediction).

Typically, the prediction goes from low-level features, exemplars, causes or attitude-objects

to higher-level abstractions such as categories and outcomes. These predictions are graphically

illustrated in Figure 1B—C for structural and dynamic relations. To visualize the direction of

prediction, we have drawn the (low-level) features that serve as input at the bottom layer of the

architecture, and the (high-level) predicted categories that serve as output at the top layer. We will

consistently use this direction of representation in the illustrations.

Basic Emergent Connectionist Principles

Before moving on to the social phenomena of interest, it is essential to briefly discuss the

basic principles or mechanisms that drive many of our simulations. These principles are the

emergent properties of the delta-learning algorithm and include acquisition, competition, and

diffusion. Some of these principles have already been documented in prior social connectionist

work (Van Overwalle, 1998; Van Overwalle & Van Rooy, 1998; 2001a, 2001b). However,

because they are essential for understanding our examples, we will describe these principles first

and discuss their application for social cognition in more detail later during the simulations.

Acquisition and Sample Size Effect

The acquisition principle involves sample size effects that have been documented in many

areas of social cognition. For instance, when receiving more supportive information, people tend

to hold more extreme impressions about other persons (Anderson, 1967, 1981), make more

extreme causal judgments (Baker, Berbier & Vallée-Tourangeau, 1989; Försterling, 1992; Shanks,

1985, 1987, 1995; Shanks, Lopez, Darby & Dickinson, 1996), make more polarized group

decisions (Fiedler, 1996; Ebbesen & Bowers, 1974), endorse more firmly an hypothesis (Fiedler,

Walther & Nickel, 1999), make more extreme predictions (Manis, Dovalina, Avis & Cardoze,

1980) and agree more with persuasive messages (Eagly & Chaiken, 1993).

One of the most striking characteristics of connectionist models using the delta algorithm is

that learning is modeled as a gradual on-line process of adjusting existing knowledge to novel

information. This characteristic has already been exploited in the earlier associative learning

models that preceded connectionism, such as the popular Rescorla-Wagner (1972) model of animal


conditioning and human contingency judgments.

This model predicts that when a cue (i.e., conditioned stimulus) is followed by an effect

(i.e., unconditioned stimulus), the organism integrates this information resulting in a stronger cue-

effect association and more vigorous responding when the cue is present. In humans, this also

results in stronger judgments of the causal influence of the cue (see Baker et al. 1989; Shanks,

1985, 1987, 1995; Shanks et al., 1996; Van Overwalle & Van Rooy, 2001a). Likewise, the delta

algorithm predicts that the more information that is received on the joint presence of a feature and

a category, the stronger their connection weight will become. This results in a pattern of increasing

weights as more pieces of information are processed, or a sample size effect (see illustration in

Figure 2A). In contrast, conventional probabilistic and statistical models of causality (e.g., Cheng

& Novick, 1992; Försterling, 1989) and attitude formation (Ajzen, 1991) do not predict a gradual

increase of judgments and remain silent with respect to sample size effects.

How is on-line learning and sample size effect achieved in connectionist models? Given

the assumption that the connection weights are initially set to zero (or any arbitrary low scale

value), the effect is that in the beginning phases of learning, the connection weights are relatively

modest and often inaccurate, and only grow more accurate (stronger or weaker, positive or

negative) when more information is received. The reason for this incremental learning is that the

error in the delta algorithm is only gradually minimized as regulated by the learning rate. Even

when the covariation between a feature and a category is perfect, the learning rate dictates that the

weights connecting the two will increase by only a small fraction. Thus, it takes multiple

repetitions of the same information before a strong weight emerges.

Figure 2A depicts a system with a learning rate of 0.20. This means that the error of

underestimating a perfect correlation was corrected gradually by increasing the weights with 20%

of the error. As can be seen, because feature A is always paired with a category (i.e., perfect

correlation), its connection weight will gradually increase at each trial starting with 0.20 to reach

eventually its maximum value of +1 after a number of trails. Several researchers have noted that

given a sufficient number of trials, the delta learning algorithm converges to the same predictions

as conventional probabilistic and statistical models (Chapman & Robbins, 1990; Van Overwalle,

1996; Sarle, 1994).


Competition and Discounting

Another essential property of the delta algorithm is that it gives rise to competition between

connections. This competition principle favors features or causes that are more predictive or

diagnostic than others, which are disfavored. The term competition stems from the associative

learning literature on animal conditioning and causality judgments mentioned earlier (Rescorla &

Wagner, 1972; Shanks, 1995), and should not be confused with other usages in the connectionist

literature such as competitive networks (McClelland & Rumelhart, 1988). A typical example of

competition is discounting in causal attribution. When one cause acquires strong causal weight,

perceivers tend to ignore alternative causes (Hansen & Hall, 1985; Kruglanski, Schwartz, Maides,

& Hamel, 1978; Rosenfield & Stephan, 1977; Van Overwalle & Van Rooy, 1998; Wells & Ronis,

1982).

Competition is a basic property of associative learning models like that of Rescorla and

Wagner (1972), where it is known as blocking. In fact, one of the reasons of the wide popularity

of the Rescorla-Wagner model is that it was among the first conditioning models that were able to

predict this property. As noted by several researchers (Read & Montoya, 1999; Van Overwalle,

1998), the delta algorithm makes similar predictions.

How does this property work in connectionist models? The basic mechanism behind

competition is that the internal activation of an outcome node is determined by the sum of the

activations received from all connecting causal nodes (see Figure 2B). As the connection of cause

A is already relatively strong, it sends a great deal of activation to the outcome node. Any

additional activation from an alternative cause B leads to over-activation of the outcome node and

increased error, and therefore blocks any growth of connection strength of cause B.

Diffusion and Memory for Inconsistent Information

Still another property of the delta algorithm is that it is responsible for the weakening of

connections when a single node is connected to many nodes that are only occasionally activated.

In spreading activation models of memory, this property is known as the fan effect (Anderson,

1976), although its underlying mechanism is fundamentally different from the diffusion property.

In the associative learning and connectionist literature, this is a novel property that — to our

knowledge — was not detected or mentioned earlier. The diffusion property is introduced to


explain lower recall for consistent as compared to inconsistent information in impression formation

(Hastie & Kumar, 1979) and illusory correlation (Hamilton, Dugal & Trollier, 1985).

A typical example is a trait that implies many behaviors of which only a few are actually

present at a given time. The more behaviors that imply the same trait, the weaker each of the trait-

behavior associations become. From a connectionist view, the reason is that while only a few

behaviors are activated (and their connections strengthened), all other possible trait-implying

behaviors are absent and thus remain inactivated, leading to a reduction of their connection weight

with the trait.

How does this diffusion principle explain enhanced recall of inconsistent information? As

illustrated in Figure 2C, compared to many consistent behaviors that imply trait T1, inconsistent

behaviors that imply trait T2 are, by definition, smaller in number. Hence, there is less often

inactivation and thus weakening of inconsistent connections (with T2) than of consistent

connections (with T1). This unequal weakening or diffusion therefore leads to enhanced recall for

inconsistent information.

Overview of the Simulations

Simulated Phenomena

We applied the three emergent connectionist processing principles to a number of classic

findings in the social cognition literature. For explanatory purposes, most often, we replicated a

well-known experiment that illustrates a particular phenomenon, although we occasionally also

simulated a theoretical prediction. Table 1 lists the topics of the simulations to be reported shortly,

the relevant empirical study or theory that we attempted to replicate, as well the major underlying

processing principle responsible for reproducing the data. Although not all relevant data in social

cognition can be addressed in a single paper, we are confident that we have included some of the

most relevant phenomena in the current literature.

General Methodology

We basically used the same methodology throughout the simulations. The particular

conditions and trial orders of the focused experiments were reproduced as faithfully as possible,

although sometimes minor changes were introduced to simplify things (e.g., fewer trials than in the

actual experiments). When a random trial order was used, we ran the auto-associative network 50


times with a different random order and averaged the results.

All parameters of the auto-associative model, except the learning rate, were kept fixed for

all simulations (Estr = Istr = Decay = 1, and internal cycles = 1, see McClelland & Rumelhart,

1988). We did not impose a common learning rate because of the different contexts, measures and

procedures used in the experiments. Rather, we freely selected a learning rate value that provided

the highest correlation with the observed data of each simulation, after examining all admissible

parameter values (see Gluck & Bower, 1988; Nosofsky, Kruschke & McKinley, 1992). In most

cases, the selected learning rate was quite robust. In other words, increasing or decreasing this

parameter had little substantial effect on the simulations. Only in a few cases where the original

learning rate was already high ( .28), increasing the rate further was problematic because the

weights grew out of bound (e.g., far beyond +1). The technical details on the auto-associative

model are given in Appendix A.

At the end of each simulated experiment or experimental condition, test trials were run in

which certain nodes of interest were turned on and the resulting activation in other nodes was

recorded to evaluate our predictions or to compare with observed experimental data. This will be

explained in more detail for each simulation. Except when otherwise noted, the obtained test

activations were projected onto the observed data using linear regression (with a positive slope), to

visually demonstrate the fit of the simulations to data. The reason for the use of this technique is

that most often only the pattern of test activations is of interest, rather than the exact values.

Structural Relationships

Categorization

Perhaps one of the most basic learning processes in social cognition is categorization, or the

grouping of diverse information into meaningful concepts or categories that contain similar

features (e.g., objects), functions (e.g., roles) or members (e.g., social groups). The categorization

process promotes cognitive economy and organization, which enables us to go beyond the current

information given and to plan our behavior and interaction with the external environment.

In recent approaches to categorization, members of a (social) category are not defined by

strict criteria of necessity or sufficiency, but rather by a degree of typicality or representativeness.

The process by which typicality is derived is most often described in terms of either a prototype or


an exemplar approach. According to the prototype approach, learners abstract a central tendency

of each category and then classify instances according to their similarity to the category's central

prototype (e.g., Rosch, 1978). In contrast, no such average or ideal prototype is assumed in the

exemplar approach where categorizing of an object depends on the similarity with memory traces

of all instances in the category (Fiedler, 1996; Hintzmann, 1986; Medin & Shaffer, 1978;

Nosofsky, 1986; Smith & Zárate, 1992).

Simulation 1: Categorization

How does a recurrent model simulate categorization? As we have seen, during learning, the

delta algorithm changes the weights between the object's features and the category so that they

better predict category membership. By this error-reducing process, the weights reflect a sort of

average link between the features of a category, that is, all instances are effectively "superimposed"

or abstracted into a prototype.

Let us illustrate the connectionist properties of feature similarity and prototype abstraction

with the network example shown in Figure 3. This network has four feature nodes and two

category nodes. Imagine that we are on a visit in Brussels and that we want to know whether an

inhabitant is Flemish or Walloon. Probably the best criterion is the language being used: either

Dutch (Flemish) or French (Walloon). However, because we may not always be able to hear these

people talk, there are other less perfect features we may rely on: A Fleming is often perceived as

simple-tasted and less sophisticated, refined or cultured than a Walloon.

Table 2 shows a simulated learning experience in which we perceive each of these features

a number of times and are also told the correct category (e.g., by our host). As can be seen, the

perfect features are always paired with their own category, whereas the imperfect features are often

absent even when the person can be categorized as Flemish or Walloon.

The results of this simulation are illustrated in the top panel of Figure 4. In addition, this

figure also depicts predictions from probabilistic theory and empirical data from Gluck and Bower

(1988, Experiment 1). In this experiment, subjects were given a medical diagnosis task in which

they had to learn to diagnose one of two diseases (i.e., categories) on the basis of four symptoms

(i.e., features). Our simulation is a simplified version of the learning trails given to the subjects in

Gluck and Bower's experiment. As can be seen, in Gluck and Bower's (1988) data, there was a

clear preference for the perfect category ("Dutch" in our example) that went above the 50% base-


rate, illustrating base-rate neglect.

In the simulation, to measure the typicality of features with respect to a category, each

feature is activated or primed (see bottom panel of Table 2). This activation is automatically

spread upward to the category nodes, and the degree of category activation reflects the typicality of

the feature for that category. For instance, speaking Dutch is strongly related to the Flemish

category so that priming of this feature will strongly activate the Flemish category. Likewise,

speaking French is strongly related to the Walloon category and priming of this feature will

strongly activate the Walloon category.

To measure the preference of one category over the other, we considered the difference

between the resulting activation of the Flemish category node and the activation of the Walloon

category node (see Table 2). To map these simulation results on the proportional data of Figure 4

(where 50 % reflects an equal preference for both categories), the simulated data were regressed on

the observed data. The intercept in this simulation was held constant at .50. Hence, simulation

results above .50 reflect a preference for the Flemish category, while simulation results below .50

reflect a preference for the Walloon category. Not fixing the intercept in this manner would

conceal the relative preference for the Flemish or Walloon category.

As one would expect, Figure 4 (top panel) reveals that a perfect Flemish feature (e.g.,

Dutch) gives rise to the highest activation in the Flemish category. Similarly, a perfect Walloon

feature (e.g., French) gives rise to the highest activation in the French category (as indicated by the

lowest score in Figure 4). Imperfect features show activations that lie between these two extremes,

as they are ambivalent predictors of category belongingness. As can be seen, the simulations fit

nicely with research findings from Gluck and Bower (1988, Experiment 1) and better than

probabilistic predictions.

The bottom panel of Figure 4 illustrates the prototype of each category. To measure the

category prototype, the category node is primed (see bottom panel of Table 2). This activation

automatically spreads downward to the relevant feature nodes, and the resulting activation pattern

of the features reflects the prototype of the primed category. (These downward connections are not

shown in Figure 3, but are roughly equivalent to the upward connections). Thus, for instance, to

measure the prototypical Flemish features, the Flemish category is primed and the resulting

activation of the features reflects the prototype.


As one might expect, Figure 4 (bottom panel) reveals that the prototype consists

predominantly of the category's perfect feature and less so of its imperfect feature. Because in our

simulation features of the other category were also present, they are also part of the prototype

although to a much weaker degree. Thus, the prototype is quite flexible as it may include features

that are relatively rare, although these features are clearly less relevant or typical of the prototype.

Base-Rate Neglect in Categorization

One of the reasons why we took the Brussels example is that the distribution of the social

categories is unequal: There are many more Walloons than Flemish living in Brussels. In the

original study of Gluck and Bower (1988), they had a similar imbalance between common and rare

disease categories. Given such an unequal distribution, the connectionist approach makes an

interesting prediction that is intuitively plausible, but difficult to explain by other approaches.

This prediction is base-rate neglect. Perceivers often place more emphasis on the

diagnosticity or similarity of features and neglect the normative probabilities of the features'

occurrence in making categorical judgments (Gluck & Bower, 1988; Kruschke, 1996). For

instance, although Table 2 shows that the probability that a Dutch-speaking Brussels inhabitant is

Flemish is equal to the probability that he or she is Walloon (e.g., 30 cases in both categories),

people tend to rely more on Dutch as an informative cue to make a Flemish categorization.

Intuitively, this makes sense. Dutch is a better predictor of being Flemish, because this feature is

quite often a good predictor of the Flemish category on its own. In contrast, Dutch is not a good

predictor of being Walloon as it is always contaminated by the presence of other Walloon features.

People's reliance on the most predictive feature regardless of normative probability has been

explained in the past by the operation of the representativeness heuristic, or by people's use of

similarity rather than probability to categorize.

In the social domain, the preference for diagnosticity of information is revealed in trait

inferences where people rely more on some types of behaviors than on others (Reeder & Brewer,

1979). For morality traits, people draw inferences more readily from negative behavior (e.g.,

lying) whereas for ability traits, people draw inferences more readily from positive (e.g.,

successful) behavior. This has been explained by the fact that immoral behaviors are more rare and

unique and thus more informative for morality judgments, while high-ability behaviors are more

unique and thus more informative for ability inferences.


A connectionist network can predict this base-rate neglect in a straightforward manner.

Consider the results of the simulation in the top panel of Figure 4. As noted earlier, a score

above .50 indicates a preference for the Flemish category, while a score below .50 indicates a

preference for the Walloon category. Of particular interest is that the perfect feature “Dutch-

speaking” exceeds the normative probabilistic prediction of .50. This reveals base-rate neglect.

From a connectionist perspective, this is due to the competition principle. For a Flemish

inhabitant, the Dutch-speaking feature is the best predictor available so that the other features are

discounted. In contrast, because a Walloon most often possesses better predicting features than

speaking Dutch, this feature must compete against these better predictors and is discounted. As a

result, the connection weight of the Dutch feature is stronger with the Flemish category (.21) than

with the Walloon category (.05), resulting in a substantial proportion above .50 in favor of the

Flemish category.

Limitations and Future Directions

Although the present recurrent model is capable of explaining base-rate neglect, it is not

able to account for the inverse base-rate effect (Medin & Edelson, 1988; Kruschke, 1996; Shanks,

1992). Whereas base-rate neglect reflect the tendency to select a rare category (e.g., Flemish)

when tested with a single feature (e.g. speaking Dutch) for which the objective probably was equal

for all categories, an inverse base-rate effect reflects the tendency to select a rare category when

tested with a combination of conflicting features (e.g. speaking Dutch and French). An explanation

for this phenomenon is that the combination of both perfect features is quite distinctive and rare,

and so is more indicative of a rare category (Shanks, 1992).

A number of authors have developed connectionist network models to account for this

inverse base-rate effect. Gluck (1992) claimed that a distributed representation of the present

recurrent network approach was capable of explaining inverse base-rates. However, our

simulations showed that this claim is incorrect, as this network cannot explain all data. Other

proposals are more effective and account for the inverse base-rate effect by increasing the attention

given to uncommon features. Shanks (1992) developed a simple extension of the standard delta

algorithm to give more attention to uncommon features and Kruschke (1996; Kruschke &

Johansen, 1999) developed a network model that learns to attend more to features that distinguish

them from the already learned (frequent) category.


Impression Formation

In a social context, getting to know others often involves drawing inferences about

characteristics and traits of individuals and groups. This process of impression formation, we will

argue, obeys connectionist principles similar to those underlying categorization processes in

general.

In a typical impression formation experiment, participants receive a series of trait adjectives

about a person and are requested to make overall trait or likability impressions (categorization) of

that person (e.g., Asch, 1946; Anderson, 1981; Kashima & Kerekes, 1994). Sometimes the

adjectives are close synonyms that imply one or more specific traits, sometimes they are very

diverse and imply an overall likeability impression. Anderson (1981) argued that impressions of a

person are abstracted from trait adjectives as if people average these adjectives, and proposed a

weighted average model to explain person impression judgments. Although his claim was

supported by an impressive amount of research, the model was criticized on the grounds that it

seems unlikely that people would perform all the necessary weighting and averaging calculations in

their mind to arrive at an impression, and many researchers abandoned Anderson's model for this

reason.

However, the connectionist metaphor used here can revive Anderson's model. The

weighted averaging principle can easily be implemented by implicit and automatic connectionist

processes based on the delta algorithm, without recourse to explicit arithmetical calculations (see

Appendix B for an algebraic proof). We will illustrate how a recurrent network can model

impression formation with two typical findings from person impression research.

Simulation 2: On-line integration and Recency

First, consider an experiment by Stewart (1965) in which adjectives describing a high trait

(e.g., talkative) were followed by opposite (or low) trait adjectives (e.g., reticent). This

experimental manipulation was modeled using a network architecture consisting of a person node

and a task context node (which reflects instructions and other experimental context variables)

connected to a trait node. The simulations start from the assumption that the trait implied by an

adjective is already learned and recruited from semantic and social knowledge. Specifically, we

assume here that adjectives associated with the trait are denoted by an activation value of +1 for


that trait, whereas adjectives associated with the opposite trait have an activation value of -1 (this is

equivalent to Anderson's scale values).

What is of interest here is how this trait-implying information is applied to build up an

impression of a specific person, by changing the weights linking the person with the trait. Table 3

depicts a schematic list of the information given in Stewart's (1965) experiment, where some

subjects received high trait information about a person in the first half of the experiment and low

trait information in the second half, and other subjects received the reverse low-high order. When

the person is described by a high trait, the connection weight is increased according to the

acquisition principle of the delta algorithm. In contrast, when the person is described by a low

trait, the weight is decreased according to the acquisition principle. After training, the person node

in the network is primed and the resulting activation of the trait node indicates what trait the person

conveys (see bottom panel of Table 3).

The results of a recurrent simulation are shown in Figure 5. As can be seen, there is a close

fit between the simulation and the data. Of particular interest is the crossover at the end of training

as the last presented adjectives win over the earlier presented adjectives, in both data and

simulations. This reflects a recency effect and suggests that the revision and adjustment of person

impressions is an on-line acquisition process where novel information often "overwrites" older

information previously stored in the connection weights.

Simulation 3: Recency in Concurrent Judgments, Primacy at Final Judgments

As a second example, consider research in which disconfirmatory information is given

during a single specific position in a series of trials. By comparing the effect of disconfirmatory

with confirmatory information at the same position in the trial series (denoted as serial position),

one can estimate the weight each trait takes at a given position (Anderson, 1979; Anderson &

Farkas, 1973; Busemeyer & Myung, 1988; Dreben, Fiske & Hastie, 1979; Kashima & Kerekes,

1994). Early disconfirmatory trait information might be important in crystallizing an impression

(primacy effect), while late information might be influential because it sheds new light on traits

presented earlier on (recency effect).

Research uniformly suggests that when participants give their trait ratings continuously

after each adjective is presented, then item weights are relatively equal in all but the last position,

at which point they rise sharply. This reflects a recency effect. However, it is most important to


note that this recency effect attenuates when more trait information is given. Thus, when given

only a few pieces of trait-implying information, disconfirmatory information has a stronger effect

than when given more trait-implying information. It is as if increasing the amount of confirmatory

information shields the perceiver from the disconfirmatory information. In order to simulate this

result, we used the same recurrent architecture as before. A simulation of the experiment by

Dreben, Fiske and Hastie (1979) is schematically listed in Table 4 for the case when four adjectives

are given (the logic is similar for other frequencies).

The simulation results are shown in the top panel of Figure 6, where the dotted line depicts

the attenuation of recency. The recurrent network was clearly able to reproduce the predicted

attenuation, although attenuation was somewhat less steep in the simulations than in the data of

Dreben, Fiske and Hastie (1979). How did the recurrent model attain attenuation of recency? One

possible interpretation suggested by an analysis of the simulation is that the person node in the

recurrent network receives internal activation from the trait node (e.g., “When someone is talking

that loud, it must be John”). Because the trait node becomes positively linked with the person node

after confirmatory trials, it compensates for the disconfirmatory information, and it does so

increasingly better with more trials. Stated differently, a robust impression as a consequence of

earlier confirmatory information makes the perceiver more resistant to change his or her

impression given one disconfirmatory item. This explanation differs from Anderson's reasoning

based on a distinction between item-specific and abstract aspects of impression formation

(Anderson & Farkas, 1973; Dreben et al., 1979). Similar recurrent simulations also fitted well with

recent serial position data from Kerekes (1991: in Kashima and Kerekes, 1994).

In contrast to the previous findings, trait weights show a typical primacy effect when

impression judgments are given at the end of the series of trials rather than continuously

(Anderson, 1979). This primacy effect can also be simulated by the same recurrent network, as

shown in the bottom panel of Figure 6. Of specific interest is the much greater learning rate for

this simulation, which suggests that primacy might be a consequence of building up a prediction of

the trait very quickly in only a few trials, so that later information has little effect on the

impression. This interpretation shares with Anderson's (1981) attention decrement hypothesis the

idea that there is the most attention paid to and the most uptake of information during the earliest

trials, thus allowing little impact of information presented later. This hypothesis was also


incorporated in an alternative connectionist model, termed the tensor product model, developed for

impression formation by Kashima and Kerekes (1994).

In sum, based on our connectionist simulations, we can explain the different effects of

continuous and final judgments by differences in learning rate. This seems plausible, as

information uptake and processing is probably less interrupted by final trait judgments than by

continuous judgments, resulting in a faster learning rate for final judgments and hence primacy

rather than recency.

Limitations and Future Research

Although Kashima and Kerekes (1994; see also Busemeyer & Myung, 1988) correctly

pointed out that attenuation of recency cannot be simulated with a feedforward network, we have

demonstrated here that it can easily be reproduced with a recurrent model. This contradicts the

claim made by Kashima, Woolcock and Kashima (2000, p. 924) that this effect cannot be obtained

with a recurrent model. Moreover, unlike the tensor product model proposed by Kashima and

Kerekes (1994), our simulations do not require additional ad-hoc assumptions such as a changing

context after each judgment, to obtain attenuation of recency.

That our recurrent model can reproduce both recency and primacy effects is encouraging,

but as long as we cannot verify which independent conditions actually determine both effects, the

idea that both effects are driven by a different learning rate remains at best suggestive. The novel

hypothesis that grew from the simulations is that people build more robust impressions of a person,

either through a growing positive expectancy that shields them from disconfirming information

(attenuation of recency) or by building an impression very quickly and disregarding subsequent

information (primacy). However, we are not aware of any research that has explored this potential

explanation in depth.

Simulation 4: Higher recall for inconsistent information

In the previous research paradigms, participants received trait adjectives and were

instructed to form an impression about a person. This seems to reflect the manner in which we

routinely communicate about others. However, when we learn about others from our own

observations, we do not see traits but rather the behaviors that are associated with them. An

intriguing finding given this type of learning is that inconsistent or unexpected behavioral


information is often better recalled than information that is consistent with the dominant trait

expectation (for a review see Stangor & McMillan, 1992). Thus, we better recall a hooligan

helping an older lady cross the street than a nurse performing the same act.

Hastie (1980, Hastie & Kumar, 1979) reasoned that the inconsistent information requires an

extra cognitive effort to explain and to make sense of the inconsistency, and is therefore elaborated

more deeply. This leads to extra links between the inconsistent information and other locations in

memory, and, thereby, to better recall. Hastie (1980) supported this interpretation by research

indicating that inconsistent information leads to more causal elaborations of the behavioral

sentences. However, these sentence elaborations were explicitly requested from the participants

after the initial phase of impression formation was over. It is thus not clear whether they were

generated spontaneously during initial encoding or only constructed after the request (cf., Nisbett

& Wilson, 1977).

Can connectionist principles account for the enhanced memory of inconsistent information

without recourse to explicit elaborative processes? Yes, and to illustrate this, we simulated a well-

known experiment by Hamilton, Katz and Leirer (1980, Experiment 3). Participants read

information concerning several fictional persons. For each person, they read a list of 10 consistent

and 1 inconsistent behavioral descriptions about that person, after which they had to recall as many

behavioral sentences as possible. Half of the participants were given the instruction to form an

impression of the person, whereas the other half was given the instruction to memorize the

behavioral information. Under impression formation instructions, participants were more likely to

recall inconsistent items, whereas this difference disappeared under memory instructions.

To understand enhanced memory for inconsistent behavioral information, consider a

network architecture with a person node and a trait node, as in the previous simulations, as well as

separate nodes for each behavioral sentence. Thus, categorical trait information implied by the

behavior as well as the individual behavioral exemplars are represented in a sort of semi-distributed

manner. Table 5 provides a simplified simulation of Hamilton et al.'s experiment with 4 consistent

behaviors and 1 inconsistent behavior.

To simulate impression formation, each behavior was activated together with the associated

trait and the person node (see Table 5). As predicted by the diffusion principle, however, each

time a behavior is not present but expected due to the presence of the trait, this weakens the trait-


behavior connection. Thus, the more behaviors confirm the expected trait, the less indicative each

behavior becomes for that trait or person. This is especially true for consistent behaviors, which

appear much more often not than with the trait. As a result, the behavioral links will be weaker for

consistent as opposed to inconsistent behaviors.

In contrast, in the memorizing condition, subjects are not motivated to form a unified trait

impression of the person. We assumed that this would result in a much shallower encoding of

person and trait information, which was simulated by setting the activation of these nodes to 0.10

instead of the typical 1. As a result, all links between the person or trait and the behaviors would

reduce sharply.

Figure 7 shows the results of the recurrent simulation. It was assumed that the person or

the traits would serve as cue to recall the specific behavioral episodes (see bottom panel of Table

5). The simulations give very similar results when only the person or only the traits are primed to

retrieve the behavioral information. As can be seen, the simulations replicated the basic finding

that inconsistent information was better recalled than consistent information under impression

formation instructions. However, under memorizing instructions, enhanced memory disappeared.

It is important to note that the same simulation was able to replicate the well-known finding

that recognition measures produce the opposite tendency to report more consistent information

(Stangor & McMillan, 1992). This was accomplished by running the same simulation followed by

a recognition test that was biased by searching only for behaviors congruent with the consistent

trait (see bottom panel of Table 5). This reflects the idea that consistent traits guide recognition

when the perceiver relies on guessing. However, if this bias was removed (by deleting ? for the

"common" trait), inconsistent behaviors were better recognized than consistent behaviors in line

with the improved recognition sensitivity measures reported by Stangor and McMillan (1992).


The simulation of higher recall of inconsistent behavioral information suggests that this

effect may be due to relatively stronger direct links of unique behavioral information. Thus, the

present connectionist account emphasizes the direct connections from a particular trait or person to

behavioral exemplars, while Hastie (1980, see also Srull, 1981) argued that stronger associations

between consistent and inconsistent behaviors after resolving the inconsistency were what

produced this higher recall. Our simulation does not rule out other processes, such as deeper and


more elaborated processing (Hastie, 1980), that may contribute to the effect of better recall of

incongruent information. But is this more elaborated processing necessary?

Some support for the effortful generation of elaborations was demonstrated in studies that

found decreased recall for inconsistent behaviors when mental resources were limited by reducing

answering time, by making the task more complex, or by adding distracter tasks (Bargh & Thein,

1985; Hamilton, Driscoll & Worth, 1989; Macrae, Hewstone & Griffiths, 1993; Stangor & Duan,

1991). However, these results can be easily simulated with our connectionist network by simply

assuming that load decreased the encoding of the behavioral episodes or even all information (e.g.,

with an activation of 0.10). This suggests that poorer encoding of information, rather than less

inconsistency reduction and elaboration might have reduced recall of inconsistent information.

Hence, there seems to be no need to postulate explicit elaborations to explain higher recall of

inconsistent behavior.

The present perspective is also consistent with other findings that report less enhanced

recall for inconsistent information

when an impression is formed for a non-meaningful group of individuals, by assuming a

decreased activation of the person and trait nodes, based on the fact that perceivers are less

willing to invest cognitive effort in encoding an overall impression (Srull et al., 1985,

experiment 7),

for behavioral items at the beginning of a list compared to the end of a list (Srull et al.,

1985, experiments 5 & 6; Hastie & Kumar, 1979, experiment 3),

when the number of inconsistent items increases, thus making them less unique and

unexpected (Hastie & Kumar, 1979, experiment 3; Srull, 1981, experiments 1—3; Srull,

Lichtenstein & Rothbart, 1985, experiment 3).

Overall, it appears that the proposed model is broadly consistent with a relatively large

spectrum of research findings. This suggests that the diffusion principle provides an interesting

alternative hypothesis explaining increased recall for inconsistent information.

Assimilation and Contrast in Person and Group Perception

An important feature of recurrent models is their capacity to generalize. A trained network

exposed to an incomplete pattern of information will fill in the missing information on the basis of

the complete pattern learned previously. This generalization process can be seen as a type of


assimilation in that past experiences influence how we perceive and interpret novel information

that is similar or closely related to it. For instance, when seeing a photo of Hitler, we might

immediately complete this image with activated memories on his aggressive wars, mass

annihilation of Jews and so on. There is abundant evidence showing that accessible knowledge like

traits, stereotypes, moods, emotions and attitudes is likely to result in the generalization to

unobserved features. In the next simulations, we will explore some applications of this capacity to

generalize, as well the opposite capacity to generate contrast effects in person perception

(Anderson & Cole, 1990; Smith & DeCoster, 1998).

Simulation 5: Assimilation of Unobserved Attributes

To demonstrate generalization in a recurrent network, imagine that the network learns that

Hitler was a cruel German Nazi leader who was responsible for the mass annihilation of Jews.

When the network is then tested with a Hitler probe and a few related attributes (e.g., German,

nazi, cruel), would it use this knowledge to activate the missing feature of mass annihilation?

Recall that activation in a recurrent network is determined not only by external input, but also by

internal input coming from related nodes in proportion to their connection weights. This implies

that although the missing annihilation node receives no external activation, it does receive internal

activation through its links with the Hitler and other related nodes.

Table 6 shows a schematic description of a simulation that combines several simulations by

Smith and DeCoster (1998). In this simulation, we presented information on three individual

exemplars such as Hitler, Goebbels, and Himmler, each defined by five features (labeled E1—E5,

E6-E10, and E11-E15) as well as on their group (e.g., Nazi) that was characterized by three

features (labeled G1—G3).

To increase the realism of the simulation, like Smith and DeCoster, we represent features in

a distributed manner, that is, each feature is represented by a set of micro-features (unlike our

previous simulations in which a localist coding scheme was used with each feature being

represented by a single node). Distributed representations are more realistic because we know that

symbolic concepts are not represented by single neurons but rather by assemblies of neurons.

Specifically, each feature was represented by 5 micro-features or nodes. For instance, Hitler was

not represented by five nodes, but rather by a series of 25 nodes that reflected several micro-

features of his physical appearance, character and so on. In addition, we also use random noise in


the presentation of background context and features to simulate the imperfect conditions of

perception. Although these latter aspects appear in our simulation mainly for purposes of

comparison, they are not essential in understanding the generalization process or in producing the

results (e.g., the noise cancels out given enough simulation runs).

After going through the learning history of Table 6, all but one feature of the individual

exemplars or group were primed (see bottom panel of Table 6). Figure 8 depicts the resulting

activation of the remaining feature (represented by five nodes). As can be seen, the internal

activation of the other nodes in the network allows the network to reconstruct the missing

information of the original learned pattern almost perfectly, for both the individual exemplar and

for the group. This indicates that the recurrent network is capable of integrating and utilizing both

individualized and schematic (i.e., group) information.

Further research

Smith and DeCoster (1998) demonstrated that a recurrent network can reproduce other very

interesting phenomena of social cognition. Perhaps one of the most intriguing properties is the

creation of new emergent attributes by combining parts of existing attributes (see Smith &

DeCoster, 1998, simulation 3). Traditional theories of categorization assume that people use a

single schema, stereotype or knowledge structure to make inferences about a target person or a

group. Even if multiple schemas are relevant, each of them is independently activated and applied.

However, people can combine many sources of knowledge in order to construct new emergent

properties to describe subtypes or subgroups of people. For instance, a militant feminist who is

also a bank teller may become subtyped as a feminist bank teller with specific idiosyncratic

attributes (Smith & DeCoster, 1998; Asch & Zukier, 1984). Previous connectionist models like

ECHO (Thagard, 1989) were unable to model this process.

Simulation 6: Assimilation with Traits, Contrast with Exemplars

The abundance of assimilation effects in social cognition research may generate the

suggestion that filling in unobserved characteristics is the default or most natural process. Thus,

when primed with “violent,” we judge a non-descript or ambiguous target person as more hostile,

and when primed with “nice,” we judge that same target as less hostile. However, under some

circumstances, the opposite effect may occur. Sometimes primed features may lead to contrast


rather than assimilation.

For instance, when primed with the exemplar Gandhi, people may judge a target person as

relatively more hostile, whereas primed with Hitler, they may judge the same target as relatively

less hostile. Under these conditions, the exemplars Gandhi and Hitler serve as an anchor against

which the target is judged, and so leads to contrast effects. In sum, contextually (or chronically)

primed information may not only serve as an interpretation frame, but also as a comparison

standard during impression formation.

What produces assimilation or contrast? According to Stapel, Koomen and Van der Pligt

(1997), trait concepts are more likely to serve to interpret an ambiguous person description

(assimilation), because traits carry with them only conceptual meaning. On the other hand,

exemplars -- if sufficiently extreme -- will be used as comparison standard (contrast) because both

the exemplar and the target are persons that can be compared with each other. An experiment by

Stapel et al. (1997) confirmed this proposition. Participants were asked to form an impression of

an ambiguous friendly or hostile target person. Before they were exposed to the description of the

target, they were primed with names of traits (e.g., violent or nice) or with names of extreme

exemplars (e.g., Hitler or Gandhi). Assimilation was found in the trait priming condition, whereas

contrast was found in the person priming condition.

A recurrent network can simulate this combination of assimilation and contrast. As listed

in Table 7, the network first builds up background knowledge about average persons (who are at

times more or less violent or nice), extreme exemplars like Hitler and Gandhi, as well as about the

relationships between traits (e.g., nice is the opposite of hostile and violent).

The essential idea of the simulation is that during priming, the primed stimulus and the

target description are temporarily activated together. This is represented by programming a single

learning trial for each priming condition (see Table 7). Because testing a trait category involves

connections from person to trait, there is competition between exemplars, but not between traits

and exemplars. Hence, when a trait concept is primed, this leads to the usual assimilation of the

trait impression through the acquisition principle. In contrast, when an exemplar such as Hitler is

primed, competition arises between this exemplar and the target exemplar (which is so nondescript

that it is assumed to be taken as an instance of an average person). This competition arises when

both exemplars predict hostility and their summed activation overestimates the observed degree of


hostility. This error leads to a decrease in the connection weights between target and hostility, and

results in a contrast effect.

The full learning history of this simulation is listed in Table 7. Distributed coding and

noise were used in the priming trials to implement the idea that slightly different instances of traits

and exemplars were used in the priming and prior knowledge phases. As can be seen in Figure 9,

the simulation replicated the empirical assimilation and contrast effects as reported by Stapel,

Koomen and Van der Pligt (1997).


The exemplars that serve as a comparison standard need to be sufficiently extreme, because

otherwise little overestimation would occur in the network, and thus little contrast. This prediction

is supported by a recent study by Moskowitz and Skurnik (1999). In two experiments, they found

that that moderate exemplars (e.g., Kissinger) lead to less contrast than extreme exemplars (e.g.,

Hitler). As one might expect from the recurrent network's generalization property, they also found

that moderate trait primes lead to less assimilation than extreme primes. The present recurrent

network was able to reproduce the findings of Moskowitz and Skurnik (1999).

However, the present network is, at the moment, not capable of reproducing the effect of

cognitive load on assimilation and contrast. Moskowitz and Skurnik (1999) showed that cognitive

interference (i.e., increasing task load or interrupting the current task) minimized the effects of trait

assimilation, but left the effects of exemplar contrasts relatively untouched. If we simulate

decreased resources during priming by decreasing node activation, however, we would expect the

opposite effect to occur. Future research is necessary to ensure that Moskowitz findings are robust,

and if so, how task load can be implemented in a recurrent network so that it can approximate their

findings.

Causal Judgments and Attitudes

In this section we discuss causal judgments and attitude formation from an attributional

perspective. A first question is how causes, attitudes and effects are represented in a connectionist

network. As with the recurrent networks of social judgment described earlier, we represent causes

and attitude-objects as features, and outcomes or behaviors as categories. However, whereas non-

causal features in social inference are rather passive descriptors or predictors of category


membership, intuitively, causes and attitude-objects have a more active role, in that they also tend

to play a causal role about the outcomes they predict. For instance, an angry face does not only tell

something about the person (social inference), but also warns the observer to defend him- or

herself for possible attacks (causal inference). Likewise, an attitude-object like a toy may not only

look attractive (trait inference), but may also increase approach behavior (causal inference). This

difference between the descriptive nature of social inference and the more active role or power of

causes and attitudes is not explicitly modeled in connectionist models, but is evident from the

typical sort of categories which reflect social events (behaviors, outcomes) rather than social

entities (traits, groups, family, etc.).

Causal Attributions

Recent research has demonstrated that there are many parallels between human models of

causal attribution and animal conditioning models (for overviews see Allen, 1993; Shanks, 1995;

Read & Montoya, 1999). To cite a few important parallels, one of the most popular models in

animal learning, the Rescorla-Wagner (1972) model, is identical to the delta learning algorithm

(implemented in a feedforward network), and it has also been shown that this model asymptotically

converges to another popular model of human causality based on probabilistic principles (Cheng &

Novick, 1992; see Chapman & Robbins, 1990; Van Overwalle, 1996).

In a recent article, Read and Montoya (1999) successfully simulated a number of

phenomena from the animal learning literature with a recurrent network (see their Table 2, p. 735).

This simulation work demonstrated that a recurrent model can reproduce competition between

alternative causes such as

discounting (where one cause blocks the causal influence of an alternative cause),

augmentation (where one inhibitory cause increases the influence of an alternative cause

that facilitates the outcome),

inhibition (where an alternative cause develops inhibitory effects that prevent the outcome

from occurring),

overshadowing (where the causal strength of two causes is less than that of a single cause in

predicting the same outcome).

Like many researchers in the social and animal learning domain, we apply the terms

discounting and augmentation quite broadly to denote causal competition both during and after


causal learning, that is, during or after novel, causally relevant information is received and

processed. Thus, competition may occur between information taken in at any time, either with

novel information (or novel causes) or with earlier material reactivated from memory (or known

causes). This differs from the position taken by other authors (Morris & Larrick, 1995; Read &

Montoya, 1999) who reserve the terms discounting and augmentation exclusively for reasoning

processes based on prior causal learning in the original sense of Kelley (1972).

Because several authors (Van Overwalle, 1998; Van Overwalle & Van Rooy, 2001a,

2001b; Read & Montoya, 1999) have already provided many illustrations of connectionist

modeling of causal attribution, we will present only a single simulation of this phenomenon.

Simulation 7: Forward Discounting

A common finding in animal and human literature is that when a particular cause has

already explained an outcome, then any alternative cause is always discounted. This is called

forward discounting. The idea of forward discounting is largely consistent with the anchoring

explanation in social psychology, which assumes that people anchor on the first presented

explanation or the first dominant explanation that comes to mind (e.g., the actor's disposition) and

tend to ignore novel information implicating alternative explanations (Shaklee & Fischhoff, 1982;

Gilbert & Malone, 1995). For instance, Van Overwalle, Drenth and Marsman (1999) found that

spontaneous trait inferences were not moderated by covariation information when presented after a

description of the actor's behavior, but only when presented before it. Thus, when known

personality traits or other situational pressures provide a ready explanation for someone's behavior,

people tend to disregard novel information about additional factors.

In a series of experiments, Van Overwalle and Van Rooy (1998, 2001b) combined the

process of forward discounting with sample size, that is, by increasing the sample size and thus the

strength of a known cause, discounting of a novel cause was made stronger. Thus, they combined

the emergent principles of acquisition and competing. Participants read stories in which several

causes could explain an outcome. For instance, in one of the stories they were first told that Ann

won several single tennis games, and then that Ann (now a known cause) also won several double

games with Troy (a novel cause). As expected, given Ann's previous successes, the contribution of

Troy was decreased or discounted.

However, a crucial manipulation was how often Ann won her single games. When Ann


won her single games only once, she acquired little causal strength and discounting of Troy was

much weaker than when Ann won several times, thus acquiring more causal strength. Thus,

discounting of Troy was indirectly influenced by the weakening or strengthening of Ann. Table 8

shows the design of this experiment, and Figure 10 depicts the simulated and observed results. As

can be seen, the recurrent network conforms nicely to the observed data. Note that current

statistical models of causality (Cheng & Novick, 1992; Försterling, 1989) are unable to account for

these results. Van Overwalle and Van Rooy (1998, 2001b) performed similar experiments

involving augmentation, and found parallel results consistent with the combined predictions of

sample size and competition.


Discounting can occur not only when the alternative explanation is a novel one as in the

simulation, but also when competing causes are processed simultaneously. Thus, competition

effects do not require a fixed sequence of processing of causal information, as assumed in phase-

like models of dispositional attribution (e.g., Gilbert, 1989). This implication is in line with recent

research suggesting that the weighting of competing person and environmental attributions

"involves an iterative or even simultaneous evaluation of the various hypotheses before reaching a

conclusion" (Trope & Gaunt, 2000, p. 353).

However, what happens when competition arises after the competing causes have already

gained causal strength? For instance, if Troy and Ann always won their double games, and now

we learn that Ann alone wins all her singles, what do we think about Troy? This now involves

backward revaluation. According to Dickinson and Burke (1996), backward revaluation depends

on the relationship between the two causes. They found that when causes are positively related,

then discounting will take place; when they are independent, there will be no discounting (see also

Van Overwalle & Timmermans, 2000). These results cannot be simulated with the present

standard recurrent network, but requires a modification so that absent causes that are expected (via

prior compound presentation) receive a negative activation rather than the standard "filling up" of

activation from related nodes (for more details see Van Overwalle and Timmermans, 2000, 2001;

Graham, 1999).


Attitudes

The most influential and popular model of attitude formation is the theory of reasoned

action developed by Fishbein and Ajzen (1975) and later refined and relabeled as the theory of

planned behavior (Ajzen, 1991; Ajzen & Madden, 1986). According to this model, an attitude is a

function of

the expectation or belief that the behavior will lead to a certain consequence or outcome

(e.g., various means of transportation, like cars and buses pollute; bicycles don't), and

the person's evaluation of these outcomes (e.g., pollution is bad).

Multiplying the expectancy and value components associated with each outcome and

summing up these products determines an attitude.

The theory of planned behavior has received considerable empirical support in many

studies (see Fishbein & Ajzen, 1975; Ajzen, 1991; Ajzen & Madden, 1986), although it has been

found that other factors besides attitudes may exert an influence on behavior. A major criticism

leveled against the theory, however, is its assumption that humans make rational decisions, and

carefully elaborate and compare alternative behavioral options before they engage in a particular

behavior. It seems unlikely that people engage in extensive processing of the pros and cons of

specific behavioral alternatives for every opinion or attitude they have (Fazio, 1990). Although

Ajzen and Fishbein (1980) acknowledged that people may simply reactivate and employ attitudes

formed previously, they still assumed that these prior attitudes had been formed explicitly.

Simulation 8: Attitude Formation

We propose, however, that attitudes may also be developed implicitly. Recent research by

Betsch et al. (2001) indicates that the encoding of value-charged stimuli is sufficient to prompt an

on-line process by which values are implicitly summed and stored in memory. A process of

implicit attitude formation, representation and retrieval in memory without deliberative processing

can be modeled by a connectionist implementation of the theory of planned behavior.

To illustrate, let us return to the above example. The first attitude component, the belief

that one's choice will result in certain outcomes, can be represented as causal expectations linking

the choice of transportation with likely outcomes such as how fast a car will be, how dry the trip

will be, and how polluting. The likelihood of these outcomes is expressed in the weight of the


connections, acquired during prior experiences. Thus, the more often a particular consequence is

observed, the stronger the weight becomes. Conversely, the less often a particular consequence is

observed, the weaker this weight will be.

The second attitude component, the value, can be represented by concurrent emotional or

evaluative responses to these outcomes, such as, how much the person likes or dislikes being in a

fast and polluting car, in a dry place, again acquired during prior experiences. Thus, in line with

Ajzen (1991, p. 191), we assume that the outcomes linked with a behavior are "valued positively or

negatively", and that they are further modified during actual experiences.

In a connectionist network, the activation sent out by each means of transportation is

multiplied by the weight of the connections associated with the outcomes, including the value

node. We suggest that a person's attitude is reflected in the activation of this value node after the

relevant attitude-object (i.e., means of transportation) was activated. This proposition is

mathematically very similar to the multiplicative function in Ajzen's (1991) theory of planned

behavior (i.e., where expectations are replaced by connection weights, and values are replaced by

value node activations; see Appendix C for a formal proof). However, it does not require the less

plausible assumption of deliberate weighting of all alternatives, as only the beliefs and evaluations

that are accessible in memory at the time of judgment will determine the attitude. Note that

outcomes other than the value (fast, dry and polluting) are not taken into account for measuring an

attitude because they reflect cognitions related to the attitude-objects rather than evaluations. This

is consistent with the dominant view in the attitude literature that takes attitudes primarily as

evaluative responses.

Table 9 depicts a recurrent simulation of this example. The likelihood of the outcomes is

determined by the frequency that a causal factor co-occurs with an outcome, and the value is

determined by the degree of satisfaction or dissatisfaction experienced during this outcome.

Although we used extreme +1 and -1 evaluative values for simplicity, moderate values are also

possible. In Figure 11, the simulated values are compared with predictions of the theory of

planned behavior. As can be seen, the simulated and predicted data match almost perfectly.

Simulation 9: Dual-Process Models of Persuasion

The theory of planned behavior (Ajzen, 1991; Ajzen & Madden, 1986) assumes that people

systematically scrutinize all relevant information for making an attitude judgment. Although this


might be the preferred approach when forming an initial opinion about an important issue

(Gollwitzer, 1990), in many cases attitudes are created or changed in a more shallow or heuristic

manner. This distinction has been captured in the heuristic—systematic model of Chaiken (1980,

1987; Chen & Chaiken, 1999) and the elaboration likelihood model (Petty & Cacioppo, 1986;

Petty & Wegener, 1999). According to these dual-process models, systematic processing implies

that people have formed or updated their attitudes by actively attending to and cognitively

reflecting upon persuasive argumentation. In contrast, heuristic processing implies that people

have formed or changed their attitudes by using heuristic cues that give rise, automatically, to

stored decision rules such as "experts can be trusted", "majority opinion is correct", and "long

messages are valid messages".

Dual-process theories regard systematic processing of information as requiring more effort

and cognitive capacity than heuristic processing. Hence, when motivation or capacity for

systematic scrutiny of information is low, such as when the issue is of low personal relevance or

when time is limited, people use heuristics like source credibility, other people's attitudes or the

length and number of arguments. These two processing modes are not necessarily mutually

exclusive. For instance, systematic and heuristic processing may co-occur when the arguments are

too ambiguous to form an opinion by extensive processing alone, that is, heuristic cues may

additionally help to form an opinion by biasing the selection of ambiguous information (Chen &

Chaiken, 1999).

Such an interaction between systematic and heuristic processing was investigated by

Chaiken and Maheswaran (1994). They presented a message about a fictitious answering machine

in which different features were described with varying importance. This information was

ostensibly published either in a highly regarded magazine specialized in scientific testing of new

products or in a promotional pamphlet prepared by sales personnel. The results revealed that with

low task importance (i.e., respondents' opinion would have little bearing on the manufacturer's

product distribution), source credibility was the only determinant of people's attitude. In contrast,

with high task importance (i.e., respondents' opinion would count heavily), the quality of the

machines' features was the only determining factor, except when the message was ambiguous and

source credibility alone influenced the attitude. This study is important, because it demonstrates

several predictions of dual-process models. It documents how heuristic cues can bias message


arguments that are ambiguous (the biasing hypothesis; Chen & Chaiken, 1999), and how

systematic processing can overrule heuristic cues when the arguments are unambiguous (the

attenuation hypothesis; Chen & Chaiken, 1999).

We simulated the interactive nature of systematic and heuristic processes as investigated by

Chaiken and Maheswaran (1994) with our recurrent model (see Table 10). According to Bohner,

Ruder and Erb (1999), heuristic cues like source credibility may lead people to form expectations

about message valence or strength. We assume that these expectations are driven by prior

experiences of good and bad argumentation with the same or similar sources, or by communicated

opinions about such experiences. As can be seen in the top panel of Table 10, this assumption of

prior knowledge on quality of argumentation was incorporated by setting the value node to +1

(high credibility) or .10 (low credibility) during an initial prior learning phase.

Next, we ran one of three message types, involving strong, weak and ambiguous arguments,

which were simulated by different activation levels of the value node. That is, the strength and

direction of the arguments was determined by setting the activation of the value node to either

positive or zero. (No negative activation was used as weak arguments actually involved

descriptions of available features for which other products were however superior). Table 10

depicts a simplified version of the actual design used by Chaiken and Maheswaran (1994).

More importantly, given heuristic processing, in line with the basic assumptions of dual

process models, we assumed that these arguments would not be encoded or elaborated sufficiently

(i.e., we ran one trial only with all activation levels divided by 10). In contrast, given systematic

processing we assumed that these arguments would be processed more extensively (i.e., two trials

as shown in Table 10). The results depicted in Figure 12 reveal that our simulation reproduced the

predicted pattern as observed by Chaiken and Maheswaran (1994). Thus, the simulation

reproduced heuristic and systematic processing, as well the predicted interaction between both (i.e.,

biasing and attenuation effects; Chen & Chaiken, 1999).

Siebler, Bohner and Weinerth (1998) proposed an alternative connectionist constraint

satisfaction model (i.e., ECHO, Thagard, 1989) to account for the same data. However, this latter

type of connectionist model suffers from shortcomings (e.g., no weight adjustments, no permanent

attitude change) to be discussed later. Nevertheless, for the present simulation, it should be noted

that small deviations in the learning rate destroyed the predicted interaction between the systematic


and heuristic processing of the ambiguous message. Specifically, a higher learning rate caused the

novel arguments to overwrite all memory of the source's credibility, whereas a lower learning rate

caused the source's credibility to be the only determinant of the attitude. This seems to suggest that

the interaction between systematic and heuristic processing depends on a critical balance between

source credibility and (rate of) systematic elaboration of argument quality, which is in line with the

sparse reports on this interaction in the attitude literature (Chaiken, Liberman & Eagly, 1989, p.

233; but see Bohner, Moskowitz, Chaiken, 1995).

Simulation 10: Cognitive Dissonance

Sometimes, our attitudes are not so much driven by immediate evaluations of attitude

objects, but rather by reactions to our own behaviors, especially when these behaviors go against

our initial preferences. This phenomenon is captured in Festinger's (1957) theory of cognitive

dissonance, which predicts that discrepant behavior generates dissonance or uneasiness that "will

exert pressures in the direction of bringing the appropriate cognitive elements into correspondence"

(p. 11). For instance, when induced to write an essay that runs counter to one's initial attitude (e.g.,

a student defending stricter exam criteria), an individual will tend to reduce dissonance by

changing his or her attitude in the direction of the position taken in the essay. This tendency is

stronger when alternative explanatory factors or justification such as high payment or social

pressure, are absent. In contrast, when external demands (e.g., payment or pressure by the

experimenter) provide sufficient justification for engaging in the dissonant behavior, then

dissonance reduction does not occur (e.g., Linder, Cooper & Jones, 1976).

Cooper and Fazio (1984) have proposed an attributional analysis of the process of cognitive

dissonance reduction. They suggested that individuals attempt to understand and justify their

discrepant behavior ("Why did I behave this way?"). When alternative causal explanations for the

discrepant behavior are absent, then participants conclude that they must have liked writing the

essay more than they initially thought, and this results in attitude change. Conversely, when

sufficient external explanations are available, no dissonance is experienced and no attitude change

will occur. Thus, for instance, more attitude change is expected given a low rather than high

monetary reward.

However, the reverse effect has been observed when individuals are forced to engage in

discrepant behavior. Indeed, in this case, there is more attitude change with a high rather than low


monetary reward (Linder, Cooper & Jones, 1976). To explain this opposite effect, Van Overwalle

and Jordens (2001) extended the attributional analysis by assuming that individuals try to

understand not only their discrepant behavior, but also their concurrent feelings ("Why do I feel

this way?"). In the case of high external constraints like strong pressure towards discrepant

behavior and low payment, the experimental situation will be experienced as particularly

unpleasant. According to Van Overwalle and Jordens (2001), these negative feelings will

counteract and reduce the attitude discrepancy, as if a person concludes that although having done

something wrong, he or she was already sufficiently punished for it by feeling very bad about it.

We conducted a connectionist implementation of the original experiment by Linder et al.

(1967), which is very similar to the simulation by Van Overwalle and Jordens (2001). The

learning history is shown in Table 11. To simulate the idea that prior experiences are only roughly

similar to the experimental manipulations, we used a distributed representation with noise added

(see top panel). The experimental manipulation was implemented as a single trial, to reflect the

assumption that attributional thoughts were raised at least once during the experiment (see middle

panel). The attitude toward the essay was measured by priming the attitude-object (i.e., essay) and

reading off the activation of both the behavioral and affective outcomes. Thus, an attitude is seen

here not only as an affective response, but also as a behavioral approach—avoidance response.

How is attitude change under induced choice simulated? Given that pressure from the

experimenter is absent, only the attitude-object (the essay) and payment serve as potential causes in

explaining discrepant behavior and concurrent feelings. A lowered reward is simulated by

decreasing the activation of the payment node to .20. This results in compensatory augmentation

(i.e., competition principle) of the connections from the essay node to the behavioral and affective

outcomes, with as consequence an increased positive attitude toward the essay.

How was the no-choice condition simulated? In addition to the influence of payment, the

negative feelings arising from low payment combined with experimenter pressure drive the

connection between the essay and evaluative outcome downward, resulting in a decreased attitude

toward the essay. The results of this simulation are shown in Figure 13 and compared with

empirical data by Linder et al. (1967). As can be seen, the fit between simulated and observed data

was excellent.


Limitations and further research

The present simulations encompass a wide variety of models and data in the attitude

literature. We are just beginning to uncover the implications of connectionist modeling for this

area in social cognition. However, from this initial sketch it appears that seemingly different

modes of processing and types of persuasion and information may all be driven by the same

underlying connectionist mechanisms. In addition, our analysis paints a somewhat different picture

of heuristic cues in attitude formation, in particular, and in social cognition in general. It is to this

discussion that we now turn.

A Note on Judgmental Heuristics

The mainstream theoretical approach to judgment in social psychology is that information

processing is rarely exhaustive or guided by logical norms, but rather reveals a compromise

between rationality and economy. In this approach, effortless judgments are typified by

judgmental heuristics that enable individuals to make rapid and easy judgments by rules-of-thumb

that require little explicit thinking but, overall, provide adequate responses most of the time.

According to this view, the price of such rapid judgments can be observed in a series of biases.

Rather than viewing heuristics and biases as exceptions to the rules of logical thinking, we

would like to argue that they actually reflect how the brain — as a connectionist device — works.

Take, for example, the heuristics assumed to influence judgments under uncertainty (Kahneman,

Slovic & Tversky, 1982), or the heuristic rules of dual-process models of persuasion (Chaiken,

1980; Petty & Cacioppo, 1987).

Heuristics under Uncertainty

We will consider three major heuristics used in judgment making under uncertainty. These

are:

Availability. The availability heuristic reflects the finding that many judgments are biased

by information about facts and arguments available in memory, either due to frequent (chronic)

utilization in the past or to recent priming. This is exactly what a recurrent network would predict.

Information that is recently primed or activated is spread to other related concepts, influencing

judgments about them as we have seen in the earlier assimilation examples (Simulation 5). A

dramatic demonstration of chronic accessibility was given by Smith and DeCoster (1998,


Simulation 5). They demonstrated that people who used a particular concept frequently in the past

might lose this information if it was "overwritten" by novel information, but this concept could be

quickly be recovered after a few presentations of the original information.

Representativeness. The representativeness heuristic has been invoked to explain the

finding that categorization is often guided by resemblance between concepts rather than by

statistical base rates. As we have seen in the section on categorization, this is exactly what one

would expect from a connectionist view. In Simulation 1, we demonstrated that a category is

chosen on the basis of its most unique (diagnostic) feature, even if that feature has the same base

rate as another, less unique feature.

Anchoring. The anchoring and adjustment heuristic has been proposed to explain why

judgments are often biased toward an initial anchor and has been taken as evidence that judgments

are often made and adjusted on-line. Again, anchoring can be simply taken as a consequence of

on-line or incremental connectionist learning. According to the delta learning algorithm, weight

adjustments are often stronger initially because of the greater error in the network, while later

adjustments become increasingly smaller because the error is reduced.

It is interesting to ask why adjustments in later phases of learning are often insufficient to

engender a change of opinion. For instance, why are situational constraints and pressures seldom

taken into account when making dispositional inferences about an actor? The answer may be

found in the backward revaluation hypothesis proposed by Dickinson and Burke (1996). As noted

in the causal attribution section, this hypothesis posits that backward discounting of causal factors

will occur only when there is a strong unique relationship between these causes. Because the

relationship between an actor and a situation is seldom unique (e.g., many different actors may

appear in the same situation), this revaluation hypothesis predicts that correction of personal

judgments by situational information will often be insufficient (see also Van Overwalle &

Timmermans, 2000, 2001).

Heuristics in Persuasion

Dual-process models of persuasion (Chaiken, 1987; Petty & Cacioppo, 1986) posit that

people revert to heuristic processing when their motivation or their capacity to analyze message

content in detail is minimal. Heuristic cues are characterized as salient, easily processed pieces of

stimulus information that gives rise, automatically, to the activation of a stored decision rule


(Chaiken, Duckworth & Darke, 1999). These heuristic rules are developed through past

experiences and observations and include, for example, beliefs that "experts can be trusted",

"multiple arguments are stronger", "high consensus implies correctness", and "things I like are

good". Heuristic processing involves automatic processing of these rules with little awareness of

their occurrence and their impact on judgments.

Although we agree that heuristic processing is automatic and often beyond awareness, we

would argue that it does not necessarily involve the application of well-learned rules. We do not

exclude this possibility, but we propose that connectionist principles provide a much more

convincing and parsimonious account of the implicit nature of heuristic processes. To support this

contention, let us review a number of these heuristic rules and see how connectionist processes can

explain them. As we shall see, the proposed connectionist mechanisms differ markedly from the

original hypotheses in the literature on how heuristics are "activated" and "applied" to influence

attitude judgments (Chen & Chaiken, 1999).

Expertise. This heuristic was already simulated in the previous section (Chaiken &

Maheswaran, 1994; Petty, Cacioppo & Goldman, 1981). It was shown that expertise involves an

expectation about argument quality and value, based on prior learning from the same or similar

sources. During prior learning, the activation of the value node is high when the source is an

expert (with a standard activation level of 1) or low when the source is not an expert (e.g., 0.10).

Most importantly, we contend that the effect of knowledge resulting from prior learning on source

quality is naturally integrated with novel information about an attitude-object through the principle

of acquisition. Hence, this heuristic does not require activation of any explicit rule or belief.

Consensus. This heuristic (Maheswaran & Chaiken, 1991) functions in very similar ways

to the expertise heuristic. Consensus information involves an expectation about the positive or

negative value of features (i.e., implemented by a positive or negative activation of the value

node), based on prior learning from other sources. In a connectionist framework, this expectation

or prior knowledge is naturally integrated with novel information, without any recourse to

additional rules or beliefs.

Length. Lengthy messages tend to contain more arguments (Petty & Cacioppo, 1984) or

tend to repeat the same arguments in different words with more detail (Wood, Kallgren & Preisler,

1985). According to the principle of acquisition, greater sample size of arguments should result in


stronger effects on attitude judgments. Thus, the more often an argument that an attitude-object

possesses a particular feature is repeated, the stronger the connection will grow between the

attitude-object and this feature (and its associated value).

Nevertheless, it is also possible that people are mislead by the sheer length of a message

(i.e., by use of larger fonts), even if it does not include more persuasive arguments (Wood et al.,

1985). This seems to suggest that superficial characteristics of a message can influence processing,

rather than the arguments themselves. A connectionist network can account for this effect by

assuming that such superficial characteristics of length are often correlated with actual differences

in message length, and so may influence attitude indirectly. More generally, heuristic processing

may sometimes be influenced by issue-irrelevant aspects of the information, and so reflect

qualitative rather than merely quantitative variations in processing (Petty & Wegener, 1999).

Mood. As is evident from the network architecture used to simulate attitude formation, in

our view, mood is just another outcome component that determines attitudes together with other

behavioral approach or avoidance outcomes (i.e., discrepant behaviors). In our simulations, we

tended to equate mood with evaluation (i.e., value node), but admittedly, this might prove to be an

oversimplification (e.g., Perugini & Conner, 2000), useful only as a first approximation of this

issue. Nevertheless, we do believe that mood and affect in general are outcomes or pieces of

information that determine one's attitude and other judgments, although we did not differentiate

strongly between implicit mood priming (Bower, 1981) and explicit processing of mood as part of

relevant information (Schwartz, 1990). We return to this issue of implicit and explicit information

processing in the general discussion.

Fit and Model Comparisons

The simulations that we have reported all replicate the empirical data or theoretical

predictions reasonably well. However, it is possible that this fit is due to some procedural choices

of the simulations rather than conceptual validity. The aim of this section is to demonstrate that

changes in these choices generally do not invalidate our simulations. To this end, we explore a

number of issues, including the localist versus distributed encoding of concepts, and the specific

recurrent network used. We will address each issue in turn.


Distributed Coding

The first issue is whether the nodes in the auto-associative architecture encode localist or

distributed features. Localist features reflect “symbolic” pieces of information, that is, each node

represents a concrete concept. In contrast, in a distributed encoding, a concept is represented by a

pattern of activation across an array of nodes, none of which reflect a symbolic concept but rather

some sub-symbolic micro-feature of it (Thorpe, 1994). Although we most often used a localist

encoding scheme to facilitate this introduction to the most important processing mechanisms

underlying connectionism, we admit that localist encoding is far from realistic. Unlike distributed

coding, it implies that each concept is stored in a single processing unit and, except for differing

levels of activation, is always perceived in the same manner. Given the advantages of distributed

coding, is it possible to replicate our localist simulations with a distributed representation?

To address this question, we reran all localist simulations with a distributed encoding

scheme much like the previous distributed simulations (see Table 12 for details). As can be seen,

all distributed simulations attained a good fit to data and, in all cases, the pattern of results from the

localist simulations was reproduced. This suggests that the underlying principles and mechanisms

that we put forward as being responsible for the major simulation results can be obtained not only

in the more contrived context of a localist encoding, but also in a more realistic context of a

distributed encoding.

Feedforward Model

We claimed earlier that feedforward connections were responsible for replicating most of

the phenomena of interest, with the exception of serial position in impression formation

(Simulation 3) and generalization (Simulation 5). To substantiate this claim, we ran all simulations

with a feedforward pattern associator (McClelland & Rumelhart, 1988) that consists only of

feedforward connections. As can be seen in Table 12, for all simulations except those mentioned

earlier, as predicted, a feedforward architecture did almost equally well as the original simulations.

The only exception was the interaction between heuristic and central processing of attitude

information (Simulation 9) that was less robust, as noted earlier.

This suggests that for most phenomena in social cognition, the feedforward connections in

the network were most crucial. Only for serial position, generalization and interaction of heuristic


and central processing (simulations 3, 5 & 9), the other lateral or backward connections were also

important for obtaining the predicted effects.

Non-linear Recurrent Model

We also claimed earlier that a recurrent model with a linear updating activation algorithm

and a single internal updating cycle (for collecting the internal activation from related nodes) was

sufficient for reproducing the social phenomena of interest. This contrasts with other social

researchers who used a non-linear activation updating algorithm and many more internal cycles

(Smith & DeCoster, 1998; Read & Montoya, 1999). Are these model features necessary? To

answer this question, we ran all our simulations with a non-linear activation algorithm and 10

internal cycles.

As can be seen from Table 12, although the non-linear model yielded an adequate fit, most

simulations did not improve substantially compared to the original simulations. This suggests that

the present linear activation update algorithm with a single internal cycle is sufficient for

simulating many phenomena in social cognition. This should not come as a surprise. In recurrent

simulations of other issues, such as the formation of semantic concepts, multiple internal cycles

were useful to perform "cleanup" in the network so that the weights between, for instance, a

perceptual and conceptual level of representation were forced to eventually settle into

representations that had pre-established conceptual meaning (e.g., Sitton, Mozer & Farah, 2000).

Such a distinction between perceptual and conceptual levels was not made here, and, as a result,

multiple internal cycles had no real function.

The Tensor Product Model

Kashima and his colleagues recently presented a tensor product model, an alternative

connectionist model of person and group impression formation and change (Kashima & Kerekes,

1994; Kashima, Woolcock, Kashima, 2000). As noted earlier, contrary to their claims, the present

recurrent network was able to successfully reproduce the phenomena of impression formation

simulated with their model, including recency and primacy effects. A major difference with the

present recurrent approach, however, is that the tensor product model uses a Hebbian learning

algorithm. Even though this type of learning has the advantage of neurobiological plausibility, it

has the significant disadvantage that it does not reproduce the competition property. Hence, social


cognition phenomena explained by this property such as base-rate neglect, discounting, cognitive

dissonance and so forth, can presumably not be simulated with this model, at least not without

additional assumptions. And indeed, to simulate, for instance, attenuation of recency in impression

formation, this model requires the ad-hoc assumption of different context presentations before and

after a judgment (Kashima & Kerekes, 1994). This assumption was not required with the present

simulations.

General Discussion

In this article, we have presented an overview of a number of major findings in social

cognition and have shown how they might be able to be accounted for within a connectionist

framework. This connectionist perspective offers a novel view on how information could be

encoded in the brain, how it might be structured and activated, and how it could be retrieved and

used for social judgment. This view differs from earlier theories in social cognition which have

relied on metaphors such as associative networks or constraint satisfaction networks with fixed

weights (Kunda & Thagard, 1996; Read & Marcus-Newhall, 1993; Shultz & Lepper, 1996), phase-

like integration of information (Gilbert, 1989; but see Trope & Gaunt, 2000) or a formulation in

algebraic or probabilistic terms (Cheng & Novick, 1992, Anderson, 1981; Ajzen, 1991). The

problem is that these various metaphors give a fragmentary account of social cognitive

mechanisms.

In contrast, the connectionist approach proposed in this paper, while it relies on the same

general auto-associative architecture and processing algorithms, has been used in such a way as to

be applicable to a wide-ranging number of phenomena in social cognition. Moreover, we have

shown that this model provides an alternative interpretation of earlier algebraic models in social

psychology (Cheng & Novick, 1992, Anderson, 1981; Ajzen, 1991). In addition, this model can

also account for the learning of social knowledge structures. Hence, this approach could

potentially be used to investigate the development among infants and children of the structures

underlying social cognition.

We have focused to a large extent on the model as a learning device, that is, as a

mechanism for associating patterns that reflect social concepts by means of very elementary

learning processes. One major advantage of a connectionist perspective is that complex social

reasoning and learning can be accomplished by putting together an array of simple interconnected


elements, which greatly enhance the network’s computational power, and by incrementally

adjusting the weights of the connections with the delta learning algorithm. We have demonstrated

that this learning algorithm gives rise to a number of novel properties, among them the acquisition

property which accounts for sample size effects, the competition property accounting for

discounting and augmentation, and the diffusion principle accounting for higher recall for

inconsistent information. These properties are able to explain most of our simulations of social

judgment and behavior. In contrast, introductory textbooks on the auto-associator (e.g., Fausett,

1994; McClelland & Rumelhart, 1988) emphasize other capacities of the auto-associator including

its content-addressable memory, its ability to do pattern completion (see also Simulation 5) and

fault and noise tolerance.

Implications

What are the implications of the present work for social cognitive theories? The key

contribution of this paper is that a wide range of social cognitive phenomena was simulated with

the same overall network model, suggesting that these phenomena are based, at least during early

processing, on the same fundamental information processing principles. Providing a common

framework for these different phenomena will hopefully generate further research and extend to

new areas of social psychology usually seen as too different to be brought under a single theoretical

heading. In addition to the present model’s ability to account for empirical data, it can generate

new hypotheses that can be tested in a classical experimental setting. We briefly discuss some

potential questions that emerge from this model.

Knowledge Acquisition

To what extent is the learning history assumed in our simulations correct? What

mechanisms and architectural considerations are necessary to preserve the network’s knowledge

base? Perhaps, these answers can in part be answered by laboratory replications of the assumed

learning histories that should reveal equivalences with the (prior) knowledge of participants.

Heuristic versus Central Processing

Our approach does not make a principled distinction between heuristic and central

processing. Quite often, setting activation to a lower or higher (default) level made it possible to

simulate this distinction, suggesting that heuristic and central processing is mainly a matter of


shallow versus focused attention to information. This differential attention gives rise to a

differential emphasis on, for example, prior information versus novel information, and may result

in different judgments. In previous sections, we explained in detail how several reasoning

heuristics could be viewed from this connectionist perspective and these suggestions are

immediately open to empirical tests.

Automatic versus Conscious Reasoning

As noted in the introduction, the present model does not draw a sharp distinction between

automatic and conscious processing, implicit and explicit processing, or associative and symbolic

information processing. While some may view this as a disadvantage of the model, recent research

has revealed that this distinction is far from clear-cut, as unconscious intuitions and insights may

underlie conscious decisions. To resolve this quandary, some researchers (e.g., Smith & DeCoster,

2000) have proposed a distinction between two processing modes: a slow-learning (connectionist)

pattern-completion mode and a more effortful (symbolic) mode that involves explicit symbolically

represented rules and inferences. The present approach seems to suggest that such sharp distinction

is perhaps not necessary.

The Role of Affect

In the final section on attitudes, the role of affect and evaluation was prominent in our

simulations. As noted above, our model makes no distinction between affect as unconscious

priming or explicit information (Schwartz, 1990), although it is clear that this distinction is crucial

to understanding people’s reaction to mood changes. Assuming that evaluation and affect play a

crucial role in attitude judgment, simply inducing a positive or negative mood unobtrusively will

change these judgments in the direction predicted by the model. Recent research findings seem to

support these predictions (Jordens & Van Overwalle, 2001).

Limitations and Future Directions

Given the breadth of social cognition, we inevitably were not able to include many other

interesting findings and phenomena. Perhaps the most interesting area omitted involves group

processes. Connectionist modeling may well help to explain how group identity is created, how

perceptions of group homogeneity is changed, how accentuation of correlated features is enhanced,

how illusory correlation and unrealistic negative stereotypes of minority groups are developed.


These questions are addressed in Van Rooy and Van Overwalle (2001c). However, other

phenomena, such as motivation, love, and violence, remain far beyond the current scope of

connectionist modeling.

While we believe we have shown that a connectionist framework can potentially provide a

parsimonious account of a number of disparate phenomena in social psychology, we are not

suggesting that this is the only valid means of modeling social cognitive phenomena. On the

contrary, we defend a multiple-view position in which connectionism would play a key role but

would co-exist alongside other viewpoints. We think that a strict neurological reductionism is

untenable, especially in personality and social psychology, where it is difficult to see how one

could develop a connectionist model of such high-level abstract concepts as “need for closure”,

“prejudice”, and the like.

There are other limitations to connectionist models. Researchers who may agree with our

overall auto-associative approach, may remain unpersuaded by a specific application of the model

to a particular phenomenon. These applications merely reflect our current thinking and will almost

certainly be replaced by improved models in the future. We believe, however, that the essence of

the approach proposed here will survive.

Our model suggests a number of possible directions for further investigation. Even though

the simple auto-associative model presented here does, indeed, apply to a wide-range of social

phenomena, it would be ridiculous to assume that the whole of higher cognition could be modeled

by auto-association alone. This simple paradigm quickly reveals its limits when we try to apply the

results obtained to other mechanisms.

First, given the importance of attention and motivation in social perception and cognition, it

will ultimately be necessary to incorporate these factors into an improved model. For the time

being, attentional aspects of human information processing are not part of the dynamics of our

network (variations were simply hand-coded as differences in activation states), which focuses

almost exclusively on learning and pattern association. Another issue that remains to be resolved is

how concepts are initially represented when presented to the network. This was not modeled in the

present simulations, but is certainly critical for context-dependent learning and judgment, which

involve combinations of features and context.

Second, another improvement to the present recurrent network might be the inclusion of an


array of hidden (McClelland & Rumelhart, p. 121—126) or exemplar nodes (e.g., Kruschke &

Johansen, 1999) that may potentially increase its power and capacity, for instance, to process non-

linear interactions. Although non-linearity was not an issue in the present simulations, it may

become more critical for combinations of features (e.g., when only a combination of causes

produces an effect), or of features and context.

Third, a more modular architecture will almost certainly be necessary to produce a better fit

of the model to empirical data. For example, one severe limitation of most connectionist models is

known as “catastrophic interference” (McCloskey & Cohen, 1989; Ratcliff, 1990), which is the

tendency of neural networks to forget abruptly and completely previously learned information in

the presence of new input. This limitation is untenable for a realistic model of social cognitive

processes, in general, and for a model of the formation and use of stereotypes, in particular, since

one of the basic properties of stereotypes is their resistance to change in the presence of new

information. In response to such observations, it has been suggested that, to overcome this

problem, the brain developed a dual hippocampal-neocortical memory system in which new

information is processed in the hippocampus and old information is stored and consolidated in the

neocortex (McClelland, McNaughton, & O’Reilly, 1995; Smith & DeCoster, 2000). Various

modelers (French, 1997; Ans & Rousset, 1997) have proposed modular connectionist architectures

mimicking this dual-memory system with one sub-system dedicated to the rapid learning of

unexpected and novel information and the building of episodic memory traces and the other sub-

system responsible for slow incremental learning of statistical regularities of the environment and

gradual consolidation of information learned in the first subsystem. There is considerable evidence

for the modular nature of the brain, in particular for the complementary learning roles of

hippocampal and neocortical structures (McClelland, McNaughton & O’Reilly, 1995), the

predominant role of the amygdala in social judgment and perception of emotions (Adolphs, Tranel

& Damasio, 1998), and so forth. It strikes us that the next step in connectionist modeling of social

cognition will involve exploring connectionist architectures built from separate but complementary

systems.

Conclusion

Connectionist modeling of social cognition fits seamlessly into a multilevel integrative

analyses of human behavior (Cacioppo & al., 2000). Given that cognition is intrinsically social,


connectionism will ultimately have to begin to incorporate social constraints into its models. On

the other hand, social psychology will need to be more attentive to the biological underpinnings of

social behavior. Social and biological approaches to cognition can therefore be seen as

complementary endeavors with the common goal of achieving a clearer and deeper understanding

of human behavior. We hope that connectionist accounts of social cognition will provide the

common ground for this exploration.


Appendix

A. The Linear Auto-Associative Model

In an auto-associative network, features and categories, or causes and outcomes are

represented in nodes that are all interconnected. Processing information in this model takes place

in two phases. In the first phase, the activation of the nodes is computed, and in the second phase,

the weights of the connections are updated (see also McClelland & Rumelhart, 1988).

Node Activation

During the first phase of information processing, each node in the network receives

activation from external sources. Because the nodes are all interconnected, this activation is then

spread throughout the network where it influences all other nodes. The activation coming from the

other nodes is called the internal input. Together with the external input, this internal input

determines the final pattern of activation of the nodes, which reflects the short-term memory of the

network.

In mathematical terms, every node i in the network receives external input, termed exti. In

the auto-associative model, every node i also receives internal input inti which is the sum of the

activation from the other nodes j (denoted by aj) in proportion to the weight of their connection, or

inti = (aj * wij), (1)

for all j i. Typically, activations and weights range between –1 to +1. The external input

and internal input are then summed to the net input, or

neti = E * exti + I * inti, (2)

where E and I reflect the degree to which the net input is determined by the external and

internal input respectively. Typically, in a recurrent network, the activation of each node i is

updated during a number of cycles until it eventually converges to a stable pattern that reflects the

network's short-term memory. According to the linear activation algorithm, the updating of

activation is governed by the following equation:

ai = neti - D * ai, (3)

where D reflects a memory decay term. In the present simulations, we used only one

internal updating cycle and the parameter values D = I = E = 1. Given these simplifying

assumptions, the final activation of node i reduces simply to the sum of the external and internal


input, or:

ai = neti = exti + inti (3')

Weight Updating

After this first phase, the auto-associative model then enters in its second learning phase,

where the short-term activation is consolidated in long term weight changes to better represent and

anticipate future external input. Basically, weight changes are driven by the discrepancy between

the internal input from the last but one updating cycle of the network and the external input

received from outside sources, formally expressed in the delta algorithm (McClelland &

Rumelhart, 1988, p. 166):

wij = (exti - inti)aj, (4)

where wij is the weight of the connection from node j to i, and is a learning rate that

determines how fast the network learns.

The presence of a feature or a category was typically encoded by setting the external input

to +1, and -1 for opposite features or categories (lower values were also used, see appropriate

tables); otherwise the external activation remained at resting level 0. The weights of the

connections were updated after each trial. At the end of each simulation, the judgment of interest

was tested by turning on the external input of the appropriate nodes and reading off the resulting

activation of the nodes that represent the judgment of interest (see also appropriate tables).

B. Anderson's Averaging Rule and the Delta algorithm

This appendix demonstrates that the delta algorithm converges at asymptote to Anderson's

(1981) averaging rule of impression formation, which expresses a rating about a person as:

rating = isi / i, (5)

were i represents the weights and si the scale values of the trait.

This proof uses the same logic as Chapman and Robbins (1990) in their demonstration that

the delta algorithm converges to the probabilistic expression of covariation. In line with the

conventional representation of covariation information, person impression information can be

represented in a contingency table with two cells. Cell a represents all cases where the actor is

ascribed a focal trait, while cell b represents all cases where the actor is ascribed the opposite trait.

For simplicity, I use only two trait categories, although this proof can easily be extended to more


categories.

In a recurrent connectionist architecture with localist encoding as used in the text, the target

person j and the trait categories i are each represented by a node, which are connected by adjustable

weights wij. When the target person is present, its corresponding node receives external activation,

and this activation is spread to each trait node. We assume that the overall activation received at

the trait nodes i (or internal activation) after priming the person node, reflects the impression on the

person.

According to the delta algorithm in Equation 4, the weights wij are adjusted proportional to

the error between the actual trait category (represented by its external activation ext) and the trait

category as predicted by the network (represented by its internal activation int). If we substitute in

Equation 4 ext by Anderson's scale values (s1 for the focal trait, and s2 for the opposite trait) and if

we take the default activation for aj (which is 1), then the following equations can be constructed

for the two cells in the contingency table:

For the a cell: wi = (s1 - int), (6)

For the b cell: wi = (s2 - int). (7)

The change in overall impression is the sum of Equations 6 and 7, weighted for the

corresponding frequencies a and b, in the two cells, or:

wi = a[(s1 - int)] + b[(s2 - int)] (8)

These adjustments will continue until asymptote, that is, until the error between actual and

expected category is zero. This implies that at asymptote, the changes will become zero, or wi =

0. Consequently, Equation 8 becomes:

0 = a[(s1 - int)] + b[(s2 - int)]

= a[s1 - int] + b[s2 - int]

= [a * s1 + b * s2] – [a + b]int

so that:

int = [a * s1 + b * s2] / [a + b],

Because the internal activation of the trait nodes reflects the trait impression on the person,

this can be rewritten in Anderson's terms as:

impression = fisi / fi (9)

where f represents frequencies with which a person and the traits co-occur. As can be seen,


Equation 6 has the same format as Equation 1. This demonstrates that the delta algorithm predicts

a weighted averaging function at asymptote for making overall impression judgments, where

Anderson's weights are determined by the frequencies by which person and traits are presented

together.

C. Ajzen's Expectancy-Value model and the Delta algorithm

This appendix demonstrates that the delta algorithm converges at asymptote to the

expectancy-value model of attitude formation by Ajzen (1991). According to Ajzen's (1991)

expectancy-value model, an attitude is formed by summing the multiplicative combination of (a)

the strength of a salient belief that a behavior will produce a given outcome and (b) the subjective

evaluation of this outcome, or (Ajzen, 1991, p. 191):

attitude biei, (10)

were bi represents the strength of the belief and ei the evaluation. Beliefs and evaluations

are typically scored on 7- point scales. However, because there is "no rational a priori criterion we

can use to decide how the belief and evaluation scales should be scored" (Ajzen, 1991, p. 193), the

preceding formula can be normalized by diving it by the mean belief strengths, or:

attitude biei / bi (11)

Using the same logic of the proof above, it can be shown that the delta algorithm results in

asymptote in an equation similar to Equation 9, or:

attitude = fiei / fi

(12)

where f represent the frequencies that the attitude-object leads to a given outcome (which

we assume determine the belief strengths b), and where e represents the activation values +1

(desirable outcomes), -1 (undesirable outcomes) or 0 (neutral). The equivalence between

Equations 4 and 12 demonstrates that the delta algorithm predicts a (normalized) multiplicative

function at asymptote for making attitude judgments, where the strength of the beliefs are

determined by the frequencies by which the attitude-object and outcomes co-occur.


References

Adolphs, R., & Damasio, A. (2001). The interaction of affect and cognition: A neurobiological

perspective. In J.P. Forgas (Ed.). Handbook of affect and social cognition (pp. 27-49).

Mahwah, NJ: Lawrence Erlbaum Associates.

Adolphs, R., Tranel, D., & Damasio, A. (1998). The human amygdala in social judgment. Nature,

393, 470-474.

Ajzen, I. & Madden, T.J. (1986). Prediction of goal-directed behavior: Attitudes, intentions, and

perceived behavioral control. Journal of Experimental Social Psychology, 22, 453—474.

Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision

Processes, 50, 179—211.

Ajzen, I., & Fishbein, M. (1980). Understanding attitudes and predicting social behavior.

Englewood Cliffs, NJ: Prentice-Hall.

Allan, L. G. (1993). Human contingency Judgments: Rule based or associative? Psychological

Bulletin, 114, 435-448.

Allison, T., Puce, A., & McCarthy, G. (2000). Social perception from visual cues: role of the STS

region. Trends in Cognitive Sciences, 4, 267-278.

Anderson, J. R. (1976). Language, memory and thought. Hillsdale, NJ: Erlbaum.

Anderson, N. H. (1967). Averaging model analysis of set size effect in impression formation.

Journal of Experimental Psychology, 75, 158—165.

Anderson, N. H. (1979). Serial position curves in impression formation. Journal of Experimental

Psychology, 97, 8—12.

Anderson, N. H. (1981). Foundations of information integration theory. New York: Academic

Press.

Anderson, N. H., & Farkas, A. J. (1973). New light on order effect in attitude change. Journal of

Personality and Social Psychology, 28, 88—93.

Anderson, S. M. & Cole, S. W. (1990). "Do I know you?": The role of significant other in general

social perception. Journal of Personality and Social Psychology, 59, 384—399.

Ans, B., & Rousset, S. (1997). Avoiding catastrophic forgetting by coupling two reverberating

neural networks. Académie des Sciences de la vie, 320, 989-997.

Asch, S. E. & Zukier, H. (1984). Thinking about persons. Journal of Personality and Social

Psychology, 46, 1230—1240.

Asch, S. E. (1946). Forming impressions of personality. Journal of Abnormal and Social


Baker, A. G., Berbier, M. W., & Vallée-Tourangeau, F. (1989). Judgments of a 2 x 2 contingency


table: sequential processing and the learning curve. The Quarterly Journal of Experimental

Psychology, 41B, 65—97.

Bargh, J. A. & Thein, R. D. (1985). Individual construct accessibility, person memory, and the

recall-judgment link: The case of information overload. Journal of Personality and Social

Psychology, 49, 1129—1146.

Betsch, T., Plessner, H., Schwieren, C., & Gütig, R. (2001). Personality and Social Psychology

Bulletin, 27, 242—253.

Bohner, G., Moskowitz G. B., & Chaiken, S. (1995). The interplay between heuristic and

systematic processing of social information. European Review of Social Psychology, 6, 33

—68.

Bohner, G., Ruder, M., & Erb, H.-P. (1999). When expertise backfires: Contrast versus

assimilation in the interplay of heuristic and systematic processing. Unpublished

manuscript.

Bower, G. H. (1981) Emotional mood and memory. American Psychologist, 36, 129—148.

Busemeyer, J. R. & Myung, I. J. (1988). A new method for investigating prototype learning.

Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 3—1.

Cacioppo, J.T., Berntson, G.G., Sheridan, J.F., & McClintock M.K. (2000). Multilevel integrative

analyses of human behavior: social neuroscience and the complementing nature of social

and biological approaches. Psychological Bulletin, 126, 829-843.

Chaiken, S. (1980). Heuristic versus systematic information processing and the use of source

versus message cues in persuasion. Journal of Personality and Social Psychology, 39, 752

—766.

Chaiken, S. (1987). The heuristic model of persuasion. In M. P. Zanna, J. M. Olson, & C. P.

Herman (Eds.). Social influence: The Ontario Symposium (Vol. 5., pp. 3—39). Hillsdale,

NJ: Erlbaum.

Chaiken, S., & Maheswaran, D. (1994). Heuristic processing can bias systematic processing:

effects of source credibility, argument ambiguity, and task importance on attitude

judgment. Journal of Personality and Social Psychology, 66, 460—473.

Chaiken, S., Duckworth, K. L., & Darke, P. (1999). When parsimony fails… Psychological

Inquiry, 10, 118—123.

Chaiken, S., Liberman, A. & Eagly, A. H. (1989). Heuristic and systematic information

processing within and beyond the persuasion context. In J. S. Uleman & J. A. Bargh (Eds.)

Unintended thought (pp. 212—252). New York, NY: Guilford.

Chapman, G. B. & Robbins, S. J. (1990). Cue interaction in human contingency judgment.

Memory and Cognition, 18, 537—545.


Chen, S. & Chaiken, S. (1999). The Heuristic-systematic model in its broader context. In S.

Chaiken & Y. Trope (Eds.). Dual-process theories in social psychology (pp. 73—96).

New York, NY: Guilford Press.

Cheng, P. W., & Novick, L. R. (1992). Covariation in natural causal induction. Psychological

Review, 99, 365—382.

Cooper, J. & Fazio, R. H. (1984). A new look at dissonance theory. In L. Berkowitz (Ed.).

Advances in experimental social psychology (Vol. 17, pp. 229-266). New York: Academic

Press.

Dickinson, A. & Burke, J. (1996). Within-compound associations mediate the retrospective

revaluation of causality judgments. Quarterly Journal of Experimental Psychology, 49B,

60-80.

Dreben, E. K., Fiske, S. T., & Hastie, R. (1979). The independence of evaluative and item

information: Impression and recall order effects in behavior-based impression formation.

Journal of Personality and Social Psychology, 37, 1758-1768.

Eagly, A. H. & Chaiken, S. (1993). The psychology of Attitudes. San Diego, CA: Harcourt

Brace.

Ebbesen, E. B., & Bowers, R. J. (1974) Proportion of risky to conservative arguments in a group

discussion and choice shifts. Journal of Personality and Social Psychology, 29, 316—327.

Fausett, L. (1994). Fundamentals of neural networks: architectures, algorithms and applications.

Englewood Cliffs, NJ: Prentice-Hall.

Fazio, R. H. (1990). Multiple processes by which attitudes guide behavior: the MODE model as an

integrative framework. In M. P. Zanna (Ed.) Advances in Experimental Social Psychology

(vol. 13, pp. 75—109). San Diego, CA: Academic Press.

Festinger, L. (1957) A theory of cognitive dissonance. Evanston, IL: Row, Peterson.

Fiedler, K. (1996). Explaining and simulating judgment biases as an aggregation phenomenon in

probabilistic, multiple-cue environment. Journal of Personality and Social Psychology,

103, 193-214.

Fiedler, K., Walther, E. & Nickel, S. (1999). The auto-verification of social hypotheses:

Stereotyping and the power of sample size. Journal of Personality and Social Psychology,

77, 5-18.

Fishbein, M., & Ajzen, I. (1975). Belief attitude, intention and behavior an introduction to theory

and research. London, UK: Addison-Wesley.

Försterling, F. (1989). Models of covariation and attribution: How do they relate to the analogy of

analysis of variance? Journal of Personality and Social Psychology, 57, 615—625.

Försterling, F. (1992). The Kelley model as an analysis of variance analogy: How far can it be


taken? Journal of Experimental Social Psychology, 28, 475—490.

French, R. (1997). Pseudo-recurrent connectionist networks: An approach to the “sensitivity–

stability” dilemma. Connection Science, 9, 353-379.

Gilbert, D. T. & Malone, P. S. (1995). The correspondence bias. Psychological Bulletin, 117, 21

— 38.

Gilbert, D. T. (1989). Thinking lightly about others: Automatic components of the social inference

process In J. S. Uleman & J. A. Bargh (Eds.) Unintended thoughts: Limits of awareness,

intention, and control (pp. 189-211). New York: Guilford.

Gluck, M. A. & Bower, G. H. (1988). From conditioning to category learning: An adaptive

network model. Journal of Experimental Psychology: General, 117, 227—247.

Gluck, M. A. (1992). Stimulus sampling and distributed representations in adaptive network

theories of learning. In A. F. Healy, S.M. Kosslyn, R. M. Shiffrin (Eds.) From learning

theory to connectionist theory: Essays in honor of William K. Estes (pp. 169—199).

Hillsdale, NJ: Erlbaum.

Gollwitzer, P. M. (1990) Action phases and mind-sets. In E. T. Higgins and R. M. Sorrentino

(Eds.), Handbook of motivation and cognition: Foundations of social behavior (Vol. 2, pp.

53—92). New York: Guilford Press.

Graham, S. (1999). Retrospective revaluation and inhibitory associations: Does perceptual

learning modulate our perceptions of the contingencies between events? Quarterly Journal

of Experimental Psychology, 52B, 159-185.

Hamilton, D. L., Driscoll, D. M., & Worth, L. T. (1989). Cognitive organization of impressions:

Effects of incongruency in complex representations. Journal of Personality and Social


Hamilton, D. L., Katz, L. B., Leier, V. O. (1980). Cognitive representation of personality

impressions: Organizational processes in first impression formation. Journal of Personality

and Social Psychology, 39, 1050—1063.

Hamilton, D.L., Dugan, P.M., & Trollier, T.K. (1985). The formation of stereotypic beliefs:

Further evidence for distinctiveness-based illusory correlation. Journal of Personality and

Social Psychology, 48, 5-17.

Hansen, R. D. & Hall, C. A. (1985). Discounting and augmenting facilitative and inhibitory

forces: the winner takes all. Journal of Personality and Social Psychology, 49, 1482--1493.

Hastie, R. & Kumar, P. A. (1979) Person memory: Personality traits as organizing principles in

memory for behaviors. Journal of Personality and Social Psychology, 37, 25—38.

Hastie, R. (1980). Memory for behavioral information that confirms or contradicts a personality

impression. In R. Hastie, T. M. Ostrom, E. B. Ebbesen, R. S. Wyer, D. L. Hamilton, & D.


E. Carlston (Eds.). Person Memory: The cognitive basis of social perception (pp. 155—

177). Hillsdale, NJ: Erlbaum.

Hintzmann, D. L. (1986). "Schema abstraction" in a multi-trace memory model. Psychological

Review, 93, 411—428.

Ito, T.A., & Cacioppo, J.T. (2001). Affect and attitudes: A social neuroscience approach. In J.P.

Forgas (Ed.) Handbook of affect and social cognition (pp. 50-74). Mahwah, NJ: Lawrence

Erlbaum Associates.

Kahneman, D., Slovic, P., & Tversky, A. (1982). Judgments under uncertainty: Heuristics and

biases. Cambridge, England: Cambridge University Press.

Kashima, Y, & Kerekes, A. R. Z. (1994). A distributed memory model of averaging phenomena in

person impression formation. Journal of Experimental Social Psychology, 30, 407—455.

Kashima, Y., Woolcock, J., & Kashima, E. S. (2000). Group impression as dynamic

configurations: The tensor product model of group impression formation and change.

Psychological Review, 107, 914-942

Kelley, H. H. (1972). Causal schemata and the attribution process. In E. E. Jones, D. E. Kanouse,

H. H. Kelley, R. E. Nisbett, S. Valins & B. Weiner (Eds.) Attribution: Perceiving the

causes of behavior (pp. 151-174). Morristown, NJ: General Learning Press.

Kruglanski, A. W., Schwartz, S. M., Maides, S., & Hamel, I. Z. (1978). Covariation, discounting,

and augmentation : Towards a clarification of attributional principles. Journal of

Personality, 76, 176–189.

Kruschke, J. K. (1996). Base rates in category learning. Journal of Experimental Psychology:

Learning, Memory and Cognition, 22, 3—26.

Kruschke, J. K., & Johansen, M. K. (1999). A model of probabilistic category learning. Journal of

Experimental Psychology: Learning, Memory, and Cognition, 25, 1083—1119.

Kunda, Z., & Thagard, P. (1996). Forming impressions from stereotypes, traits, and behaviors: A

parallel-constraint-satisfaction theory. Psychological Review, 103, 284-308

Labiouse, C. L. & French, R. M. (2001). A connectionist model of person perception and

stereotype formation. In R. French & J. Sougné (Eds.) Proceedings of the sixth Neural

Computation and Psychology Workshop: Learning, Development, and Evolution, pp.209-

218. London: Springer Verlag.

Linder, D.E., Cooper, J. & Jones, E.E. (1967). Decision freedom as a determinant of the role of

incentive magnitude in attitude change. Journal of Personality and Social Psychology, 6,

245-254.

Macrae, C. N., Hewstone, M., & Griffiths, R. J. (1993). Processing load and memory for

stereotype-based information. European Journal of Social Psychology, 23, 77-87.


Maheswaran, D. & Chaiken, S. (1991). Promoting systematic processing in low-motivation

settings: Effect of incongruent information on processing and judgment. Journal of


Manis, M., Dovalina, I., Avis, N. E., & Cardoze, S. (1980). Base rates can affect individual

predictions. Journal of Personality and Social Psychology, 38, 231—248.

McClelland, J. L. & Rumelhart, D. E. (1985). Distributed memory and the representation of

general and specific information. Journal of Experimental Psychology, 114, 159—188.

McClelland, J. M. & Rumelhart, D. E. (1988). Explorations in parallel distributed processing: A

handbook of models, programs and exercises. Cambridge, MA: Bradford.

McClelland, J., McNaughton, B., & O’Reilly, R. (1995). Why there are complementary learning

systems in the hippocampus and neocortex: Insights from the successes and the failures of

connectionist models of learning and memory. Psychological Review, 102, 419-457.

McCloskey, M., & Cohen N.J. (1989). Catastrophic interference in connectionist networks: the

sequential learning problem. The Psychology of Learning and Motivation, 24, 109-165.

Medin, D. L. & Edelson, S. M. (1988). Problem structure and the used of base-rate information

form experience. Journal of Experimental Psychology, General, 117, 68—85.

Medin, D. L., & Schaffer, M. M. (1978). Context theory of classification learning. Psychological

Review, 85, 207—238.

Morris, M. W. & Larrick, R. P. (1995). When one cause casts doubt on another: A normative

analysis of Discounting in causal attribution. Psychological Review, 102, 331—355.

Moskowitz, G. B., & Skurnik, I. W. (1999). Contrast effects as determined by the type of prime:

trait versus exemplar primes initiate processing strategies that differ in how accessible

constructs are used. Journal of Personality and Social Psychology, 76, 911—927.

Nisbett, R.E., & Wilson, T.D. (1977). Telling more than we can know: Verbal reports on mental

processes. Psychological Review, 84, 231-259.

Nosofsky, R. M., Kruschke, J. K. & McKinley, S. C. (1992). Combining exemplar-based category

representations and connectionist learning rules. Journal of Experimental Psychology:

Learning, Memory and Cognition, 18, 211—233.

Nosofsky, R.M. (1986). Attention, similarity, and the identification-categorization relationship.

Journal of Experimental Psychology: General, 115, 39-57.

Perugini, M. & Conner, M. (2000). Predicting and understanding behavioral volitions: The

interplay between goals and behaviors. Journal of Experimental Social Psychology, 30,

705—731.

Petty, E. & Wegener, D. T. (1999). The elaboration likelihood model: Current status and

controversies. In S. Chaiken & Y. Trope (Eds.). Dual-process theories in social


psychology (pp. 41—72). New York, NY: Guilford Press.

Petty, R. E. & Cacioppo, J. T. (1986). The elaboration likelihood model of persuasion. In L.

Berkowitz (Ed.). Advances in experimental social psychology (Vol. 19, pp. 123—205).

San Diego, CA: Academic Press.

Petty, R. E., Cacioppo, J. T. (1984). The effects of involvement on responses to argument quantity

and quality: Central and peripheral routes to persuasion. Journal of Personality and Social


Petty, R.E., Cacioppo, J. T., & Goldman, R. (1981). Personal involvement as a determinant of

argument-based persuasion. Journal of Personality and Social Psychology, 41, 847—855.

Phelps, E.A., O’Connor, K.J., Cunningham, W.A., Funayama, S., Gatenby, C., Gore, J.C., &

Banaji M.R. (2000). Performance on indirect measures of race evaluation predicts

amygdala activation. Journal of Cognitive Neuroscience, 12, 729-738.

Ratcliff, R. (1990). Connectionist models of recognition memory: constraints imposed by learning

and forgetting functions. Psychological Review, 97, 285-308.

Read, S. J. & Marcus-Newhall, A. (1993) Explanatory coherence in social explanations: A parallel

distributed processing account. Journal of Personality and Social Psychology, 65, 429—

447.

Read, S. J., & Montoya, J. A. (1999). An autoassociative model of causal reasoning and causal

learning: Reply to Van Overwalle's critique of Read and Marcus-Newhall (1993). Journal

of Personality and Social Psychology, 76, 728—742.

Reeder, G. D., & Brewer, M. B. (1979). A schematic model of dispositional attribution in

interpersonal perception. Psychological Review, 86, 61—79.

Rescorla, R. A. & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the

effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy

(Eds.) Classical conditioning II: Current research and theory (pp. 64–98). New York:

Appleton-Century-Crofts.

Rosch, E. H. (1978) Principles of categorization. In E. H. Rosch & B. B. Lloyds (Eds.), Cognition

and categorization (pp. 27—48). Hillsdale, NJ: Erlbaum.

Rosenfield, D. & Stephan, W. G. (1977). When discounting fails: An unexpected finding.

Memory and Cognition, 5, 97-102.

Rumelhart, D. E., & McClelland, J. L. (1996). Parallel Distributed Processing: Explorations In The

Microstructure of Cognition - Vol. 1: Foundations. Cambridge, MS: The Mit Press.

Sarle, W. S. (1994). Neural networks and statistical models. Proceedings of the nineteenth annual

SAS users group international conference.

Schwarz, N. (1990). Feelings as information: Informational and motivational functions of affective


states. In E.T. Higgins & R. Sorrentino (Eds.), Handbook of motivation and cognition:

Foundations of social behavior (Vol. 2). New York: Guilford.

Shacklee, H. & Fischhoff, B. (1982). Strategies of information search in causal analysis. Memory

and Cognition, 10, 520-530.

Shanks, D. (1992). Connectionist accounts of the inverse base-rate effect in categorization.

Connection Science, 4, 3—18.

Shanks, D. R. (1985). Forward and backward blocking in human contingency judgment. Quarterly

Journal of Experimental Psychology, 37b, 1—21.

Shanks, D. R. (1987). Acquisition functions in contingency judgment. Learning and Motivation,

18, 147—166.

Shanks, D. R. (1995). Is human learning rational? Quarterly Journal of Experimental Psychology,

48a, 257—279.

Shanks, D.R., Lopez, F. J., Darby, R. J., Dickinson, A. (1996). Distinguishing associative and

probabilistic contrast theories of human contingency judgment. In D. R. Shanks, K. J.

Holyoak, & D. L. Medin (Eds.) The psychology of learning and motivation (Vol. 34, pp.

265—311). New York, NY: Academic Press.

Shultz, T. & Lepper, M. (1996). Cognitive dissonance reduction as constraint satisfaction.

Psychological Review, 2, 219-240.

Siebler, F., Bohner, G. & Weinerth, T. (1998). Simulation of implicit and explicit processes in

parallel-constraint-satisfaction networks. Unpublished manuscript.

Sitton, M., Mozer, M. C., & Farah, M. J. (2000). Superadditive effects of multiple lesions in a

connectionist architecture: Implications for the neuropsychology of optic aphasia.

Psychological Review, 107, 709—734.

Smith E. R., & Zárate, M. A. (1992). Exemplar-based model of social judgment. Psychological

Review, 99, 3—21.

Smith, E. R. & DeCoster, J. (1998). Knowledge acquisition, accessibility, and use in person

perception and stereotyping: Simulation with a recurrent connectionist network. Journal of


Smith, E. R. (1996). What to connectionism and social psychology offer each other? Journal of

Personality and Social Psychology, 70, 893-912.

Smith, E.R. & DeCoster, J. (2000). Associative and rule-based processing: A connectionist

interpretation of dual-process models. In S. Chaiken & Y. Trope (Eds.) Dual-process

theories in social psychology (pp. 323—338). London, UK: Guilford.

Srull, T. K. (1981). Person Memory: Some tests of associative storage and retrieval models.

Journal of Experimental Psychology: Human Learning and Memory, 7, 440—463.


Srull, T. K., Lichtenstein, M., Rothbart, M. (1985). Associative storage and retrieval processes in

person memory. Journal of Experimental Psychology: Learning, Memory, and Cognition,

11, 316—345.

Stangor, C. & McMillan, D. (1992). Memory for expectancy-congruent and expectancy-

incongruent information: A review of the social and social developmental literatures.

Psychological Bulletin, 111, 42—61.

Stangor, C., & Duan, C. (1991). Effects of multiple task demands upon memory for information

about social groups. Journal of Experimental Social Psychology, 27, 357—378.

Stapel, D. A., Koomen, W. & J. van der Pligt (1997). Categories of category accessibility: the

impact of trait concept versus exemplar priming on person judgments. Journal of

Experimental Social Psychology, 33, 47—76.

Stewart, R. H. (1965). Effect of continuous responding on the order effect in personality

impression formation. Journal of Personality and Social Psychology, 1, 161—165.

Thagard, P. (1989). Explanatory coherence. Behavioral and Brain Sciences, 12, 435-467.

Thorpe (1994). Localized versus distributed representations. In M. A. Arbib (Ed.) Handbook of

brain theory and neural networks (pp. 949-952). Cambridge, MA: MIT Press.

Tobena, A., Marks, I., & Dar, R. (1999). Advantages of bias and prejudice: an exploration of their

neurocognitive templates. Neuroscience and Biobehavioral Reviews, 23, 1047-1058.

Trope, Y. & Gaunt, R. (2000). Processing alternative explanations of behavior: Correction or

integration? Journal of Personality and Social Psychology, 79, 344—354.

Van Overwalle, F. & Timmermans, B. (2000) Discounting and augmentation in attribution: The

role of connections between causes. Manuscript submitted for publication.

Van Overwalle, F. & Timmermans, B. (2001). Learning about an Absent Cause: Discounting and

Augmentation of Positively and Independently Related Causes. In R. French & J. Sougné

(Eds.) Proceedings of the sixth Neural Computation and Psychology Workshop: Evolution,

Learning, and Development. London: Springer Verlag.

Van Overwalle, F. (1996). The relationship between the Rescorla-Wagner associative model and

the probabilistic joint model of causality. Psychologica Belgica, 36, 171-192.

Van Overwalle, F. (1998) Causal Explanation as Constraint Satisfaction: A Critique and a

Feedforward Connectionist Alternative. Journal of Personality and Social Psychology, 74,

312-328.

Van Overwalle, F., & Jordens, K. (2001) A Feedforward Connectionist Model of Cognitive

Dissonance: An Alternative to Shultz and Lepper (1996). Manuscript submitted for

publication.

Van Overwalle, F., & Van Rooy, D. (1998). A Connectionist Approach to Causal Attribution. In S.


J. Read & L. C Miller (Eds.) Connectionist models of Social Reasoning and Social

Behavior. New York: Erlbaum.

Van Overwalle, F. & Van Rooy, D. (2001a). When more observations are better than less : A

connectionist account of the acquisition of causal strength. European Journal of Social

Psychology, 31, 155-175.

Van Overwalle, F., & Van Rooy, D. (2001b). How one cause discounts or augments another: A

connectionist account of causal competition. Personality and Social Psychology Bulletin, in

press.

Van Overwalle, F. & Van Rooy, D. (2001c). A recurrent connectionist model of biases in group

judgments. Manuscript submitted for publication.

Van Overwalle, F., Drenth, T. & Marsman, G. (1999). Spontaneous trait inferences: Are they

linked to the actor or to the action? Personality and Social Psychology Bulletin, 25, 450-

462.

Wells, G. L. & Ronis, D. L. (1982). Discounting and augmentation: Is there something special

about the number of causes? Personality and Social Psychology Bulletin, 8, 566—572.

Wood, W., Kallgren, C. A., Preisler, R. M. (1985). Access to attitude-relevant information in

memory as a determinant of persuasion: The role of message attributes. Journal of

Experimental Social Psychology, 21, 73—85.


Table 1

Overview of the Simulations

Nr. Topic Evidence / PredictionMajor Processing

Principle

1 Categorization Gluck & Bower, 1988, exp. 1 Competition

2 Impression formation Stewart, 1965 Acquisition

3 Serial position Dreben, Fiske & Hastie, 1979 Acquisition

4 Inconsistent

information

Hamilton, Driscoll, & Worth, 1980, exp. 3 Diffusion

5 Generalization Smith & DeCoster, 1998, sim. 1 & 2 Spreading

of Internal

Activation

6 Assimilation &

Contrast

Stapel, Koomen & van der Pligt, exp. 3 Acquisition

Competition

7 Causal Attribution Van Overwalle & Van Rooy, 2001, exp. 1 Acquisition

Competition

8 Attitude Formation Ajzen, 1991 Acquisition

9 Dual-Process Models Chaiken & Maheswaran, 1994 Acquisition

10 Cognitive Dissonance Linder, Cooper & Jones, 1967 Competition


Table 2

Learning Experiences in Categorization (Simulation 1)

Features Categories——————————————————————————————-———————- ———————————————-Dutch less-sophisticated refined French Flemish Walloon

Flemish (Rare) Category

#10 1 0 0 0 1 0 #5 1 1 0 0 1 0 #5 1 1 1 0 1 0 #10 1 1 1 1 1 0

R Walloon (Common) Category

#30 1 1 1 1 0 1 #15 0 1 1 1 0 1 #15 0 0 1 1 0 1 #30 0 0 0 1 0 1

Test

Features

Dutch 1 0 0 0 ? –? less-sophisticated 0 1 0 0 ? –? refined 0 0 1 0 ? –? French 0 0 0 1 ? –?

Prototype

Flemish ? ? ? ? 1 0 Walloon ? ? ? ? 0 1

Note. Simplified version of the experimental design of Gluck & Bower (1988); Cell entries denote external

activation; R=Randomized order; #=frequency of trial; the pattern of features were generated according to

the following probabilities: Given a category, the category's own perfect feature was present 100% of the

time and its imperfect feature 67%, the other category's imperfect feature 50% and its perfect feature 33%.


Table 3

Impression Formation: Recency after Reversal of Trait-implying Information (Simulation 2)

Features Category—————————————————————— ———————————

person context trait

High - Low Presentation Order

#4 High trait 1 1 +1#4 Low trait 1 1 –1

Low - High presentation Order

#4 Low trait 1 1 –1#4 High trait 1 1 +1

Test

1 0 ?

Note. Schematic representation of the experimental design of Stewart (1965); High=adjective implies trait;

Low=adjective implies opposite trait; Cell entries denote external activation; #=number of trials.


Table 4

Impression Formation: Recency and Primacy in Serial Position Weights (Simulation 3)

Features Category—————————————————————— ———————————

person context trait

Confirmatory Information

#4 High 1 1 +1

Mixed Information

#3 High 1 1 +1#1 Lowa 1 1 –1

Test

1 0 ?

Note. Schematic representation of the experimental design of Dreben, Fiske & Hastie (1979) illustrated here

for the four trial condition; High=adjective implies trait; Low=adjective implies opposite trait; Cell entries

denote external activation; Initial weights were set at .10; #=number of trials.

a This disconfirming trial is presented at position 1, 2, 3, or 4 of the four trial series (here it is shown at

position 4), and ratings are then compared with judgments from the confirmatory condition at the same

serial position.


Table 5

Impression Formation: Memory for inconsistent Information (Simulation 4)

Trait Behavioral Exemplars —————————————— —————————————————————————

person common violent consistent inconsistent #1 consistent 1a 1a 0 1 0 0 0 0 #1 consistent 1a 1a 0 0 1 0 0 0 #1 inconsistent 1a 0 1a 0 0 0 0 1 #1 consistent 1a 1a 0 0 0 1 0 0 #1 consistent 1a 1a 0 0 0 0 1 0

TestRecall consistent 1 1 1 ? ? ? ? 0 inconsistent 1 1 1 0 0 0 0 ?

Biased Recognition consistent ? ? 0 1 1 1 1 0 inconsistent ? ? 0 0 0 0 0 1

Note. Schematic representation of the experimental design of Hamilton, Katz & Leirer (1980, experiment

3), illustrated for a fictitious 4 / 1 distribution of consistent versus inconsistent behaviors. In the original

experiment, the distribution was 10 / 1, and the inconsistent statement was given at position 2, 6 or 10 of

the list (in most similar experiments order was randomized). Cell entries denote external activation;

#=number of trials.

a Activation set to 0.10 under memorizing instructions.


Table 6

Assimilation: Exemplar and Group Inferences (Simulation 5)

Exemplar Features Group Features———————————————————————————— ———————————————E1 E2 E3 E4 E5 G1 G2 G3

Background Knowledge

#100 context 0̃ 0̃ 0̃ 0̃ 0̃ 0̃ 0̃ 0̃R

#20 exemplar ̃ ̃ ̃ ̃ ̃ ̃ –̃ +̃

Test

exemplar ? 0 0 0 group 0 0 0 0 0 + – ?

Note. Schematic representation of assimilation of exemplar and group stereotypes; Each feature E or G is

represented by 5 nodes; Cell entries denote external activation; for the exemplar features, reflects a

randomly drawn Normal distributed pattern with M=0 & SD=.5 (identical across all trials); for the group

features, + reflects M=.50 and – reflects M=-.5; R=Randomized order; #=number of trials. For reasons of

clarity, the other exemplar features E6 to E15 representing two other exemplars (each #20 trials) are not

shown.

~ Noise added randomly at each trial with Normal distribution of M=0 & SD=.5


Table 7

Assimilation and Contrast (Simulation 6)

Exemplars Trait Categories——————————————————————— ————————————————

hostile/ person Hitler Gandhi violent nice


#10 Hitler 0 + 0 + 0 #10 Gandhi 0 0 + 0 +R #10 average person + 0 0 + 0 #10 average person + 0 0 0 + #10 traits 0 0 0 – +

Priming (each condition only once)

#1 Hitler +̃ +̃ 0 0 0#1 Gandhi +̃ 0 +̃̃̃ 0 0#1 hostile +̃ 0 0 +̃ 0#1 nice +̃ 0 0 0 +̃̃̃̃

Test

+ 0 0 0 ?

Note. Schematic representation of prior knowledge acquisition and experimental design of Stapel, Koomen

& Van der Plight (1997, Experiment 3); Each feature/category is represented by 5 nodes; Cell entries denote

external activation with + and – reflecting a randomly drawn Normal distributed pattern with M=+.5/-.5 &

SD=.5 (identical across all trials; the simulation was run for 5 such activation patterns and results were

averaged); R=Randomized order; #=number of trials.

~ Noise added randomly at each trial with Normal distribution of M=0 & SD=.5


Table 8

Forward Discounting of a Novel Cause in function of the Sample Size of a Known Cause (Simulation 7)

Causes Outcome—————————————————————— ———————————

known novel (Ann) (Troy)

Small Sample Size #1 1 0 1 #5 1 1 1Large Sample Size #5 1 0 1 #5 1 1 1

Test

Known (Ann) 1 0 ?Novel (Troy) 0 1 ?

Note. Schematic representation of the experimental design of Van Overwalle & Van Rooy (2001); Cell

entries denote external activation; #=number of trials.


Table 9

Attitude Formation (Simulation 8)

Causal Factors Outcomes———————————————— —————————————————————-——car bicycle bus fast dry pollutes value

Car #10 1 0 0 1 0 0 1R #10 1 0 0 0 1 0 1 #10 1 0 0 0 0 1 –1

Bicycle #5 0 1 0 1 0 0 1R #10 0 1 0 0 –1 0 –1

Bus #10 0 0 1 –1 0 0 –1R #5 0 0 1 0 1 0 1 #5 0 0 1 0 0 1 –1

Test

attitude toward car 1 0 0 0 0 0 ?attitude toward bicycle 0 1 0 0 0 0 ?attitude toward bus 0 0 1 0 0 0 ?

Note. Schematic representation of attitude formation on the basis of beliefs on outcome consequences and

value (likeability of consequences; cf., theory of Planned behavior); Cell entries denote external activation;

R=Randomized order; #=number of trials.


Table 10

Dual Processes in Attitude Formation (Simulation 9)

Causal Factors Outcomes——————————————— ———————————————————————-——

product source featuresa value

Prior Knowledge on Source#20 Low credibility 1 1 0 0 0 0 .1#20 High credibility 1 1 0 0 0 0 1

Strong arguments #2 1 1 1 0 0 0 1 #2 1 1 0 1 0 0 1 #2 1 1 0 0 1 0 0

Weak arguments #2 1 1 1 0 0 0 0 #2 1 1 0 1 0 0 0 #2 1 1 0 0 1 0 .1

Ambiguous arguments #2 1 1 1 0 0 0 0 #2 1 1 0 1 0 0 1 #2 1 1 0 0 1 0 0 #2 1 1 0 0 0 1 .1

Test

attitude toward product 1 0 0 0 0 0 ?

Note. Simplified representation of attitude formation on the basis of heuristics (source credibility) and

systematic processing (argument quality; Chaiken & Maheswaran, 1994); Cell entries denote external

activation; R=Randomized order; #=number of trials (for heuristic processing, the number of trials for all

arguments was set to 1 with all activation levels divided by 10).

a The first two features of the product are of high importance, the last two of low importance (as can be seen

from the value component).

R

R

R


Table 11

Cognitive Dissonance following Induced Compliance (Simulation 10)

Causal factors Outcomes————————————————————— ———————————————essay writetopic payment forced essay value


#10 topic (T) +̃ 0 0 0 0 #10 T + payment +̃ +̃ 0 +̃ 0 #10 T + 20% P +̃ +̃a 0 +̃ 0R #10 T + forced (F) +̃ 0 +̃ +̃ –̃̃ #10 T + P + F +̃ +̃ +̃ +̃ 0 #10 T + 20% P + F +̃ +̃a +̃ +̃ –̃

Induced Compliance (each condition only once)

#1 choice & low payment + +a 0 + 0#1 choice & high payment + + 0 + 0#1 forced & low payment + +a + + –#1 forced & high payment + + + + 0

Test

attitude toward essay + 0 0 ? ?

Note. Schematic representation of induced compliance experiment by Linder, Cooper & Jones (1967); Each

factor/outcome is represented by 5 nodes; Cell entries denote external activation with + and – reflecting a

randomly drawn Normal distributed pattern with M=+1/-1 & SD=.2 (identical across all trials; the

simulation was run for 5 such activation patterns and results were averaged); R=Randomized order;

#=number of trials.

a M=+.2 to reflect low payment; ~ Noise added randomly at each trial with Normal distribution of M=0 &

SD=.2

Table 12

Fit and Robustness of the Simulations, including Alternative Encoding and Models

NrOriginal

Simulationa Distributed FeedforwardNon-linear Recurrent

1 .97 (.01) .98 .97 .99

2 .98 (.32b) .94 .97 .96

3 contn .94 (.40c) .90 — .86

final .99 (.89c) .78 — .99

4 .97 (.28) .91 .97 .96

5 persone .99 (.01) — — .99

groupe .99 (.01) — — .99

6e .86 (.05) — .73 .53

7 .99 (.10d) .99 .99 .99

8 .99 (.20) .98 .99 .97

9 .94 (.30) .94 .82x .84x

10e .99 (.02) — .95 .66x

Note. Cell entries are correlations between mean simulated values (averaged across randomizations) and

empirical data or theoretical predictions. For the distributed encoding, each concept was represented by 5

nodes and an activation pattern drawn from a Normal distribution with M = activation of the original

simulation & SD = .20 (5 such random pattern were run and averaged) and additional noise at each trial

drawn from a Normal distribution with M = 0 & SD = .20. For the Non-linear auto-associative model, the

parameters were: E = I = Decay = .15 and internal cycles = 9 (McClelland & Rumelhart, 1988). For all

alternative models, we searched for the best fitting learning parameter.a Learning rate between parentheses; b-d The contextual node's learning rate was (b) 25%, (c) 33% or (d)

166% of this learning rate; e Distributed encoding; x Predicted pattern was not reproduced.

Figure Captions

Figure 1. (A) Architecture of an auto-associative recurrent network, applied for (B) structural

relations and (C) causal relations.

Figure 2. Graphical illustration of the principles of (A) acquisition [with learning rate 0.20], (B)

competition and (C) diffusion. O=outcome, C=consistent information, I= inconsistent

information, T=trait. Filled nodes are activated at a single trial, empty nodes are not

activated. Full lines denote strong connection weights, broken lines denote moderate

weights while dotted lines denote weak weights.

Figure 3. Categorization and prototype abstraction: Network architecture with 4 feature nodes

connected to 2 category nodes (only the important connections are shown). Connection

weights are shown after the learning history listed in Table 2, where stronger weights are

depicted by solid lines and weaker weights by broken lines.

Figure 4. Categorization and prototype abstraction: Observed data from Gluck and Bower (1988)

and simulation results of categorization (top panel) and prototype abstraction (bottom

panel; learning rate = .01). Note that the simulation results from the top panel were

regressed onto the observed data with an intercept fixed at .50.

Figure 5. Impression formation: Observed data from Stewart (1965) and simulation results

(learning rate for person = .32, for context = .08).

Figure 6. Impression formation: Observed serial position curves from Dreben, Fiske & Hastie

(1979; left panels) and simulation (right panels) of attenuation of recency given continuous

responding (top; learning rate for person = .40, for context = .13) and primacy given final

responding (bottom; learning rate for person = .89, for context = .29).

Figure 7. Higher recall of inconsistent behavioral information after impression formation and

memory instructions: Observed data from Hamilton, Katz and Leirer (1980, exp. 3) and

simulation results (learning rate = .28).

Figure 8. Generalization: Simulation of exemplar and group stereotype assimilation (learning rate

= .01). The original external activation of the 5 nodes (reflecting micro-features) is given

by solid lines, while the reconstructed activation through internal input is given by broken

lines.

Figure 9. Assimilation and contrast effects after priming with a trait or person: Observed data from

Stapel, Koomen and Van der Pligt (1997, exp. 3) and simulation results (learning rate

= .05).

Figure 10. Causal attribution and discounting: Observed data from Van Overwalle and Van Rooy

(2000) and simulation results (learning rate = .10).

Figure 11. Attitude formation: Prediction from the theory of planned behavior by Ajzen (1991)

and simulation results (learning rate = .20).

Figure 12. Dual processes of attitude formation: Observed data from Chaiken and Maheswaran

(1994; top panel) and simulation results (learning rate = .30; bottom panel).

Figure 13. Cognitive dissonance: Observed data from Linder, Cooper and Jones (1967) and

simulation results (learning rate = .02).

Figure 1

Category

feature exemplar

Outcome

cause attitude-object

A. Recurrent Architecture

B. Structural Connections

C. Causal Connections

External input

Output

Internal input

Figure 2

A. Acquisition

C. Diffusion

BBAT1

F F F F F F I I

B

B. Competition

A

D F F F F F F

B

D I I

Figure 3

Dutch less- refined French sophisticated

Flemish Walloon (Rare) (Common)

.21.05

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9

Figure 10

Figure 11

Figure 12

Figure 13

Figure 14

Table 11.acs.ist.psu.edu/misc/dirk-files/Papers/VanOverwalle/... · Web viewIt documents how...

Documents

Transcript of Table 11.acs.ist.psu.edu/misc/dirk-files/Papers/VanOverwalle/... · Web viewIt documents how...