Denham Et-Al HEARES Submitted

Manuscript for submission to Hearing Research 13/06/2008

Perceptual bi-stability in auditory streaming: How much do stimulus features

matter?

Susan L. Denham1*, Kinga Gyimesi2,3, Gbor Stefanics2,4, and Istvn Winkler2,5

1Centre for Theoretical and Computational Neuroscience, University of Plymouth,

Drake Circus, Plymouth PL4 8AA, UK

2Department of General Psychology, Institute for Psychology, Hungarian Academy of

Sciences, 1394 Budapest, P.O. Box 398, Hungary

3Department of Cognitive Science, Budapest University of Technology and Economics,

1111 Budapest, Sztoczek u. 2, Hungary

4Department of Experimental Zoology and Neurobiology, University of Pcs, 7624

Pcs, Ifjsg st. 6, Hungary

5Institute of Psychology, University of Szeged, 6722 Szeged, Petfi S. sgt. 30-34,

Hungary

Abstract

The auditory two-tone streaming paradigm has been used extensively to study

processing mechanisms that underlie the decomposition of the composite auditory input

into coherent sound sequences, and hence the perception of auditory objects. Here we

present new results from a study of bi-stability in auditory streaming. Using relatively

long (4 minute) sequences, we show that there are two fundamentally different phases in

* Corresponding author; [email protected]; tel +44 1752 232610; fax +44 1752

233349

Perceptual bi-stability in auditory streaming 1

ManuscriptClick here to view linked References


this process. Listeners hold their first percept of the sound sequence for a relatively long

period (first phase), after which perception stochastically switches between two or more

alternative sound organisations, each held on average for a much shorter duration

(second phase). The two perceptual phases also differ in that stimulus parameters

influence perceptual behaviour to a far greater degree in the first than in the second

phase, and during the second but not the first phase, there are significant periods when

more than one organisation can be perceived simultaneously. Furthermore, our analysis

reveals deep parallels between the dynamics of perceptual organisation in auditory

streaming and binocular rivalry. We propose an account of auditory streaming in terms

of rivalry between competing temporal associations. Based on the results of our

experiments, we suggest that in the first perceptual phase (formation of associations),

alternative interpretations of the auditory input are formed. In the second phase

(coexistence of interpretations), perception stochastically switches between the

alternatives, thus maintaining perceptual flexibility.

Keywords

auditory streaming, bi-stability, perceptual switching, auditory scene analysis

Introduction

In order to make sense of real-world environments it is necessary to identify,

extract and organise relevant information from the wealth of incoming sensory data.

The potential amount of information far exceeds the processing capacity of any living

system. However, biological organisms are not idle perceivers; rather they seek out

information about the world and the objects in it [1]. The challenge is to create

appropriate object representations on the fly and to continually modify them in order to



maintain as accurate models as possible. Thus the process of perceptual organisation is

fundamental to effective perception.

An important problem for auditory perception is that sound sources may emit

discontinuous sequences of sounds, so some means for forming associations between

discrete events, through the detection of regularities and the maintenance of temporally

persistent representations is required. The formation of sequential associations has been

extensively studied by means of the paradigm of auditory streaming. In the typical

streaming experiment [2], a tone sequence of the structure ABA-ABA-ABA- is

presented at a fast stimulus rate (A and B denote tones differing from each other in

frequency; the - sign stands for a silent interval equal to the time interval between the

onsets of successive tones; i.e. the stimulus onset asynchrony (SOA); see figures 1 and

2). When all sounds are grouped together into a single coherent stream, a galloping

rhythm is typically heard. By increasing the frequency separation (f) between the A and B tones and/or by shortening the interval between subsequent same-frequency tones

(the within-stream inter-tone or offset-to-onset interval, t), perception of the sound sequence changes to that of two homogeneous isochronous streams; a faster paced one

consisting of A tones and a slower paced one consisting of Bs [2].

f

t

SOA

B

A

Sounds Perception

Figure 1. Cartoon of the auditory streaming paradigm. A sequence of low (A) and high (B)

tones presented repeatedly in ABA- groups can be perceived as a single coherent stream with a

galloping rhythm (upper right), or as two segregated streams (lower right), each with an

isochronous rhythm



In general, there is a trade-off between f and t in determining the dominant perceptual organisation. Asking participants whether they perceived the galloping tone

pattern in sequences of the type shown on Figure 1, van Noorden [2] identified three

separate regions of the f-t (or f-SOA, see Figure 2)1 space with different characteristic perceptual organisations (see Figure 3). With very low f's, participants could always hear the galloping rhythm showing that they could organize all tones into

a single sound stream. With slightly larger f's and/or longer t's, participants were able to hear either two separate sound streams or a single integrated stream and to alter their

organisational bias between the two percepts at will. Further increasing f and decreasing t resulted in participants not being able to hear the galloping rhythm, which suggests that perception of two streams became the dominant sound organisation.

a

c d

Time

Frequency

b

Figure 2. The four different time intervals that can be distinguished in the two tone sequences

used in our streaming experiments, and considered by Bregman and colleagues; a) SOAwithin, b)

ISIwithin, c) ISIacross, and d) SOAacross. Diagram derived from [3].

Based on the results obtained in the classical studies of auditory streaming (for a

review, see [4]), it was assumed that following an initial short build-up period

perception of tone sequences such as the one shown in Figure 1 becomes stable, when

1 In his experiments, van Noorden did not distinguish between t and SOA [2. van Noorden, L.P.A.S., Temporal coherence in the perception of tone sequences, in Institute for Perception research. 1975: Eindhoven.]; however, recently Bregman and colleagues [3. Bregman, A.S., et al., Effects of time intervals and tone durations on auditory stream segregation. Percept Psychophys, 2000. 62(3): p. 626-36.], in a study designed to investigate the influence of the various possible time intervals on stream segregation, found that it is in fact the within-stream offset-to-onset interval (i.e. ISIwithin or t) which is most influential in determining the likelihood of streaming (see, figure 2).



parameters fall into either the segregated or the integrated area of the f-t space (see Figure 3). However, recent evidence [5, 6] suggests that the idea that the auditory

system fixes on a single unchanging dominant percept is to some extent an artefact of

the experimental procedures and analysis methods used in classical studies of auditory

streaming. These obscure an important detail about streaming, namely, that it fluctuates

randomly between the two possible organisations; a phenomenon known as perceptual

bi-stability.2 Bi-stability in visual perception (for a review, see [7]) has been studied

extensively since it was thought to offer the possibility of identifying the neural

correlates of visual awareness and ultimately consciousness, by allowing perceptual

changes to be dissociated from changes in the stimulus [8]. The finding of perceptual

bi-stability in auditory streaming is important in that it raises questions about the

commonality of mechanisms underlying perceptual organisation in different sensory

modalities and it further suggests that auditory streaming, far from being a primitive and

automatic process, may be better understood in terms of generic strategies of perceptual

organisation; a notion that has important implications for models of auditory streaming.

2 Actually, even with this simple stimulus configuration, several different sound organisations can be experienced; i.e. multi-stability. For the sake of simplicity we can assume that the various possible perceptual organisations can be sorted into two main categories: integrated and segregated (for a formal definition, see the Design section).



Temporal coherence boundary

boundary

Temporalcoherence

Integrated

Ambiguous

Fission boundary

Fission boundary

Segregated

Figure 3. The dependence of primitive auditory streaming on frequency difference (f) and presentation rate (characterized by the SOA) found in human psychophysical experiments using

alternating pure tones [2, 9]. Stimuli in the region of the parameter space above the 'temporal

coherence boundary' are generally perceived as two segregated streams, and those with

parameters in the region below the 'fission boundary' as a single coherent stream. Those falling

in the ambiguous region can be perceived in either way, and perception can be influenced by

top-down processes [2]. van Noorden actually reported two sets of boundaries, which are

illustrated here for comparison and clarification; the more commonly referenced blue set are

more appropriate for describing perceptual behaviour in response to short sequences, while the

green set are relevant to longer sequences such as the ones we report here. The red dot

indicates the stimulus parameters used by Pressnitzer and Hup [5, 6] the orange dot those of

Winkler et al. [10] and the yellow dots those reported in Denham and Winkler [11] see

descriptions of these experiments in the text.

In the investigation of Winkler et al. [10], a single point in the parameter space

(indicated by the orange dot in Figure 3) was tested (A = 1245 Hz, B = 931 Hz, f = 5 semitones, stimulus onset asynchrony, SOA = 100ms, tone duration = 40 ms in one and

175 ms in another condition, sequences were close to 3 minutes in duration).



Participants were instructed to continuously depress a response key, when they heard

the galloping rhythm and to release the key when they did not. With the short tone

duration, t was 160 ms, the result was a rather ambiguous perception of the tone sequence (galloping was heard on average for 61.8% of the time; SD 22.5); with the

long tone duration, t was 25 ms, the result was predominantly segregated perceptual organisation (10.3% galloping; SD 20.1). Each participant experienced perceptual

switching in both stimulus conditions.

Pressnitzer and Hup [5, 6] tested a similar point of the parameter space

(indicated by the red dot in Figure 3; A = 587 Hz, B = 440 Hz, f = 5 semitones, stimulus onset asynchrony, SOA = 120 ms, tone duration = 120 ms, t = 120 ms, sequences of 4 minutes). These parameters are close to the ambiguous condition of

Winkler et al. [10] and although Pressnitzer and Hup [5, 6] used a slightly different

experimental procedure in which participants reported their perceptions using three

buttons; one for grouping (galloping rhythm), one for splitting (streaming) and the third

for cases when neither galloping nor streaming was heard, their participants also

reported considerable perceptual switching.

In the experiment reported in [11], we were interested to test whether bi-stability

was restricted to the ambiguous region or whether it would be found over a wider range

of the parameter space. We used the same one-button experimental procedure as

Winkler et al. [10] and tested the parameter combinations indicated by the yellow dots

in Figure 3. Our results [11] showed that perceptual bi-stability was present for all

conditions tested. Furthermore, although there was great variability in the mean

switching rate between and within participants, they all showed some degree of

perceptual bi-stability.



In their experiments Pressnitzer and Hup [5, 6] also explored similarities

between perceptual switching behaviour in response to the auditory galloping/streaming

sequence (Figure 1), and a visual stimulus configuration known to be perceptually bi-

stable. The visual stimulus consisted of overlapping drifting gratings, which could be

perceived either as sticking or sliding surfaces; participants typically experience

perceptual switching between these two alternatives. The results indicated that key

characteristics of visual bi-stability are also present in the auditory case. These

characteristics included: exclusivity, the existence of two plausible yet mutually

exclusive alternative interpretations of the sensory input; randomness, stochastic

switching between percepts such that successive dominance durations are uncorrelated;

and inevitability, the finite duration of perceptual dominance; i.e. even when the

intention is to hold onto one interpretation, a switch will always eventually occur [12].

Visual bi-stability can also be induced by showing a different image to each eye;

the well-known phenomenon of binocular rivalry in which participants experience

switches in perceptual awareness between the two images even though both are

continually present (for a recent review, see [13]). On the face of it, finding similarities

in perceptual behaviour in response to auditory tone sequences and binocularly rivalrous

images seems rather unlikely: the sensory modalities are completely different, the tones

are intermittent, while the images can be constant; and the tones are heard by both ears,

while each eye is presented with a different image. However, binocular rivalry is also

known to display all three of the key characteristics of bi-stability listed above. Because

binocular rivalry has been extensively investigated over many years, and is perhaps the

best understood paradigm inducing bi-stable perception, we explored the possibility that

some insights into the processes underlying auditory streaming might be gained from



comparisons between the two phenomena; although we did not necessarily expect to

find a very close correspondence.

Here we report a detailed analysis of perceptual bi-stability observed in auditory

streaming as well as comparisons between auditory streaming and binocular rivalry.

These analyses and comparisons were motivated by our recently suggested

interpretation of the auditory streaming phenomenon [11] (for a detailed description, see

the Distribution of perceptual switching section) and aimed at finding new ways to

describe auditory streaming quantitatively.

Perceptual Experiments

Having found that perceptual bi-stability in auditory streaming exists over a

wide range of the feature space [11], and is not restricted to the ambiguous region [2],

we were interested to characterise the distribution and dynamics of perceptual

switching, since these aspects had not yet been determined for auditory streaming. The

results which we present here were obtained in two further perceptual experiments of

auditory streaming.

Participants

Thirty young healthy volunteers (16 male, 18-26 years of age, average 21.8

years) participated in experiment 1 and 15 (7 male, 21-25 years of age, average 22.2

years) in experiment 2. Participants received modest financial compensation for their

participation. The study was conducted in the sound-attenuated experimental chamber

of the Institute for Psychology, Hungarian Academy of Sciences. It was approved by the

Ethical Committee (institutional review board) of the Institute for Psychology. After the

aims and procedures of the study were explained to them, participants signed an



informed consent form before starting the experiment. Participants were pre-selected on

the basis of the results of clinical audiometry with the criteria that the hearing threshold

between 250 and 6000 Hz should not be higher than 25 dB, and the difference between

the two ears not higher than 15 dB in the same frequency range.

Stimulus paradigm

We decided to focus the first experiment on the medium-to-large f region of the parameter space with medium to moderately long SOAs, to see whether features of

perceptual bi-stability would show differences between the segregated (large f, medium SOA) and the ambiguous (medium f and large f with long SOA) region of the parameter space; the second experiment was then focussed on the region of small to

medium f and short to medium SOA. The experimental conditions used are illustrated in Figure 4 below. Participants were presented with 4-minute long trains of the ABA-

structure, where the A and B were pure tones of 75 ms duration, including 5 ms linear

onset and 5 ms linear offset ramps. In separate trains, f was 4, 10, 16, or 22 semitones (ST) in experiment 1, and 1, 3, 5, or 7 ST in experiment 2; SOA was, 100, 150, 200, or

250 ms (thus t was 125, 225, 325, or 425 ms for the more frequent tones) and 75, 100, 125, or 150 ms (thus t was 75, 125, 175, or 225 ms) for experiments 1 and 2, respectively. Altogether, 4 4 = 16 different types of trains were tested in each

experiment, separately. The frequency of the lower-pitched, more frequent tones (A

on Figure 1) was kept constant at 400 Hz across the different stimulus conditions.

Sounds were generated on an IBM PC computer (MEL 2.0 stimulus presentation

software Psychology Software Tools Inc.), amplified using a custom-made sound

mixer and amplifier, and delivered through Sennheiser HD 430 headphones at a



comfortable 70-dB (SPL) intensity level. The order of the stimulus trains with different

parameters (each train 4 minutes long) was randomized separately for each participant.

Procedure

In the classical studies of auditory streaming, participants were typically asked

to report their perception after the end of each short sound sequence. Thus these

experiments were not designed to test the temporal dynamics of streaming-related

perceptual processes. Furthermore, participants were often asked to attempt to hear the

sound sequences according to one or another pattern. This method was used to find

unambiguous effects of stimulus parameters. The recent studies described in the

Introduction employed on-line measures, asking participants to report their perceptions

as they occurred throughout the presentation of relatively long sound sequences.

However, in all but one [6] of the streaming experiments reported in the literature, it has

been implicitly assumed that only two alternative sound organisations are possible; i.e.

either integrated or segregated, each linked to a specific perceived sound pattern. With

sequences similar to those presented in the current experiments (see Figure 1), the

integrated organisation was expected to result in the perception of the galloping

pattern, whereas the segregated sound organisation was expected to result in the

simultaneous perception of a high and a low tone sequence, both with uniform (but

different) presentation rate. These assumptions were reflected in the response choices

and instructions given to participants. In fact, most previous experiments only asked

participants to report when they experienced a certain pattern of sounds (e.g. the

galloping pattern). It was then assumed that participants, who did not report hearing the

designated pattern experienced the opposite sound organisation (in the example, this

would be the segregated organisation). However, in a pilot study in which we asked



participants to describe their different perceptions in detail, we found that they a) heard

rhythmic patterns different from either one of the expected ones, and b) sometimes

heard simultaneously a pattern that involved both high and low tones and a pattern

involving only high or only low tones.

In order to eliminate possible confusion caused by the perception of rhythms

other than the galloping rhythm the notion of an integrated percept was generalized

and defined for participants as hearing a repeating pattern, which contained both low

and high tones. In turn, the notion of a segregated percept was similarly generalized and

defined for participants as hearing some repeating pattern(s) formed either exclusively

of high or exclusively of low tones, with the possibility that multiple repeating

segregated patterns (i.e., A---A---A and B-B-B) may be perceived concurrently.

Participants were to depress one response key so long as they experienced an integrated

percept and the other key when they experienced a segregated percept. The role of the

two keys was randomly assigned across participants. When participants heard no

repeating tone pattern, they were instructed to release both keys. Participants were asked

to mark their perception throughout the duration of the stimulus sequence and not to

attempt hearing the sound according to one or another perceptual organisation. The

experimenter made sure that participants understood the types of percepts they were

required to report, using both auditory and visual illustrations. Furthermore, in

experiment 1, we then divided our participants into two groups. One group received

instructions implicitly suggesting exclusivity between the segregated and integrated

percepts (as described above); i.e., the instructions were you may either hear a

repeating integrated, or some repeating segregated tone patterns, or no repeating tone

pattern . This set of instructions was similar to that employed by Pressnitzer and

Hup [5]. The other group was explicitly told that it was possible that they may



sometimes hear both types of patterns at the same time; i.e., the instructions were

you may hear a repeating integrated, or some repeating segregated tone patterns,

possibly even both at the same time, or no repeating pattern at all . In the case that

they heard both types of patterns at the same time, they were instructed to keep both

buttons depressed. However, they were also cautioned to be sure to release the button

when they stopped hearing the corresponding pattern. In addition to the instructions,

when analyzing the responses, we discarded all those responses, which we assumed to

represent transitions between two percepts; i.e., all phases with duration shorter than

300 ms. The assumption was that in such cases, participants may simply have been

slightly inaccurate in synchronising their button presses and releases. Because the

results obtained with the two sets of instructions in experiment 1 proved to be very

similar in all regards except for a higher incidence of reporting two simultaneous

percepts in the latter group, for the sake of clarity, here we only report the results

obtained with the instructions explicitly mentioning the possibility of simultaneously

hearing repeating integrated and segregated tone patterns. The group of participants,

who received this set of instructions, included 15 volunteers (6 male, 18-25 years of

age, average 20.9 years). In experiment 2, we only used this set of instructions and all

other procedures were also identical to those of experiment 1.

Participants sat in a comfortable reclining chair in the experimental chamber

throughout the experimental session, holding a response button in each hand. Short 1-3

minute breaks were inserted between consecutive stimulus trains with longer breaks,

when the participant could move about, scheduled just before the start of the

experimental conditions (after explaining and illustrating the possible perceptions) and

after the 8th stimulus train. Further longer breaks were inserted into the session when

necessary. The experiment took ca. two hours altogether. The state of the two response



keys was sampled at 10 Hz (100 ms sampling time) by a NeuroScan Synamps EEG

recording system. Response key states were then extracted from the signals and encoded

separately for each participant and condition for further analysis.

Fi

gure 4. Experimental conditions for experiments 1 (magenta) and 2 (cyan) reported here.

Conditions are numbered separately for experiments 1 and 2 in the following way. Number 1

denotes the shortest SOA and smallest f (100 ms / 4 ST and 75 ms / 1 ST for experiments 1 and 2, respectively). Numbers increase faster through the four different fs (e.g., 2 marks 100 ms / 10 ST and 75 ms / 3 ST for experiments 1 and 2, respectively) and slower for the four

different SOAs (e.g., 5 marks 150 ms / 4 ST and 100 ms / 1 ST for experiments 1 and 2,

respectively).

Results and discussion

Firstly, we note that when participants are exposed to alternating tone sequences

of sufficient duration, then perceptual bi-stability is found over a very wide range of the

parameter space, including conditions where stable organisation would be expected; i.e.

conditions with very large frequency differences and fast presentation rates or small



frequency differences and slow presentation rates. We have found no condition of all

those that we have tested that was stable across all participants, and no participant who

experienced stable perceptual organisation for all conditions. Furthermore, switching

occurs in all phases of the experimental session. The switching results for each

participant, condition, and position of the stimulus train within the experimental session

are illustrated in Figure 5. On average, there were 15.75 switches per condition in

experiment 1, 36.59 in experiment 2. This corresponds, on average, to one switch in

every 15.24 and 6.56 seconds, respectively; showing that perceptual switching occurs

quite often when listening to the tone sequences used to study auditory streaming. We

found no significant effect of the position of the stimulus sequence within the

experimental session for either experiment (F[15,210] = 1.44 and F[15,210] = 0.81, for

experiments 1 and 2, respectively; p>.1, both; one-way dependent ANOVA of the

number of perceptual switches with the factor Train-number [116]). These results

suggest that the observed perceptual switching does not result from learning or fatigue

within the experimental session. The number of switches per train appears to be higher

in experiment 2 than in experiment 1. This may be an effect of the parameters (f and t; the effects of these parameters will be explored below) and/or a difference between the participant groups (the amount of switching varies considerably across participants

see the middle column of Figure 5). The following sections examine perceptual

switching in more detail.



Experiment 1

Experiment 2

Figure 5. Average total number of perceptual switches (red lines) and individual participant data

(black dots) plotted against a) condition, b) participant, and c) position of the 4-minute long

stimulus train within the experimental session. Results from experiment 1 are plotted in the top

row; those from experiment 2 in the bottom row. Condition numbers are defined in Figure 4

separately for experiment 1 and 2. Note that due to the randomized order of the stimulus

conditions, trains with any set of parameters could occur in any position within the experimental

session.

Distribution of perceptual switching

The first question we address is whether perceptual switching occurs uniformly

across the parameter space or whether the degree of perceptual switching is parameter

dependent. One prediction about the distribution of perceptual switching can be made

on the basis of van Noordens perceptual boundaries [2]; the idea being that ambiguity

regarding the appropriate perceptual organisation leads to instability. This interpretation

suggests that perceptual switching will be maximal in the ambiguous region [2].



Another way to understand bi-stability is as a competition between alternative

interpretations of the auditory scene, mediated by competition between different

sequential associations or rules [14] (for a summary, see Figure 6), where:

Local rules [[4, 14]], refer to links between temporally adjacent events; these rules are fully supported by temporal proximity [15] (i.e., they are associations between

temporally adjacent sound events in the sequence, thus linking them is relatively

easy, at least at small and intermediate ts; the links are stronger for small f and weaker for large f (i.e., similar sounds form better groups than dissimilar ones; see grouping by similarity [15]).

Global rules [4, 14], refer to links between non-adjacent events; in our simple sequence for streaming, these sounds are identical in frequency (i.e. A-A or B--

-B..); therefore, temporal separation alone governs the strength of this type of

grouping, which is stronger for small t and weaker for large t. (Note that in our experiments, decreasing the SOA by a given amount decreases t by twice that amount, because tone duration was kept constant.)

Global rulet

Local rule... f (t)

Figure 6. Cartoon indicating the influence of f and t on the most prominent sequential associations which can be made in the galloping auditory streaming paradigm. t is placed in parenthesis, because with short-medium ts (SOAs), the rule-competition account of auditory streaming suggests that the effect of changes in SOA (which determines t in the current experiments) on the formation and representation of the local rule is relatively small. This is

because 1) the sounds to be connected are adjacent and 2) with relatively short SOAs

separating the sounds, the neural after-effects of the first sound are still present when the second

sound arrives.

The local versus global rule interpretation suggests that most switching will be

found for stimulus parameters which maximise competition, i.e. when both local and

global rules are strong. This occurs for small f and small t, because with small f, the



local rule becomes strong, and with small t, the global rule becomes strong. Hence, if this interpretation is correct, we should find most switching in experiment 1 at f = 4ST, SOA = 100ms. Conversely least competition should occur when the two rules are

not well balanced, and one or the other wins the competition very easily. In experiment

1 we expect to find least switching for the 4ST, 250ms and 22ST, 100ms conditions.

The conditions where the two rules are approximately equally matched but are both

relatively weak would result an intermediate amount of switching.

In order to distinguish these alternative interpretations (i.e., ambiguity vs.

rule-competition) we examined the total switching in each condition. The results below

support the hypothesis of competing sequential associations; the mean switching rate

peaks along a ridge where local and global rules may be considered to be roughly

balanced, and in experiment 1, it is highest for the condition with smallest f and smallest t (see Figure 7, left panel). From this analysis, it is clear that the region of maximum switching does not coincide with the ambiguous region of van Noorden [2].

We note that the ridge of maximum switching appears to run roughly parallel to the

temporal coherence boundary.

Figure 7. Group-mean distribution of perceptual switching in experiment 1 (left panel) and



experiment 2 (right panel). The colour scale indicates the mean number of switches across

participants accumulated throughout the tone trains, separately for each condition. Note that the

coloured surface is interpolated between the discrete experimental data points indicated by the

green dots. For clarity, the x and y axes of the two panels are differently scaled. See figure 4 for

the relation between the parameters used in experiments 1 and 2.

The results of experiment 1 (Figure 7, left panel) suggest that the distribution of

perceptual switching in auditory streaming is broadly consistent with competition

between alternative sequential associations, with stronger competition and more

switching where these associations are strongest. However, although the relationship

between perceptual switching and stimulus parameters generally supports the rule

competition hypothesis, there is a clear qualification evident in the results of

experiment 2 (Figure 7, right panel); i.e. switching rates do not increase indefinitely

with decreasing f and t. There is a non-monotonic relationship between the mean number of switches and f and t, with the maximum switching found in the region of f = 4ST and SOA = 125ms (t = 175 ms). Although the increase in switching with increasing rule strength is intuitively easy to understand, the non-monotonic relationship

between rule strength and switching requires further consideration.

The fall-off in switching rate in the region of very small f and t suggests that different factors, which become stronger in this region of the parameter space, also

affect switching. Due to the uniform 75-ms stimulus duration used in our experiments,

with SOA < 100ms there is no (or almost no) silent gap between successive A and B

tones. Thus it is possible that for small fs, triplets of three successive tones (ABA) may form a unitary event and, therefore, for segregation to occur, the system has to first

extract the components from the composite before other sequential associations can be

established. This notion is supported by the literature on temporal integration, showing

that auditory input within 150-200 ms is integrated into a single unit and processed in



many ways differently from successive sounds exceeding this period (e.g., masking,

loudness summation, detection of omissions and successive deviations [16-18]). Thus in

the case of short ts, building the global rule suffers and, as a consequence, competition between the two rules is less balanced. Therefore, the amount of switching

decreases compared with the more balanced cases.

In contrast, with large fs and short SOAs (SOA < 100ms; t < 125 ms), associations between successive identical tones may be formed directly and, perhaps,

even before associations between the tones with different frequencies. If this were the

case, one should expect that, contrary to the common assumption that integration is

always the first perceptual state [4], segregation should be reported first for short SOA

and large f conditions. Our results (see next section) confirm this expectation. One possible explanation for the immediate dominance of segregation at small t and large f is that large fs impose relatively large spatial distance between stimulus-driven neural activity in the tonotopically organized part of the afferent auditory system, thus

weakening and delaying interactions between the activities associated with the two

different tones. At the same time, short ts allow the after-effect of the more frequent tones to survive till the arrival of the next identical tone. This may enable easy direct

linking of successive identical tones. In accordance with this account, dominance of

segregation over temporal integration at large fs has been previously demonstrated [19-21], even within the temporal integration period.

Differences between switching behaviour in the first and subsequent phases

It is widely assumed that auditory streaming has a strong initial bias towards

integration. This has led to the suggestion that auditory streaming can be interpreted as a

process whereby the auditory system 'accumulates evidence' before making a final



organisational decision [4]. That is, segregation can only emerge after a gradual 'build-

up' process [22]. In this section we show that 1) it is not always the case that

participants report integration first and 2) there is no stable final sound organisation,

rather switching between alternative organisations continues throughout the stimulation.

Figure 8 (top panels) show, separately for experiment 1 and 2, the mean first

percept (termed first phase) durations averaged across all participants. The group-

averaged value was calculated by treating integrated phase durations as positive and

segregated phase durations as negative. Therefore, the colour in the diagram shows the

overall group tendency towards one or the other first reported percept. It is clear from

the figure that for small fs, integration tends to be the first percept reported, whereas for trains with parameters falling into the larger-f and short-SOA region, most participants initially linked same-frequency tones (A-A or B---B), and thus perceived a

segregated percept first (for a possible explanation, see the previous section).



Mean of All Subsequent Phases

First Phase


First Phase

Figure 8. Group-mean signed durations (in seconds) of the first (top) and the mean of all

subsequent phases (bottom) averaged across all participants for experiment 1 (left) and

experiment 2 (right). Segregated phases were assigned negative and integrated phases positive

values. Note that the colour scale is the same for all images as indicated by the bar on the right,

and that the coloured surface is interpolated between the discrete experimental data points

indicated by the green dots. For clarity, the x and y axes of the two panels are differently scaled.

See figure 4 for the relation between the parameters used in experiments 1 and 2.

A comparison between the distribution of the signed durations obtained in the

first and subsequent phases (figure 8 bottom panels) clearly shows many instances of

first-phase segregation and also that, as expected, segregation dominates over a wider

range of stimulus values after the first phase. For experiment 1, an ANOVA test of the

phase durations with the structure Phase (first vs. subsequent) SOA (4 levels, see

Design and figure 4) f (4 levels, see Design and figure 4) showed significant main effects of all three factors (Phase: F[1,14]=15.04, p


F[3,42]=30.11, p


The ANOVA of the signed phase durations for experiment 2, showed a very

similar set of results. All three main effects were significant (Phase: F[1,12]=18.64,

p


integrated percept is more common over a wider range of parameters, segregation

catches up with it later on.

Figure 8 and the related statistical analysis of the signed phase durations clearly

show differences in the distribution of integration and segregation and the effects of

SOA and f between the first and subsequent perceptual phases. However, as was noted, averaging signed phase durations does not allow one to draw conclusions

regarding the length of phase durations. These are illustrated in Figure 9 below,

separately for the first and subsequent phases.

First Phase


First Phase


Figure 9. Group-mean durations (in seconds) of the first (top) and the mean of all subsequent

phases (bottom) averaged across all participants for experiment 1 (left) and experiment 2 (right).

Note that the colour scale is the same for all images as indicated by the bar on the right, and that

the coloured surface is interpolated between the discrete experimental data points indicated by

the green dots. For clarity, the x and y axes of the two panels are differently scaled. See figure 4

for the relation between the parameters used in experiments 1 and 2.



The images show 1) a clear overall decrease of durations from the first to

subsequent phases, as well as visible effects of t and f on phase durations in the first but not on subsequent phases.

In experiment 1, an ANOVA test of the absolute phase durations (segregated

and integrated phases pooled together) with factors of Phase (first vs subsequent)

SOA (4 levels, see Design and figure 4) f (4 levels, see Design and figure 4) showed a main effect of Phase (F[1,14]=46.11, p


durations only characterized the first phase, whereas subsequent phases showed a

largely uniform distribution of phase durations. No other main effect or interaction

reached significance.

In experiment 2, an ANOVA of the same structure as above yielded significant

main effects for Phase (F[1,14]=21.34, p


affect the duration of the first phase only (see the almost homogeneous distribution of

phase durations in the second phase Figure 9 bottom). The analysis of the signed

phase durations showed that small fs promote integration as the first percept, whereas short ts promote segregation as the first percept. This result together with the pattern of interaction between Phase and f and/or SOA for absolute phase durations suggest that first-phase durations are shortest when competition between the two rules is the

highest (sort SOA, small f). More or less uniformly distributed phase durations in subsequent phases leads to more overall switching with shorter first-phase durations,

because a short first phase leaves more time for switching in subsequent phases. This

explains the pattern observed in the Distribution of perceptual switching section and

qualifies the effects and explanations described in that section as referring to the first

phase: When both rules are strong they can be discovered fast and so switching between

them starts early.

Time course of perception during the stimulus trains

Our finding that segregation becomes more likely later during the stimulus trains

appears to match the classical findings of a build-up of segregation followed by a stable

percept. However, we have already noted that segregation can be the initial percept.

Furthermore, figures 10 and 11 show that the time course of increasing probability of

perceptual segregation is not identical to the one described as build-up in the classical

literature of auditory streaming. This is because the duration of the build-up phase has

been estimated to be in the order of ca. 10 s, whereas for most parts of the parameter

space, the duration of the first percept exceeds this period. Furthermore, the probability

of perceiving integrated or segregated organisations does not settle immediately after

the first phase. Rather, slow changes can be observed, throughout the whole 4-minute



stimulus train, depending on the combination of parameters. Three different trends can

be discerned in the figures below. With parameter combinations regarded to promote

segregation (short SOA, large f), following a fast overshoot of the probability of segregation, the ratio between segregation and integration declines slowly (see, e.g.,

figure 10, SOA=100 ms, f=16 or 22ST and figure 11, SOA=75 ms, f=5 or 7ST). With parameter combinations regarded to promote integration, the probability of

segregation appears to increase slowly throughout the whole duration of the stimulus

trains (see, e.g., figure 10, SOA=150 or 200 ms, f=4ST and figure 11, f=1ST and any SOA or SOA=150 ms and any f). Finally, with most of those parameter combinations which would fall into the ambiguous region, the initial increase of segregation is

followed by a fairly stable period, in which the probability of segregation falls mostly

into a narrow range between 0.4 and 0.6 (see, e.g., figure 10, SOA=200 ms, f=10ST and figure 11, SOA=125 ms, f=5 or 7ST).

We tested the time course of perceptual organisation within the 4-minute long

stimulus trains by comparing across three time ranges, selected from the early (20-50 s,

because there are some combinations of parameters and participants, who had not yet

given their initial response at shorter latencies; see the range shade blue-grey on Figure

10), middle (120-150 s) and late (200-230 s) phase of the stimulus trains by conducting

ANOVAs of the probability of segregation with the structure: Time-range (early,

middle, late) SOA (4 levels; see Figure 4) f (4 levels; see Figure 4). In experiment 1, increasing SOAs induced a monotonically decreasing probability for perceiving the

segregated organisation (F[3,42]=30.24, p


streaming test sequences. The significant interaction between Time-range and SOA

(F[6,84]=3.65, p


Figure 10. Experiment 1: Group-average time-course of the probability of the segregated

percept within the stimulus trains as a function of SOA (panels) and f (overplotted, marked with colours). With segregation taken as the value 1 and integration as the value 0, the figure

shows the probability of reporting segregation at each time point. Note that the blue-grey bars

indicate the initial period during which not all participants had yet made their initial choice; the

data during this time is, therefore averaged over only those participants who had reacted by the

given time. The violet bars mark the periods used for the statistical analysis.



Figure 11. Experiment 2: Group-average time-course of the probability of the segregated

percept within the stimulus trains as a function of SOA (panels) and f (overplotted, marked with colours). See description in the legend of figure 10.

The fact that switching continues throughout the duration of the whole stimulus

train, is clearly shown in figures 12 and 13 in which group-averaged phase durations are

plotted for each condition as a function of time within the stimulus trains. The low

phase-duration period at the beginning of the stimulus trains (


trains). Only significant effects related to the time ranges are reported, because we have

already tested the effects of SOA and f on phase durations (see the previous section). In experiment 1, phase durations were significantly reduced in the late as compared to

the early and middle time ranges (F[2,28]=5.85, p


between Time-range and f reflect this effect combined with the previously observed effect showing that low fs induce long (integrated) first phases.

Figure 12. Experiment 1: Time course of group-average phase duration separately for the four

different SOAs, overplotting the different fs (marked with differently coloured lines). Note that the blue-grey bars indicate the initial period during which not all participants had yet made

their initial choice; the data during this time is therefore averaged over only those participants,

who had reacted by the given time. The violet bars mark the periods used for the statistical

analysis.



Fi

gure 13. Experiment 2: Time course of group-average phase duration separately for the four

different SOAs, overplotting the different fs. See description in the legend of Figure 12. In summary, we found that for the typical tone sequences used to study auditory

streaming, 1) integration is not necessarily the first percept and 2) no stable final percept

emerges. These statements appear to contradict the classical findings. The

contradictions are, however, explained by the different assumptions and methods

employed by the present study (together with the few similar previous studies [5, 6, 10,

11]) and classical explorations of auditory streaming. The assumption of a stable final

percept led most experimenters in the past to use short (typically


interpreted as more and more participants reaching the final percept, our data shows

that the group-average probability of perceiving the sounds in terms of one or another

percept is a product of averaging between perceptual states, which switch back and forth

all the time, but with shorter- and longer-term changes in the statistics of the switching

behaviour. In fact, our results argue for a different distinction in the temporal dynamics

of auditory streaming. Instead of build-up and final percept it appears that

distinguishing between the first perception (first-phase) and subsequent states, as is

often the case in the analysis of visual bi-stability, may prove a more fruitful description

of the temporal behaviour of perceptual processes. Results described in this section

argue that these two periods (and perhaps more, if changes by the end of the 4-minute

trains are not related to the length of the stimulus blocks) may show characteristically

different properties and thus may be understood by different theoretical and

computational models.

From the above views it follows that an improved description of auditory

streaming must cover the dynamic properties of this perceptual phenomenon. For

further insight we next turned to study similarities and differences between auditory

streaming and visual bi-stability. The following section will test whether auditory

streaming also shows some important properties of binocular rivalry. The findings will

be incorporated within our rule-competition theory, which already provides

explanations to many of the classical (e.g., integration is more often the first percept,

because building local rules is faster; etc.) and some of the novel findings (switching

maxima; segregation as the first percept; etc).



Effect of rule strength on perceptual organisation

Visual studies of bi-stability often employ the paradigm of binocular rivalry, in

which different images are presented to the left and right eye. Participants experience

perceptual switching between the two images, as one then the other dominates

perception for some period of time (termed the dominance duration). The well-known

second proposition of Levelt [23] states that if the contrast in the image presented to one

eye is increased while the contrast in the image to the other eye is held constant, then

the dominance durations of the fixed-contrast eye decrease while the dominance

durations of the variable-contrast eye remain approximately constant. This law was

thought to hold generally for binocular rivalry. However, it has recently been shown

that if a wider range of mean contrast levels is used (Levelt employed only relatively

high contrast levels), then adjusting the contrast in one eye can affect the dominance

durations both eyes, as illustrated in figure 14 [24].



Figure 14. Cartoon illustrating changes in dominance durations with changing image contrast in

one eye: the validity of Levelt's second proposition [23]; taken from [24]. On the left, a series of

four plots shows how the dominance durations of the variable (solid line) and fixed (dashed

line) contrast eyes changes as a function of contrast in the variable-contrast eye. The y axis

indicates the normalised dominance duration, and the x axis, the log of the contrast in the

variable contrast eye. As shown, relationships such as that in the upper left plot are consistent

with Levelts proposition [23], whereas those in the lower left plot, are inconsistent; there

appears to be a continuum from one extreme to the other (illustrated in the middle two plots).

The plots are colour coded to map onto the box at the lower right, which indicates the absolute

contrast levels under which these different relationships were observed. As can be seen, Levelts

second proposition holds when the contrast level in the fixed eye is high, but not when the

contrast level in the fixed eye is low.

If bi-stability in auditory streaming originates from competition between

competing rules, then it is possible that the role of contrast in binocular rivalry may be



similar to that of the strength of competing rules in auditory streaming. That is, higher

contrast in binocular rivalry would be analogous to having a stronger rule for a

particular organisation in auditory streaming. We have already suggested that f and t affect local and global rules differently: f affects only the strength of the local rule, whereas t only affects the global rule. However, although SOA also has a small effect on the local rule (the across-stream t decreases with decreasing SOAs, which somewhat strengthens the local rule), we will ignore this factor in the following

description. As a consequence, by changing f while keeping the SOA constant we manipulate the strength of the local rule while keeping the strength of the global rule

fixed; and, similarly, changing SOA while keeping f constant manipulates the strength of the global rule while keeping the strength of the local rule (more or less) fixed. As a

first step, for comparison between binocular rivalry and auditory streaming, figure 15

was constructed to show the group-averaged durations of the first two perceptual phases

in experiment 1.3 We chose to analyse the mean duration of the first two perceptual

phases, because our previous analyses (see above) showed that f and SOA has little effect on phase durations in later time ranges of the 4-minute trains. Mean integrated

phase durations are plotted with respect to the stimulus parameters as an interpolated

solid surface, and mean segregated phase durations are plotted as a meshed surface. By

taking slices through these surfaces at different values of f and SOA, we can compare the effects of rule strength in auditory streaming with the effects of contrast levels in

vision.

3 Because experiment 2 covered a much smaller range of parameters in f and SOA than experiment 1, only results of experiment 1 were used in this analysis.



Figure 15. Interpolated surfaces showing the group-averaged durations of the first two

perceptual phases obtained in experiment 1 as a function of f and SOA; integrated (solid surface) and segregated (mesh surface) phases. Colours code phase durations (redundant with

the vertical axis).

Figure 16 shows that, if we interpret auditory streaming as a competition

between alternative sequential associations (local versus global rules), then we can find

similar relationships between the dominance durations of integration and segregation to

those reported for eye dominance in binocular rivalry [24]. Levelts second proposition

is consistent with the changes in phase durations where the associations are strong. For

instance, at 4-ST f (strong local rule), increasing the global-rule strength by decreasing the SOA reduces the mean integrated phase durations (the fixed-rule-strength

percept), but has little effect on the segregated phase durations (the

variable-rule-strength percept; compare the top left panel of Figure 16 with panel A

of figure 14). In contrast, at 22-ST f (weak local rule), increasing the global-rule strength, by decreasing the SOA substantially increases the mean segregated phase



durations (variable-rule-strength percept) in accordance with the similar binocular

rivalry effect (compare bottom left panel of Figure 16 with panel D of figure 14). At

fs between these two extremes, results compatible with the intermediate relationships reported by Brascamp et al. are found [24].

Similarly, for the 100-ms SOA conditions (strong global rule), increasing the

local-rule strength by decreasing f decreases the mean segregated phase durations (the fixed-rule-strength percept), but does not much affect the mean integrated phase

durations (the variable-rule-strength percept; compare the top right panel of Figure 16

with panel A of figure 14). In contrast, at 250-ms SOA (weak global rule), increasing

the local-rule strength by decreasing f substantially increases the mean integrated phase durations (variable-rule percept) in accordance with the similar binocular

rivalry effect (compare bottom right panel of Figure 16 with panel D of figure 14).

Figure 16. Local- and global-rule-strength effects, colour-coded as in Figure 14: analysed for f (local rule; left column) and t (global rule; right column). Phase durations for the



fixed-strength rule are shown with dashed, those for the variable-strength rule with

continuous lines. Note that the rule strength progressively decreases from the top row to the

bottom row, and within each plot, rule strength decreases from left to right; i.e. the opposite

direction than in figure 14.

Thus another analogy can be shown between visual bi-stability and auditory

streaming. Specifically, assuming an analogy between the strength of local- vs.

global-rule representations in auditory streaming and contrast levels of images presented

to the two eyes in the binocular rivalry situation, parameters affecting the "strength" of

these representations had similar effects on the dominance phase durations. Together

with the similarities mentioned in the introduction, this analogy suggests that the

principles of computational models of binocular rivalry may be applicable in modelling

auditory streaming. However, there is a caveat to this claim. Here we considered only

the first two perceptual phases, since we showed earlier that stimulus parameters only

have significant effects on phase durations during the first perceptual phase. In contrast,

the findings in binocular rivalry [24] may relate to later phases of perceptual switching,

as Brascamp et al excluded the first minute of participant responses from their analysis.

Thus the analogy between auditory streaming and binocular rivalry may not extend to

the distinction between initial and subsequent phases in auditory streaming. This,

however, does not reduce the value of the analogy in helping to identify generic

modelling principles applicable to both modalities.

Transition Phases: Distribution of both-percept responses

The distribution of both-percept responses (i.e., when participants report

perceiving repeating integrated and segregated percepts concurrently) is not uniform

across conditions in experiment 1. (Again, we only analyze the results of experiment 1

see Footnote 3.) There is a clear tendency to report 'both' more often along a ridge in the



parameter space roughly corresponding to the region where local and global rules are

balanced. An ANOVA of the proportion of both responses in the subsequent phases

with the structure of SOA (4 levels; see Figure 4) f (4 levels; see Figure 4) showed only a significant interaction between the two factors (F[9,126]=3.06, p


Figure 17. Distribution of mean steady state both-percept responses as a proportion of post-

first-phase stimulus duration in experiment 1. Note to avoid problems arising from accidental

simultaneous button presses we included in this analysis only phases with duration exceeding

300ms.

Both types of patterns being perceived at the same time contradicts our intuitive

assumption of the exclusivity of two competing perceptual organisations in the

alternating two-tone sequence, as well as the findings of Pressnitzer and Hup [5, 6]

and much of the visual literature on bi-stability. However, it has been shown in vision

that contrary to the usual assumptions of exclusivity [12], periods of 'transition' during

which neither eye is clearly dominant can be of rather long duration; comparable with

'eye' dominance durations [24]. This is consistent with the durations of the both-

percept responses in our experiment 1. Furthermore, in another striking analogy

between visual and auditory bi-stability, the distribution of mean both-percept

durations with respect to rule strength resembles the distribution of 'transition' durations

in the binocular rivalry paradigm [24]; as illustrated in figure 18.



MaxMax

MinMin MaxMax

MinMin

Figure 18. Transition (both-percept) phases in binocular rivalry and auditory streaming. Left:

Relationship between transition durations and image contrast (governing the "strength" of the

alternative eye activations) in binocular rivalry; where Dep. refers to the dominant percept

prior to the transition, and Dest. to the percept following the transition; taken from [24].

Right: Relationship between both-percept durations and rule strength in experiment 1. Dep.

refers to the rule strength corresponding to the perceptual organisation prior to the 'both'

response, and Dest. to the rule strength corresponding to the perceptual organisation

following the 'both' response. For the perceptual state of 'integration', the rule strength was

considered to correspond to f; and for 'segregation' the rule strength was considered to correspond to SOA.

It is not clear at this stage what gives rise to the perception of both patterns

simultaneously, and intuitively we think of them as mutually exclusive. One possibility

is that there is a very rapid switching between the two alternatives but that conscious

perception is more sluggish, and unable to follow this rapid switching. Hence there is a

sort of stroboscopic effect in which both perceptual organisations are perceived as being

present although there is actually switching between them. An alternative explanation

arises from the notion of competition at several different levels in the perceptual

hierarchy. Generally, recurrent top-down connections tend to ensure that there is

consistency across the whole system, but there may be instances in which the top-down

signals cannot overcome the local competitive interactions; in this case, incompatible



winners could emerge at different levels, and this inconsistency may take some time to

resolve before a consistent organisation emerges throughout the hierarchy. In our

computational modelling studies, we have observed both of these effects and we are

planning to formulate a follow-up experiment to distinguish between these two

possibilities.

General Discussion

Interesting theoretical insights into the processes underlying streaming can be

derived from the experiments reported above. In this section, we present a conceptual

framework for auditory streaming which accounts for previous results as well as our

new findings. Here follows a summary of the novel observations obtained in the current

experiments:

1) Switching between integrated and segregated percepts continues throughout

the stimulus sequences with any combination of f and t (SOA) and in all participants. Switching is not a product of learning or fatigue. Rather, there appears to be no final

stable percept in auditory streaming.

2) Participants report simultaneous perception of integrated and segregated tone

patterns (both responses); that is, segregated and integrated percepts are not exclusive.

3) With medium-to-large fs and very short ts (SOAs) the first reported percept is segregation; that is, integration is not always the first percept. Hence,

segregation is not a result of evidence gathering.

4) When f is low/medium and t (SOA) is short the first perceptual phase is typically much shorter than first phases with other combinations of the two parameters.

The overall amount of switching is also highest in this region. The effect is not fully



monotonic, first percepts become somewhat longer and the overall amount of switching

increases with very short SOAs and at very low fs; the minimum of the first-phase duration distribution and the maximum amount of switching appearing at 100-ms SOA

within the 4-10-ST f range in the current experiments.

5) With time, the ratio between integrated and segregated percepts tends to

become balanced irrespective of the combination of f and t (SOA).

6) f and t (SOA) affect the duration of the first perceptual phase in response to the tone sequences similarly to the way in which visual contrast affects dominance

durations in binocular rivalry. It appears as if varying f affects the integrated and varying t (SOA) affects the segregated sound organisation similarly to the way in which 'contrast' is assumed to control the competitiveness of an image in the given eye.

This result supports the notion that f controls the competitiveness (strength) of the integrated, whereas t (SOA) controls that of the segregated sound organisation.

7) The distribution of the duration of both-percept responses as a function of

the strength of the preceding and the following organisation, assuming the above

described relationship between f and the integrated and t (SOA) and the segregated organisation, is similar to the distribution of the duration of transitional phases in

binocular rivalry.

The observed qualitative differences between the first and subsequent perceptual

phases have strong implications for theoretical accounts of streaming, which must

explain both the perceptual bias of the first phase, and the continuing switching

behaviour. It is important to note that our distinction here between first and subsequent

perceptual phases is not the same as the usual distinction between build-up and final



percept [4], because the first phase is generally longer than the expected build-up

duration, and we found no evidence for a stable final percept.

Since participants attended the sounds without attempting to hear them

according to one or the other organisation, the current results cannot tell us whether

switching occurs in an automatic fashion or is the product of attention. Previous

electrophysiological studies [10, 25] obtained correlates of both bottom-up and

attentional processes when participants were exposed to tone sequences similar to those

used in the current experiments. There is some indication that the initial phase of

streaming may be more sensitive to attentional effects [25-27], whereas maintaining

sound organisations does not require focused attention [28]. However, there is evidence

that the segregated organisation can develop [29] without attention, although this may

depend on the actual stimulus configuration. On the other hand, these

electrophysiological studies did not measure the perception of the critical sounds

on-line, thus possibly averaging together trials with different perceptual organisations.

Therefore, no strong conclusion can be drawn regarding the relationship between

attention and the observed differences between the first and subsequent perceptual

phases.

In the following account, we shall distinguish between the first and subsequent

perceptual phases. We shall attempt to describe them in terms of alternative rule

representations which vie for dominance, and test the viability of this explanation in the

face of our novel findings. As will become clear, a useful way to think about the two

perceptual stages may be as two distinct processes, formation of sequential associations

and coexistence between alternative interpretations.



First phase: rule formation

There are two aspects to consider: the initial choice of perceptual organisation,

and the duration of the first perceptual phase. We consider these below separately for

the case where the first phase is the integrated state and that where it is the segregated

state.

Integration is most commonly found for the parameter ranges used, except at

very short SOAs and medium-large fs, presumably since sequential associations between temporally adjacent events are more easily formed in the brain; i.e., there is a

bias towards forming AB and BA links first, and these local rules support perceptual integration. A consideration of plasticity mechanisms (Hebbian learning)

suggests that the formation of associations is perhaps only possible between consecutive

events; i.e. synaptic plasticity does not support skipping over intervening events. This

principle has two consequences for the effects of f on the formation of sequential associations. Firstly, in a tonotopically organised system, the amount of overlap

between the neural activity in response to the A and B tones is governed by the

frequency difference between the two: with small fs, the overlap in activity and thus the potential for forming these associations is large, whereas with large fs it is small. Thus the main effect of f is on the formation of local rules. Secondly, f may also affect the speed with which global-rule representations can form. This is because in

order to form higher order sequential associations ('global rules') it is necessary for

populations which respond selectively to one or other of the tones to emerge (i.e. a

process of clustering), before the within-stream sequential associations can be learned.

Only after such populations are formed can the within-stream events become

consecutive, at least as viewed from within the population cluster. If we consider a fixed

SOA, then clearly the difficulty of forming separate clusters will be determined largely



by f and the extent to which the neural activity elicited by the two tones overlaps, which in turn will determine the duration before the system can discover the

segregated organisation. This is well illustrated in Figure 16 (right panels), in which f has a clear effect on the duration of segregation for fixed SOAs. Thus the two effects of

f go hand in hand: smaller fs improve the formation of local-rule links and delay the formation of global-rule links; larger fs weaken the formation of local-rule links and allow faster formation of global-rule links.

The above description is consistent with the finding that the fission boundary is

much larger than the smallest detectable frequency difference [30], since segregation

requires not only a detectable difference, but also a clear separation of activity.

Similarly, the amplitude-modulation 'fission' boundary was shown to be much larger

than the smallest detectable amplitude-modulation difference [31]. Although initially

puzzling, the finding of a very stable fission boundary, measured in terms of the

minimum f necessary for segregation [30], can also be reconciled with perceptual switching; while the formation of separate clusters of activity may depend on some

minimum featural difference, our results show that subsequent stochastic switching is

largely independent of stimulus features. The claims of various groups, e.g. [32-34] that

adaptation in primary auditory cortex is a neural correlate of streaming is also consistent

with these ideas, since adaptation is likely to be an important aspect of the process of

clustering. Therefore, it will be linked to the duration of the first phase, and hence the

rate at which the probability of reporting streaming develops [32-34].

Integration may also be reported initially as a result of a qualitatively different

mechanism when f is small and SOA is short. When two sounds are presented within a temporal window of less than about 200 ms duration, then they tend to be processed as a



single event [19], at least within a limited range of frequency differences [20]. In our

experiments this occurs primarily for SOAs 100 ms, and f < 4ST. In these cases the integrated pattern first perceived is likely to be an ABA chunk. The auditory system

then needs to pull this chunk apart before it can discover the alternative within-stream

sequential associations. Nevertheless, perceptual switching is found here too.

Segregation as the first percept at short SOAs and larger fs can also be understood in terms of the tendency to chunk acoustic stimuli into single events if they

occur within approximately 200ms of each other [19]. When f exceeds the proposed spectral window of integration [20], consecutive identical tones may be directly linked

(i.e., with the B tone excluded, the integration window contains A-A). A direct

consequence of this explanation is that first-phase segregated percepts must be based on

the more frequent stimulus (A-A, they cannot be B---B); a prediction which we plan to

test experimentally. Neurophysiological studies suggest a reason for this phenomenon

since it has been shown that responses in cortex are greatly reduced when events follow

each other at a rate faster than about 10 Hz [35]. Therefore, at fast presentation rates, it

is possible that due to insufficient recovery time some populations only respond to one

or other of the tones from the outset. Hence the higher order sequential associations can

be discovered immediately, resulting in segregation becoming the first percept. In our

data, the minimum f for reporting first phase segregation is approximately 5 semitones at 125-ms SOA, which is not inconsistent with the spectrotemporal window of

integration reported in [20]. Thus, similarly to f, SOA also has two different effects on the formation of rule representations. Firstly it governs the formation of associations

between consecutive tones, because for such associations the neural after-effects of the

first tone must still be present at the time the next tone arrives. This primarily affects the

formation of within-stream associations since twithin > tacross. Secondly, with short



SOAs the recovery of the neuronal elements in higher (cortical) levels of the auditory

system may become an issue, and this primarily affects the local rule. Again, the two

effects go hand in hand: shortening the SOA strengthens the associations necessary for

representing the global rule more than the local rule (see Figure 16, left panel dF =

22ST), then, especially at small fs, shortening the SOA weakens the local-rule associations more than the global-rule ones (see Figure 16, left panel dF = 4ST).

One argument against this description of the SOA effects is that streaming can

be induced by the presenting a sequence of only one of the tones prior to the onset of

the ABA_ABA pattern [36]. Based on this finding Bregman et al. [3] argue against

SOA being an important determinant of streaming, suggesting instead that the

within-stream ISI (t) is the major factor. However, within the above-described framework, it is easy to understand how the induction sequence used by Rogers and

Bregman [36] led to increased reports of streaming at the onset of the ABA pattern. The

induction sequence promoted the establishment of sequential associations between tones

in one of the streams, thereby biasing the system towards activity associated with

single-frequency streams to be perceived first. In this case, the formation phase would

involve the discovery of linkages between temporally adjacent events, the local rules.

Thus, the effect of the induction sequence is similar to that of very low (


duration of this phase depends on the time taken to discover and represent alternative

sequential associations. Consistent with findings in vision that local competition is

necessary in order to tr

Denham Et-Al HEARES Submitted

Documents

Transcript of Denham Et-Al HEARES Submitted