Denham Et-Al HEARES Submitted

59
Manuscript for submission to Hearing Research 13/06/2008 Perceptual bi-stability in auditory streaming: How much do stimulus features matter? Susan L. Denham 1* , Kinga Gyimesi 2,3 , Gábor Stefanics 2,4 , and István Winkler 2,5 1 Centre for Theoretical and Computational Neuroscience, University of Plymouth, Drake Circus, Plymouth PL4 8AA, UK 2 Department of General Psychology, Institute for Psychology, Hungarian Academy of Sciences, 1394 Budapest, P.O. Box 398, Hungary 3 Department of Cognitive Science, Budapest University of Technology and Economics, 1111 Budapest, Sztoczek u. 2, Hungary 4 Department of Experimental Zoology and Neurobiology, University of Pécs, 7624 Pécs, Ifjúság st. 6, Hungary 5 Institute of Psychology, University of Szeged, 6722 Szeged, Petőfi S. sgt. 30-34, Hungary Abstract The auditory two-tone streaming paradigm has been used extensively to study processing mechanisms that underlie the decomposition of the composite auditory input into coherent sound sequences, and hence the perception of auditory objects. Here we present new results from a study of bi-stability in auditory streaming. Using relatively long (4 minute) sequences, we show that there are two fundamentally different phases in * Corresponding author; [email protected] ; tel +44 1752 232610; fax +44 1752 233349 Perceptual bi-stability in auditory streaming 1 Manuscript Click here to view linked References

description

Denham Et-Al HEARES Submitted

Transcript of Denham Et-Al HEARES Submitted

  • Manuscript for submission to Hearing Research 13/06/2008

    Perceptual bi-stability in auditory streaming: How much do stimulus features

    matter?

    Susan L. Denham1*, Kinga Gyimesi2,3, Gbor Stefanics2,4, and Istvn Winkler2,5

    1Centre for Theoretical and Computational Neuroscience, University of Plymouth,

    Drake Circus, Plymouth PL4 8AA, UK

    2Department of General Psychology, Institute for Psychology, Hungarian Academy of

    Sciences, 1394 Budapest, P.O. Box 398, Hungary

    3Department of Cognitive Science, Budapest University of Technology and Economics,

    1111 Budapest, Sztoczek u. 2, Hungary

    4Department of Experimental Zoology and Neurobiology, University of Pcs, 7624

    Pcs, Ifjsg st. 6, Hungary

    5Institute of Psychology, University of Szeged, 6722 Szeged, Petfi S. sgt. 30-34,

    Hungary

    Abstract

    The auditory two-tone streaming paradigm has been used extensively to study

    processing mechanisms that underlie the decomposition of the composite auditory input

    into coherent sound sequences, and hence the perception of auditory objects. Here we

    present new results from a study of bi-stability in auditory streaming. Using relatively

    long (4 minute) sequences, we show that there are two fundamentally different phases in

    * Corresponding author; [email protected]; tel +44 1752 232610; fax +44 1752

    233349

    Perceptual bi-stability in auditory streaming 1

    ManuscriptClick here to view linked References

  • Manuscript for submission to Hearing Research 13/06/2008

    this process. Listeners hold their first percept of the sound sequence for a relatively long

    period (first phase), after which perception stochastically switches between two or more

    alternative sound organisations, each held on average for a much shorter duration

    (second phase). The two perceptual phases also differ in that stimulus parameters

    influence perceptual behaviour to a far greater degree in the first than in the second

    phase, and during the second but not the first phase, there are significant periods when

    more than one organisation can be perceived simultaneously. Furthermore, our analysis

    reveals deep parallels between the dynamics of perceptual organisation in auditory

    streaming and binocular rivalry. We propose an account of auditory streaming in terms

    of rivalry between competing temporal associations. Based on the results of our

    experiments, we suggest that in the first perceptual phase (formation of associations),

    alternative interpretations of the auditory input are formed. In the second phase

    (coexistence of interpretations), perception stochastically switches between the

    alternatives, thus maintaining perceptual flexibility.

    Keywords

    auditory streaming, bi-stability, perceptual switching, auditory scene analysis

    Introduction

    In order to make sense of real-world environments it is necessary to identify,

    extract and organise relevant information from the wealth of incoming sensory data.

    The potential amount of information far exceeds the processing capacity of any living

    system. However, biological organisms are not idle perceivers; rather they seek out

    information about the world and the objects in it [1]. The challenge is to create

    appropriate object representations on the fly and to continually modify them in order to

    Perceptual bi-stability in auditory streaming 2

  • Manuscript for submission to Hearing Research 13/06/2008

    maintain as accurate models as possible. Thus the process of perceptual organisation is

    fundamental to effective perception.

    An important problem for auditory perception is that sound sources may emit

    discontinuous sequences of sounds, so some means for forming associations between

    discrete events, through the detection of regularities and the maintenance of temporally

    persistent representations is required. The formation of sequential associations has been

    extensively studied by means of the paradigm of auditory streaming. In the typical

    streaming experiment [2], a tone sequence of the structure ABA-ABA-ABA- is

    presented at a fast stimulus rate (A and B denote tones differing from each other in

    frequency; the - sign stands for a silent interval equal to the time interval between the

    onsets of successive tones; i.e. the stimulus onset asynchrony (SOA); see figures 1 and

    2). When all sounds are grouped together into a single coherent stream, a galloping

    rhythm is typically heard. By increasing the frequency separation (f) between the A and B tones and/or by shortening the interval between subsequent same-frequency tones

    (the within-stream inter-tone or offset-to-onset interval, t), perception of the sound sequence changes to that of two homogeneous isochronous streams; a faster paced one

    consisting of A tones and a slower paced one consisting of Bs [2].

    f

    t

    SOA

    B

    A

    Sounds Perception

    Figure 1. Cartoon of the auditory streaming paradigm. A sequence of low (A) and high (B)

    tones presented repeatedly in ABA- groups can be perceived as a single coherent stream with a

    galloping rhythm (upper right), or as two segregated streams (lower right), each with an

    isochronous rhythm

    Perceptual bi-stability in auditory streaming 3

  • Manuscript for submission to Hearing Research 13/06/2008

    In general, there is a trade-off between f and t in determining the dominant perceptual organisation. Asking participants whether they perceived the galloping tone

    pattern in sequences of the type shown on Figure 1, van Noorden [2] identified three

    separate regions of the f-t (or f-SOA, see Figure 2)1 space with different characteristic perceptual organisations (see Figure 3). With very low f's, participants could always hear the galloping rhythm showing that they could organize all tones into

    a single sound stream. With slightly larger f's and/or longer t's, participants were able to hear either two separate sound streams or a single integrated stream and to alter their

    organisational bias between the two percepts at will. Further increasing f and decreasing t resulted in participants not being able to hear the galloping rhythm, which suggests that perception of two streams became the dominant sound organisation.

    a

    c d

    Time

    Frequency

    b

    Figure 2. The four different time intervals that can be distinguished in the two tone sequences

    used in our streaming experiments, and considered by Bregman and colleagues; a) SOAwithin, b)

    ISIwithin, c) ISIacross, and d) SOAacross. Diagram derived from [3].

    Based on the results obtained in the classical studies of auditory streaming (for a

    review, see [4]), it was assumed that following an initial short build-up period

    perception of tone sequences such as the one shown in Figure 1 becomes stable, when

    1 In his experiments, van Noorden did not distinguish between t and SOA [2. van Noorden, L.P.A.S., Temporal coherence in the perception of tone sequences, in Institute for Perception research. 1975: Eindhoven.]; however, recently Bregman and colleagues [3. Bregman, A.S., et al., Effects of time intervals and tone durations on auditory stream segregation. Percept Psychophys, 2000. 62(3): p. 626-36.], in a study designed to investigate the influence of the various possible time intervals on stream segregation, found that it is in fact the within-stream offset-to-onset interval (i.e. ISIwithin or t) which is most influential in determining the likelihood of streaming (see, figure 2).

    Perceptual bi-stability in auditory streaming 4

  • Manuscript for submission to Hearing Research 13/06/2008

    parameters fall into either the segregated or the integrated area of the f-t space (see Figure 3). However, recent evidence [5, 6] suggests that the idea that the auditory

    system fixes on a single unchanging dominant percept is to some extent an artefact of

    the experimental procedures and analysis methods used in classical studies of auditory

    streaming. These obscure an important detail about streaming, namely, that it fluctuates

    randomly between the two possible organisations; a phenomenon known as perceptual

    bi-stability.2 Bi-stability in visual perception (for a review, see [7]) has been studied

    extensively since it was thought to offer the possibility of identifying the neural

    correlates of visual awareness and ultimately consciousness, by allowing perceptual

    changes to be dissociated from changes in the stimulus [8]. The finding of perceptual

    bi-stability in auditory streaming is important in that it raises questions about the

    commonality of mechanisms underlying perceptual organisation in different sensory

    modalities and it further suggests that auditory streaming, far from being a primitive and

    automatic process, may be better understood in terms of generic strategies of perceptual

    organisation; a notion that has important implications for models of auditory streaming.

    2 Actually, even with this simple stimulus configuration, several different sound organisations can be experienced; i.e. multi-stability. For the sake of simplicity we can assume that the various possible perceptual organisations can be sorted into two main categories: integrated and segregated (for a formal definition, see the Design section).

    Perceptual bi-stability in auditory streaming 5

  • Manuscript for submission to Hearing Research 13/06/2008

    Temporal coherence boundary

    boundary

    Temporalcoherence

    Integrated

    Ambiguous

    Fission boundary

    Fission boundary

    Segregated

    Figure 3. The dependence of primitive auditory streaming on frequency difference (f) and presentation rate (characterized by the SOA) found in human psychophysical experiments using

    alternating pure tones [2, 9]. Stimuli in the region of the parameter space above the 'temporal

    coherence boundary' are generally perceived as two segregated streams, and those with

    parameters in the region below the 'fission boundary' as a single coherent stream. Those falling

    in the ambiguous region can be perceived in either way, and perception can be influenced by

    top-down processes [2]. van Noorden actually reported two sets of boundaries, which are

    illustrated here for comparison and clarification; the more commonly referenced blue set are

    more appropriate for describing perceptual behaviour in response to short sequences, while the

    green set are relevant to longer sequences such as the ones we report here. The red dot

    indicates the stimulus parameters used by Pressnitzer and Hup [5, 6] the orange dot those of

    Winkler et al. [10] and the yellow dots those reported in Denham and Winkler [11] see

    descriptions of these experiments in the text.

    In the investigation of Winkler et al. [10], a single point in the parameter space

    (indicated by the orange dot in Figure 3) was tested (A = 1245 Hz, B = 931 Hz, f = 5 semitones, stimulus onset asynchrony, SOA = 100ms, tone duration = 40 ms in one and

    175 ms in another condition, sequences were close to 3 minutes in duration).

    Perceptual bi-stability in auditory streaming 6

  • Manuscript for submission to Hearing Research 13/06/2008

    Participants were instructed to continuously depress a response key, when they heard

    the galloping rhythm and to release the key when they did not. With the short tone

    duration, t was 160 ms, the result was a rather ambiguous perception of the tone sequence (galloping was heard on average for 61.8% of the time; SD 22.5); with the

    long tone duration, t was 25 ms, the result was predominantly segregated perceptual organisation (10.3% galloping; SD 20.1). Each participant experienced perceptual

    switching in both stimulus conditions.

    Pressnitzer and Hup [5, 6] tested a similar point of the parameter space

    (indicated by the red dot in Figure 3; A = 587 Hz, B = 440 Hz, f = 5 semitones, stimulus onset asynchrony, SOA = 120 ms, tone duration = 120 ms, t = 120 ms, sequences of 4 minutes). These parameters are close to the ambiguous condition of

    Winkler et al. [10] and although Pressnitzer and Hup [5, 6] used a slightly different

    experimental procedure in which participants reported their perceptions using three

    buttons; one for grouping (galloping rhythm), one for splitting (streaming) and the third

    for cases when neither galloping nor streaming was heard, their participants also

    reported considerable perceptual switching.

    In the experiment reported in [11], we were interested to test whether bi-stability

    was restricted to the ambiguous region or whether it would be found over a wider range

    of the parameter space. We used the same one-button experimental procedure as

    Winkler et al. [10] and tested the parameter combinations indicated by the yellow dots

    in Figure 3. Our results [11] showed that perceptual bi-stability was present for all

    conditions tested. Furthermore, although there was great variability in the mean

    switching rate between and within participants, they all showed some degree of

    perceptual bi-stability.

    Perceptual bi-stability in auditory streaming 7

  • Manuscript for submission to Hearing Research 13/06/2008

    In their experiments Pressnitzer and Hup [5, 6] also explored similarities

    between perceptual switching behaviour in response to the auditory galloping/streaming

    sequence (Figure 1), and a visual stimulus configuration known to be perceptually bi-

    stable. The visual stimulus consisted of overlapping drifting gratings, which could be

    perceived either as sticking or sliding surfaces; participants typically experience

    perceptual switching between these two alternatives. The results indicated that key

    characteristics of visual bi-stability are also present in the auditory case. These

    characteristics included: exclusivity, the existence of two plausible yet mutually

    exclusive alternative interpretations of the sensory input; randomness, stochastic

    switching between percepts such that successive dominance durations are uncorrelated;

    and inevitability, the finite duration of perceptual dominance; i.e. even when the

    intention is to hold onto one interpretation, a switch will always eventually occur [12].

    Visual bi-stability can also be induced by showing a different image to each eye;

    the well-known phenomenon of binocular rivalry in which participants experience

    switches in perceptual awareness between the two images even though both are

    continually present (for a recent review, see [13]). On the face of it, finding similarities

    in perceptual behaviour in response to auditory tone sequences and binocularly rivalrous

    images seems rather unlikely: the sensory modalities are completely different, the tones

    are intermittent, while the images can be constant; and the tones are heard by both ears,

    while each eye is presented with a different image. However, binocular rivalry is also

    known to display all three of the key characteristics of bi-stability listed above. Because

    binocular rivalry has been extensively investigated over many years, and is perhaps the

    best understood paradigm inducing bi-stable perception, we explored the possibility that

    some insights into the processes underlying auditory streaming might be gained from

    Perceptual bi-stability in auditory streaming 8

  • Manuscript for submission to Hearing Research 13/06/2008

    comparisons between the two phenomena; although we did not necessarily expect to

    find a very close correspondence.

    Here we report a detailed analysis of perceptual bi-stability observed in auditory

    streaming as well as comparisons between auditory streaming and binocular rivalry.

    These analyses and comparisons were motivated by our recently suggested

    interpretation of the auditory streaming phenomenon [11] (for a detailed description, see

    the Distribution of perceptual switching section) and aimed at finding new ways to

    describe auditory streaming quantitatively.

    Perceptual Experiments

    Having found that perceptual bi-stability in auditory streaming exists over a

    wide range of the feature space [11], and is not restricted to the ambiguous region [2],

    we were interested to characterise the distribution and dynamics of perceptual

    switching, since these aspects had not yet been determined for auditory streaming. The

    results which we present here were obtained in two further perceptual experiments of

    auditory streaming.

    Participants

    Thirty young healthy volunteers (16 male, 18-26 years of age, average 21.8

    years) participated in experiment 1 and 15 (7 male, 21-25 years of age, average 22.2

    years) in experiment 2. Participants received modest financial compensation for their

    participation. The study was conducted in the sound-attenuated experimental chamber

    of the Institute for Psychology, Hungarian Academy of Sciences. It was approved by the

    Ethical Committee (institutional review board) of the Institute for Psychology. After the

    aims and procedures of the study were explained to them, participants signed an

    Perceptual bi-stability in auditory streaming 9

  • Manuscript for submission to Hearing Research 13/06/2008

    informed consent form before starting the experiment. Participants were pre-selected on

    the basis of the results of clinical audiometry with the criteria that the hearing threshold

    between 250 and 6000 Hz should not be higher than 25 dB, and the difference between

    the two ears not higher than 15 dB in the same frequency range.

    Stimulus paradigm

    We decided to focus the first experiment on the medium-to-large f region of the parameter space with medium to moderately long SOAs, to see whether features of

    perceptual bi-stability would show differences between the segregated (large f, medium SOA) and the ambiguous (medium f and large f with long SOA) region of the parameter space; the second experiment was then focussed on the region of small to

    medium f and short to medium SOA. The experimental conditions used are illustrated in Figure 4 below. Participants were presented with 4-minute long trains of the ABA-

    structure, where the A and B were pure tones of 75 ms duration, including 5 ms linear

    onset and 5 ms linear offset ramps. In separate trains, f was 4, 10, 16, or 22 semitones (ST) in experiment 1, and 1, 3, 5, or 7 ST in experiment 2; SOA was, 100, 150, 200, or

    250 ms (thus t was 125, 225, 325, or 425 ms for the more frequent tones) and 75, 100, 125, or 150 ms (thus t was 75, 125, 175, or 225 ms) for experiments 1 and 2, respectively. Altogether, 4 4 = 16 different types of trains were tested in each

    experiment, separately. The frequency of the lower-pitched, more frequent tones (A

    on Figure 1) was kept constant at 400 Hz across the different stimulus conditions.

    Sounds were generated on an IBM PC computer (MEL 2.0 stimulus presentation

    software Psychology Software Tools Inc.), amplified using a custom-made sound

    mixer and amplifier, and delivered through Sennheiser HD 430 headphones at a

    Perceptual bi-stability in auditory streaming 10

  • Manuscript for submission to Hearing Research 13/06/2008

    comfortable 70-dB (SPL) intensity level. The order of the stimulus trains with different

    parameters (each train 4 minutes long) was randomized separately for each participant.

    Procedure

    In the classical studies of auditory streaming, participants were typically asked

    to report their perception after the end of each short sound sequence. Thus these

    experiments were not designed to test the temporal dynamics of streaming-related

    perceptual processes. Furthermore, participants were often asked to attempt to hear the

    sound sequences according to one or another pattern. This method was used to find

    unambiguous effects of stimulus parameters. The recent studies described in the

    Introduction employed on-line measures, asking participants to report their perceptions

    as they occurred throughout the presentation of relatively long sound sequences.

    However, in all but one [6] of the streaming experiments reported in the literature, it has

    been implicitly assumed that only two alternative sound organisations are possible; i.e.

    either integrated or segregated, each linked to a specific perceived sound pattern. With

    sequences similar to those presented in the current experiments (see Figure 1), the

    integrated organisation was expected to result in the perception of the galloping

    pattern, whereas the segregated sound organisation was expected to result in the

    simultaneous perception of a high and a low tone sequence, both with uniform (but

    different) presentation rate. These assumptions were reflected in the response choices

    and instructions given to participants. In fact, most previous experiments only asked

    participants to report when they experienced a certain pattern of sounds (e.g. the

    galloping pattern). It was then assumed that participants, who did not report hearing the

    designated pattern experienced the opposite sound organisation (in the example, this

    would be the segregated organisation). However, in a pilot study in which we asked

    Perceptual bi-stability in auditory streaming 11

  • Manuscript for submission to Hearing Research 13/06/2008

    participants to describe their different perceptions in detail, we found that they a) heard

    rhythmic patterns different from either one of the expected ones, and b) sometimes

    heard simultaneously a pattern that involved both high and low tones and a pattern

    involving only high or only low tones.

    In order to eliminate possible confusion caused by the perception of rhythms

    other than the galloping rhythm the notion of an integrated percept was generalized

    and defined for participants as hearing a repeating pattern, which contained both low

    and high tones. In turn, the notion of a segregated percept was similarly generalized and

    defined for participants as hearing some repeating pattern(s) formed either exclusively

    of high or exclusively of low tones, with the possibility that multiple repeating

    segregated patterns (i.e., A---A---A and B-B-B) may be perceived concurrently.

    Participants were to depress one response key so long as they experienced an integrated

    percept and the other key when they experienced a segregated percept. The role of the

    two keys was randomly assigned across participants. When participants heard no

    repeating tone pattern, they were instructed to release both keys. Participants were asked

    to mark their perception throughout the duration of the stimulus sequence and not to

    attempt hearing the sound according to one or another perceptual organisation. The

    experimenter made sure that participants understood the types of percepts they were

    required to report, using both auditory and visual illustrations. Furthermore, in

    experiment 1, we then divided our participants into two groups. One group received

    instructions implicitly suggesting exclusivity between the segregated and integrated

    percepts (as described above); i.e., the instructions were you may either hear a

    repeating integrated, or some repeating segregated tone patterns, or no repeating tone

    pattern . This set of instructions was similar to that employed by Pressnitzer and

    Hup [5]. The other group was explicitly told that it was possible that they may

    Perceptual bi-stability in auditory streaming 12

  • Manuscript for submission to Hearing Research 13/06/2008

    sometimes hear both types of patterns at the same time; i.e., the instructions were

    you may hear a repeating integrated, or some repeating segregated tone patterns,

    possibly even both at the same time, or no repeating pattern at all . In the case that

    they heard both types of patterns at the same time, they were instructed to keep both

    buttons depressed. However, they were also cautioned to be sure to release the button

    when they stopped hearing the corresponding pattern. In addition to the instructions,

    when analyzing the responses, we discarded all those responses, which we assumed to

    represent transitions between two percepts; i.e., all phases with duration shorter than

    300 ms. The assumption was that in such cases, participants may simply have been

    slightly inaccurate in synchronising their button presses and releases. Because the

    results obtained with the two sets of instructions in experiment 1 proved to be very

    similar in all regards except for a higher incidence of reporting two simultaneous

    percepts in the latter group, for the sake of clarity, here we only report the results

    obtained with the instructions explicitly mentioning the possibility of simultaneously

    hearing repeating integrated and segregated tone patterns. The group of participants,

    who received this set of instructions, included 15 volunteers (6 male, 18-25 years of

    age, average 20.9 years). In experiment 2, we only used this set of instructions and all

    other procedures were also identical to those of experiment 1.

    Participants sat in a comfortable reclining chair in the experimental chamber

    throughout the experimental session, holding a response button in each hand. Short 1-3

    minute breaks were inserted between consecutive stimulus trains with longer breaks,

    when the participant could move about, scheduled just before the start of the

    experimental conditions (after explaining and illustrating the possible perceptions) and

    after the 8th stimulus train. Further longer breaks were inserted into the session when

    necessary. The experiment took ca. two hours altogether. The state of the two response

    Perceptual bi-stability in auditory streaming 13

  • Manuscript for submission to Hearing Research 13/06/2008

    keys was sampled at 10 Hz (100 ms sampling time) by a NeuroScan Synamps EEG

    recording system. Response key states were then extracted from the signals and encoded

    separately for each participant and condition for further analysis.

    Fi

    gure 4. Experimental conditions for experiments 1 (magenta) and 2 (cyan) reported here.

    Conditions are numbered separately for experiments 1 and 2 in the following way. Number 1

    denotes the shortest SOA and smallest f (100 ms / 4 ST and 75 ms / 1 ST for experiments 1 and 2, respectively). Numbers increase faster through the four different fs (e.g., 2 marks 100 ms / 10 ST and 75 ms / 3 ST for experiments 1 and 2, respectively) and slower for the four

    different SOAs (e.g., 5 marks 150 ms / 4 ST and 100 ms / 1 ST for experiments 1 and 2,

    respectively).

    Results and discussion

    Firstly, we note that when participants are exposed to alternating tone sequences

    of sufficient duration, then perceptual bi-stability is found over a very wide range of the

    parameter space, including conditions where stable organisation would be expected; i.e.

    conditions with very large frequency differences and fast presentation rates or small

    Perceptual bi-stability in auditory streaming 14

  • Manuscript for submission to Hearing Research 13/06/2008

    frequency differences and slow presentation rates. We have found no condition of all

    those that we have tested that was stable across all participants, and no participant who

    experienced stable perceptual organisation for all conditions. Furthermore, switching

    occurs in all phases of the experimental session. The switching results for each

    participant, condition, and position of the stimulus train within the experimental session

    are illustrated in Figure 5. On average, there were 15.75 switches per condition in

    experiment 1, 36.59 in experiment 2. This corresponds, on average, to one switch in

    every 15.24 and 6.56 seconds, respectively; showing that perceptual switching occurs

    quite often when listening to the tone sequences used to study auditory streaming. We

    found no significant effect of the position of the stimulus sequence within the

    experimental session for either experiment (F[15,210] = 1.44 and F[15,210] = 0.81, for

    experiments 1 and 2, respectively; p>.1, both; one-way dependent ANOVA of the

    number of perceptual switches with the factor Train-number [116]). These results

    suggest that the observed perceptual switching does not result from learning or fatigue

    within the experimental session. The number of switches per train appears to be higher

    in experiment 2 than in experiment 1. This may be an effect of the parameters (f and t; the effects of these parameters will be explored below) and/or a difference between the participant groups (the amount of switching varies considerably across participants

    see the middle column of Figure 5). The following sections examine perceptual

    switching in more detail.

    Perceptual bi-stability in auditory streaming 15

  • Manuscript for submission to Hearing Research 13/06/2008

    Experiment 1

    Experiment 2

    Figure 5. Average total number of perceptual switches (red lines) and individual participant data

    (black dots) plotted against a) condition, b) participant, and c) position of the 4-minute long

    stimulus train within the experimental session. Results from experiment 1 are plotted in the top

    row; those from experiment 2 in the bottom row. Condition numbers are defined in Figure 4

    separately for experiment 1 and 2. Note that due to the randomized order of the stimulus

    conditions, trains with any set of parameters could occur in any position within the experimental

    session.

    Distribution of perceptual switching

    The first question we address is whether perceptual switching occurs uniformly

    across the parameter space or whether the degree of perceptual switching is parameter

    dependent. One prediction about the distribution of perceptual switching can be made

    on the basis of van Noordens perceptual boundaries [2]; the idea being that ambiguity

    regarding the appropriate perceptual organisation leads to instability. This interpretation

    suggests that perceptual switching will be maximal in the ambiguous region [2].

    Perceptual bi-stability in auditory streaming 16

  • Manuscript for submission to Hearing Research 13/06/2008

    Another way to understand bi-stability is as a competition between alternative

    interpretations of the auditory scene, mediated by competition between different

    sequential associations or rules [14] (for a summary, see Figure 6), where:

    Local rules [[4, 14]], refer to links between temporally adjacent events; these rules are fully supported by temporal proximity [15] (i.e., they are associations between

    temporally adjacent sound events in the sequence, thus linking them is relatively

    easy, at least at small and intermediate ts; the links are stronger for small f and weaker for large f (i.e., similar sounds form better groups than dissimilar ones; see grouping by similarity [15]).

    Global rules [4, 14], refer to links between non-adjacent events; in our simple sequence for streaming, these sounds are identical in frequency (i.e. A-A or B--

    -B..); therefore, temporal separation alone governs the strength of this type of

    grouping, which is stronger for small t and weaker for large t. (Note that in our experiments, decreasing the SOA by a given amount decreases t by twice that amount, because tone duration was kept constant.)

    Global rulet

    Local rule... f (t)

    Figure 6. Cartoon indicating the influence of f and t on the most prominent sequential associations which can be made in the galloping auditory streaming paradigm. t is placed in parenthesis, because with short-medium ts (SOAs), the rule-competition account of auditory streaming suggests that the effect of changes in SOA (which determines t in the current experiments) on the formation and representation of the local rule is relatively small. This is

    because 1) the sounds to be connected are adjacent and 2) with relatively short SOAs

    separating the sounds, the neural after-effects of the first sound are still present when the second

    sound arrives.

    The local versus global rule interpretation suggests that most switching will be

    found for stimulus parameters which maximise competition, i.e. when both local and

    global rules are strong. This occurs for small f and small t, because with small f, the

    Perceptual bi-stability in auditory streaming 17

  • Manuscript for submission to Hearing Research 13/06/2008

    local rule becomes strong, and with small t, the global rule becomes strong. Hence, if this interpretation is correct, we should find most switching in experiment 1 at f = 4ST, SOA = 100ms. Conversely least competition should occur when the two rules are

    not well balanced, and one or the other wins the competition very easily. In experiment

    1 we expect to find least switching for the 4ST, 250ms and 22ST, 100ms conditions.

    The conditions where the two rules are approximately equally matched but are both

    relatively weak would result an intermediate amount of switching.

    In order to distinguish these alternative interpretations (i.e., ambiguity vs.

    rule-competition) we examined the total switching in each condition. The results below

    support the hypothesis of competing sequential associations; the mean switching rate

    peaks along a ridge where local and global rules may be considered to be roughly

    balanced, and in experiment 1, it is highest for the condition with smallest f and smallest t (see Figure 7, left panel). From this analysis, it is clear that the region of maximum switching does not coincide with the ambiguous region of van Noorden [2].

    We note that the ridge of maximum switching appears to run roughly parallel to the

    temporal coherence boundary.

    Figure 7. Group-mean distribution of perceptual switching in experiment 1 (left panel) and

    Perceptual bi-stability in auditory streaming 18

  • Manuscript for submission to Hearing Research 13/06/2008

    experiment 2 (right panel). The colour scale indicates the mean number of switches across

    participants accumulated throughout the tone trains, separately for each condition. Note that the

    coloured surface is interpolated between the discrete experimental data points indicated by the

    green dots. For clarity, the x and y axes of the two panels are differently scaled. See figure 4 for

    the relation between the parameters used in experiments 1 and 2.

    The results of experiment 1 (Figure 7, left panel) suggest that the distribution of

    perceptual switching in auditory streaming is broadly consistent with competition

    between alternative sequential associations, with stronger competition and more

    switching where these associations are strongest. However, although the relationship

    between perceptual switching and stimulus parameters generally supports the rule

    competition hypothesis, there is a clear qualification evident in the results of

    experiment 2 (Figure 7, right panel); i.e. switching rates do not increase indefinitely

    with decreasing f and t. There is a non-monotonic relationship between the mean number of switches and f and t, with the maximum switching found in the region of f = 4ST and SOA = 125ms (t = 175 ms). Although the increase in switching with increasing rule strength is intuitively easy to understand, the non-monotonic relationship

    between rule strength and switching requires further consideration.

    The fall-off in switching rate in the region of very small f and t suggests that different factors, which become stronger in this region of the parameter space, also

    affect switching. Due to the uniform 75-ms stimulus duration used in our experiments,

    with SOA < 100ms there is no (or almost no) silent gap between successive A and B

    tones. Thus it is possible that for small fs, triplets of three successive tones (ABA) may form a unitary event and, therefore, for segregation to occur, the system has to first

    extract the components from the composite before other sequential associations can be

    established. This notion is supported by the literature on temporal integration, showing

    that auditory input within 150-200 ms is integrated into a single unit and processed in

    Perceptual bi-stability in auditory streaming 19

  • Manuscript for submission to Hearing Research 13/06/2008

    many ways differently from successive sounds exceeding this period (e.g., masking,

    loudness summation, detection of omissions and successive deviations [16-18]). Thus in

    the case of short ts, building the global rule suffers and, as a consequence, competition between the two rules is less balanced. Therefore, the amount of switching

    decreases compared with the more balanced cases.

    In contrast, with large fs and short SOAs (SOA < 100ms; t < 125 ms), associations between successive identical tones may be formed directly and, perhaps,

    even before associations between the tones with different frequencies. If this were the

    case, one should expect that, contrary to the common assumption that integration is

    always the first perceptual state [4], segregation should be reported first for short SOA

    and large f conditions. Our results (see next section) confirm this expectation. One possible explanation for the immediate dominance of segregation at small t and large f is that large fs impose relatively large spatial distance between stimulus-driven neural activity in the tonotopically organized part of the afferent auditory system, thus

    weakening and delaying interactions between the activities associated with the two

    different tones. At the same time, short ts allow the after-effect of the more frequent tones to survive till the arrival of the next identical tone. This may enable easy direct

    linking of successive identical tones. In accordance with this account, dominance of

    segregation over temporal integration at large fs has been previously demonstrated [19-21], even within the temporal integration period.

    Differences between switching behaviour in the first and subsequent phases

    It is widely assumed that auditory streaming has a strong initial bias towards

    integration. This has led to the suggestion that auditory streaming can be interpreted as a

    process whereby the auditory system 'accumulates evidence' before making a final

    Perceptual bi-stability in auditory streaming 20

  • Manuscript for submission to Hearing Research 13/06/2008

    organisational decision [4]. That is, segregation can only emerge after a gradual 'build-

    up' process [22]. In this section we show that 1) it is not always the case that

    participants report integration first and 2) there is no stable final sound organisation,

    rather switching between alternative organisations continues throughout the stimulation.

    Figure 8 (top panels) show, separately for experiment 1 and 2, the mean first

    percept (termed first phase) durations averaged across all participants. The group-

    averaged value was calculated by treating integrated phase durations as positive and

    segregated phase durations as negative. Therefore, the colour in the diagram shows the

    overall group tendency towards one or the other first reported percept. It is clear from

    the figure that for small fs, integration tends to be the first percept reported, whereas for trains with parameters falling into the larger-f and short-SOA region, most participants initially linked same-frequency tones (A-A or B---B), and thus perceived a

    segregated percept first (for a possible explanation, see the previous section).

    Perceptual bi-stability in auditory streaming 21

  • Manuscript for submission to Hearing Research 13/06/2008

    Mean of All Subsequent Phases

    First Phase

    Mean of All Subsequent Phases

    First Phase

    Figure 8. Group-mean signed durations (in seconds) of the first (top) and the mean of all

    subsequent phases (bottom) averaged across all participants for experiment 1 (left) and

    experiment 2 (right). Segregated phases were assigned negative and integrated phases positive

    values. Note that the colour scale is the same for all images as indicated by the bar on the right,

    and that the coloured surface is interpolated between the discrete experimental data points

    indicated by the green dots. For clarity, the x and y axes of the two panels are differently scaled.

    See figure 4 for the relation between the parameters used in experiments 1 and 2.

    A comparison between the distribution of the signed durations obtained in the

    first and subsequent phases (figure 8 bottom panels) clearly shows many instances of

    first-phase segregation and also that, as expected, segregation dominates over a wider

    range of stimulus values after the first phase. For experiment 1, an ANOVA test of the

    phase durations with the structure Phase (first vs. subsequent) SOA (4 levels, see

    Design and figure 4) f (4 levels, see Design and figure 4) showed significant main effects of all three factors (Phase: F[1,14]=15.04, p

  • Manuscript for submission to Hearing Research 13/06/2008

    F[3,42]=30.11, p

  • Manuscript for submission to Hearing Research 13/06/2008

    The ANOVA of the signed phase durations for experiment 2, showed a very

    similar set of results. All three main effects were significant (Phase: F[1,12]=18.64,

    p

  • Manuscript for submission to Hearing Research 13/06/2008

    integrated percept is more common over a wider range of parameters, segregation

    catches up with it later on.

    Figure 8 and the related statistical analysis of the signed phase durations clearly

    show differences in the distribution of integration and segregation and the effects of

    SOA and f between the first and subsequent perceptual phases. However, as was noted, averaging signed phase durations does not allow one to draw conclusions

    regarding the length of phase durations. These are illustrated in Figure 9 below,

    separately for the first and subsequent phases.

    First Phase

    Mean of All Subsequent Phases

    First Phase

    Mean of All Subsequent Phases

    Figure 9. Group-mean durations (in seconds) of the first (top) and the mean of all subsequent

    phases (bottom) averaged across all participants for experiment 1 (left) and experiment 2 (right).

    Note that the colour scale is the same for all images as indicated by the bar on the right, and that

    the coloured surface is interpolated between the discrete experimental data points indicated by

    the green dots. For clarity, the x and y axes of the two panels are differently scaled. See figure 4

    for the relation between the parameters used in experiments 1 and 2.

    Perceptual bi-stability in auditory streaming 25

  • Manuscript for submission to Hearing Research 13/06/2008

    The images show 1) a clear overall decrease of durations from the first to

    subsequent phases, as well as visible effects of t and f on phase durations in the first but not on subsequent phases.

    In experiment 1, an ANOVA test of the absolute phase durations (segregated

    and integrated phases pooled together) with factors of Phase (first vs subsequent)

    SOA (4 levels, see Design and figure 4) f (4 levels, see Design and figure 4) showed a main effect of Phase (F[1,14]=46.11, p

  • Manuscript for submission to Hearing Research 13/06/2008

    durations only characterized the first phase, whereas subsequent phases showed a

    largely uniform distribution of phase durations. No other main effect or interaction

    reached significance.

    In experiment 2, an ANOVA of the same structure as above yielded significant

    main effects for Phase (F[1,14]=21.34, p

  • Manuscript for submission to Hearing Research 13/06/2008

    affect the duration of the first phase only (see the almost homogeneous distribution of

    phase durations in the second phase Figure 9 bottom). The analysis of the signed

    phase durations showed that small fs promote integration as the first percept, whereas short ts promote segregation as the first percept. This result together with the pattern of interaction between Phase and f and/or SOA for absolute phase durations suggest that first-phase durations are shortest when competition between the two rules is the

    highest (sort SOA, small f). More or less uniformly distributed phase durations in subsequent phases leads to more overall switching with shorter first-phase durations,

    because a short first phase leaves more time for switching in subsequent phases. This

    explains the pattern observed in the Distribution of perceptual switching section and

    qualifies the effects and explanations described in that section as referring to the first

    phase: When both rules are strong they can be discovered fast and so switching between

    them starts early.

    Time course of perception during the stimulus trains

    Our finding that segregation becomes more likely later during the stimulus trains

    appears to match the classical findings of a build-up of segregation followed by a stable

    percept. However, we have already noted that segregation can be the initial percept.

    Furthermore, figures 10 and 11 show that the time course of increasing probability of

    perceptual segregation is not identical to the one described as build-up in the classical

    literature of auditory streaming. This is because the duration of the build-up phase has

    been estimated to be in the order of ca. 10 s, whereas for most parts of the parameter

    space, the duration of the first percept exceeds this period. Furthermore, the probability

    of perceiving integrated or segregated organisations does not settle immediately after

    the first phase. Rather, slow changes can be observed, throughout the whole 4-minute

    Perceptual bi-stability in auditory streaming 28

  • Manuscript for submission to Hearing Research 13/06/2008

    stimulus train, depending on the combination of parameters. Three different trends can

    be discerned in the figures below. With parameter combinations regarded to promote

    segregation (short SOA, large f), following a fast overshoot of the probability of segregation, the ratio between segregation and integration declines slowly (see, e.g.,

    figure 10, SOA=100 ms, f=16 or 22ST and figure 11, SOA=75 ms, f=5 or 7ST). With parameter combinations regarded to promote integration, the probability of

    segregation appears to increase slowly throughout the whole duration of the stimulus

    trains (see, e.g., figure 10, SOA=150 or 200 ms, f=4ST and figure 11, f=1ST and any SOA or SOA=150 ms and any f). Finally, with most of those parameter combinations which would fall into the ambiguous region, the initial increase of segregation is

    followed by a fairly stable period, in which the probability of segregation falls mostly

    into a narrow range between 0.4 and 0.6 (see, e.g., figure 10, SOA=200 ms, f=10ST and figure 11, SOA=125 ms, f=5 or 7ST).

    We tested the time course of perceptual organisation within the 4-minute long

    stimulus trains by comparing across three time ranges, selected from the early (20-50 s,

    because there are some combinations of parameters and participants, who had not yet

    given their initial response at shorter latencies; see the range shade blue-grey on Figure

    10), middle (120-150 s) and late (200-230 s) phase of the stimulus trains by conducting

    ANOVAs of the probability of segregation with the structure: Time-range (early,

    middle, late) SOA (4 levels; see Figure 4) f (4 levels; see Figure 4). In experiment 1, increasing SOAs induced a monotonically decreasing probability for perceiving the

    segregated organisation (F[3,42]=30.24, p

  • Manuscript for submission to Hearing Research 13/06/2008

    streaming test sequences. The significant interaction between Time-range and SOA

    (F[6,84]=3.65, p

  • Manuscript for submission to Hearing Research 13/06/2008

    Figure 10. Experiment 1: Group-average time-course of the probability of the segregated

    percept within the stimulus trains as a function of SOA (panels) and f (overplotted, marked with colours). With segregation taken as the value 1 and integration as the value 0, the figure

    shows the probability of reporting segregation at each time point. Note that the blue-grey bars

    indicate the initial period during which not all participants had yet made their initial choice; the

    data during this time is, therefore averaged over only those participants who had reacted by the

    given time. The violet bars mark the periods used for the statistical analysis.

    Perceptual bi-stability in auditory streaming 31

  • Manuscript for submission to Hearing Research 13/06/2008

    Figure 11. Experiment 2: Group-average time-course of the probability of the segregated

    percept within the stimulus trains as a function of SOA (panels) and f (overplotted, marked with colours). See description in the legend of figure 10.

    The fact that switching continues throughout the duration of the whole stimulus

    train, is clearly shown in figures 12 and 13 in which group-averaged phase durations are

    plotted for each condition as a function of time within the stimulus trains. The low

    phase-duration period at the beginning of the stimulus trains (

  • Manuscript for submission to Hearing Research 13/06/2008

    trains). Only significant effects related to the time ranges are reported, because we have

    already tested the effects of SOA and f on phase durations (see the previous section). In experiment 1, phase durations were significantly reduced in the late as compared to

    the early and middle time ranges (F[2,28]=5.85, p

  • Manuscript for submission to Hearing Research 13/06/2008

    between Time-range and f reflect this effect combined with the previously observed effect showing that low fs induce long (integrated) first phases.

    Figure 12. Experiment 1: Time course of group-average phase duration separately for the four

    different SOAs, overplotting the different fs (marked with differently coloured lines). Note that the blue-grey bars indicate the initial period during which not all participants had yet made

    their initial choice; the data during this time is therefore averaged over only those participants,

    who had reacted by the given time. The violet bars mark the periods used for the statistical

    analysis.

    Perceptual bi-stability in auditory streaming 34

  • Manuscript for submission to Hearing Research 13/06/2008

    Fi

    gure 13. Experiment 2: Time course of group-average phase duration separately for the four

    different SOAs, overplotting the different fs. See description in the legend of Figure 12. In summary, we found that for the typical tone sequences used to study auditory

    streaming, 1) integration is not necessarily the first percept and 2) no stable final percept

    emerges. These statements appear to contradict the classical findings. The

    contradictions are, however, explained by the different assumptions and methods

    employed by the present study (together with the few similar previous studies [5, 6, 10,

    11]) and classical explorations of auditory streaming. The assumption of a stable final

    percept led most experimenters in the past to use short (typically

  • Manuscript for submission to Hearing Research 13/06/2008

    interpreted as more and more participants reaching the final percept, our data shows

    that the group-average probability of perceiving the sounds in terms of one or another

    percept is a product of averaging between perceptual states, which switch back and forth

    all the time, but with shorter- and longer-term changes in the statistics of the switching

    behaviour. In fact, our results argue for a different distinction in the temporal dynamics

    of auditory streaming. Instead of build-up and final percept it appears that

    distinguishing between the first perception (first-phase) and subsequent states, as is

    often the case in the analysis of visual bi-stability, may prove a more fruitful description

    of the temporal behaviour of perceptual processes. Results described in this section

    argue that these two periods (and perhaps more, if changes by the end of the 4-minute

    trains are not related to the length of the stimulus blocks) may show characteristically

    different properties and thus may be understood by different theoretical and

    computational models.

    From the above views it follows that an improved description of auditory

    streaming must cover the dynamic properties of this perceptual phenomenon. For

    further insight we next turned to study similarities and differences between auditory

    streaming and visual bi-stability. The following section will test whether auditory

    streaming also shows some important properties of binocular rivalry. The findings will

    be incorporated within our rule-competition theory, which already provides

    explanations to many of the classical (e.g., integration is more often the first percept,

    because building local rules is faster; etc.) and some of the novel findings (switching

    maxima; segregation as the first percept; etc).

    Perceptual bi-stability in auditory streaming 36

  • Manuscript for submission to Hearing Research 13/06/2008

    Effect of rule strength on perceptual organisation

    Visual studies of bi-stability often employ the paradigm of binocular rivalry, in

    which different images are presented to the left and right eye. Participants experience

    perceptual switching between the two images, as one then the other dominates

    perception for some period of time (termed the dominance duration). The well-known

    second proposition of Levelt [23] states that if the contrast in the image presented to one

    eye is increased while the contrast in the image to the other eye is held constant, then

    the dominance durations of the fixed-contrast eye decrease while the dominance

    durations of the variable-contrast eye remain approximately constant. This law was

    thought to hold generally for binocular rivalry. However, it has recently been shown

    that if a wider range of mean contrast levels is used (Levelt employed only relatively

    high contrast levels), then adjusting the contrast in one eye can affect the dominance

    durations both eyes, as illustrated in figure 14 [24].

    Perceptual bi-stability in auditory streaming 37

  • Manuscript for submission to Hearing Research 13/06/2008

    Figure 14. Cartoon illustrating changes in dominance durations with changing image contrast in

    one eye: the validity of Levelt's second proposition [23]; taken from [24]. On the left, a series of

    four plots shows how the dominance durations of the variable (solid line) and fixed (dashed

    line) contrast eyes changes as a function of contrast in the variable-contrast eye. The y axis

    indicates the normalised dominance duration, and the x axis, the log of the contrast in the

    variable contrast eye. As shown, relationships such as that in the upper left plot are consistent

    with Levelts proposition [23], whereas those in the lower left plot, are inconsistent; there

    appears to be a continuum from one extreme to the other (illustrated in the middle two plots).

    The plots are colour coded to map onto the box at the lower right, which indicates the absolute

    contrast levels under which these different relationships were observed. As can be seen, Levelts

    second proposition holds when the contrast level in the fixed eye is high, but not when the

    contrast level in the fixed eye is low.

    If bi-stability in auditory streaming originates from competition between

    competing rules, then it is possible that the role of contrast in binocular rivalry may be

    Perceptual bi-stability in auditory streaming 38

  • Manuscript for submission to Hearing Research 13/06/2008

    similar to that of the strength of competing rules in auditory streaming. That is, higher

    contrast in binocular rivalry would be analogous to having a stronger rule for a

    particular organisation in auditory streaming. We have already suggested that f and t affect local and global rules differently: f affects only the strength of the local rule, whereas t only affects the global rule. However, although SOA also has a small effect on the local rule (the across-stream t decreases with decreasing SOAs, which somewhat strengthens the local rule), we will ignore this factor in the following

    description. As a consequence, by changing f while keeping the SOA constant we manipulate the strength of the local rule while keeping the strength of the global rule

    fixed; and, similarly, changing SOA while keeping f constant manipulates the strength of the global rule while keeping the strength of the local rule (more or less) fixed. As a

    first step, for comparison between binocular rivalry and auditory streaming, figure 15

    was constructed to show the group-averaged durations of the first two perceptual phases

    in experiment 1.3 We chose to analyse the mean duration of the first two perceptual

    phases, because our previous analyses (see above) showed that f and SOA has little effect on phase durations in later time ranges of the 4-minute trains. Mean integrated

    phase durations are plotted with respect to the stimulus parameters as an interpolated

    solid surface, and mean segregated phase durations are plotted as a meshed surface. By

    taking slices through these surfaces at different values of f and SOA, we can compare the effects of rule strength in auditory streaming with the effects of contrast levels in

    vision.

    3 Because experiment 2 covered a much smaller range of parameters in f and SOA than experiment 1, only results of experiment 1 were used in this analysis.

    Perceptual bi-stability in auditory streaming 39

  • Manuscript for submission to Hearing Research 13/06/2008

    Figure 15. Interpolated surfaces showing the group-averaged durations of the first two

    perceptual phases obtained in experiment 1 as a function of f and SOA; integrated (solid surface) and segregated (mesh surface) phases. Colours code phase durations (redundant with

    the vertical axis).

    Figure 16 shows that, if we interpret auditory streaming as a competition

    between alternative sequential associations (local versus global rules), then we can find

    similar relationships between the dominance durations of integration and segregation to

    those reported for eye dominance in binocular rivalry [24]. Levelts second proposition

    is consistent with the changes in phase durations where the associations are strong. For

    instance, at 4-ST f (strong local rule), increasing the global-rule strength by decreasing the SOA reduces the mean integrated phase durations (the fixed-rule-strength

    percept), but has little effect on the segregated phase durations (the

    variable-rule-strength percept; compare the top left panel of Figure 16 with panel A

    of figure 14). In contrast, at 22-ST f (weak local rule), increasing the global-rule strength, by decreasing the SOA substantially increases the mean segregated phase

    Perceptual bi-stability in auditory streaming 40

  • Manuscript for submission to Hearing Research 13/06/2008

    durations (variable-rule-strength percept) in accordance with the similar binocular

    rivalry effect (compare bottom left panel of Figure 16 with panel D of figure 14). At

    fs between these two extremes, results compatible with the intermediate relationships reported by Brascamp et al. are found [24].

    Similarly, for the 100-ms SOA conditions (strong global rule), increasing the

    local-rule strength by decreasing f decreases the mean segregated phase durations (the fixed-rule-strength percept), but does not much affect the mean integrated phase

    durations (the variable-rule-strength percept; compare the top right panel of Figure 16

    with panel A of figure 14). In contrast, at 250-ms SOA (weak global rule), increasing

    the local-rule strength by decreasing f substantially increases the mean integrated phase durations (variable-rule percept) in accordance with the similar binocular

    rivalry effect (compare bottom right panel of Figure 16 with panel D of figure 14).

    Figure 16. Local- and global-rule-strength effects, colour-coded as in Figure 14: analysed for f (local rule; left column) and t (global rule; right column). Phase durations for the

    Perceptual bi-stability in auditory streaming 41

  • Manuscript for submission to Hearing Research 13/06/2008

    fixed-strength rule are shown with dashed, those for the variable-strength rule with

    continuous lines. Note that the rule strength progressively decreases from the top row to the

    bottom row, and within each plot, rule strength decreases from left to right; i.e. the opposite

    direction than in figure 14.

    Thus another analogy can be shown between visual bi-stability and auditory

    streaming. Specifically, assuming an analogy between the strength of local- vs.

    global-rule representations in auditory streaming and contrast levels of images presented

    to the two eyes in the binocular rivalry situation, parameters affecting the "strength" of

    these representations had similar effects on the dominance phase durations. Together

    with the similarities mentioned in the introduction, this analogy suggests that the

    principles of computational models of binocular rivalry may be applicable in modelling

    auditory streaming. However, there is a caveat to this claim. Here we considered only

    the first two perceptual phases, since we showed earlier that stimulus parameters only

    have significant effects on phase durations during the first perceptual phase. In contrast,

    the findings in binocular rivalry [24] may relate to later phases of perceptual switching,

    as Brascamp et al excluded the first minute of participant responses from their analysis.

    Thus the analogy between auditory streaming and binocular rivalry may not extend to

    the distinction between initial and subsequent phases in auditory streaming. This,

    however, does not reduce the value of the analogy in helping to identify generic

    modelling principles applicable to both modalities.

    Transition Phases: Distribution of both-percept responses

    The distribution of both-percept responses (i.e., when participants report

    perceiving repeating integrated and segregated percepts concurrently) is not uniform

    across conditions in experiment 1. (Again, we only analyze the results of experiment 1

    see Footnote 3.) There is a clear tendency to report 'both' more often along a ridge in the

    Perceptual bi-stability in auditory streaming 42

  • Manuscript for submission to Hearing Research 13/06/2008

    parameter space roughly corresponding to the region where local and global rules are

    balanced. An ANOVA of the proportion of both responses in the subsequent phases

    with the structure of SOA (4 levels; see Figure 4) f (4 levels; see Figure 4) showed only a significant interaction between the two factors (F[9,126]=3.06, p

  • Manuscript for submission to Hearing Research 13/06/2008

    Figure 17. Distribution of mean steady state both-percept responses as a proportion of post-

    first-phase stimulus duration in experiment 1. Note to avoid problems arising from accidental

    simultaneous button presses we included in this analysis only phases with duration exceeding

    300ms.

    Both types of patterns being perceived at the same time contradicts our intuitive

    assumption of the exclusivity of two competing perceptual organisations in the

    alternating two-tone sequence, as well as the findings of Pressnitzer and Hup [5, 6]

    and much of the visual literature on bi-stability. However, it has been shown in vision

    that contrary to the usual assumptions of exclusivity [12], periods of 'transition' during

    which neither eye is clearly dominant can be of rather long duration; comparable with

    'eye' dominance durations [24]. This is consistent with the durations of the both-

    percept responses in our experiment 1. Furthermore, in another striking analogy

    between visual and auditory bi-stability, the distribution of mean both-percept

    durations with respect to rule strength resembles the distribution of 'transition' durations

    in the binocular rivalry paradigm [24]; as illustrated in figure 18.

    Perceptual bi-stability in auditory streaming 44

  • Manuscript for submission to Hearing Research 13/06/2008

    MaxMax

    MinMin MaxMax

    MinMin

    Figure 18. Transition (both-percept) phases in binocular rivalry and auditory streaming. Left:

    Relationship between transition durations and image contrast (governing the "strength" of the

    alternative eye activations) in binocular rivalry; where Dep. refers to the dominant percept

    prior to the transition, and Dest. to the percept following the transition; taken from [24].

    Right: Relationship between both-percept durations and rule strength in experiment 1. Dep.

    refers to the rule strength corresponding to the perceptual organisation prior to the 'both'

    response, and Dest. to the rule strength corresponding to the perceptual organisation

    following the 'both' response. For the perceptual state of 'integration', the rule strength was

    considered to correspond to f; and for 'segregation' the rule strength was considered to correspond to SOA.

    It is not clear at this stage what gives rise to the perception of both patterns

    simultaneously, and intuitively we think of them as mutually exclusive. One possibility

    is that there is a very rapid switching between the two alternatives but that conscious

    perception is more sluggish, and unable to follow this rapid switching. Hence there is a

    sort of stroboscopic effect in which both perceptual organisations are perceived as being

    present although there is actually switching between them. An alternative explanation

    arises from the notion of competition at several different levels in the perceptual

    hierarchy. Generally, recurrent top-down connections tend to ensure that there is

    consistency across the whole system, but there may be instances in which the top-down

    signals cannot overcome the local competitive interactions; in this case, incompatible

    Perceptual bi-stability in auditory streaming 45

  • Manuscript for submission to Hearing Research 13/06/2008

    winners could emerge at different levels, and this inconsistency may take some time to

    resolve before a consistent organisation emerges throughout the hierarchy. In our

    computational modelling studies, we have observed both of these effects and we are

    planning to formulate a follow-up experiment to distinguish between these two

    possibilities.

    General Discussion

    Interesting theoretical insights into the processes underlying streaming can be

    derived from the experiments reported above. In this section, we present a conceptual

    framework for auditory streaming which accounts for previous results as well as our

    new findings. Here follows a summary of the novel observations obtained in the current

    experiments:

    1) Switching between integrated and segregated percepts continues throughout

    the stimulus sequences with any combination of f and t (SOA) and in all participants. Switching is not a product of learning or fatigue. Rather, there appears to be no final

    stable percept in auditory streaming.

    2) Participants report simultaneous perception of integrated and segregated tone

    patterns (both responses); that is, segregated and integrated percepts are not exclusive.

    3) With medium-to-large fs and very short ts (SOAs) the first reported percept is segregation; that is, integration is not always the first percept. Hence,

    segregation is not a result of evidence gathering.

    4) When f is low/medium and t (SOA) is short the first perceptual phase is typically much shorter than first phases with other combinations of the two parameters.

    The overall amount of switching is also highest in this region. The effect is not fully

    Perceptual bi-stability in auditory streaming 46

  • Manuscript for submission to Hearing Research 13/06/2008

    monotonic, first percepts become somewhat longer and the overall amount of switching

    increases with very short SOAs and at very low fs; the minimum of the first-phase duration distribution and the maximum amount of switching appearing at 100-ms SOA

    within the 4-10-ST f range in the current experiments.

    5) With time, the ratio between integrated and segregated percepts tends to

    become balanced irrespective of the combination of f and t (SOA).

    6) f and t (SOA) affect the duration of the first perceptual phase in response to the tone sequences similarly to the way in which visual contrast affects dominance

    durations in binocular rivalry. It appears as if varying f affects the integrated and varying t (SOA) affects the segregated sound organisation similarly to the way in which 'contrast' is assumed to control the competitiveness of an image in the given eye.

    This result supports the notion that f controls the competitiveness (strength) of the integrated, whereas t (SOA) controls that of the segregated sound organisation.

    7) The distribution of the duration of both-percept responses as a function of

    the strength of the preceding and the following organisation, assuming the above

    described relationship between f and the integrated and t (SOA) and the segregated organisation, is similar to the distribution of the duration of transitional phases in

    binocular rivalry.

    The observed qualitative differences between the first and subsequent perceptual

    phases have strong implications for theoretical accounts of streaming, which must

    explain both the perceptual bias of the first phase, and the continuing switching

    behaviour. It is important to note that our distinction here between first and subsequent

    perceptual phases is not the same as the usual distinction between build-up and final

    Perceptual bi-stability in auditory streaming 47

  • Manuscript for submission to Hearing Research 13/06/2008

    percept [4], because the first phase is generally longer than the expected build-up

    duration, and we found no evidence for a stable final percept.

    Since participants attended the sounds without attempting to hear them

    according to one or the other organisation, the current results cannot tell us whether

    switching occurs in an automatic fashion or is the product of attention. Previous

    electrophysiological studies [10, 25] obtained correlates of both bottom-up and

    attentional processes when participants were exposed to tone sequences similar to those

    used in the current experiments. There is some indication that the initial phase of

    streaming may be more sensitive to attentional effects [25-27], whereas maintaining

    sound organisations does not require focused attention [28]. However, there is evidence

    that the segregated organisation can develop [29] without attention, although this may

    depend on the actual stimulus configuration. On the other hand, these

    electrophysiological studies did not measure the perception of the critical sounds

    on-line, thus possibly averaging together trials with different perceptual organisations.

    Therefore, no strong conclusion can be drawn regarding the relationship between

    attention and the observed differences between the first and subsequent perceptual

    phases.

    In the following account, we shall distinguish between the first and subsequent

    perceptual phases. We shall attempt to describe them in terms of alternative rule

    representations which vie for dominance, and test the viability of this explanation in the

    face of our novel findings. As will become clear, a useful way to think about the two

    perceptual stages may be as two distinct processes, formation of sequential associations

    and coexistence between alternative interpretations.

    Perceptual bi-stability in auditory streaming 48

  • Manuscript for submission to Hearing Research 13/06/2008

    First phase: rule formation

    There are two aspects to consider: the initial choice of perceptual organisation,

    and the duration of the first perceptual phase. We consider these below separately for

    the case where the first phase is the integrated state and that where it is the segregated

    state.

    Integration is most commonly found for the parameter ranges used, except at

    very short SOAs and medium-large fs, presumably since sequential associations between temporally adjacent events are more easily formed in the brain; i.e., there is a

    bias towards forming AB and BA links first, and these local rules support perceptual integration. A consideration of plasticity mechanisms (Hebbian learning)

    suggests that the formation of associations is perhaps only possible between consecutive

    events; i.e. synaptic plasticity does not support skipping over intervening events. This

    principle has two consequences for the effects of f on the formation of sequential associations. Firstly, in a tonotopically organised system, the amount of overlap

    between the neural activity in response to the A and B tones is governed by the

    frequency difference between the two: with small fs, the overlap in activity and thus the potential for forming these associations is large, whereas with large fs it is small. Thus the main effect of f is on the formation of local rules. Secondly, f may also affect the speed with which global-rule representations can form. This is because in

    order to form higher order sequential associations ('global rules') it is necessary for

    populations which respond selectively to one or other of the tones to emerge (i.e. a

    process of clustering), before the within-stream sequential associations can be learned.

    Only after such populations are formed can the within-stream events become

    consecutive, at least as viewed from within the population cluster. If we consider a fixed

    SOA, then clearly the difficulty of forming separate clusters will be determined largely

    Perceptual bi-stability in auditory streaming 49

  • Manuscript for submission to Hearing Research 13/06/2008

    by f and the extent to which the neural activity elicited by the two tones overlaps, which in turn will determine the duration before the system can discover the

    segregated organisation. This is well illustrated in Figure 16 (right panels), in which f has a clear effect on the duration of segregation for fixed SOAs. Thus the two effects of

    f go hand in hand: smaller fs improve the formation of local-rule links and delay the formation of global-rule links; larger fs weaken the formation of local-rule links and allow faster formation of global-rule links.

    The above description is consistent with the finding that the fission boundary is

    much larger than the smallest detectable frequency difference [30], since segregation

    requires not only a detectable difference, but also a clear separation of activity.

    Similarly, the amplitude-modulation 'fission' boundary was shown to be much larger

    than the smallest detectable amplitude-modulation difference [31]. Although initially

    puzzling, the finding of a very stable fission boundary, measured in terms of the

    minimum f necessary for segregation [30], can also be reconciled with perceptual switching; while the formation of separate clusters of activity may depend on some

    minimum featural difference, our results show that subsequent stochastic switching is

    largely independent of stimulus features. The claims of various groups, e.g. [32-34] that

    adaptation in primary auditory cortex is a neural correlate of streaming is also consistent

    with these ideas, since adaptation is likely to be an important aspect of the process of

    clustering. Therefore, it will be linked to the duration of the first phase, and hence the

    rate at which the probability of reporting streaming develops [32-34].

    Integration may also be reported initially as a result of a qualitatively different

    mechanism when f is small and SOA is short. When two sounds are presented within a temporal window of less than about 200 ms duration, then they tend to be processed as a

    Perceptual bi-stability in auditory streaming 50

  • Manuscript for submission to Hearing Research 13/06/2008

    single event [19], at least within a limited range of frequency differences [20]. In our

    experiments this occurs primarily for SOAs 100 ms, and f < 4ST. In these cases the integrated pattern first perceived is likely to be an ABA chunk. The auditory system

    then needs to pull this chunk apart before it can discover the alternative within-stream

    sequential associations. Nevertheless, perceptual switching is found here too.

    Segregation as the first percept at short SOAs and larger fs can also be understood in terms of the tendency to chunk acoustic stimuli into single events if they

    occur within approximately 200ms of each other [19]. When f exceeds the proposed spectral window of integration [20], consecutive identical tones may be directly linked

    (i.e., with the B tone excluded, the integration window contains A-A). A direct

    consequence of this explanation is that first-phase segregated percepts must be based on

    the more frequent stimulus (A-A, they cannot be B---B); a prediction which we plan to

    test experimentally. Neurophysiological studies suggest a reason for this phenomenon

    since it has been shown that responses in cortex are greatly reduced when events follow

    each other at a rate faster than about 10 Hz [35]. Therefore, at fast presentation rates, it

    is possible that due to insufficient recovery time some populations only respond to one

    or other of the tones from the outset. Hence the higher order sequential associations can

    be discovered immediately, resulting in segregation becoming the first percept. In our

    data, the minimum f for reporting first phase segregation is approximately 5 semitones at 125-ms SOA, which is not inconsistent with the spectrotemporal window of

    integration reported in [20]. Thus, similarly to f, SOA also has two different effects on the formation of rule representations. Firstly it governs the formation of associations

    between consecutive tones, because for such associations the neural after-effects of the

    first tone must still be present at the time the next tone arrives. This primarily affects the

    formation of within-stream associations since twithin > tacross. Secondly, with short

    Perceptual bi-stability in auditory streaming 51

  • Manuscript for submission to Hearing Research 13/06/2008

    SOAs the recovery of the neuronal elements in higher (cortical) levels of the auditory

    system may become an issue, and this primarily affects the local rule. Again, the two

    effects go hand in hand: shortening the SOA strengthens the associations necessary for

    representing the global rule more than the local rule (see Figure 16, left panel dF =

    22ST), then, especially at small fs, shortening the SOA weakens the local-rule associations more than the global-rule ones (see Figure 16, left panel dF = 4ST).

    One argument against this description of the SOA effects is that streaming can

    be induced by the presenting a sequence of only one of the tones prior to the onset of

    the ABA_ABA pattern [36]. Based on this finding Bregman et al. [3] argue against

    SOA being an important determinant of streaming, suggesting instead that the

    within-stream ISI (t) is the major factor. However, within the above-described framework, it is easy to understand how the induction sequence used by Rogers and

    Bregman [36] led to increased reports of streaming at the onset of the ABA pattern. The

    induction sequence promoted the establishment of sequential associations between tones

    in one of the streams, thereby biasing the system towards activity associated with

    single-frequency streams to be perceived first. In this case, the formation phase would

    involve the discovery of linkages between temporally adjacent events, the local rules.

    Thus, the effect of the induction sequence is similar to that of very low (

  • Manuscript for submission to Hearing Research 13/06/2008

    duration of this phase depends on the time taken to discover and represent alternative

    sequential associations. Consistent with findings in vision that local competition is

    necessary in order to tr