Denham Et-Al HEARES Submitted
description
Transcript of Denham Et-Al HEARES Submitted
-
Manuscript for submission to Hearing Research 13/06/2008
Perceptual bi-stability in auditory streaming: How much do stimulus features
matter?
Susan L. Denham1*, Kinga Gyimesi2,3, Gbor Stefanics2,4, and Istvn Winkler2,5
1Centre for Theoretical and Computational Neuroscience, University of Plymouth,
Drake Circus, Plymouth PL4 8AA, UK
2Department of General Psychology, Institute for Psychology, Hungarian Academy of
Sciences, 1394 Budapest, P.O. Box 398, Hungary
3Department of Cognitive Science, Budapest University of Technology and Economics,
1111 Budapest, Sztoczek u. 2, Hungary
4Department of Experimental Zoology and Neurobiology, University of Pcs, 7624
Pcs, Ifjsg st. 6, Hungary
5Institute of Psychology, University of Szeged, 6722 Szeged, Petfi S. sgt. 30-34,
Hungary
Abstract
The auditory two-tone streaming paradigm has been used extensively to study
processing mechanisms that underlie the decomposition of the composite auditory input
into coherent sound sequences, and hence the perception of auditory objects. Here we
present new results from a study of bi-stability in auditory streaming. Using relatively
long (4 minute) sequences, we show that there are two fundamentally different phases in
* Corresponding author; [email protected]; tel +44 1752 232610; fax +44 1752
233349
Perceptual bi-stability in auditory streaming 1
ManuscriptClick here to view linked References
-
Manuscript for submission to Hearing Research 13/06/2008
this process. Listeners hold their first percept of the sound sequence for a relatively long
period (first phase), after which perception stochastically switches between two or more
alternative sound organisations, each held on average for a much shorter duration
(second phase). The two perceptual phases also differ in that stimulus parameters
influence perceptual behaviour to a far greater degree in the first than in the second
phase, and during the second but not the first phase, there are significant periods when
more than one organisation can be perceived simultaneously. Furthermore, our analysis
reveals deep parallels between the dynamics of perceptual organisation in auditory
streaming and binocular rivalry. We propose an account of auditory streaming in terms
of rivalry between competing temporal associations. Based on the results of our
experiments, we suggest that in the first perceptual phase (formation of associations),
alternative interpretations of the auditory input are formed. In the second phase
(coexistence of interpretations), perception stochastically switches between the
alternatives, thus maintaining perceptual flexibility.
Keywords
auditory streaming, bi-stability, perceptual switching, auditory scene analysis
Introduction
In order to make sense of real-world environments it is necessary to identify,
extract and organise relevant information from the wealth of incoming sensory data.
The potential amount of information far exceeds the processing capacity of any living
system. However, biological organisms are not idle perceivers; rather they seek out
information about the world and the objects in it [1]. The challenge is to create
appropriate object representations on the fly and to continually modify them in order to
Perceptual bi-stability in auditory streaming 2
-
Manuscript for submission to Hearing Research 13/06/2008
maintain as accurate models as possible. Thus the process of perceptual organisation is
fundamental to effective perception.
An important problem for auditory perception is that sound sources may emit
discontinuous sequences of sounds, so some means for forming associations between
discrete events, through the detection of regularities and the maintenance of temporally
persistent representations is required. The formation of sequential associations has been
extensively studied by means of the paradigm of auditory streaming. In the typical
streaming experiment [2], a tone sequence of the structure ABA-ABA-ABA- is
presented at a fast stimulus rate (A and B denote tones differing from each other in
frequency; the - sign stands for a silent interval equal to the time interval between the
onsets of successive tones; i.e. the stimulus onset asynchrony (SOA); see figures 1 and
2). When all sounds are grouped together into a single coherent stream, a galloping
rhythm is typically heard. By increasing the frequency separation (f) between the A and B tones and/or by shortening the interval between subsequent same-frequency tones
(the within-stream inter-tone or offset-to-onset interval, t), perception of the sound sequence changes to that of two homogeneous isochronous streams; a faster paced one
consisting of A tones and a slower paced one consisting of Bs [2].
f
t
SOA
B
A
Sounds Perception
Figure 1. Cartoon of the auditory streaming paradigm. A sequence of low (A) and high (B)
tones presented repeatedly in ABA- groups can be perceived as a single coherent stream with a
galloping rhythm (upper right), or as two segregated streams (lower right), each with an
isochronous rhythm
Perceptual bi-stability in auditory streaming 3
-
Manuscript for submission to Hearing Research 13/06/2008
In general, there is a trade-off between f and t in determining the dominant perceptual organisation. Asking participants whether they perceived the galloping tone
pattern in sequences of the type shown on Figure 1, van Noorden [2] identified three
separate regions of the f-t (or f-SOA, see Figure 2)1 space with different characteristic perceptual organisations (see Figure 3). With very low f's, participants could always hear the galloping rhythm showing that they could organize all tones into
a single sound stream. With slightly larger f's and/or longer t's, participants were able to hear either two separate sound streams or a single integrated stream and to alter their
organisational bias between the two percepts at will. Further increasing f and decreasing t resulted in participants not being able to hear the galloping rhythm, which suggests that perception of two streams became the dominant sound organisation.
a
c d
Time
Frequency
b
Figure 2. The four different time intervals that can be distinguished in the two tone sequences
used in our streaming experiments, and considered by Bregman and colleagues; a) SOAwithin, b)
ISIwithin, c) ISIacross, and d) SOAacross. Diagram derived from [3].
Based on the results obtained in the classical studies of auditory streaming (for a
review, see [4]), it was assumed that following an initial short build-up period
perception of tone sequences such as the one shown in Figure 1 becomes stable, when
1 In his experiments, van Noorden did not distinguish between t and SOA [2. van Noorden, L.P.A.S., Temporal coherence in the perception of tone sequences, in Institute for Perception research. 1975: Eindhoven.]; however, recently Bregman and colleagues [3. Bregman, A.S., et al., Effects of time intervals and tone durations on auditory stream segregation. Percept Psychophys, 2000. 62(3): p. 626-36.], in a study designed to investigate the influence of the various possible time intervals on stream segregation, found that it is in fact the within-stream offset-to-onset interval (i.e. ISIwithin or t) which is most influential in determining the likelihood of streaming (see, figure 2).
Perceptual bi-stability in auditory streaming 4
-
Manuscript for submission to Hearing Research 13/06/2008
parameters fall into either the segregated or the integrated area of the f-t space (see Figure 3). However, recent evidence [5, 6] suggests that the idea that the auditory
system fixes on a single unchanging dominant percept is to some extent an artefact of
the experimental procedures and analysis methods used in classical studies of auditory
streaming. These obscure an important detail about streaming, namely, that it fluctuates
randomly between the two possible organisations; a phenomenon known as perceptual
bi-stability.2 Bi-stability in visual perception (for a review, see [7]) has been studied
extensively since it was thought to offer the possibility of identifying the neural
correlates of visual awareness and ultimately consciousness, by allowing perceptual
changes to be dissociated from changes in the stimulus [8]. The finding of perceptual
bi-stability in auditory streaming is important in that it raises questions about the
commonality of mechanisms underlying perceptual organisation in different sensory
modalities and it further suggests that auditory streaming, far from being a primitive and
automatic process, may be better understood in terms of generic strategies of perceptual
organisation; a notion that has important implications for models of auditory streaming.
2 Actually, even with this simple stimulus configuration, several different sound organisations can be experienced; i.e. multi-stability. For the sake of simplicity we can assume that the various possible perceptual organisations can be sorted into two main categories: integrated and segregated (for a formal definition, see the Design section).
Perceptual bi-stability in auditory streaming 5
-
Manuscript for submission to Hearing Research 13/06/2008
Temporal coherence boundary
boundary
Temporalcoherence
Integrated
Ambiguous
Fission boundary
Fission boundary
Segregated
Figure 3. The dependence of primitive auditory streaming on frequency difference (f) and presentation rate (characterized by the SOA) found in human psychophysical experiments using
alternating pure tones [2, 9]. Stimuli in the region of the parameter space above the 'temporal
coherence boundary' are generally perceived as two segregated streams, and those with
parameters in the region below the 'fission boundary' as a single coherent stream. Those falling
in the ambiguous region can be perceived in either way, and perception can be influenced by
top-down processes [2]. van Noorden actually reported two sets of boundaries, which are
illustrated here for comparison and clarification; the more commonly referenced blue set are
more appropriate for describing perceptual behaviour in response to short sequences, while the
green set are relevant to longer sequences such as the ones we report here. The red dot
indicates the stimulus parameters used by Pressnitzer and Hup [5, 6] the orange dot those of
Winkler et al. [10] and the yellow dots those reported in Denham and Winkler [11] see
descriptions of these experiments in the text.
In the investigation of Winkler et al. [10], a single point in the parameter space
(indicated by the orange dot in Figure 3) was tested (A = 1245 Hz, B = 931 Hz, f = 5 semitones, stimulus onset asynchrony, SOA = 100ms, tone duration = 40 ms in one and
175 ms in another condition, sequences were close to 3 minutes in duration).
Perceptual bi-stability in auditory streaming 6
-
Manuscript for submission to Hearing Research 13/06/2008
Participants were instructed to continuously depress a response key, when they heard
the galloping rhythm and to release the key when they did not. With the short tone
duration, t was 160 ms, the result was a rather ambiguous perception of the tone sequence (galloping was heard on average for 61.8% of the time; SD 22.5); with the
long tone duration, t was 25 ms, the result was predominantly segregated perceptual organisation (10.3% galloping; SD 20.1). Each participant experienced perceptual
switching in both stimulus conditions.
Pressnitzer and Hup [5, 6] tested a similar point of the parameter space
(indicated by the red dot in Figure 3; A = 587 Hz, B = 440 Hz, f = 5 semitones, stimulus onset asynchrony, SOA = 120 ms, tone duration = 120 ms, t = 120 ms, sequences of 4 minutes). These parameters are close to the ambiguous condition of
Winkler et al. [10] and although Pressnitzer and Hup [5, 6] used a slightly different
experimental procedure in which participants reported their perceptions using three
buttons; one for grouping (galloping rhythm), one for splitting (streaming) and the third
for cases when neither galloping nor streaming was heard, their participants also
reported considerable perceptual switching.
In the experiment reported in [11], we were interested to test whether bi-stability
was restricted to the ambiguous region or whether it would be found over a wider range
of the parameter space. We used the same one-button experimental procedure as
Winkler et al. [10] and tested the parameter combinations indicated by the yellow dots
in Figure 3. Our results [11] showed that perceptual bi-stability was present for all
conditions tested. Furthermore, although there was great variability in the mean
switching rate between and within participants, they all showed some degree of
perceptual bi-stability.
Perceptual bi-stability in auditory streaming 7
-
Manuscript for submission to Hearing Research 13/06/2008
In their experiments Pressnitzer and Hup [5, 6] also explored similarities
between perceptual switching behaviour in response to the auditory galloping/streaming
sequence (Figure 1), and a visual stimulus configuration known to be perceptually bi-
stable. The visual stimulus consisted of overlapping drifting gratings, which could be
perceived either as sticking or sliding surfaces; participants typically experience
perceptual switching between these two alternatives. The results indicated that key
characteristics of visual bi-stability are also present in the auditory case. These
characteristics included: exclusivity, the existence of two plausible yet mutually
exclusive alternative interpretations of the sensory input; randomness, stochastic
switching between percepts such that successive dominance durations are uncorrelated;
and inevitability, the finite duration of perceptual dominance; i.e. even when the
intention is to hold onto one interpretation, a switch will always eventually occur [12].
Visual bi-stability can also be induced by showing a different image to each eye;
the well-known phenomenon of binocular rivalry in which participants experience
switches in perceptual awareness between the two images even though both are
continually present (for a recent review, see [13]). On the face of it, finding similarities
in perceptual behaviour in response to auditory tone sequences and binocularly rivalrous
images seems rather unlikely: the sensory modalities are completely different, the tones
are intermittent, while the images can be constant; and the tones are heard by both ears,
while each eye is presented with a different image. However, binocular rivalry is also
known to display all three of the key characteristics of bi-stability listed above. Because
binocular rivalry has been extensively investigated over many years, and is perhaps the
best understood paradigm inducing bi-stable perception, we explored the possibility that
some insights into the processes underlying auditory streaming might be gained from
Perceptual bi-stability in auditory streaming 8
-
Manuscript for submission to Hearing Research 13/06/2008
comparisons between the two phenomena; although we did not necessarily expect to
find a very close correspondence.
Here we report a detailed analysis of perceptual bi-stability observed in auditory
streaming as well as comparisons between auditory streaming and binocular rivalry.
These analyses and comparisons were motivated by our recently suggested
interpretation of the auditory streaming phenomenon [11] (for a detailed description, see
the Distribution of perceptual switching section) and aimed at finding new ways to
describe auditory streaming quantitatively.
Perceptual Experiments
Having found that perceptual bi-stability in auditory streaming exists over a
wide range of the feature space [11], and is not restricted to the ambiguous region [2],
we were interested to characterise the distribution and dynamics of perceptual
switching, since these aspects had not yet been determined for auditory streaming. The
results which we present here were obtained in two further perceptual experiments of
auditory streaming.
Participants
Thirty young healthy volunteers (16 male, 18-26 years of age, average 21.8
years) participated in experiment 1 and 15 (7 male, 21-25 years of age, average 22.2
years) in experiment 2. Participants received modest financial compensation for their
participation. The study was conducted in the sound-attenuated experimental chamber
of the Institute for Psychology, Hungarian Academy of Sciences. It was approved by the
Ethical Committee (institutional review board) of the Institute for Psychology. After the
aims and procedures of the study were explained to them, participants signed an
Perceptual bi-stability in auditory streaming 9
-
Manuscript for submission to Hearing Research 13/06/2008
informed consent form before starting the experiment. Participants were pre-selected on
the basis of the results of clinical audiometry with the criteria that the hearing threshold
between 250 and 6000 Hz should not be higher than 25 dB, and the difference between
the two ears not higher than 15 dB in the same frequency range.
Stimulus paradigm
We decided to focus the first experiment on the medium-to-large f region of the parameter space with medium to moderately long SOAs, to see whether features of
perceptual bi-stability would show differences between the segregated (large f, medium SOA) and the ambiguous (medium f and large f with long SOA) region of the parameter space; the second experiment was then focussed on the region of small to
medium f and short to medium SOA. The experimental conditions used are illustrated in Figure 4 below. Participants were presented with 4-minute long trains of the ABA-
structure, where the A and B were pure tones of 75 ms duration, including 5 ms linear
onset and 5 ms linear offset ramps. In separate trains, f was 4, 10, 16, or 22 semitones (ST) in experiment 1, and 1, 3, 5, or 7 ST in experiment 2; SOA was, 100, 150, 200, or
250 ms (thus t was 125, 225, 325, or 425 ms for the more frequent tones) and 75, 100, 125, or 150 ms (thus t was 75, 125, 175, or 225 ms) for experiments 1 and 2, respectively. Altogether, 4 4 = 16 different types of trains were tested in each
experiment, separately. The frequency of the lower-pitched, more frequent tones (A
on Figure 1) was kept constant at 400 Hz across the different stimulus conditions.
Sounds were generated on an IBM PC computer (MEL 2.0 stimulus presentation
software Psychology Software Tools Inc.), amplified using a custom-made sound
mixer and amplifier, and delivered through Sennheiser HD 430 headphones at a
Perceptual bi-stability in auditory streaming 10
-
Manuscript for submission to Hearing Research 13/06/2008
comfortable 70-dB (SPL) intensity level. The order of the stimulus trains with different
parameters (each train 4 minutes long) was randomized separately for each participant.
Procedure
In the classical studies of auditory streaming, participants were typically asked
to report their perception after the end of each short sound sequence. Thus these
experiments were not designed to test the temporal dynamics of streaming-related
perceptual processes. Furthermore, participants were often asked to attempt to hear the
sound sequences according to one or another pattern. This method was used to find
unambiguous effects of stimulus parameters. The recent studies described in the
Introduction employed on-line measures, asking participants to report their perceptions
as they occurred throughout the presentation of relatively long sound sequences.
However, in all but one [6] of the streaming experiments reported in the literature, it has
been implicitly assumed that only two alternative sound organisations are possible; i.e.
either integrated or segregated, each linked to a specific perceived sound pattern. With
sequences similar to those presented in the current experiments (see Figure 1), the
integrated organisation was expected to result in the perception of the galloping
pattern, whereas the segregated sound organisation was expected to result in the
simultaneous perception of a high and a low tone sequence, both with uniform (but
different) presentation rate. These assumptions were reflected in the response choices
and instructions given to participants. In fact, most previous experiments only asked
participants to report when they experienced a certain pattern of sounds (e.g. the
galloping pattern). It was then assumed that participants, who did not report hearing the
designated pattern experienced the opposite sound organisation (in the example, this
would be the segregated organisation). However, in a pilot study in which we asked
Perceptual bi-stability in auditory streaming 11
-
Manuscript for submission to Hearing Research 13/06/2008
participants to describe their different perceptions in detail, we found that they a) heard
rhythmic patterns different from either one of the expected ones, and b) sometimes
heard simultaneously a pattern that involved both high and low tones and a pattern
involving only high or only low tones.
In order to eliminate possible confusion caused by the perception of rhythms
other than the galloping rhythm the notion of an integrated percept was generalized
and defined for participants as hearing a repeating pattern, which contained both low
and high tones. In turn, the notion of a segregated percept was similarly generalized and
defined for participants as hearing some repeating pattern(s) formed either exclusively
of high or exclusively of low tones, with the possibility that multiple repeating
segregated patterns (i.e., A---A---A and B-B-B) may be perceived concurrently.
Participants were to depress one response key so long as they experienced an integrated
percept and the other key when they experienced a segregated percept. The role of the
two keys was randomly assigned across participants. When participants heard no
repeating tone pattern, they were instructed to release both keys. Participants were asked
to mark their perception throughout the duration of the stimulus sequence and not to
attempt hearing the sound according to one or another perceptual organisation. The
experimenter made sure that participants understood the types of percepts they were
required to report, using both auditory and visual illustrations. Furthermore, in
experiment 1, we then divided our participants into two groups. One group received
instructions implicitly suggesting exclusivity between the segregated and integrated
percepts (as described above); i.e., the instructions were you may either hear a
repeating integrated, or some repeating segregated tone patterns, or no repeating tone
pattern . This set of instructions was similar to that employed by Pressnitzer and
Hup [5]. The other group was explicitly told that it was possible that they may
Perceptual bi-stability in auditory streaming 12
-
Manuscript for submission to Hearing Research 13/06/2008
sometimes hear both types of patterns at the same time; i.e., the instructions were
you may hear a repeating integrated, or some repeating segregated tone patterns,
possibly even both at the same time, or no repeating pattern at all . In the case that
they heard both types of patterns at the same time, they were instructed to keep both
buttons depressed. However, they were also cautioned to be sure to release the button
when they stopped hearing the corresponding pattern. In addition to the instructions,
when analyzing the responses, we discarded all those responses, which we assumed to
represent transitions between two percepts; i.e., all phases with duration shorter than
300 ms. The assumption was that in such cases, participants may simply have been
slightly inaccurate in synchronising their button presses and releases. Because the
results obtained with the two sets of instructions in experiment 1 proved to be very
similar in all regards except for a higher incidence of reporting two simultaneous
percepts in the latter group, for the sake of clarity, here we only report the results
obtained with the instructions explicitly mentioning the possibility of simultaneously
hearing repeating integrated and segregated tone patterns. The group of participants,
who received this set of instructions, included 15 volunteers (6 male, 18-25 years of
age, average 20.9 years). In experiment 2, we only used this set of instructions and all
other procedures were also identical to those of experiment 1.
Participants sat in a comfortable reclining chair in the experimental chamber
throughout the experimental session, holding a response button in each hand. Short 1-3
minute breaks were inserted between consecutive stimulus trains with longer breaks,
when the participant could move about, scheduled just before the start of the
experimental conditions (after explaining and illustrating the possible perceptions) and
after the 8th stimulus train. Further longer breaks were inserted into the session when
necessary. The experiment took ca. two hours altogether. The state of the two response
Perceptual bi-stability in auditory streaming 13
-
Manuscript for submission to Hearing Research 13/06/2008
keys was sampled at 10 Hz (100 ms sampling time) by a NeuroScan Synamps EEG
recording system. Response key states were then extracted from the signals and encoded
separately for each participant and condition for further analysis.
Fi
gure 4. Experimental conditions for experiments 1 (magenta) and 2 (cyan) reported here.
Conditions are numbered separately for experiments 1 and 2 in the following way. Number 1
denotes the shortest SOA and smallest f (100 ms / 4 ST and 75 ms / 1 ST for experiments 1 and 2, respectively). Numbers increase faster through the four different fs (e.g., 2 marks 100 ms / 10 ST and 75 ms / 3 ST for experiments 1 and 2, respectively) and slower for the four
different SOAs (e.g., 5 marks 150 ms / 4 ST and 100 ms / 1 ST for experiments 1 and 2,
respectively).
Results and discussion
Firstly, we note that when participants are exposed to alternating tone sequences
of sufficient duration, then perceptual bi-stability is found over a very wide range of the
parameter space, including conditions where stable organisation would be expected; i.e.
conditions with very large frequency differences and fast presentation rates or small
Perceptual bi-stability in auditory streaming 14
-
Manuscript for submission to Hearing Research 13/06/2008
frequency differences and slow presentation rates. We have found no condition of all
those that we have tested that was stable across all participants, and no participant who
experienced stable perceptual organisation for all conditions. Furthermore, switching
occurs in all phases of the experimental session. The switching results for each
participant, condition, and position of the stimulus train within the experimental session
are illustrated in Figure 5. On average, there were 15.75 switches per condition in
experiment 1, 36.59 in experiment 2. This corresponds, on average, to one switch in
every 15.24 and 6.56 seconds, respectively; showing that perceptual switching occurs
quite often when listening to the tone sequences used to study auditory streaming. We
found no significant effect of the position of the stimulus sequence within the
experimental session for either experiment (F[15,210] = 1.44 and F[15,210] = 0.81, for
experiments 1 and 2, respectively; p>.1, both; one-way dependent ANOVA of the
number of perceptual switches with the factor Train-number [116]). These results
suggest that the observed perceptual switching does not result from learning or fatigue
within the experimental session. The number of switches per train appears to be higher
in experiment 2 than in experiment 1. This may be an effect of the parameters (f and t; the effects of these parameters will be explored below) and/or a difference between the participant groups (the amount of switching varies considerably across participants
see the middle column of Figure 5). The following sections examine perceptual
switching in more detail.
Perceptual bi-stability in auditory streaming 15
-
Manuscript for submission to Hearing Research 13/06/2008
Experiment 1
Experiment 2
Figure 5. Average total number of perceptual switches (red lines) and individual participant data
(black dots) plotted against a) condition, b) participant, and c) position of the 4-minute long
stimulus train within the experimental session. Results from experiment 1 are plotted in the top
row; those from experiment 2 in the bottom row. Condition numbers are defined in Figure 4
separately for experiment 1 and 2. Note that due to the randomized order of the stimulus
conditions, trains with any set of parameters could occur in any position within the experimental
session.
Distribution of perceptual switching
The first question we address is whether perceptual switching occurs uniformly
across the parameter space or whether the degree of perceptual switching is parameter
dependent. One prediction about the distribution of perceptual switching can be made
on the basis of van Noordens perceptual boundaries [2]; the idea being that ambiguity
regarding the appropriate perceptual organisation leads to instability. This interpretation
suggests that perceptual switching will be maximal in the ambiguous region [2].
Perceptual bi-stability in auditory streaming 16
-
Manuscript for submission to Hearing Research 13/06/2008
Another way to understand bi-stability is as a competition between alternative
interpretations of the auditory scene, mediated by competition between different
sequential associations or rules [14] (for a summary, see Figure 6), where:
Local rules [[4, 14]], refer to links between temporally adjacent events; these rules are fully supported by temporal proximity [15] (i.e., they are associations between
temporally adjacent sound events in the sequence, thus linking them is relatively
easy, at least at small and intermediate ts; the links are stronger for small f and weaker for large f (i.e., similar sounds form better groups than dissimilar ones; see grouping by similarity [15]).
Global rules [4, 14], refer to links between non-adjacent events; in our simple sequence for streaming, these sounds are identical in frequency (i.e. A-A or B--
-B..); therefore, temporal separation alone governs the strength of this type of
grouping, which is stronger for small t and weaker for large t. (Note that in our experiments, decreasing the SOA by a given amount decreases t by twice that amount, because tone duration was kept constant.)
Global rulet
Local rule... f (t)
Figure 6. Cartoon indicating the influence of f and t on the most prominent sequential associations which can be made in the galloping auditory streaming paradigm. t is placed in parenthesis, because with short-medium ts (SOAs), the rule-competition account of auditory streaming suggests that the effect of changes in SOA (which determines t in the current experiments) on the formation and representation of the local rule is relatively small. This is
because 1) the sounds to be connected are adjacent and 2) with relatively short SOAs
separating the sounds, the neural after-effects of the first sound are still present when the second
sound arrives.
The local versus global rule interpretation suggests that most switching will be
found for stimulus parameters which maximise competition, i.e. when both local and
global rules are strong. This occurs for small f and small t, because with small f, the
Perceptual bi-stability in auditory streaming 17
-
Manuscript for submission to Hearing Research 13/06/2008
local rule becomes strong, and with small t, the global rule becomes strong. Hence, if this interpretation is correct, we should find most switching in experiment 1 at f = 4ST, SOA = 100ms. Conversely least competition should occur when the two rules are
not well balanced, and one or the other wins the competition very easily. In experiment
1 we expect to find least switching for the 4ST, 250ms and 22ST, 100ms conditions.
The conditions where the two rules are approximately equally matched but are both
relatively weak would result an intermediate amount of switching.
In order to distinguish these alternative interpretations (i.e., ambiguity vs.
rule-competition) we examined the total switching in each condition. The results below
support the hypothesis of competing sequential associations; the mean switching rate
peaks along a ridge where local and global rules may be considered to be roughly
balanced, and in experiment 1, it is highest for the condition with smallest f and smallest t (see Figure 7, left panel). From this analysis, it is clear that the region of maximum switching does not coincide with the ambiguous region of van Noorden [2].
We note that the ridge of maximum switching appears to run roughly parallel to the
temporal coherence boundary.
Figure 7. Group-mean distribution of perceptual switching in experiment 1 (left panel) and
Perceptual bi-stability in auditory streaming 18
-
Manuscript for submission to Hearing Research 13/06/2008
experiment 2 (right panel). The colour scale indicates the mean number of switches across
participants accumulated throughout the tone trains, separately for each condition. Note that the
coloured surface is interpolated between the discrete experimental data points indicated by the
green dots. For clarity, the x and y axes of the two panels are differently scaled. See figure 4 for
the relation between the parameters used in experiments 1 and 2.
The results of experiment 1 (Figure 7, left panel) suggest that the distribution of
perceptual switching in auditory streaming is broadly consistent with competition
between alternative sequential associations, with stronger competition and more
switching where these associations are strongest. However, although the relationship
between perceptual switching and stimulus parameters generally supports the rule
competition hypothesis, there is a clear qualification evident in the results of
experiment 2 (Figure 7, right panel); i.e. switching rates do not increase indefinitely
with decreasing f and t. There is a non-monotonic relationship between the mean number of switches and f and t, with the maximum switching found in the region of f = 4ST and SOA = 125ms (t = 175 ms). Although the increase in switching with increasing rule strength is intuitively easy to understand, the non-monotonic relationship
between rule strength and switching requires further consideration.
The fall-off in switching rate in the region of very small f and t suggests that different factors, which become stronger in this region of the parameter space, also
affect switching. Due to the uniform 75-ms stimulus duration used in our experiments,
with SOA < 100ms there is no (or almost no) silent gap between successive A and B
tones. Thus it is possible that for small fs, triplets of three successive tones (ABA) may form a unitary event and, therefore, for segregation to occur, the system has to first
extract the components from the composite before other sequential associations can be
established. This notion is supported by the literature on temporal integration, showing
that auditory input within 150-200 ms is integrated into a single unit and processed in
Perceptual bi-stability in auditory streaming 19
-
Manuscript for submission to Hearing Research 13/06/2008
many ways differently from successive sounds exceeding this period (e.g., masking,
loudness summation, detection of omissions and successive deviations [16-18]). Thus in
the case of short ts, building the global rule suffers and, as a consequence, competition between the two rules is less balanced. Therefore, the amount of switching
decreases compared with the more balanced cases.
In contrast, with large fs and short SOAs (SOA < 100ms; t < 125 ms), associations between successive identical tones may be formed directly and, perhaps,
even before associations between the tones with different frequencies. If this were the
case, one should expect that, contrary to the common assumption that integration is
always the first perceptual state [4], segregation should be reported first for short SOA
and large f conditions. Our results (see next section) confirm this expectation. One possible explanation for the immediate dominance of segregation at small t and large f is that large fs impose relatively large spatial distance between stimulus-driven neural activity in the tonotopically organized part of the afferent auditory system, thus
weakening and delaying interactions between the activities associated with the two
different tones. At the same time, short ts allow the after-effect of the more frequent tones to survive till the arrival of the next identical tone. This may enable easy direct
linking of successive identical tones. In accordance with this account, dominance of
segregation over temporal integration at large fs has been previously demonstrated [19-21], even within the temporal integration period.
Differences between switching behaviour in the first and subsequent phases
It is widely assumed that auditory streaming has a strong initial bias towards
integration. This has led to the suggestion that auditory streaming can be interpreted as a
process whereby the auditory system 'accumulates evidence' before making a final
Perceptual bi-stability in auditory streaming 20
-
Manuscript for submission to Hearing Research 13/06/2008
organisational decision [4]. That is, segregation can only emerge after a gradual 'build-
up' process [22]. In this section we show that 1) it is not always the case that
participants report integration first and 2) there is no stable final sound organisation,
rather switching between alternative organisations continues throughout the stimulation.
Figure 8 (top panels) show, separately for experiment 1 and 2, the mean first
percept (termed first phase) durations averaged across all participants. The group-
averaged value was calculated by treating integrated phase durations as positive and
segregated phase durations as negative. Therefore, the colour in the diagram shows the
overall group tendency towards one or the other first reported percept. It is clear from
the figure that for small fs, integration tends to be the first percept reported, whereas for trains with parameters falling into the larger-f and short-SOA region, most participants initially linked same-frequency tones (A-A or B---B), and thus perceived a
segregated percept first (for a possible explanation, see the previous section).
Perceptual bi-stability in auditory streaming 21
-
Manuscript for submission to Hearing Research 13/06/2008
Mean of All Subsequent Phases
First Phase
Mean of All Subsequent Phases
First Phase
Figure 8. Group-mean signed durations (in seconds) of the first (top) and the mean of all
subsequent phases (bottom) averaged across all participants for experiment 1 (left) and
experiment 2 (right). Segregated phases were assigned negative and integrated phases positive
values. Note that the colour scale is the same for all images as indicated by the bar on the right,
and that the coloured surface is interpolated between the discrete experimental data points
indicated by the green dots. For clarity, the x and y axes of the two panels are differently scaled.
See figure 4 for the relation between the parameters used in experiments 1 and 2.
A comparison between the distribution of the signed durations obtained in the
first and subsequent phases (figure 8 bottom panels) clearly shows many instances of
first-phase segregation and also that, as expected, segregation dominates over a wider
range of stimulus values after the first phase. For experiment 1, an ANOVA test of the
phase durations with the structure Phase (first vs. subsequent) SOA (4 levels, see
Design and figure 4) f (4 levels, see Design and figure 4) showed significant main effects of all three factors (Phase: F[1,14]=15.04, p
-
Manuscript for submission to Hearing Research 13/06/2008
F[3,42]=30.11, p
-
Manuscript for submission to Hearing Research 13/06/2008
The ANOVA of the signed phase durations for experiment 2, showed a very
similar set of results. All three main effects were significant (Phase: F[1,12]=18.64,
p
-
Manuscript for submission to Hearing Research 13/06/2008
integrated percept is more common over a wider range of parameters, segregation
catches up with it later on.
Figure 8 and the related statistical analysis of the signed phase durations clearly
show differences in the distribution of integration and segregation and the effects of
SOA and f between the first and subsequent perceptual phases. However, as was noted, averaging signed phase durations does not allow one to draw conclusions
regarding the length of phase durations. These are illustrated in Figure 9 below,
separately for the first and subsequent phases.
First Phase
Mean of All Subsequent Phases
First Phase
Mean of All Subsequent Phases
Figure 9. Group-mean durations (in seconds) of the first (top) and the mean of all subsequent
phases (bottom) averaged across all participants for experiment 1 (left) and experiment 2 (right).
Note that the colour scale is the same for all images as indicated by the bar on the right, and that
the coloured surface is interpolated between the discrete experimental data points indicated by
the green dots. For clarity, the x and y axes of the two panels are differently scaled. See figure 4
for the relation between the parameters used in experiments 1 and 2.
Perceptual bi-stability in auditory streaming 25
-
Manuscript for submission to Hearing Research 13/06/2008
The images show 1) a clear overall decrease of durations from the first to
subsequent phases, as well as visible effects of t and f on phase durations in the first but not on subsequent phases.
In experiment 1, an ANOVA test of the absolute phase durations (segregated
and integrated phases pooled together) with factors of Phase (first vs subsequent)
SOA (4 levels, see Design and figure 4) f (4 levels, see Design and figure 4) showed a main effect of Phase (F[1,14]=46.11, p
-
Manuscript for submission to Hearing Research 13/06/2008
durations only characterized the first phase, whereas subsequent phases showed a
largely uniform distribution of phase durations. No other main effect or interaction
reached significance.
In experiment 2, an ANOVA of the same structure as above yielded significant
main effects for Phase (F[1,14]=21.34, p
-
Manuscript for submission to Hearing Research 13/06/2008
affect the duration of the first phase only (see the almost homogeneous distribution of
phase durations in the second phase Figure 9 bottom). The analysis of the signed
phase durations showed that small fs promote integration as the first percept, whereas short ts promote segregation as the first percept. This result together with the pattern of interaction between Phase and f and/or SOA for absolute phase durations suggest that first-phase durations are shortest when competition between the two rules is the
highest (sort SOA, small f). More or less uniformly distributed phase durations in subsequent phases leads to more overall switching with shorter first-phase durations,
because a short first phase leaves more time for switching in subsequent phases. This
explains the pattern observed in the Distribution of perceptual switching section and
qualifies the effects and explanations described in that section as referring to the first
phase: When both rules are strong they can be discovered fast and so switching between
them starts early.
Time course of perception during the stimulus trains
Our finding that segregation becomes more likely later during the stimulus trains
appears to match the classical findings of a build-up of segregation followed by a stable
percept. However, we have already noted that segregation can be the initial percept.
Furthermore, figures 10 and 11 show that the time course of increasing probability of
perceptual segregation is not identical to the one described as build-up in the classical
literature of auditory streaming. This is because the duration of the build-up phase has
been estimated to be in the order of ca. 10 s, whereas for most parts of the parameter
space, the duration of the first percept exceeds this period. Furthermore, the probability
of perceiving integrated or segregated organisations does not settle immediately after
the first phase. Rather, slow changes can be observed, throughout the whole 4-minute
Perceptual bi-stability in auditory streaming 28
-
Manuscript for submission to Hearing Research 13/06/2008
stimulus train, depending on the combination of parameters. Three different trends can
be discerned in the figures below. With parameter combinations regarded to promote
segregation (short SOA, large f), following a fast overshoot of the probability of segregation, the ratio between segregation and integration declines slowly (see, e.g.,
figure 10, SOA=100 ms, f=16 or 22ST and figure 11, SOA=75 ms, f=5 or 7ST). With parameter combinations regarded to promote integration, the probability of
segregation appears to increase slowly throughout the whole duration of the stimulus
trains (see, e.g., figure 10, SOA=150 or 200 ms, f=4ST and figure 11, f=1ST and any SOA or SOA=150 ms and any f). Finally, with most of those parameter combinations which would fall into the ambiguous region, the initial increase of segregation is
followed by a fairly stable period, in which the probability of segregation falls mostly
into a narrow range between 0.4 and 0.6 (see, e.g., figure 10, SOA=200 ms, f=10ST and figure 11, SOA=125 ms, f=5 or 7ST).
We tested the time course of perceptual organisation within the 4-minute long
stimulus trains by comparing across three time ranges, selected from the early (20-50 s,
because there are some combinations of parameters and participants, who had not yet
given their initial response at shorter latencies; see the range shade blue-grey on Figure
10), middle (120-150 s) and late (200-230 s) phase of the stimulus trains by conducting
ANOVAs of the probability of segregation with the structure: Time-range (early,
middle, late) SOA (4 levels; see Figure 4) f (4 levels; see Figure 4). In experiment 1, increasing SOAs induced a monotonically decreasing probability for perceiving the
segregated organisation (F[3,42]=30.24, p
-
Manuscript for submission to Hearing Research 13/06/2008
streaming test sequences. The significant interaction between Time-range and SOA
(F[6,84]=3.65, p
-
Manuscript for submission to Hearing Research 13/06/2008
Figure 10. Experiment 1: Group-average time-course of the probability of the segregated
percept within the stimulus trains as a function of SOA (panels) and f (overplotted, marked with colours). With segregation taken as the value 1 and integration as the value 0, the figure
shows the probability of reporting segregation at each time point. Note that the blue-grey bars
indicate the initial period during which not all participants had yet made their initial choice; the
data during this time is, therefore averaged over only those participants who had reacted by the
given time. The violet bars mark the periods used for the statistical analysis.
Perceptual bi-stability in auditory streaming 31
-
Manuscript for submission to Hearing Research 13/06/2008
Figure 11. Experiment 2: Group-average time-course of the probability of the segregated
percept within the stimulus trains as a function of SOA (panels) and f (overplotted, marked with colours). See description in the legend of figure 10.
The fact that switching continues throughout the duration of the whole stimulus
train, is clearly shown in figures 12 and 13 in which group-averaged phase durations are
plotted for each condition as a function of time within the stimulus trains. The low
phase-duration period at the beginning of the stimulus trains (
-
Manuscript for submission to Hearing Research 13/06/2008
trains). Only significant effects related to the time ranges are reported, because we have
already tested the effects of SOA and f on phase durations (see the previous section). In experiment 1, phase durations were significantly reduced in the late as compared to
the early and middle time ranges (F[2,28]=5.85, p
-
Manuscript for submission to Hearing Research 13/06/2008
between Time-range and f reflect this effect combined with the previously observed effect showing that low fs induce long (integrated) first phases.
Figure 12. Experiment 1: Time course of group-average phase duration separately for the four
different SOAs, overplotting the different fs (marked with differently coloured lines). Note that the blue-grey bars indicate the initial period during which not all participants had yet made
their initial choice; the data during this time is therefore averaged over only those participants,
who had reacted by the given time. The violet bars mark the periods used for the statistical
analysis.
Perceptual bi-stability in auditory streaming 34
-
Manuscript for submission to Hearing Research 13/06/2008
Fi
gure 13. Experiment 2: Time course of group-average phase duration separately for the four
different SOAs, overplotting the different fs. See description in the legend of Figure 12. In summary, we found that for the typical tone sequences used to study auditory
streaming, 1) integration is not necessarily the first percept and 2) no stable final percept
emerges. These statements appear to contradict the classical findings. The
contradictions are, however, explained by the different assumptions and methods
employed by the present study (together with the few similar previous studies [5, 6, 10,
11]) and classical explorations of auditory streaming. The assumption of a stable final
percept led most experimenters in the past to use short (typically
-
Manuscript for submission to Hearing Research 13/06/2008
interpreted as more and more participants reaching the final percept, our data shows
that the group-average probability of perceiving the sounds in terms of one or another
percept is a product of averaging between perceptual states, which switch back and forth
all the time, but with shorter- and longer-term changes in the statistics of the switching
behaviour. In fact, our results argue for a different distinction in the temporal dynamics
of auditory streaming. Instead of build-up and final percept it appears that
distinguishing between the first perception (first-phase) and subsequent states, as is
often the case in the analysis of visual bi-stability, may prove a more fruitful description
of the temporal behaviour of perceptual processes. Results described in this section
argue that these two periods (and perhaps more, if changes by the end of the 4-minute
trains are not related to the length of the stimulus blocks) may show characteristically
different properties and thus may be understood by different theoretical and
computational models.
From the above views it follows that an improved description of auditory
streaming must cover the dynamic properties of this perceptual phenomenon. For
further insight we next turned to study similarities and differences between auditory
streaming and visual bi-stability. The following section will test whether auditory
streaming also shows some important properties of binocular rivalry. The findings will
be incorporated within our rule-competition theory, which already provides
explanations to many of the classical (e.g., integration is more often the first percept,
because building local rules is faster; etc.) and some of the novel findings (switching
maxima; segregation as the first percept; etc).
Perceptual bi-stability in auditory streaming 36
-
Manuscript for submission to Hearing Research 13/06/2008
Effect of rule strength on perceptual organisation
Visual studies of bi-stability often employ the paradigm of binocular rivalry, in
which different images are presented to the left and right eye. Participants experience
perceptual switching between the two images, as one then the other dominates
perception for some period of time (termed the dominance duration). The well-known
second proposition of Levelt [23] states that if the contrast in the image presented to one
eye is increased while the contrast in the image to the other eye is held constant, then
the dominance durations of the fixed-contrast eye decrease while the dominance
durations of the variable-contrast eye remain approximately constant. This law was
thought to hold generally for binocular rivalry. However, it has recently been shown
that if a wider range of mean contrast levels is used (Levelt employed only relatively
high contrast levels), then adjusting the contrast in one eye can affect the dominance
durations both eyes, as illustrated in figure 14 [24].
Perceptual bi-stability in auditory streaming 37
-
Manuscript for submission to Hearing Research 13/06/2008
Figure 14. Cartoon illustrating changes in dominance durations with changing image contrast in
one eye: the validity of Levelt's second proposition [23]; taken from [24]. On the left, a series of
four plots shows how the dominance durations of the variable (solid line) and fixed (dashed
line) contrast eyes changes as a function of contrast in the variable-contrast eye. The y axis
indicates the normalised dominance duration, and the x axis, the log of the contrast in the
variable contrast eye. As shown, relationships such as that in the upper left plot are consistent
with Levelts proposition [23], whereas those in the lower left plot, are inconsistent; there
appears to be a continuum from one extreme to the other (illustrated in the middle two plots).
The plots are colour coded to map onto the box at the lower right, which indicates the absolute
contrast levels under which these different relationships were observed. As can be seen, Levelts
second proposition holds when the contrast level in the fixed eye is high, but not when the
contrast level in the fixed eye is low.
If bi-stability in auditory streaming originates from competition between
competing rules, then it is possible that the role of contrast in binocular rivalry may be
Perceptual bi-stability in auditory streaming 38
-
Manuscript for submission to Hearing Research 13/06/2008
similar to that of the strength of competing rules in auditory streaming. That is, higher
contrast in binocular rivalry would be analogous to having a stronger rule for a
particular organisation in auditory streaming. We have already suggested that f and t affect local and global rules differently: f affects only the strength of the local rule, whereas t only affects the global rule. However, although SOA also has a small effect on the local rule (the across-stream t decreases with decreasing SOAs, which somewhat strengthens the local rule), we will ignore this factor in the following
description. As a consequence, by changing f while keeping the SOA constant we manipulate the strength of the local rule while keeping the strength of the global rule
fixed; and, similarly, changing SOA while keeping f constant manipulates the strength of the global rule while keeping the strength of the local rule (more or less) fixed. As a
first step, for comparison between binocular rivalry and auditory streaming, figure 15
was constructed to show the group-averaged durations of the first two perceptual phases
in experiment 1.3 We chose to analyse the mean duration of the first two perceptual
phases, because our previous analyses (see above) showed that f and SOA has little effect on phase durations in later time ranges of the 4-minute trains. Mean integrated
phase durations are plotted with respect to the stimulus parameters as an interpolated
solid surface, and mean segregated phase durations are plotted as a meshed surface. By
taking slices through these surfaces at different values of f and SOA, we can compare the effects of rule strength in auditory streaming with the effects of contrast levels in
vision.
3 Because experiment 2 covered a much smaller range of parameters in f and SOA than experiment 1, only results of experiment 1 were used in this analysis.
Perceptual bi-stability in auditory streaming 39
-
Manuscript for submission to Hearing Research 13/06/2008
Figure 15. Interpolated surfaces showing the group-averaged durations of the first two
perceptual phases obtained in experiment 1 as a function of f and SOA; integrated (solid surface) and segregated (mesh surface) phases. Colours code phase durations (redundant with
the vertical axis).
Figure 16 shows that, if we interpret auditory streaming as a competition
between alternative sequential associations (local versus global rules), then we can find
similar relationships between the dominance durations of integration and segregation to
those reported for eye dominance in binocular rivalry [24]. Levelts second proposition
is consistent with the changes in phase durations where the associations are strong. For
instance, at 4-ST f (strong local rule), increasing the global-rule strength by decreasing the SOA reduces the mean integrated phase durations (the fixed-rule-strength
percept), but has little effect on the segregated phase durations (the
variable-rule-strength percept; compare the top left panel of Figure 16 with panel A
of figure 14). In contrast, at 22-ST f (weak local rule), increasing the global-rule strength, by decreasing the SOA substantially increases the mean segregated phase
Perceptual bi-stability in auditory streaming 40
-
Manuscript for submission to Hearing Research 13/06/2008
durations (variable-rule-strength percept) in accordance with the similar binocular
rivalry effect (compare bottom left panel of Figure 16 with panel D of figure 14). At
fs between these two extremes, results compatible with the intermediate relationships reported by Brascamp et al. are found [24].
Similarly, for the 100-ms SOA conditions (strong global rule), increasing the
local-rule strength by decreasing f decreases the mean segregated phase durations (the fixed-rule-strength percept), but does not much affect the mean integrated phase
durations (the variable-rule-strength percept; compare the top right panel of Figure 16
with panel A of figure 14). In contrast, at 250-ms SOA (weak global rule), increasing
the local-rule strength by decreasing f substantially increases the mean integrated phase durations (variable-rule percept) in accordance with the similar binocular
rivalry effect (compare bottom right panel of Figure 16 with panel D of figure 14).
Figure 16. Local- and global-rule-strength effects, colour-coded as in Figure 14: analysed for f (local rule; left column) and t (global rule; right column). Phase durations for the
Perceptual bi-stability in auditory streaming 41
-
Manuscript for submission to Hearing Research 13/06/2008
fixed-strength rule are shown with dashed, those for the variable-strength rule with
continuous lines. Note that the rule strength progressively decreases from the top row to the
bottom row, and within each plot, rule strength decreases from left to right; i.e. the opposite
direction than in figure 14.
Thus another analogy can be shown between visual bi-stability and auditory
streaming. Specifically, assuming an analogy between the strength of local- vs.
global-rule representations in auditory streaming and contrast levels of images presented
to the two eyes in the binocular rivalry situation, parameters affecting the "strength" of
these representations had similar effects on the dominance phase durations. Together
with the similarities mentioned in the introduction, this analogy suggests that the
principles of computational models of binocular rivalry may be applicable in modelling
auditory streaming. However, there is a caveat to this claim. Here we considered only
the first two perceptual phases, since we showed earlier that stimulus parameters only
have significant effects on phase durations during the first perceptual phase. In contrast,
the findings in binocular rivalry [24] may relate to later phases of perceptual switching,
as Brascamp et al excluded the first minute of participant responses from their analysis.
Thus the analogy between auditory streaming and binocular rivalry may not extend to
the distinction between initial and subsequent phases in auditory streaming. This,
however, does not reduce the value of the analogy in helping to identify generic
modelling principles applicable to both modalities.
Transition Phases: Distribution of both-percept responses
The distribution of both-percept responses (i.e., when participants report
perceiving repeating integrated and segregated percepts concurrently) is not uniform
across conditions in experiment 1. (Again, we only analyze the results of experiment 1
see Footnote 3.) There is a clear tendency to report 'both' more often along a ridge in the
Perceptual bi-stability in auditory streaming 42
-
Manuscript for submission to Hearing Research 13/06/2008
parameter space roughly corresponding to the region where local and global rules are
balanced. An ANOVA of the proportion of both responses in the subsequent phases
with the structure of SOA (4 levels; see Figure 4) f (4 levels; see Figure 4) showed only a significant interaction between the two factors (F[9,126]=3.06, p
-
Manuscript for submission to Hearing Research 13/06/2008
Figure 17. Distribution of mean steady state both-percept responses as a proportion of post-
first-phase stimulus duration in experiment 1. Note to avoid problems arising from accidental
simultaneous button presses we included in this analysis only phases with duration exceeding
300ms.
Both types of patterns being perceived at the same time contradicts our intuitive
assumption of the exclusivity of two competing perceptual organisations in the
alternating two-tone sequence, as well as the findings of Pressnitzer and Hup [5, 6]
and much of the visual literature on bi-stability. However, it has been shown in vision
that contrary to the usual assumptions of exclusivity [12], periods of 'transition' during
which neither eye is clearly dominant can be of rather long duration; comparable with
'eye' dominance durations [24]. This is consistent with the durations of the both-
percept responses in our experiment 1. Furthermore, in another striking analogy
between visual and auditory bi-stability, the distribution of mean both-percept
durations with respect to rule strength resembles the distribution of 'transition' durations
in the binocular rivalry paradigm [24]; as illustrated in figure 18.
Perceptual bi-stability in auditory streaming 44
-
Manuscript for submission to Hearing Research 13/06/2008
MaxMax
MinMin MaxMax
MinMin
Figure 18. Transition (both-percept) phases in binocular rivalry and auditory streaming. Left:
Relationship between transition durations and image contrast (governing the "strength" of the
alternative eye activations) in binocular rivalry; where Dep. refers to the dominant percept
prior to the transition, and Dest. to the percept following the transition; taken from [24].
Right: Relationship between both-percept durations and rule strength in experiment 1. Dep.
refers to the rule strength corresponding to the perceptual organisation prior to the 'both'
response, and Dest. to the rule strength corresponding to the perceptual organisation
following the 'both' response. For the perceptual state of 'integration', the rule strength was
considered to correspond to f; and for 'segregation' the rule strength was considered to correspond to SOA.
It is not clear at this stage what gives rise to the perception of both patterns
simultaneously, and intuitively we think of them as mutually exclusive. One possibility
is that there is a very rapid switching between the two alternatives but that conscious
perception is more sluggish, and unable to follow this rapid switching. Hence there is a
sort of stroboscopic effect in which both perceptual organisations are perceived as being
present although there is actually switching between them. An alternative explanation
arises from the notion of competition at several different levels in the perceptual
hierarchy. Generally, recurrent top-down connections tend to ensure that there is
consistency across the whole system, but there may be instances in which the top-down
signals cannot overcome the local competitive interactions; in this case, incompatible
Perceptual bi-stability in auditory streaming 45
-
Manuscript for submission to Hearing Research 13/06/2008
winners could emerge at different levels, and this inconsistency may take some time to
resolve before a consistent organisation emerges throughout the hierarchy. In our
computational modelling studies, we have observed both of these effects and we are
planning to formulate a follow-up experiment to distinguish between these two
possibilities.
General Discussion
Interesting theoretical insights into the processes underlying streaming can be
derived from the experiments reported above. In this section, we present a conceptual
framework for auditory streaming which accounts for previous results as well as our
new findings. Here follows a summary of the novel observations obtained in the current
experiments:
1) Switching between integrated and segregated percepts continues throughout
the stimulus sequences with any combination of f and t (SOA) and in all participants. Switching is not a product of learning or fatigue. Rather, there appears to be no final
stable percept in auditory streaming.
2) Participants report simultaneous perception of integrated and segregated tone
patterns (both responses); that is, segregated and integrated percepts are not exclusive.
3) With medium-to-large fs and very short ts (SOAs) the first reported percept is segregation; that is, integration is not always the first percept. Hence,
segregation is not a result of evidence gathering.
4) When f is low/medium and t (SOA) is short the first perceptual phase is typically much shorter than first phases with other combinations of the two parameters.
The overall amount of switching is also highest in this region. The effect is not fully
Perceptual bi-stability in auditory streaming 46
-
Manuscript for submission to Hearing Research 13/06/2008
monotonic, first percepts become somewhat longer and the overall amount of switching
increases with very short SOAs and at very low fs; the minimum of the first-phase duration distribution and the maximum amount of switching appearing at 100-ms SOA
within the 4-10-ST f range in the current experiments.
5) With time, the ratio between integrated and segregated percepts tends to
become balanced irrespective of the combination of f and t (SOA).
6) f and t (SOA) affect the duration of the first perceptual phase in response to the tone sequences similarly to the way in which visual contrast affects dominance
durations in binocular rivalry. It appears as if varying f affects the integrated and varying t (SOA) affects the segregated sound organisation similarly to the way in which 'contrast' is assumed to control the competitiveness of an image in the given eye.
This result supports the notion that f controls the competitiveness (strength) of the integrated, whereas t (SOA) controls that of the segregated sound organisation.
7) The distribution of the duration of both-percept responses as a function of
the strength of the preceding and the following organisation, assuming the above
described relationship between f and the integrated and t (SOA) and the segregated organisation, is similar to the distribution of the duration of transitional phases in
binocular rivalry.
The observed qualitative differences between the first and subsequent perceptual
phases have strong implications for theoretical accounts of streaming, which must
explain both the perceptual bias of the first phase, and the continuing switching
behaviour. It is important to note that our distinction here between first and subsequent
perceptual phases is not the same as the usual distinction between build-up and final
Perceptual bi-stability in auditory streaming 47
-
Manuscript for submission to Hearing Research 13/06/2008
percept [4], because the first phase is generally longer than the expected build-up
duration, and we found no evidence for a stable final percept.
Since participants attended the sounds without attempting to hear them
according to one or the other organisation, the current results cannot tell us whether
switching occurs in an automatic fashion or is the product of attention. Previous
electrophysiological studies [10, 25] obtained correlates of both bottom-up and
attentional processes when participants were exposed to tone sequences similar to those
used in the current experiments. There is some indication that the initial phase of
streaming may be more sensitive to attentional effects [25-27], whereas maintaining
sound organisations does not require focused attention [28]. However, there is evidence
that the segregated organisation can develop [29] without attention, although this may
depend on the actual stimulus configuration. On the other hand, these
electrophysiological studies did not measure the perception of the critical sounds
on-line, thus possibly averaging together trials with different perceptual organisations.
Therefore, no strong conclusion can be drawn regarding the relationship between
attention and the observed differences between the first and subsequent perceptual
phases.
In the following account, we shall distinguish between the first and subsequent
perceptual phases. We shall attempt to describe them in terms of alternative rule
representations which vie for dominance, and test the viability of this explanation in the
face of our novel findings. As will become clear, a useful way to think about the two
perceptual stages may be as two distinct processes, formation of sequential associations
and coexistence between alternative interpretations.
Perceptual bi-stability in auditory streaming 48
-
Manuscript for submission to Hearing Research 13/06/2008
First phase: rule formation
There are two aspects to consider: the initial choice of perceptual organisation,
and the duration of the first perceptual phase. We consider these below separately for
the case where the first phase is the integrated state and that where it is the segregated
state.
Integration is most commonly found for the parameter ranges used, except at
very short SOAs and medium-large fs, presumably since sequential associations between temporally adjacent events are more easily formed in the brain; i.e., there is a
bias towards forming AB and BA links first, and these local rules support perceptual integration. A consideration of plasticity mechanisms (Hebbian learning)
suggests that the formation of associations is perhaps only possible between consecutive
events; i.e. synaptic plasticity does not support skipping over intervening events. This
principle has two consequences for the effects of f on the formation of sequential associations. Firstly, in a tonotopically organised system, the amount of overlap
between the neural activity in response to the A and B tones is governed by the
frequency difference between the two: with small fs, the overlap in activity and thus the potential for forming these associations is large, whereas with large fs it is small. Thus the main effect of f is on the formation of local rules. Secondly, f may also affect the speed with which global-rule representations can form. This is because in
order to form higher order sequential associations ('global rules') it is necessary for
populations which respond selectively to one or other of the tones to emerge (i.e. a
process of clustering), before the within-stream sequential associations can be learned.
Only after such populations are formed can the within-stream events become
consecutive, at least as viewed from within the population cluster. If we consider a fixed
SOA, then clearly the difficulty of forming separate clusters will be determined largely
Perceptual bi-stability in auditory streaming 49
-
Manuscript for submission to Hearing Research 13/06/2008
by f and the extent to which the neural activity elicited by the two tones overlaps, which in turn will determine the duration before the system can discover the
segregated organisation. This is well illustrated in Figure 16 (right panels), in which f has a clear effect on the duration of segregation for fixed SOAs. Thus the two effects of
f go hand in hand: smaller fs improve the formation of local-rule links and delay the formation of global-rule links; larger fs weaken the formation of local-rule links and allow faster formation of global-rule links.
The above description is consistent with the finding that the fission boundary is
much larger than the smallest detectable frequency difference [30], since segregation
requires not only a detectable difference, but also a clear separation of activity.
Similarly, the amplitude-modulation 'fission' boundary was shown to be much larger
than the smallest detectable amplitude-modulation difference [31]. Although initially
puzzling, the finding of a very stable fission boundary, measured in terms of the
minimum f necessary for segregation [30], can also be reconciled with perceptual switching; while the formation of separate clusters of activity may depend on some
minimum featural difference, our results show that subsequent stochastic switching is
largely independent of stimulus features. The claims of various groups, e.g. [32-34] that
adaptation in primary auditory cortex is a neural correlate of streaming is also consistent
with these ideas, since adaptation is likely to be an important aspect of the process of
clustering. Therefore, it will be linked to the duration of the first phase, and hence the
rate at which the probability of reporting streaming develops [32-34].
Integration may also be reported initially as a result of a qualitatively different
mechanism when f is small and SOA is short. When two sounds are presented within a temporal window of less than about 200 ms duration, then they tend to be processed as a
Perceptual bi-stability in auditory streaming 50
-
Manuscript for submission to Hearing Research 13/06/2008
single event [19], at least within a limited range of frequency differences [20]. In our
experiments this occurs primarily for SOAs 100 ms, and f < 4ST. In these cases the integrated pattern first perceived is likely to be an ABA chunk. The auditory system
then needs to pull this chunk apart before it can discover the alternative within-stream
sequential associations. Nevertheless, perceptual switching is found here too.
Segregation as the first percept at short SOAs and larger fs can also be understood in terms of the tendency to chunk acoustic stimuli into single events if they
occur within approximately 200ms of each other [19]. When f exceeds the proposed spectral window of integration [20], consecutive identical tones may be directly linked
(i.e., with the B tone excluded, the integration window contains A-A). A direct
consequence of this explanation is that first-phase segregated percepts must be based on
the more frequent stimulus (A-A, they cannot be B---B); a prediction which we plan to
test experimentally. Neurophysiological studies suggest a reason for this phenomenon
since it has been shown that responses in cortex are greatly reduced when events follow
each other at a rate faster than about 10 Hz [35]. Therefore, at fast presentation rates, it
is possible that due to insufficient recovery time some populations only respond to one
or other of the tones from the outset. Hence the higher order sequential associations can
be discovered immediately, resulting in segregation becoming the first percept. In our
data, the minimum f for reporting first phase segregation is approximately 5 semitones at 125-ms SOA, which is not inconsistent with the spectrotemporal window of
integration reported in [20]. Thus, similarly to f, SOA also has two different effects on the formation of rule representations. Firstly it governs the formation of associations
between consecutive tones, because for such associations the neural after-effects of the
first tone must still be present at the time the next tone arrives. This primarily affects the
formation of within-stream associations since twithin > tacross. Secondly, with short
Perceptual bi-stability in auditory streaming 51
-
Manuscript for submission to Hearing Research 13/06/2008
SOAs the recovery of the neuronal elements in higher (cortical) levels of the auditory
system may become an issue, and this primarily affects the local rule. Again, the two
effects go hand in hand: shortening the SOA strengthens the associations necessary for
representing the global rule more than the local rule (see Figure 16, left panel dF =
22ST), then, especially at small fs, shortening the SOA weakens the local-rule associations more than the global-rule ones (see Figure 16, left panel dF = 4ST).
One argument against this description of the SOA effects is that streaming can
be induced by the presenting a sequence of only one of the tones prior to the onset of
the ABA_ABA pattern [36]. Based on this finding Bregman et al. [3] argue against
SOA being an important determinant of streaming, suggesting instead that the
within-stream ISI (t) is the major factor. However, within the above-described framework, it is easy to understand how the induction sequence used by Rogers and
Bregman [36] led to increased reports of streaming at the onset of the ABA pattern. The
induction sequence promoted the establishment of sequential associations between tones
in one of the streams, thereby biasing the system towards activity associated with
single-frequency streams to be perceived first. In this case, the formation phase would
involve the discovery of linkages between temporally adjacent events, the local rules.
Thus, the effect of the induction sequence is similar to that of very low (
-
Manuscript for submission to Hearing Research 13/06/2008
duration of this phase depends on the time taken to discover and represent alternative
sequential associations. Consistent with findings in vision that local competition is
necessary in order to tr