The Statistical Learning Of Musical Expectancy · The Statistical Learning Of Musical Expectancy...

111
The Statistical Learning Of Musical Expectancy by Dominique Thy An Vuvan A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Department of Psychology University of Toronto © Copyright by Dominique T. Vuvan 2012

Transcript of The Statistical Learning Of Musical Expectancy · The Statistical Learning Of Musical Expectancy...

The Statistical Learning Of Musical Expectancy

by

Dominique Thy An Vuvan

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy

Department of Psychology University of Toronto

© Copyright by Dominique T. Vuvan 2012

ii

The Statistical Learning Of Musical Expectancy

Dominique Thy An Vuvan

Doctor of Philosophy

Department of Psychology

University of Toronto

2012

Abstract

This project investigated the statistical learning of musical expectancy. As a secondary goal, the

effects of the perceptual properties of tone set familiarity (Western vs. Bohlen-Pierce) and

textural complexity (melody vs. harmony) on the robustness of that learning process were

assessed. A series of five experiments was conducted, varying in terms of these perceptual

properties, the grammatical structure used to generate musical sequences, and the methods used

to measure musical expectancy. Results indicated that expectancies can indeed be developed

following statistical learning, particularly for materials composed from familiar tone sets.

Moreover, some expectancy effects were observed in the absence of the ability to successfully

discriminate between grammatical and ungrammatical items. The effect of these results on our

current understanding of expectancy formation is discussed, as is the appropriateness of the

behavioural methods used in this research.

iii

Acknowledgments

It feels strange to see my name standing alone at the top of this document, because it surely feels

like there have been a hundred authors rather than one. My sincere and infinite thanks go out to

all my ghost co-authors. More specifically…

Thanks to Mark Schmuckler, my supervisor, for moulding me from a tiny naïve science duckling

into a big feisty science mallard. Thanks to my PhD committee – Elizabeth Johnson and Claude

Alain – whose support and invaluable feedback helped me cope with the emotional and

intellectual uncertainty of the research enterprise. Thanks to Jon Prince, my academic big

brother, whose career path trailblazing gives me hope that I too can achieve my science mallard

dreams. Thanks to Bryn Hughes for his help with my experimental stimuli, for all music

cognition experts are lost without a music theory expert buddy. Thanks to my examination

committee – Steve Joordens and Sandra Trehub – for reading my thesis and liking it! Thanks to

Barbara Tillmann, my external appraiser, for providing unique insight based on her considerable

experience with research in this area.

Thanks to Eddy Abraham, my erstwhile life adventure partner, for putting up with the highs and

lows of graduate school for the last six years. Thanks to my thesis writing support group –

Jessica Ellis, Michelle Hilscher, and Cara Tsang – whose never-ending cheerleading and advice

helped me to write faster than I’ve ever written before. Thanks to my parents, who raised a

person of sufficient intelligence and stubbornness to get through this process relatively

unscathed. Thanks to the staff of Moonbean Café, who quietly provided me with smiles,

caffeine, and workspace throughout the writing process. And thank you, dear reader, for being

interested in this work – especially if you’re not one of the individuals thanked above.

iv

Table of Contents

Acknowledgments.......................................................................................................................... iii

Table of Contents ........................................................................................................................... iv

List of Tables ................................................................................................................................ vii

List of Figures .............................................................................................................................. viii

Chapter 1 - Connections Between Statistical Learning, Musical Structure, and Musical

Expectancy ..................................................................................................................................1

1 Introduction .................................................................................................................................1

2 General Methods .........................................................................................................................8

3 Research Goals ............................................................................................................................9

Chapter 2 - Experiment 1: Memory For Western Tone-Word Melodies .......................................11

4 Introduction ...............................................................................................................................11

5 Methods .....................................................................................................................................12

5.1 Participants .........................................................................................................................12

5.2 Apparatus ...........................................................................................................................13

5.3 Materials and Procedure ....................................................................................................13

6 Results .......................................................................................................................................17

6.1 Discrimination Phase .........................................................................................................17

6.2 Expectancy Phase...............................................................................................................18

7 Discussion .................................................................................................................................20

Chapter 3 - Experiment 2: Tonal Priming With Western Chords ..................................................22

8 Introduction ...............................................................................................................................22

9 Methods .....................................................................................................................................25

9.1 Participants .........................................................................................................................25

9.2 Apparatus ...........................................................................................................................25

9.3 Materials ............................................................................................................................26

v

9.4 Procedure ...........................................................................................................................30

10 Results .......................................................................................................................................32

10.1 Discrimination Phase .........................................................................................................32

10.2 Expectancy Phase...............................................................................................................33

11 Discussion .................................................................................................................................39

Chapter 4 - Experiment 3: Memory For Bohlen-Pierce Melodies .................................................42

12 Introduction ...............................................................................................................................42

13 Methods .....................................................................................................................................43

13.1 Participants .........................................................................................................................43

13.2 Apparatus ...........................................................................................................................44

13.3 Materials and Procedure ....................................................................................................44

14 Results .......................................................................................................................................48

14.1 Discrimination Phase .........................................................................................................48

14.2 Expectancy Phase...............................................................................................................48

15 Discussion .................................................................................................................................59

Chapter 5 - Experiment 4: Tonal Priming With Bohlen-Pierce Melodies .....................................63

16 Introduction ...............................................................................................................................63

17 Methods .....................................................................................................................................64

17.1 Participants .........................................................................................................................64

17.2 Apparatus ...........................................................................................................................64

17.3 Materials ............................................................................................................................64

17.4 Procedure ...........................................................................................................................65

18 Results .......................................................................................................................................68

18.1 Discrimination Phase .........................................................................................................68

18.2 Expectancy Phase...............................................................................................................69

19 Discussion .................................................................................................................................77

vi

Chapter 6 - Experiment 5: Tonal Priming With Bohlen-Pierce Chords ........................................80

20 Introduction ...............................................................................................................................80

21 Methods .....................................................................................................................................80

21.1 Participants .........................................................................................................................80

21.2 Apparatus ...........................................................................................................................81

21.3 Materials ............................................................................................................................81

21.4 Procedure ...........................................................................................................................84

22 Results .......................................................................................................................................85

22.1 Discrimination Phase .........................................................................................................85

22.2 Expectancy Phase...............................................................................................................86

23 Discussion .................................................................................................................................87

Chapter 7 - General Discussion .....................................................................................................89

23.1 Examination of Research Goals .........................................................................................89

23.2 Refining the Proposed Model of Expectancy Learning .....................................................91

23.3 Improving Methodology ....................................................................................................93

23.4 Conclusions ........................................................................................................................94

References ......................................................................................................................................96

vii

List of Tables

Table 1: Composition of Materials for Experiment 1

Table 2: Composition of Expectancy Trials for Experiment 2

Table 3: Composition of Materials for Experiment 3

Table 4: Examples of Grammatical and Ungrammatical Standards and Their Corresponding

Comparisons for Experiment 3

Table 5: Mean Areas Under Memory Operating Characteristic Curve for Experiment 3

Table 6: Mean Similarity Ratings for Experiment 3

Table 7: Composition of Materials for Experiment 5 (from Krumhansl, 1987)

viii

List of Figures

For all figures:

1) Error bars are standard error of the condition mean.

2) * signifies p ≤ .05, ** signifies p ≤ .01, *** signifies p ≤ .001.

Figure 1: Examples of grammatical and ungrammatical standards and their corresponding

comparisons for Experiment 1.

Figure 2: Main effect of Trial Type on accuracy in Experiment 1.

Figure 3: Trial Type x Grammaticality interaction for accuracy in Experiment 1.

Figure 4: Main effect of Trial Type on d’ in Experiment 1.

Figure 5: Priming conditions and hypothesized priming strengths if participants develop

harmonic expectancies in Experiment 2.

Figure 6: Finite state grammars used in Experiment 2, based on Figure 1 from Jonaitis & Saffran

(2009). (a) Grammar A; (b) Grammar B.

Figure 7: Examples of items used in Experiment 2, based on Figures 2-3 from Jonaitis & Saffran

(2009). (a) Example of Grammar A exposure item. (b) Example of Grammar A correct

discrimination item. (c) Example of Grammar B correct discrimination item. (d) Example of

Grammar A error discrimination item. (e) Example of Grammar B error discrimination item.

Figure 8: Average similarity ratings for grammatical and ungrammatical items in the

discrimination phase of Experiment 2.

Figure 9: Western and novel priming effects in terms of accuracy (in-tune trials only) for

Experiment 2.

Figure 10: Tuning x Chord Order interaction (accuracy) for the training legal/Western illegal

condition for Experiment 2.

Figure 11: Effect of tuning on reaction time for Experiment 2.

Figure 12: Western and novel priming effects in terms of reaction time (in-tune trials only) for

Experiment 2.

Figure 13: (a) Bohlen-Pierce grammar from Experiment 3, based on Figure 2 from Loui et al.

(2008). (b) An example of a melody constructed from this grammar.

Figure 14: Main effects on memory operating characteristic in Experiment 3.

ix

Figure 15: Grammaticality x Delay interaction for memory operating characteristic in

Experiment 3.

Figure 16: Delay x Comparison interaction for memory operating characteristic in Experiment 3.

Figure 17: Effect of melody familiarity on area under memory operating characteristic in

Experiment 3.

Figure 18: Main effects on similarity ratings for Experiment 3.

Figure 19: Grammaticality x Delay interaction for similarity ratings in Experiment 3.

Figure 20: Grammaticality x Trial Type interaction for similarity ratings in Experiment 3.

Figure 21: Delay x Trial Type interaction for similarity ratings in Experiment 3. # signifies p <

.10.

Figure 22: Grammaticality x Delay x Comparison Type interaction for similarity ratings in

Experiment 3.

Figure 23: Effect of melody familiarity on similarity ratings in Experiment 3.

Figure 24: Visual presentation of priming trials in Experiment 4, based on Figure 2 from

Tillmann & Poulin-Charronnat, 2010.

Figure 25: Effect of familiarization grammar on recognition performance in Experiment 4.

Figure 26: Effect of Grammaticality on accuracy in Experiment 4.

Figure 27: Grammaticality x Timbre interaction for accuracy in Experiment 4.

Figure 28: Grammaticality x Familiarization interaction for accuracy in Experiment 4.

Figure 29: Timbre x Familiarization interaction for accuracy in Experiment 4.

Figure 30: Main effects for reaction time in Experiment 4.

Figure 31: Grammaticality x Target Position interaction for RT in Experiment 4.

Figure 32: Grammaticality x Familiarization interaction for RT in Experiment 4.

Figure 33: Bohlen-Pierce chord grammars used in Experiment 5. (a) Grammar A; (b) Grammar

B. Composition of these chords is specified in Table 7.

x

Figure 34: Grammar x Correctness interaction for discrimination phase in Experiment 5.

Figure 35: Grammaticality x Familiarization interaction for accuracy in Experiment.

1

Chapter 1 Connections Between Statistical Learning, Musical Structure, and

Musical Expectancy

1 Introduction

Expectation, which can be simply defined as “the anticipation of upcoming information based on

past and current information” (Schmuckler, 1997) , is an essential tool for human survival. As

such, this ability to predict future events from previous experiences has been well studied by

cognitive psychologists, especially in the domain of music.

Music offers a uniquely suited material with which to study cognitive expectancy for two

reasons. First, critical to expectancy is the processing of experience over time. Musical passages

must unfold sequentially and thus their perception absolutely requires the integration of auditory

information across time. Second, musical structure has not only been theoretically defined in

extraordinary detail (e.g., Laitz, 2008; Lerdahl & Jackendoff, 1983; Rameau, 1971; Schenker,

1954), but the tight match between these theoretical definitions and listener cognitions have also

been confirmed (e.g., Jones, 1990; Krumhansl, 1990). This is especially true for Western tonal-

harmonic music, but fairly detailed work has been done cross-culturally (e.g., Castellano,

Bharucha, & Krumhansl, 1984; Kessler, Hansen, & Shepard, 1984; Krumhansl, et al., 2000) and

with artificial musical structures (Oram & Cuddy, 1995; Smith & Schmuckler, 2004) as well.

Because musical pieces unfold based on underlying structural rules, the cognitive representation

of these rules is imperative to expectancy. Therefore, music is particularly valuable as

experimental material because its well-defined structure is easy to control and manipulate.

Given these advantages it is not surprising that the role of expectancy in musical processing has

been extensively explored across disciplines. Psychologists have investigated how expectancy

affects listener judgments of how well-formed a musical passage is (Cuddy & Lunney, 1995;

Krumhansl, 1995b; Schellenberg, 1996; Schmuckler, 1989), how quickly and accurately musical

information is encoded (Bharucha & Stoeckig, 1986, 1987; Bigand, Poulin, Tillmann, Madurell,

& D'Adamo, 2003; Marmel & Tillmann, 2009; Marmel, Tillmann, & Delbe, 2010; Marmel,

Tillmann, & Dowling, 2008; Tillmann, 2005; Tillmann, Bharucha, & Bigand, 2000; Tillmann,

Bigand, & Pineau, 1998; Tillmann, Janata, Birk, & Bharucha, 2003, 2008), the parameters of

2

musical production and performance (Carlsen, 1981; Schellenberg, 1996; Schmuckler, 1989;

Thompson, Cuddy, & Plaus, 1997; Unyk & Carlsen, 1987), and memory for musical passages

(Boltz, 1991, 1993; Schmuckler, 1997).

More recently, researchers have also begun using neuroscientific techniques to measure brain

responses to expectancy satisfaction and violation. For instance, Koelsch and colleagues

(Koelsch, Gunter, & Friederici, 2000; Maess, Koelsch, Gunter, & Friederici, 2001), using

electroencephalography and magnetoencephalography, have demonstrated that an early anterior

negativity (EAN), emanating from Broca’s area and its right hemisphere homologue, occurs

approximately 250 ms after an expectancy-violating musical event. Furthermore, the magnitude

of the EAN indexes the degree to which that event violates structural expectations.

The same questions that have captured the attention of psychologists have also been well-studied

by scholars in other disciplines. For instance, Huron (2006) has taken an interdisciplinary

approach to developing a theory of musical expectation that relies on research in music theory,

cognitive science, and evolutionary biology. With this theory, Huron argues that expectancy is

responsible for the various emotion states that can be evoked by musical events, and that this link

between expectancy and emotion developed through evolution to be biologically useful.

Working in the field of music theory, Narmour has developed an account of melodic expectancy

that explains how melodic structure leads to listener expectations for what will follow (Narmour,

1990, 1992). According to Narmour’s Implication-Realization model, these expectancies are

predicated on Gestalt laws of perception such as similarity, closure, and continuity, which are

argued to arise from biological constraints of the sensory system. Empirical study has confirmed

the explanatory utility of Narmour’s model (e.g., Krumhansl, 1995a; Krumhansl, 1995b),

although a simplified version of the model seems to maintain its predictive value (Schellenberg,

1996).

Margulis (2005), also a music theorist, takes a somewhat different approach. Her model of

musical expectation assigns expectancy ratings to melodic events based on the hierarchical

organization of three primary factors (stability, proximity, and direction) and one secondary

factor (mobility). Notably, Margulis’ model improves on Narmour’s by formalizing the roles of

tonality (i.e., stability) and emotion in musical expectation.

3

Other researchers have taken a computational approach, building information-theoretic models to

describe expectancy processing in music. For example, Pearce and Wiggins (2006) have

developed a model of melodic expectancy that combines the bottom-up elements present in

Narmour’s work (Narmour, 1990, 1992) and the top-down influence of tonality formalized in the

work of Margulis and others (Krumhansl, 1990; Margulis, 2005). This model was quite

successful at predicting a variety of results from previous experiments on musical expectancy,

including a study of expectancy in simple intervals (Cuddy & Lunney, 1995), a study of

continuation tones in melodic fragments (Schellenberg, 1996), and a study that evaluated

listeners’ expectancy note-by-note throughout a melody (Manzara, Witten, & James, 1992).

One aspect of musical expectancy that has thus far escaped scrutiny is the cognitive process by

which these expectancies are acquired. Participants in studies of musical expectancy are almost

exclusively adults with an entire lifetime of musical experience upon which their expectancies

rest. Thus, the central aim of this project was to elucidate how musical expectancies develop.

This goal can be broken down into three separable questions:

(1) Upon what is musical expectancy based?

(2) How do listeners learn the basis of musical expectancy?

(3) Is there evidence of expectancy processing following the putative expectancy

learning process?

Extensive work on Western music indicates that music perception relies most heavily upon

tonality, the hierarchy that dictates how all tones are organized around a central tonic, and how

groups of tones (chords) are organized around a tonic chord (see Krumhansl, 1990; 2000 for

review). Importantly, musical expectancies seem to be predicated on these perceptual structures.

For instance, Bharucha and Stoeckig (1986) found that listeners responded faster and more

accurately to expected chords than unexpected chords, and critically, that these expectancies

matched up very well with predictions from the harmonic hierarchy described previously

(Krumhansl, Bharucha, & Kessler, 1982). In another study, Schmuckler (1997) found that

recognition memory for melodies was positively correlated to listener ratings of expectancy, and

that expectancy ratings were in turn related to how well the melody in question adhered to the

rules of tonality.

4

Turning to the question of how listeners learn these tonal rules, some theorists have focused on

the idea that knowledge of these structures, and hence expectancy, is based upon innate

perceptual dispositions (e.g., Huron, 2006; Meyer, 1956; Narmour, 1990). For example,

Narmour’s Implication-Realization theory is predicated on the assumption of perception being

guided by Gestalt laws that are innate to the sensory system.

However, research on listeners of different musical abilities, ages, and different cultures

challenges the presumption that musical expectancies are innate by indicating that the perception

of tonality varies across individuals and groups. Both Cuddy and Badertscher (1987) and

Krumhansl and Shepard (1979) found that musicians have more sharply defined representations

of tonality. Work by several groups has shown that the representation of the musical structure of

one’s culture strengthens continuously from birth, through childhood, and into adulthood (Cuddy

& Badertscher, 1987; Trainor & Trehub, 1992, 1993; Trehub, Schellenberg, & Kamenetsky,

1999). For instance, Trainor and Trehub (1992) found that infants were able to detect melodic

changes regardless of tonal structure, whereas adults were able to better detect changes that

violated tonal structure than changes that occurred within tonal structure. This result

demonstrated that tonal sequences were privileged in adult perception but not in infant

perception. Finally, many studies on cross-cultural tonality perception have shown that listeners

are more sensitive to the tonal structure of their home culture than to the tonal structure of other

cultures (Castellano, et al., 1984; Kessler, et al., 1984; Krumhansl, et al., 2000). For instance,

Castellano et al. (1984) had Western and North Indian listeners respond to North Indian musical

passages. These authors found that Western responses were dependent on the pitch information

present in the passage, whereas North Indian responses demonstrated effects of tonality

regardless of the pitches in the passage. Therefore, although there is merit in a theoretical

approach that treats the tonal structures that form expectancy as fully-formed, it is also important

to examine the learning processes that contribute to these expectancies.

Proponents of this learning view have focused on the role of statistical learning (sometimes

called implicit learning) on the acquisition of mental representations of tonal structure. Statistical

learning refers to the process whereby listeners are able to extract the structural rules of an

organized stimulus system during incidental exposure to rule-abiding exemplars. This process is

generally implicit and occurs without conscious thought. In music, this specifically refers to

listeners’ ability to extract the rules of tonality from their experiences listening to tonal music.

5

Tillmann, Bharucha, and Bigand (2000) have put forth MUSACT, a computational model of

implicit tonality learning that self-organizes based on exposure to tonal music. This connectionist

model stores the information from exposure in three types of nodes – tones, chords, and keys –

and displays learning by strengthening the connections between nodes that co-occur during

exposure and allowing these changes in connection strength to spread through the network of

nodes. This model has successfully accounted for a variety of empirical findings in the tonality

literature, including studies of tone relatedness, memory, and expectancy.

Inherent in these learning accounts is the assumption that listeners are in fact sensitive to

statistical cues present in music that would enable them to formulate their ensuing understanding

of tonal structure. Indeed, research with unfamiliar musical systems seems to confirm that

listeners perceive statistical cues in music, and based on those cues, are able to form novel

mental representations of structure. The two classes of statistical cues that have received the

most attention from music researchers are pitch distributions and predictive dependencies.

A pitch distribution is a set of statistics concerning the overall behaviour of tones in a musical

passage. Every musical system has a defined tone set containing all possible legal pitches. A

pitch distribution is defined by the relative frequency of occurrence of the different pitches in the

passage. In general, more important tones (like the tonic) occur more often than less important

tones. Evidence for listeners’ sensitivity to pitch distributions comes from cross-cultural studies

of tonality as well as studies employing artificial novel tonalities. For example, in the study of

North Indian music discussed above, Castellano et al. (1984) found that the responses of the

American listeners were in high accordance with the responses of the Indian listeners, despite

some key differences. These authors postulated that the musical passages contained distributional

cues to the North Indian tonal hierarchy, and Western listeners were able to extract that

information and respond accordingly. Similarly, in a study where pitch distributions were

experimentally manipulated, Oram and Cuddy (1995) found that listeners rated tones that had

occurred more frequently as fitting better with the musical context that tones that had occurred

less frequently. Moreover, Smith and Schmuckler (2004) presented listeners with musical

passages that exhibited different levels of pitch distributional information, and found that tonality

perception depends on meeting threshold levels of pitch differentiation and organization in these

passages.

6

Predictive dependency statistics, on the other hand, describe the local relations that govern

transitions from one musical unit to the next. For instance, forward transitional probabilities are

calculated from the conditional probability of any particular event, given the event that came

before. Saffran and her collaborators have studied the perception of transitional probabilities in

language extensively. In a pair of studies, Saffran and colleagues (Saffran, Aslin, & Newport,

1996; Saffran, Newport, & Aslin, 1996) demonstrated that both 8-month-old infants and adults

are sensitive to forward transitional probabilities. In these studies, an artificial grammar was

employed consisting of six trisyllabic words. These artificial words were strung together

randomly into a continuous stream that listeners heard during an exposure period. Critically,

although word boundaries could not be detected by using acoustic cues, they could be computed

based on differential predictive dependencies, with the syllables within words exhibiting high

transitional probabilities and the syllables between words exhibiting low transitional

probabilities. Following exposure, both infants and adults were able to discriminate words from

non-words even though they had never before heard these words in isolation, thus confirming

listeners’ perceptual sensitivity to these transitional cues.

Importantly, this sensitivity to transitional cues has been tested with music-like materials as well

(Saffran, 2003a, 2003b; Saffran & Griepentrog, 2001; Saffran, Johnson, Aslin, & Newport,

1999; Saffran, Reeck, Niebuhr, & Wilson, 2005). Similar to the linguistic materials from Saffran

et al. (Saffran, Aslin, et al., 1996; Saffran, Newport, et al., 1996), the artificial languages used for

these studies combined a discrete set of tones into tone-word groups containing three tones each.

These tone-words were then concatenated randomly into a continuous stream, with the cues to

their boundaries being the transitional probabilities between tones. Following exposure to the

tone stream, participants were able to discriminate between tone-words and non-words. These

results provide evidence that listeners’ sensitivity to adjacent dependencies extends to music-like

materials.

In another study exploring predictive dependencies in music, Jonaitis and Saffran (2009)

examined the possibility that listeners could use statistical information to learn the rules of a

harmonic system, which governs how chords (groups of simultaneously sounded notes) are

combined. These authors employed a finite-state grammar which combined familiar Western

chords in novel ways. After familiarization with 100 grammatical chord sequences, listeners

were able to distinguish grammatical chord sequences from ungrammatical ones, and following

7

an additional familiarization session, they were also sensitive to more subtle violations of

grammaticality within generally grammatical items. Thus, listeners are sensitive to structural

statistics that indicate novel combinations of both tones and chords.

Extending these results from items that are produced by combining familiar Western tones in

novel ways, Loui and her colleagues (Loui & Wessel, 2008; Loui, Wessel, & Kam, 2010) studied

the statistical learning of adjacent dependencies using an artificial grammar composed from

Bohlen-Pierce tones. The Bohlen-Pierce scale is a microtonal scale consisting of tones whose

frequencies are interrelated by a logarithmic factor of 3, whereas the chromatic tones of Western

music use a factor of 2. This produces a completely novel tone set with which listeners have no

experience. These authors found that following familiarization with 400 Bohlen-Pierce melodies,

listeners were able to distinguish familiarization melodies from ungrammatical melodies, and

also could generalize their knowledge to distinguish novel grammatical melodies from

ungrammatical melodies.

There is also evidence that listeners are sensitive to non-adjacent dependencies in music. These

transitional probabilities are calculated from the conditional probability of any particular note,

given the note that came two positions earlier. Listeners seem to be able to learn these

dependencies, but only if there is a perceptual or cognitive cue that helps to organize the stimuli

into units. For example, Creel, Newport, and Aslin (2004) found that listeners were only able to

learn non-adjacent relations among tones when a perceptual grouping cue was present. Thus,

when non-adjacent tone words were played in the same pitch register or with the same timbre

during familiarization, listeners were able to distinguish these grammatical tone words from

ungrammatical ones at test. Additionally, Endress (2010) demonstrated that tonality can be used

as a grouping cue to learn non-adjacent dependencies, with listeners learning non-adjacent

melodic units when they were tonal but not when they were atonal.

All in all, these studies suggest that listeners are sensitive to statistical information presented in

musical passages, and that they are able to extract information regarding musical structure from

these statistical cues. Turning to the third question posed above, some authors have suggested

that this type of finding is evidence that statistical learning can give rise to musical expectancies

(e.g., Jonaitis & Saffran, 2009). However, expectancy is best understood as the way that the

structural knowledge learned from statistical cues “is internalized in the form of representations

8

that influence subsequent processing” (Bharucha & Stoeckig, 1986, p. 403). Thus, a true test of

musical expectancy would demonstrate that these statistically-learned structural representations

have downstream processing effects. In this light, simply demonstrating that listeners

discriminate between grammatical and ungrammatical items, without testing the consequences of

this discrimination, is very weak evidence for the statistical learning of expectancies.

This question has been previously studied in language by Graf-Estes, Evans, Alibali, and Saffran

(2007). In their study, these authors presented infants with a artificial word streams similar to

those from used by Saffran et al. (Saffran, Aslin, et al., 1996; Saffran, Newport, et al., 1996).

Following this exposure, infants were separated into groups for an object label-learning task.

During this task, the “word” group was habituated to label-object pairs in which the label was a

word from exposure, whereas the “non-word” and “part-word” groups were habituated to label-

object pairs in which the label was a non-word or part-word, respectively. Graf Estes et al.

(2007) found that only the word group exhibited label learning, which indicated that words that

were segmented from the exposure were being treated as candidate words for linking to meaning.

Thus, language structure learning led to downstream processing effects based on that structure.

To date, only one study has explored this issue in music. Tillmann and Poulin-Charronnat (2010)

trained listeners on melodies produced from a finite-state grammar with Western chromatic tones

as nodes. Following familiarization, listeners encountered novel melodies based on the

familiarization grammar; each melody contained a target note that was in- or out-of-tune.

Critically, this target note either did or did not conform to the familiarization grammar. Listeners

were asked to respond as quickly and accurately as possible with regard to whether the target

was in- or out-of-tune. For in-tune targets, these authors found that listeners responded faster and

more accurately to grammatical than ungrammatical targets. This result replicated previous

studies of melodic priming in Western music, wherein listeners respond more quickly to

expected than unexpected targets (e.g., Marmel, et al., 2008). Therefore, this experiment

successfully provided evidence that structural information that listeners learn from statistical

cues can lead to downstream expectancy effects.

2 General Methods

The experiments in each chapter all share a common structure. Each experiment has three

phases: familiarization, discrimination, and expectancy. The general methodological features of

9

each phase will be explained in this section, and more detailed descriptions will be included in

the methods section for each chapter.

In the familiarization phase, listeners were exposed to a large corpus of materials with a novel

grammatical structure. These materials varied in terms of their complexity and their familiarity.

Complexity refers to whether the materials were composed of melodies (one sound event at a

time) or chords (multiple sound events at a time). Familiarity refers to whether the materials

were composed from a tone set that is familiar (i.e., Western chromatic set) or unfamiliar (i.e.,

Bohlen-Pierce scale).

The discrimination phase consists of an attempt at replication of past work that has shown that

following the familiarization phase, listeners are able to discriminate between items that are

grammatical and ungrammatical in the novel system. Thus, in this phase, listeners were asked to

discriminate between grammatical and ungrammatical items.

The expectancy phase was the critical part of these experiments. In this phase, musical

expectancies arising from the familiarization phase were measured using either a recognition

memory task or a musical priming task. It was hypothesized that musical expectancy would be

manifested in improved memory performance and musical priming effects for grammatical over

ungrammatical items.

3 Research Goals

The current series of experiments was undertaken with two goals in mind. The primary goal was

to demonstrate that listeners are able to learn musical expectancies through statistical learning.

To that end, listeners were familiarized with materials from several published statistical learning

paradigms (Experiments 1-4) as well as one set of novel materials (Experiment 5). Following

this, attempts were made to measure musical expectancies that may have arisen during

familiarization. This was essentially an attempt to conceptually replicate Tillmann and Poulin-

Charronnat’s (2010) results while varying the materials used and the methodologies employed to

measure expectancy.

The secondary goal of this research was to explore whether the ease of learning musical

expectancies depends on certain properties of the familiarization materials. Thus, the

10

familiarization materials varied, as described above, according to their complexity and their

familiarity.

This project will therefore establish a critical mass of evidence for the idea that musical

expectancies are developed, at least in part, through listeners’ incidental exposure to statistical

cues in the environment that give cues to musical structure. Moreover, these experiments will

provide an exploration of the limits to this statistical learning with respect to the perceptual

properties of the materials encountered.

11

Chapter 2 Experiment 1: Memory For Western Tone-Word Melodies

4 Introduction

This experiment aimed to establish that statistical learning processes give rise to musical

expectancies. The materials employed in this experiment were inspired by Saffran and her

colleagues, who have used artificial tone languages extensively in their work (Saffran, 2003a,

2003b; Saffran & Griepentrog, 2001; Saffran, et al., 1999; Saffran, et al., 2005). Saffran’s studies

have shown that after exposure to tone streams constructed of tone words that are defined by

transitional probabilities, participants are able to discriminate between grammatical and

ungrammatical tone-words. Thus, listeners are sensitive to transitional probability information

during music listening. This result has been replicated extensively in adults and infants (Saffran,

et al., 1999), and with materials containing absolute and relative pitch cues (e.g., Saffran &

Griepentrog, 2001). However, Trehub (2003) raised some concerns regarding Saffran’s (2003a)

materials. Specifically, some musical characteristics of the tone-words were not controlled,

which meant that participants were possibly using perceptual, rather than statistical, cues to do

the discrimination task. An additional concern was that expectancies regarding a complex

grammar would be difficult to apprehend during short-term familiarization in the laboratory.

Thus, the familiarization and expectancy phases were designed as replications of Saffran et al.

(1999), but with a set of tone-words in which some critical musical-perceptual characteristics

were controlled, and with the grammatical structure governing the tone-words somewhat

simplified.

In order to measure expectancy, inspiration was taken from the memory literature. Dowling and

his colleagues have studied melody memory comprehensively using a recognition memory task

(Bartlett & Dowling, 1980; Dowling, 1978, 1991; Dowling & Bartlett, 1981; Dowling &

Fujitani, 1971; Dowling, Kwak, & Andrews, 1995). In a typical study, participants hear two

melodies sequentially (called the standard and the comparison, respectively) and must indicate

whether the comparison was the same as the standard. This research has revealed a panoply of

factors that influence participants performance on this task, including the delay between

melodies, and the contour and interval content of the comparison. Importantly, research has

demonstrated that tonality also has predictable effects on melody memory. In general, well-

12

structured tonal melodies are remembered better than those not conforming to the rules of tonal

structure (Cuddy, Cohen, & Mewhort, 1981; Cuddy, Cohen, & Miller, 19799; Dewar, Cuddy, &

Mewhort, 1977; Dowling, 1978, 1991; Frances, 1972).

If familiarization on tone-word languages leads to expectancies about tone-word structure,

perhaps these expectancies can be used to cognitively organize tones into tone-word chunks.

Classic memory research has shown that chunking helps increase processing efficiency and thus

boosts memory performance (Miller, 1956). In this case, melodies that were composed of these

expected tone-word chunks might be remembered differently than randomly-composed

melodies, in the same way that tonal melodies are remembered differently from random ones.

Finally, consideration should be given to the familiarity and complexity of these experimental

materials, and how these characteristics might have influenced the ease of expectancy learning in

this experiment. These stimuli were comprised of Western chromatic tones organized into a

melodic texture, meaning that they were built from familiar-sounding parts and simple in

construction. One can make two divergent predictions here. On the one hand, it is possible that it

would be easy to learn structural expectancies with these materials because the processing

demands of familiar, simple stimuli are low. On the other hand, it is possible that expectancy

learning would be impeded for two reasons. First, the use of familiar musical building-blocks

might have led to the evocation of well-trained Western musical expectancies, thus making it

very difficult to learn novel expectancies in their place. Second, melodic stimuli may be too

simple in texture to provide enough information for the development of music-structural

expectancies.

5 Methods

5.1 Participants

Thirty-two participants were recruited from the University of Toronto Scarborough community

using the introductory psychology participant pool as well as posted advertisements. Participants

were compensated with course credit or $10 per hour.

Participants consisted of 8 males and 24 females with a mean age of 20.2 years (SD = 2.4 years).

Participant data was lost for 8 participants due to experimenter error; the following information

was gathered from the remaining 24 participants. Participants were not pre-selected for musical

13

experience, but had on average 5.1 years of formal musical training (SD = 4.4 years), 1.5 years of

musical theory training (SD = 2.6 years), played music for 1.1 hours per week (SD = 1.9 hours),

and listened to music for 10.1 hours per week (SD = 10.6 hours). No participants reported having

taken part in a music psychology experiment previously, nor did any participant report having

absolute pitch.

5.2 Apparatus

All three phases of the experiment were presented to participants using an Intel Pentium 4

personal computer, with code written and run in MATLAB 7.0. This experiment was realised

using Cogent 2000 developed at University College London by the Cogent 2000 team at the

Functional Imaging Laboratory and the Institute of Cognitive Neuroscience, and Cogent

Graphics developed by John Romaya at the Laboratory of Neurobiology at the Wellcome

Department of Imaging Neuroscience. The experiment interface was viewed on an LG Flatron

L1710S monitor, while the auditory components of the experiment were heard through a pair of

Sennheiser HD 280 pro headphones connected to a Creative Sound Blaster Audigy 2 ZS

soundcard, at a volume comfortable for the participant. Participant responses were collected

using the computer keyboard.

5.3 Materials and Procedure

Participants were randomly assigned to the discrimination group or the expectancy group. Both

groups took part in the familiarization phase. Only the discrimination group participated in the

discrimination phase, and only the expectancy group participated in the expectancy phase.

Familiarization phase:

Two counterbalanced languages were created for the familiarization phase, similar to the

materials used by Saffran et al. (1999, see Table 1). All participants were randomly assigned to

either Language A or Language B. Both languages were composed from the 12 tones of the

Western chromatic set, synthesized in Audacity 1.3 as 0.33 second-long sine waves with 0.03

second onset and offset ramps. For each language, the tones were combined to make four tone-

words containing three tones each. Within each language, the transitional probability between

any two tones within a tone-word was 0.33, and the transitional probability between any two

tones between tone-words was 0. Additionally, none of the transitions within tone-words in one

14

language ever occurred in the other language, such that a tone-word in Language A was a non-

word in Language B and vice versa.

For the familiarization phase, the 4 tone-words of each language were concatenated randomly

and continuously into a five-minute block, with two constraints: that each of the four tone-words

were heard with equal probability, and that there were never back-to-back repeats of a tone-

word. This five-minute block thus contained 75 instances of each tone-word.

During this phase, participants heard the five-minute familiarization block three times, for a total

of 15 minutes of exposure to the tone-word stream. They were told that they would hear a series

of continuous tones, and instructed to listen carefully because they would be tested following the

listening session. They were not informed about which aspects of the tones would be tested.

Discrimination phase:

The four tone-words from each language were exhaustively paired with one another to create 16

discrimination trials. On each trial, participants heard a tone-word from one language, followed

by a 0.75 second pause, and then the tone-word from the other language. Whether the Language

A tone-word or the Language B tone-word would be heard first was counterbalanced across the

trials. Correct responses for Language A participants were incorrect for Language B participants,

and vice versa. Participants were informed that they would hear two tone sequences separated by

silence and instructed to indicate whether the first or second tone sequence was more familiar,

based on the familiarization phase. The task therefore required a subjective judgment of

familiarity that depended on discriminating between a tone-word from the familiarization

language and a non-word from the other language. Following the discrimination trials,

participants completed a survey regarding their musical experience. The entire experimental

session for the discrimination group lasted approximately 30 minutes.

Expectancy phase:

Nine-note melodies were composed for each language by exhaustively joining the four tone-

words into all possible permutations of three tone-words. This process yielded 24 melodies for

each language. Because the tone-words were originally composed to control for interval

(distance between tones) and contour (pattern of ups and downs) content between Language A

and Language B melodies, these two factors could therefore not be used to discriminate the

15

melodies and tone-words of Lang uage A from B. As a result of familiarization, Language B

melodies were expected to sound ungrammatical to Language A participants, and vice versa.

Memory trials were composed following Dowling (1978) study. For each of the 48 standard

melodies (half Language A, half Language B), three comparison melodies were constructed (see

Figure 1). In the Match condition, the comparison melody was identical to the standard melody.

In the Same Contour condition, the comparison started on the same note and followed the same

pattern of ups and downs as the standard. However, notes 2 to 9 were chosen randomly from the

Western chromatic tone set (with the contour restriction) such that the intervals between notes

were altered. In the Random condition, the comparison melody started on the same note as the

standard and notes 2 to 9 were chosen randomly from the chromatic set (without a contour

restriction), creating a melody that differed from the standard in contour and interval content.

According to Dowling (1978) study memory performance should be best for random trials, as

they are highly discriminable from standards. Moreover, if the familiarization phase leads to the

development of melodic expectancies, one should also predict better memory performance for

trials containing standards constructed with tone-words from the participant’s familiarization

language. The order of the 144 expectancy trials (48 Language A and Language B standards x 3

comparisons) was randomized for presentation to participants.

On each trial, participants heard two melodies separated by a 2 second pause. They were

informed that they would hear one melody followed by another melody, and asked to indicate

whether the second melody was the same as the first. Following the expectancy trials,

participants completed a survey regarding their musical experience. The entire experimental

session for the expectancy group lasted approximately one hour.

16

Figure 1. Examples of grammatical and ungrammatical standards and their corresponding

comparisons for Experiment 1.

17

Table 1

Composition of Materials for Experiment 1

Tone Frequency (Hz) Language A Language B

C 261.63

C# 277.18

D 293.66

C# A F

D A G D# 311.13

E 329.63

A# D E

G# D# E F 349.23

F# 369.99

D# G B

C F# B G 392.00

G# 415.30

A 440.00 G# F# C A# F C#

A# 466.16

B 493.88

6 Results

6.1 Discrimination Phase

For each participant in the discrimination group, the number of correct discriminations out of 16

trials was tabulated. These data were submitted to a one-sample t-test with µ = 8 (chance

performance). Participants as a group performed significantly above chance, t(15) = 4.74, p <

.001, with mean number correct = 11.25 ± 0.66. Additionally, there was no significant difference

between participants trained on Language A vs. B, t(14) = 0.18, p = .86.

18

6.2 Expectancy Phase

Responses from the melody memory task were analyzed with respect to accuracy (percent

correct) and d’. Because the 24 standards for each language included every possible permutation

of the tone-words in each language, no systematic differences were expected between standards.

Therefore, responses were collapsed across the 24 standards in each condition for all analyses.

Accuracy data were submitted to repeated measures ANOVA with Trial Type (Same Contour,

Random) and Grammaticality (Grammatical, Ungrammatical) as factors. There was a main effect

of Trial Type, F(2,30) = 4.38, MSE = 0.03, p = .02, ɳp2 = .23; planned comparisons showed that

this was due to higher accuracy for random trials over match trials, t(15) = 2.77, p = .01, and

same contour trials, t(15) = 3.93, p = .001. There was no difference in accuracy for match and

same contour trials, t(15) = 0.71, p = .49 (Figure 2).

Match Same Contour Random

Trial Type

0

20

40

60

80

100

Ac

cu

rac

y (

%)

**

Figure 2 . Main effect of Trial Type on accuracy in Experiment 1.

19

The main effect of Grammaticality was non-significant, F(1,15) = 0.10, p = .76. The interaction

between Trial Type and Grammaticality was significant, F(2,30) = 6.49, MSE = 0.01, p < .01, ɳp2

= .30 (Figure 3). To probe this interaction, unplanned simple effects analyses were conducted for

each trial type at a Bonferroni-corrected α = .02. For match trials, responses to ungrammatical

melodies were more accurate than to grammatical ones, t(15) = 2.60, p = .02. There was no

difference in accuracy between grammatical and ungrammatical melodies for same contour or

random trials, all t < 2.23, all p > .04.

Grammatical Ungrammatical

60

70

80

90

100

Ac

cu

rac

y (

%)

Grammatical Ungrammatical Grammatical Ungrammatical

Grammaticality

Match Trials Same Contour Trials Random Trials

*

Figure 3. Trial Type x Grammaticality interaction for accuracy in Experiment 1.

d’ data were submitted to repeated measures ANOVA with Trial Type (Match vs. Same Contour,

Match vs. Random) and Grammaticality (Grammatical, Ungrammatical) as factors. There was a

main effect of Trial Type, F(1,15) = 16.71, MSE = 2.46, p = .001, ɳp2 = .53, with random

comparisons more distinguishable from matches than same contour comparisons (Figure 4). The

main effect of Grammaticality was not significant, F(1,15) = 0.01, p = .93, nor was the Trial

Type x Grammaticality interaction, F(1,15) = 0.05, p = .82.

20

Match vs. Same Contour Match vs. Random

Trial Type

0

0.5

1

1.5

2

2.5d

-pri

me

***

Figure 4. Main effect of Trial Type on d’ in Experiment 1.

7 Discussion

Participants’ successful discrimination performance replicated Saffran et al.’s (1999) results, and

demonstrated that participants were able to distinguish grammatical from ungrammatical items

following the familiarization period. Results from the expectancy group were not as clear cut.

The effect of trial type on accuracy and d’ were consistent with Dowling’s research (e.g., 1978).

However, evidence that grammatical melodies (composed from familiarization tone-words) led

to better memory performance was quite weak, with no effect of grammaticality on either

accuracy or d’. In fact, the interaction between Trial Type and Grammaticality for accuracy was

actually driven by worse accuracy for grammatical than ungrammatical melodies on match trials.

Thus, this experiment has provided little evidence of musical expectancies arising from statistical

learning.

Why might this experiment have failed to successfully induce musical expectancies? As

discussed, the chromatic melodies used may have been too simple and/or familiar. However, the

21

fact that memory performance was not at ceiling advocates against this idea. Another possibility

is that memory based upon melodic chunking may not have been the right way to study the

statistical learning of expectancies in music, for two reasons. First, the use of memory

performance as a proxy for measuring expectancy, based on previous work suggesting that

highly expected melodies are also better remembered (Schmuckler, 1997), meant that

expectancies were being measured indirectly. A stronger test of expectancy learning would use a

task that directly taps expectancy generation, such as a priming paradigm.

Second, the conceptualization of melody memory in terms of the encoding of short melodic

fragments does not line up with current theoretical understanding of how melodies are encoded

in memory (i.e., Dowling, 1991). Rather, a more intuitive and musically-relevant context in

which to examine these processes may be in the learning of harmonic progressions. The adjacent

dependencies between chords are very important to expectancy, as they govern both harmonic

and melodic movement within a piece. Chords in particular are also more texturally complex

(with more than one note sounded at once), and thus provide a richer source of auditory

information upon which to scaffold expectancy, due to the interaction of simultaneously sounded

notes. Therefore, in Experiment 2, novel chord progressions were employed in a priming task in

an attempt to record the development of musical expectancies from statistical learning.

22

Chapter 3 Experiment 2: Tonal Priming With Western Chords

8 Introduction

In Experiment 2, attention was turned to the learning of expectancies in novel harmonic

progressions. Harmony is the highly constrained system by which tones are combined to form

chords and chords are combined in sequential order. Listener expectations play an important role

in theoretical accounts of harmony, and many music theorists have observed that the harmonic

function of any chord is to indicate which chord comes next (Meyer, 1956; Schenker, 1954). The

order in which chords are sounded is thus critical to harmonic expectancy. Statistical learning is

an ideal mechanism by which these sequential relations could be learned, with listeners

developing sensitivities to the conditional probabilities of one chord occurring given the previous

occurrence of a another chord as a result of exposure to a large grammatical musical corpus. The

MUSACT model (Tillmann, et al., 2000) is an example of how such a learning process might act

to produce expectancies in music. As discussed in Chapter 1, this model employs a network of

tones, chords, and keys in which connections between nodes are strengthened by the co-

occurrence (both simultaneous and serial) of node elements during exposure.

Jonaitis and Saffran (2009) provided some evidence that listeners’ harmonic expectancies may

arise through statistical learning processes. These authors developed a novel harmonic grammar

based on the Phrygian mode. This grammar utilized the chromatic tone set and chords of

Western music, but combined them in an unfamiliar way. Participants were first trained on 100

grammatical exemplars and then asked to make judgments on new items in terms of how similar

to familiarization items they sounded. These new items were manipulated according to whether

they were grammatical or ungrammatical, and whether they were completely correct or

contained minor grammatical errors. After one familiarization session, participants were able to

distinguish between grammatical and ungrammatical sequences. After two familiarization

sessions, participants were also able to distinguish between the correct and error-containing

grammatical items. Jonaitis and Saffran (2009) thus concluded that participants were able to

learn novel harmonic expectancies through statistical learning.

23

Although discrimination between grammatical and ungrammatical items indicates

comprehension of the grammar structure that produced the familiarization items, it is weak

evidence for the development of musical expectancy. As discussed previously, expectancy is

better understood as the way that this structure affects subsequent processing. Thus, to test

whether statistical learning leads to the development of musical expectancy, this experiment used

the methods from a classic musical priming study by Bharucha and Stoeckig (1986). These

authors presented pairs of chords in sequential order, and asked listeners to make a perceptual

judgment on the second chord. They found that judgments were faster and more accurate when

the two chords were related (i.e., the second chord was expected given the first) than when they

were unrelated (i.e., the second chord was unexpected given the first).

The musical priming effects observed with chord pairs by Bharucha and Stoeckig (Bharucha &

Stoeckig, 1986) have been extended to longer musical contexts and more subtle manipulations of

tonal expectancy (Bigand & Pineau, 1997), as well as to melodic stimuli (Marmel & Tillmann,

2009; Marmel, et al., 2010). Research has confirmed that these priming effects arise

predominantly from expectancies based on tonal relatedness, and not from simple

psychoacoustic similarity (Bharucha & Stoeckig, 1987; Bigand, et al., 2003; Tekman &

Bharucha, 1998). This priming paradigm therefore provides an ideal methodology with which

musical expectations can be quantified.

The familiarization and discrimination phases of Experiment 2 were full replications of Jonaitis

& Saffran’s (2009) Experiment 1. In the expectancy phase, participants were presented with

pairs of chords from the familiarization grammar and asked to make a perceptual judgment

(timbre A vs. B) on the second chord. If participants developed expectancies from

familiarization, they should have responded more quickly and accurately when the chord pair

sequences were grammatical than when they were ungrammatical.

Finally, consideration should be given to the familiarity and complexity of these experimental

materials, and how these characteristics might influence the ease of expectancy learning in this

experiment. These stimuli were comprised of Western chromatic piano tones organized into a

harmonic texture, meaning that they were built from familiar-sounding parts and complex in

construction. As in Experiment 1, it is possible to make two divergent predictions here. First, it

may be easy to learn structural expectancies with these materials because the processing

24

demands for familiar chords (although they are combined in a novel way) are low, which allows

cognitive resources to be spent on learning new connections among chords rather than on

perceiving the chords themselves. Conversely, the highly familiar chords may lead to inferior

expectancy learning because they provide a high level of context that may evoke participants’

knowledge of Western harmonic structure. Thus, despite the novel grammatical structure, these

chords may activate extremely strong musical expectancies that have been learned over a

lifetime of exposure to Western music, making it very difficult for participants to develop new

expectancies during a short laboratory session.

In fact, it is almost certain that activation of Western harmonic expectancies could not be

avoided. Because of this concern, the expectancy task conditions were designed to test all

possible combinations of grammaticality in Western harmony vs. in the familiarization grammar.

Thus, the chord pair in each trial was manipulated with regard to its legality in Western harmony

and its legality in the familiarization grammar, with different levels of priming expected for each

of these conditions. If participants did not learn harmonic expectancies during familiarization,

one would expect to observe priming effects only in trials that are legal in Western harmony.

Conversely, if participants did learn harmonic expectancies, and assuming that priming effects

are additive, the resultant priming levels are as follows, and are illustrated in Figure 2:

(1) Trials that are legal in both the familiarization grammar and Western harmony should

show the strongest priming (highest accuracy and lowest reaction times).

(2) Trials that are legal only in Western harmony, should show strong priming.

(3) Trials that are legal only in the familiarization grammar should show moderate priming.

(4) Trials that are illegal in both the familiarization grammar and Western harmony should

not show priming effects (Figure 5).

25

Figure 5. Priming conditions and hypothesized priming strengths if participants develop

harmonic expectancies in Experiment 2.

9 Methods

9.1 Participants

Fourteen participants were recruited from the University of Toronto Scarborough community

using the introductory psychology participant pool as well as posted advertisements. Participants

were compensated with course credit or $10 per hour.

Participants consisted of six males and eight females with a mean age of 20.0 years (SD = 3.1

years). Participants were selected to have at least three years of formal musical training, as non-

musician participants had trouble with the expectancy priming task during piloting. Participants

had on average 6.3 years of formal musical training (SD = 3.4 years), 0.91 years of musical

theory training (SD = 1.6 years), played music for 1.5 hours per week (SD = 1.7 hours), and

listened to music for 11.8 hours per week (SD = 7.0 hours). No participants reported having

taken part in a music psychology experiment previously, nor did any participant report having

absolute pitch.

9.2 Apparatus

This experiment used an identical apparatus to Experiment 1.

26

9.3 Materials

The familiarization and discrimination phases used the original materials from Jonaitis and

Saffran (2009). Stimulus items were chord progressions constructed from a finite state grammar

whose nodes were chords in the Phrygian mode (Figure 6). Progressions were required to begin

and end on the tonic chord (I), although multiple loops through the grammar were permitted.

Two grammars were used (Grammar A and Grammar B). Grammar B was a retrograde of

Grammar A, such that every Grammar B item was a Grammar A item where the chords were

heard in reverse order. The stimuli were designed by these authors to follow music-theoretic

conventions. First, the chords in the progressions form a harmonic hierarchy, such that chords

that are more structurally important are heard more often. Second, the items were voiced to make

the movements from chord to chord clear in the low register, as well as to follow basic rules of

melodic composition for the melody in the upper register. All items were played with a piano

timbre at 120 beats per minute (500 ms per chord).

Figure 6. Finite state grammars used in Experiment 2, based on Figure 1 from Jonaitis & Saffran

(2009). (a) Grammar A; (b) Grammar B.

For the familiarization phase, 100 Grammar A familiarization items (50 unique items, each

presented in two different voicings), four to ten chords in length and distributed uniformly across

keys, were employed. The Grammar B materials were excluded for simplicity of design, and can

be justified because these authors found no differences in performance between Grammars A and

B in their original study.

27

For the discrimination phase, 60 test items that were not part of the familiarization corpus were

used. These stimuli were five to ten chords in length and distributed uniformly across keys.

Furthermore, these items were manipulated along two dimensions: Grammaticality

(Grammatical, Ungrammatical) and Correctness (Correct, Error). Grammatical items adhered to

Grammar A’s structure, whereas ungrammatical ones followed Grammar B’s structure. Correct

items were completely grammatical exemplars of Grammar A or B, whereas error items were

based on one of the grammars but contained one to three illegal transitions. Thus, for each

grammar, there were 15 correct items and 15 error items (5 items containing one, two, and three

illegal transitions, respectively). Figure 7 shows some examples of these items.

28

Figure 7. Examples of items used in Experiment 2, based on Figures 2-3 from Jonaitis & Saffran

(2009). (a) Example of Grammar A exposure item. (b) Example of Grammar A correct

discrimination item. (c) Example of Grammar B correct discrimination item. (d) Example of

Grammar A error discrimination item. (e) Example of Grammar B error discrimination item.

29

Expectancy phase and tuning practice:

For the expectancy phase, subsets of chords from the experimental grammars were produced in a

piano timbre using Sonar 8. Each chord was produced in four versions, with the chord being

either 2 seconds or 3 seconds long and in-tune or out-of tune. Out-of-tune chords were created by

lowering the frequency of the fifth degree (the highest of the three chord tones) by a factor of

21/24

(a quarter-tone). Although Bharucha and Stoeckig (1986) only mistuned the fifth degree by

an eighth-tone, the quarter-tone mistuning was chosen because participants were unable to detect

the eighth-tone mistuning during pilot testing.

Two chord pairs were selected for each of the four priming conditions (no priming, some

priming, strong priming, strongest priming; see Figure 5). The first chord in each pair was called

the “prime”, and the second chord was called the “target”. These chord pairs were manipulated

with respect to Tuning and Chord Order. Tuning refers to whether the target was in-tune or out-

of-tune. Chord Order refers to whether the two chords were presented in the forward (Grammar

A) or retrograde order. Interestingly, chord order effects have not previously been investigated in

the harmonic priming literature. Thus, although the retrograde order is technically

ungrammatical according to the familiarization grammar, these chords would have been heard

adjacently during familiarization and may have been learned as related regardless of presentation

order, leading to priming effects.

Since the prediction of priming with regard to chord order was not clear, both orders for each

chord pair were categorized under the same priming condition (i.e., the retrograde order was still

considered grammatical for the purposes of this design, although the statistical test of this

assumption is presented in the results section). Finally, because familiarization items were

distributed evenly across musical keys, each combination of chord pair, tuning, and chord order

was presented in 3 randomly chosen keys. These manipulations resulted in 96 priming trials (4

priming conditions x 2 chord pairs x 2 tunings x 2 orders x 3 keys); the details of these trials can

be seen in Table 2.

30

Table 2

Composition of Expectancy Trials for Experiment 2

Priming Condition Chord Pair Chord Order Prime Target

No Priming 1 n/a VI II#

(illegal Western,

illegal familiarization)

n/a II# VI

2 n/a vi ii#

n/a ii# vi

Moderate Priming 1 Grammatical I II

(illegal Western,

legal familiarization)

Ungrammatical II I

2 Grammatical I vii

Ungrammatical vii I

Strong Priming 1 n/a VI III

(legal Western,

illegal familiarization)

n/a III VI

2 n/a vi iii

n/a iii vi

Strongest Priming 1 Grammatical iv i

(legal Western,

legal familiarization)

Ungrammatical i iv

2 Grammatical i III

Ungrammatical III i

Note. The prime and target chords are labeled based on the Roman numeral system presented in

Figure 6. Uppercase letters indicate a major chord, and lowercase letters indicate a minor chord

(as defined by Western harmony). These items were actually presented in three randomly chosen

keys during the experiment.

9.4 Procedure

Each participant completed all phases of the experiment.

Tuning training:

31

Before starting the experimental trials, participants were trained to discriminate in-tune from out-

of-tune chords using methodology from Bharucha & Stoeckig (1986). First, participants listened

to four examples each of in-tune and out-of-tune chords. Next, they were presented with 48

chords, half of which were in-tune and half of which were out-of-tune, and required to judge the

intonation of each chord. These test trials were preceded by 10 practice trials, and participants

were required to score 43/48 (90%) correct before moving on to the familiarization phase. All

chords presented during tuning training were 2 seconds in length.

Familiarization phase:

During the familiarization phase, participants heard 100 grammatical progressions in the

familiarization grammar. They were told that they would be presented with items from a novel

musical system and then asked to listen carefully to each item and rate how much they liked each

item on a scale from 1 (did not like it at all) to 7 (liked it a lot). This liking task was used in order

to maintain the participant’s attention throughout the familiarization phase, without calling

attention to the harmonic structural aspects of the items that would be tested later.

Expectancy phase:

The expectancy phase was conducted before the discrimination phase because of concerns that

hearing discrimination items (some of which contained grammatical errors) would alter

expectancy performance. For this priming task, participants were instructed that each trial started

with a random series of 16 chromatic tones (125 ms each), acting as an auditory mask between

trials. This mask was followed by a pair of chords. The first chord (the prime, 3 seconds long)

was always in-tune, whereas the second chord (the target, 2 seconds long) could be in- or out-of-

tune. Participants were asked to indicate as quickly and accurately as possible whether the

second chord was in- or out-of-tune. The 96 expectancy trials were preceded by 12 practice

trials.

Discrimination phase:

In the discrimination phase, participants were told that some of these items they were about to

hear would be from the familiarization system, and some of them would be from a different

system. On each trial, participants heard a test item, and were instructed to judge how similar it

32

was to the familiarization items on a scale from 1 (very dissimilar) to 7 (very similar). The 60

discrimination trials were preceded by six practice trials.

Following all the experimental trials, participants completed a survey regarding their musical

experience. The entire experimental session lasted approximately 90 minutes.

10 Results

10.1 Discrimination Phase

Similarity ratings from each participant were collapsed across the 15 exemplars in each of the

Grammar/Correctness conditions. These mean ratings were then submitted to repeated measures

ANOVA with Grammaticality (Grammatical, Ungrammaticality) and Correctness (Correct,

Error) as factors. There was a main effect of Grammaticality, F(1,13) = 15.69, MSE = 0.37, p =

.002, ɳp2 = .55, with listeners judging Grammatical familiarization items as more similar to

exposure items than Ungrammatical sequences (see Figure 8). The main effect of Correctness

was not significant, F(1,13) = 0.02, p = .88, nor was the interaction between Grammaticality and

Correctness, F(1,13) = 1.08, p = .32.

Grammatical Ungrammatical

0

1

2

3

4

5

Sim

ila

rity

Ra

tin

gs

(1

-7)

Grammaticality

**

Figure 8. Average similarity ratings for grammatical and ungrammatical items in the

discrimination phase of Experiment 2.

33

10.2 Expectancy Phase

Responses from the priming task were analyzed with respect to accuracy and reaction time.

Accuracy data:

Raw accuracy data were collapsed across chord pairs, orders and keys. These mean accuracy

scores were then submitted to repeated measures ANOVA with Western Legality (Legal,

Illegal), Familiarization Legality (Legal, Illegal), and Tuning (In-tune, Out-of-tune) as factors

(see Figure 5). None of the main effects were significant, all F values < 4, all p values > .07.

Turning to the interactions, the Western Legality x Tuning interaction was significant, F(1,12) =

13.12, MSE < 0.01, p < .01, ɳp2 = .50, as was the three-way interaction between Western legality,

familiarization legality, and tuning, F(1,12) = 6.42, MSE = 0.01, p = .03, ɳp2 = .33. All of the

remaining interactions were non-significant, all F values < 1.52, all p values > .23.

The two-way interaction between Western legality and tuning was investigated with simple

effects, whereby the effect of Western legality was assessed for in-tune and out-of-tune trials

separately. For in-tune trials, responses were significantly more accurate for trials that were legal

in Western harmony than trials that were illegal in Western harmony, t(13) = 3.31, p < .01.

However, the effect of Western legality was non-significant for out-of-tune trials, t(13) = 1.61, p

= .13 (Figure 9). This result is in line with the finding that priming effects are strongest when the

acoustic surface of the stimulus is not disrupted, as it is with out-of-tune chords (Marmel &

Tillmann, 2009).

34

Western legal Western illegal

72

76

80

84

88

92A

cc

ura

cy

(%

)

Condition

Western Priming Effect Novel Priming Effect

Training legal/Western illegal

Training illegal/Western illegal

**

Figure 9. Western and novel priming effects in terms of accuracy (in-tune trials only) for

Experiment 2.

A set of a priori analyses were conducted to assess priming effects due to the familiarization

grammar. Since it was possible that Western harmony perception may have overshadowed the

effects of the familiarization grammar for trials that were legal in both harmonic systems, this

analysis focused on the comparison of trials that were legal only in the familiarization grammar

with trials that were illegal in both grammars. Accuracy was not significantly higher for trials

that were legal only in the familiarization grammar than for trials that were illegal in both

grammars, t(13) = .11, p = .92. Due to the effect of acoustic continuity seen for Western priming,

this comparison was assessed for in-tune trials only. For in-tune trials, there was still no

significant accuracy benefit for trials that were legal only in the familiarization grammar over

trials that were illegal in both grammars, t(13) = 1.47, p = .17, although there was a trend

towards greater accuracy for the familiarization legal trials (Figure 9).

Finally, the effect of chord order (in the familiarization grammar) on accuracy was explored.

Raw accuracy data for trials that were legal in the familiarization grammar were collapsed across

chord pairs and keys. These mean accuracy scores were analyzed separately for the subset of the

familiarization legal trials that were not legal in the Western grammar and the subset that were

35

legal in the Western grammar, because of the possibility that Western grammar effects would

overshadow familiarization grammar chord order effects in the latter case.

For the familiarization legal/Western illegal condition, mean accuracy scores were submitted to

repeated measures ANOVA with Tuning (In-tune, Out-of-tune) and Chord Order (Forward,

Retrograde) as factors. There was no significant effect of Tuning, F(1,13) = 1.11, p = .31, but the

effect of Chord Order was marginally significant, F(1,13) = 3.41, MSE = 0.01, p = .09, ɳp2 = .21,

with accuracy for trials played in forward order being higher than for trials played in retrograde

order. The interaction between tuning and chord order was significant, F(1,13) = 4.42, MSE =

.01, p = .055, ɳp2 = .25. This interaction was driven by a significant effect of chord order for in-

tune trials only, t(13) = 2.51, p = .03, with accuracy for forward trials being higher than for

retrograde trials (Figure 10). An identical analysis was performed for the familiarization

legal/Western legal condition. Neither of the main effects was significant, nor was the

interaction, all F values < 1, p > .34.

Forward Retrograde

Chord Order

75

80

85

90

95

100

Ac

cu

rac

y (

%)

in-tune

out-of-tune

*

Figure 10. Tuning x Chord Order interaction (accuracy) for the familiarization legal/Western

illegal condition for Experiment 2.

36

Reaction time data:

Reaction times were only analyzed for trials where participants answered correctly, and all

reaction times greater than 2000 ms were discarded. Due to poor performance (accuracy less

than 75%), the reaction time data for four participants were excluded from the analyses. For the

remaining 10 participants, reaction time data were then collapsed across chord pairs, orders, and

keys. These mean reaction times were submitted to repeated measures ANOVA with Western

Legality (Legal, Illegal), Familiarization Legality (Legal, Illegal), and Tuning (In-tune, Out-of-

tune) as factors.

Neither the main effects of Western legality nor familiarization legality were significant, both F

values < 1.82, both p values > .20. The main effect of Tuning was significant, F(1,9) = 30.12,

MSE = 8198.73, p < .001, ɳp2 = .77, with reaction times faster for out-of-tune than in-tune trials

(Figure 11). None of the two-way interactions were significant, all F values < 3.79, all p values >

.08. The three-way interaction between Western legality, familiarization legality, and tuning was

significant, F(1,9) = 10.24, MSE = 2910.34, p = .01, ɳp2 = .53.

In-tune Out-of-tune

0

200

400

600

800

1000

Re

ac

tio

n T

ime

(m

s)

Tuning

***

Figure 11. Effect of tuning on reaction time for Experiment 2.

A set of a priori analyses were conducted to assess priming effects due to Western grammar and

the familiarization grammar. First, to assess Western harmonic priming, and following the

analysis of accuracy, reaction times for the Western legal trials were compared to reaction times

37

for the Western illegal trials for in-tune trials only. For these trials, responses were no faster for

chord pairs that were legal than pairs that were illegal in Western harmony, t(9) = 1.60, p = .15

(Figure 12).

Western legal Western illegal

700

750

800

850

900

950

1000

Re

ac

tio

n T

ime

(m

s)

Condition

Western Priming Effect Novel Priming Effect

**

Training legal/Western illegal

Training illegal/Western illegal

Figure 12. Western and novel priming effects in terms of reaction time (in-tune trials only) for

Experiment 2.

Next, to assess familiarization grammar priming, reaction times for trials that were legal only in

the familiarization grammar were compared with reaction times for trials that were illegal in both

grammars. Like for the accuracy analysis, this comparison was designed to detect priming effects

due to the familiarization grammar that may have been masked by Western harmony perception

in trials that were legal in both Western harmony and the familiarization grammar. Reaction

times for trials that were legal in the familiarization grammar only were significantly faster than

for trials that illegal in both grammars, t(9) = 2.50, p = .03. Like in the previous analyses, this

contrast was then performed for in-tune trials only. For in-tune trials, reaction times that were

legal in the familiarization grammar only were still significantly faster than for trials that were

illegal in both grammars, t(9) = 3.19, p = .01 (Figure 12). This was not the case for out-of-tune

38

trials, where there was no discernible priming due to the familiarization grammar, t(9) = 0.76, p

= .47.

Finally, the effect of chord order (in the familiarization grammar) on reaction time was explored.

Reaction time data for trials that were legal in the familiarization grammar were collapsed across

chord pairs and keys. These mean reaction times were analyzed separately for the subset of the

familiarization legal trials that were not legal in the Western grammar and the subset that were

legal in the Western grammar, because of the possibility that Western grammar effects would

overshadow familiarization grammar chord order effects in the latter case.

For the familiarization legal/Western illegal condition, mean reaction times were submitted to

repeated measures ANOVA with Tuning (In-tune, Out-of-tune) and Chord Order (Forward,

Retrograde) as factors. There was a significant effect of Tuning, F(1,9) = 8.56, MSE = 7711.35, p

= .31, ɳp2 = .49, with reaction times for out-of-tune trials being faster than for in-tune trials (see

Figure 11). The main effect of Chord Order was not significant, F(1,9) = .003, p = .09, ɳp2 = .21,

nor was the Tuning x Chord Order interaction, F(1,9) = 2.23, p = .17. An identical analysis was

performed for the familiarization legal/Western legal condition. Neither of the main effects was

significant, nor was the interaction, all F values < 2.56, p > .13.

Individual Differences

The last set of analyses explored whether performance in the discrimination phase was

associated with performance in the expectancy phase. Similarity scores were obtained for each

participant by finding the difference between the mean ratings for grammatical items and

ungrammatical items. A higher similarity score indicated that the participant distinguished better

between grammatical and ungrammatical items. The accuracy priming effect was calculated by

subtracting the mean accuracy for the No Priming condition from the mean accuracy for the

Some Priming condition. A larger positive difference indicated a larger accuracy advantage for

the Some Priming condition, which contained chord pairs that were legal only in the novel

grammar. Similarly, the reaction time priming effect was calculated by subtracting the mean

accuracy for the No Priming condition from the mean accuracy for the Some Priming condition.

Here, a larger negative difference indicated a larger reaction time advantage for the novel

grammar. Bivariate correlations were calculated between similarity scores and each of the

39

priming effects. Neither of the two correlations were significant, all absolute r values < .27, p >

.45.

11 Discussion

In the discrimination phase, participants were able to discriminate successfully between

grammatical and ungrammatical items, but they did not make the subtler distinction between

correct Grammar A items and ones that included some ungrammatical transitions; this finding

replicates Jonaitis & Saffran’s (2009, Experiment 1) results.

Turning to the expectancy phase, there was an overall effect of tuning, with higher accuracy and

faster reaction times to out-of-tune trials than in-tune trials. This result has been observed in

previous research, and can be explained by the fact that stimuli that disrupt an acoustical surface

(out-of-tune chords) are more salient than those that do not (in-tune chords) (Marmel &

Tillmann, 2009). More interestingly, priming effects driven by Western harmony were observed,

as expected. However, these priming effects were restricted to differences in accuracy, and did

not manifest in reaction time differences. This was unexpected based on the wealth of previous

research that has demonstrated consistent Western harmonic priming effects on both accuracy

and reaction time. It is possible that Western priming was somewhat disrupted by participants

learning the rules of the familiarization grammar.

Most critically, a priming effect for the familiarization grammar was observed for reaction time

(but not for accuracy). Furthermore, there was some weak evidence that chord order matters in

this priming task, with chord pairs that were presented in the order encountered during

familiarization eliciting more accurate responses than those presented in retrograde order. Thus,

although these effects did not manifest consistently in both accuracy and reaction time, they do

constitute evidence that the statistical learning of adjacent dependencies in novel harmonic

materials has downstream processing effects.

Lastly, discrimination performance was unrelated to priming performance. This may indicate

that the discrimination task and the priming task targeted different cognitive processes, although

given the fact that they were designed specifically to assess the same mental representation –

learned adjacent dependencies between chords in the familiarization grammar – this explanation

seems unlikely. Rather, the priming task may have been a more sensitive measure of

40

participants’ structural knowledge of the novel chord grammar, for two reasons. First, the

priming task presented chord pairs in isolation rather than embedded in a longer context (like in

discrimination). Working memory demands for the priming task were thus lower than for the

discrimination task, relieving participants from processing an entire harmonic passage and thus

allowing participants to focus more attention on the priming task itself. Secondly, the priming

task was an implicit evaluation of learned harmonic structure, whereas the similarity task

required explicit comparison of the passage presented in each trial with the corpus they had

encountered during familiarization. As discussed by Tillmann (2005), implicit tasks are often

superior to explicit ones in the study of musical expectancy because these representations of

expectancy are themselves acquired implicitly, as in statistical learning. A final potential

methodological explanation for the fact that discrimination and priming performance was

unrelated is that the discrimination task was presented after the priming task. This was done to

prevent experience in the discrimination block from affecting performance in the more

theoretically important priming block. However, this meant that there was a significant temporal

delay between familiarization and the performance of the discrimination task, which may have

retarded discrimination performance.

This experiment was able to demonstrate with a harmonic priming task that listeners are able to

form musical expectancies through statistical learning. Furthermore, these results suggest that for

a given tone set (in this case, the familiar Western chromatic set), more complex harmonic

materials are better suited to the development of musical expectancies than simpler melodic

materials (i.e., Experiment 1). Although it is not clear why this is, one can speculate that these

novel harmonic materials provided a stimulus environment that was both rich in information and

similar to other environments where expectancies have been learned before (i.e., Western

harmony).

These first two experiments have focused on the role of stimulus complexity on musical

expectancy learning, while controlling tone set familiarity. However, as alluded to previously,

the use of a Western chromatic tone set in constructing materials produces a confound in the

experiment – representations of familiar Western tonal structure can interfere with the new

musical relations being introduced by the familiarization grammar. Thus, to isolate the learning

of novel expectancies through statistical learning from previously learned expectancies, melodic

41

and harmonic stimuli were constructed from an unfamiliar tone set for the remaining three

experiments.

42

Chapter 4 Experiment 3: Memory For Bohlen-Pierce Melodies

12 Introduction

Experiment 3 was designed to extend the previous results to the learning of melodic expectancies

with an unfamiliar tone set. Although participants failed to learn melodic expectancies in

Experiment 1, this may have been due in part to the nature of the grammar’s construction. For

this study, rather than using a tone-word language to create stimulus items, we used a finite state

grammar in which grammatical items differed from ungrammatical ones based on a more

complex network of transitional probabilities, rather than one in which the transitional

probabilities are based on a small set of legal tone-words. This finite state grammar produced

melodies based on an underlying harmony, which mimics the way melodies are composed in real

music; this grammatical structure may therefore make it easier to learn expectancies than

Experiment 1’s tone-word chunk structure.

Inspiration for this study came from work by Loui and her colleagues (Loui & Wessel, 2008;

Loui, et al., 2010). The tone set used by these researchers was the Bohlen-Pierce scale, a

microtonal tuning system which listeners are very unlikely to have encountered previously, and

whose structure has been described by Krumhansl (1987). Loui et al. (2008; 2010) have shown

that following approximately 30 minutes of exposure to a set of grammatical Bohlen-Pierce

melodies, participants were able to discriminate between a grammatical melody and an

ungrammatical foil, regardless of whether they had heard the grammatical melody before. Thus,

the familiarization and discrimination phases were direct replications of this previous work.

The expectancy phase used a recognition memory task similar to the one from Experiment 1. As

in Experiment 1, if participants were able to learn melodic expectancies during the

familiarization phase, their memory performance should be better for grammatical melodies than

ungrammatical ones. Tonality was operationalized in the present experiment by comparing

grammatical melodies with randomly composed ungrammatical melodies. The novel grammar

from which grammatical melodies were composed governed the way tones were combined,

analogous to the way tonality governs the way tones are combined in real music. As a result,

grammatical melodies should have been perceived as better-formed than ungrammatical

43

melodies in the same way that tonal melodies are perceived as better-formed than atonal

melodies (Krumhansl, 2000). Thus, it was predicted that the effect of grammar in the present

study would be quite similar to the effect of tonality in past studies. For this experiment, the

expectancy phase memory task was designed by combining the methods of DeWitt & Crowder

(1986), who used delay and comparison type as factors, and Dowling (1991), who studied

tonality in addition to those previous two factors.

Finally, consideration should be given to the familiarity and complexity of these experimental

materials and how these characteristics might have influenced the ease of expectancy learning in

this experiment. These stimuli were melodies composed from Bohlen-Pierce tones, meaning that

they were built from unfamiliar-sounding parts but simple in construction. It is possible that

using an unfamiliar tone set will make learning expectancies very difficult because it puts a

heavy burden on perceptual processing. However, this difficulty may be slightly alleviated by the

fact that the structures to be learned are melodic and therefore quite simple. Furthermore,

expectancy learning may be potentiated by the fact that these Bohlen-Pierce structures do not

have to compete with longstanding Western melodic expectancies, as they did in Experiments 1

and 2. Because Western tonality cannot interfere in the perception of melodies that are composed

from tones outside of the Western chromatic set, it may be easier to learn the novel relations

between the Bohlen-Pierce tones established by this novel familiarization grammar.

13 Methods

13.1 Participants

Thirty-two participants were recruited from the University of Toronto Scarborough community

using the introductory psychology participant pool as well as posted advertisements. Participants

were compensated with course credit or $10 per hour.

They consisted of 10 males and 22 females with a mean age of 19.6 years (SD = 3.2 years).

These participants were selected to have at least five years of formal musical training, due to

concerns about task difficulty for non-musicians. Participants had on average 8.4 years of formal

musical training (SD = 4.1 years), 2.6 years of musical theory training (SD = 3.3 years), played

music for 3.9 hours per week (SD = 4.1 hours), and listened to music for 14.9 hours per week

44

(SD = 15.9 hours). One participant in the discrimination condition reported having absolute

pitch. No participants reported having taken part in a music psychology experiment previously.

13.2 Apparatus

This experiment used an identical apparatus to Experiment 1.

13.3 Materials and Procedure

Participants were randomly assigned to the discrimination group or the expectancy group. Both

groups took part in the familiarization phase. Only the discrimination group participated in the

discrimination phase, and only the expectancy group participated in the expectancy phase.

Bohlen-Pierce melodies:

The familiarization and discrimination phases were based on the work by Loui and colleagues

(2008; 2010), employing original materials because the stimuli from Loui and colleagues were

not available. Familiarization and discrimination stimuli were composed from pure sinusoidal

tones synthesized in Matlab 7.0, 500 ms in length with 5 ms rise and fall times. These tones had

frequencies defined by the Bohlen-Pierce system.

The Bohlen-Pierce system uses a microtonal scale based on 13 logarithmically equal divisions of

a tritave (a 3:1 frequency ratio). The tones in the tritave used for these stimuli are defined as:

Frequency (Hz) = k*3n/13

where n is the number of steps along the scale, and k is a constant equal to 220 Hz (Table 3). A

subset of these scale tones were combined into chords (Krumhansl, 1987), and these chords were

combined to form a four-chord progression (Figure 13). This progression was used as a finite-

state grammar to construct eight-note melodies; melodies had to start on a tone from the first

chord and end on a tone from the fourth chord, and successive notes could either repeat the same

tone or choose another tone from the same chord or the next chord in the progression (see Figure

13 for an example).

45

Table 3

Composition of Materials for Experiment 3

Tone in Tritave (n) Frequency (Hz)

0

220.00

1 239.40

2 260.51

3 283.48

4 308.48

5 335.68

6 365.29

7 397.50

8 432.55

9 470.69

10 512.20

11 557.37

12 606.52

Figure 13. (a) Bohlen-Pierce grammar from Experiment 3, based on Figure 2 from Loui et al.

(2008). (b) An example of a melody constructed from this grammar.

46

Familiarization phase:

For the familiarization phase, 28 instances of 18 Bohlen-Pierce melodies were presented in

random order. Participants were instructed to listen carefully to a series of melodies. Participants

were provided with paper and crayons, with which they could draw to help them keep alert over

the 30 minutes of familiarization.

Discrimination phase:

A subset of 10 melodies from familiarization was used for discrimination trials. In addition, 10

novel grammatical and 20 ungrammatical melodies were composed (10 for each block, see

below). Ungrammatical melodies were created in the same way as grammatical melodies, except

by using a retrograde grammar where the four chords occurred in reverse order.

This phase was presented in two blocks of 10 trials. For both blocks, participants heard one

grammatical melody and one ungrammatical melody (with the order counterbalanced across

trials) separated by 1 second, and instructed to indicate which of the two melodies sounded more

familiar. In the first “recognition” block, the grammatical melody had been presented in

familiarization; thus, participants were simply required to recognize a previously heard melody

in order to correctly choose the grammatical item. In the second “generalization” block, the

grammatical melody had not been presented in familiarization; thus, participants were required

to generalize the grammar’s structure to a novel exemplar to do the task.

Following the discrimination trials, participants completed a survey regarding their musical

experience. The entire experimental session for the discrimination group lasted approximately

one hour.

Expectancy phase:

Memory trials in the expectancy phase were similar to the memory trials from Experiment 1. The

120 trials were manipulated in terms of Grammar (Grammatical vs. Ungrammatical), Delay

(Short vs. Long) and Comparison Type (Match, Same Contour, Random). Of the 60 grammatical

standards, 18 were heard during familiarization, and none of the 60 ungrammatical melodies had

47

been used during familiarization or discrimination. Ungrammatical melodies were composed by

choosing 8 notes randomly from the 13 tones of the Bohlen-Pierce tone set.

The standard and comparison were separated by one second for short delay trials, and by 30

seconds for long delay trials. During the long delay, participants were presented with a random

three-digit number from which they were required to count backwards, out loud, by threes. The

three comparison types were constructed in the same manner as in Experiment 1, with two

exceptions. First, the Bohlen-Pierce tone set was used rather than the Western chromatic tone set.

Second, similar to methods used by Dowling (1991), only two notes were altered between match

and same contour trials (as opposed to all of them). These notes were chosen randomly, with the

restriction that they were neither the first nor last note of the melody. This was a result of the

restricted sample space offered by the novel Bohlen-Pierce grammar. Table 4 depicts some

examples of memory trials.

Before encountering the expectancy trials, participants were informed about all the possible trial

types, and presented with six examples. Each memory trial started with the standard melody. On

short delay trials, there was a one-second delay followed by the presentation of the word “TEST”

on the screen as the comparison melody played. On long delay trials, the word “COUNT” and a

random three-digit number appeared on the screen for 30 seconds, followed by the presentation

of the word “TEST” as the comparison melody played. After the comparison played participants

were asked to indicate how similar the second melody was to the first on a confidence scale from

1 (same) to 5 (different). The order of the 120 memory trials was randomized for presentation to

participants. Following these trials, participants completed a survey regarding their musical

experience. The entire experimental session for the expectancy group lasted approximately 90

minutes.

48

Table 4

Examples of Grammatical and Ungrammatical Standards and Their Corresponding

Comparisons for Experiment 3

Grammar Standard Comparison Type Comparison

Grammatical 0 4 7 7 7 0 6 10 Match 0 4 7 7 7 0 6 10

0 0 10 7 4 7 7 0 Same Contour 0 0 10 6 0 7 7 0

0 7 0 10 0 6 10 6 Random 0 7 10 3 7 3 0 6

Ungrammatical 4 6 11 11 11 9 11 12 Match 4 6 11 11 11 9 11 12

5 5 11 5 2 5 5 1 Same Contour 5 5 10 7 2 5 5 1

0 4 1 10 9 11 12 8 Random 0 10 11 7 10 2 1 4

Note. The numbers in the melody column refer to notes corresponding to “n” from Table 3.

14 Results

14.1 Discrimination Phase

The total number of correct responses (out of 10) was tallied for each participant for the

recognition and generalization blocks. The scores from each block were then submitted to a one-

sample t-test with µ = 5 (chance performance). Performance in the recognition block was no

different from chance (mean = 5.44 ± 0.43), t(15) = 1.02, p = .32, nor was performance in the

generalization block (mean = 5.13 ± 0.40), t(15) = 0.32, p = .76.

14.2 Expectancy Phase

Following DeWitt & Crowder (1986), similarity ratings from the melody memory task were

evaluated in two ways. The similarity ratings, recorded on a five-point confidence scale, were

49

converted to areas under a memory operating characteristic (MOC) curve (MacMillan &

Creelman, 2005) . The raw similarity ratings were analyzed in addition to these area scores.

Areas under the MOC curve:

Areas under the MOC curve (Table 5) represent participant sensitivity to differences between

stimuli, and quantify participants’ general memory performance in a particular condition. These

data were organized by the within-subjects factors of Grammaticality (Grammatical,

Ungrammatical), Delay (Short, Long), and Comparison (Match vs. Same Contour, Match vs.

Random), and submitted to repeated measures ANOVA. All three main effects were significant.

Participants performed better on ungrammatical than grammatical trials, F(1,15) = 5.96, MSE =

0.02, p = .03, ɳp2 = .28, and better with short delays than long delays, F(1,15) = 95.63, MSE =

0.02, p < .001, ɳp2 = .86. Lastly, they found same contour melodies less discriminable from

matches than random melodies, F(1,15) = 139.17, MSE = 0.01, p < .001, ɳp2

= .90 (Figure 14).

Table 5

Mean Areas Under Memory Operating Characteristic Curve for Experiment 3

Delay

Short Long

Grammaticality Comparison

Type

SC R SC R

Grammatical 0.76 0.99 0.52 0.61

Ungrammatical 0.76 0.98 0.61 0.75

Note. SC = Same Contour; R = Random.

50

Grammatical Ungrammatical

Grammaticality

0

0.2

0.4

0.6

0.8

1A

rea U

nd

er

MO

C

Short Delay Long Delay

Delay

Match vs. Same Contour Match vs. Random

Comparison

****

***

Figure 14. Main effects on memory operating characteristic for Experiment 3.

With respect to two-way interactions, the interaction between Grammaticality and Delay was

significant, F(1,15) = 5.88, MSE = 0.02, p = .028, ɳp2 = .28, as was the interaction between Delay

and Comparison, F(1,15) = 8.24, MSE = 0.01, p = .01, ɳp2 = .36. The interaction between

Grammaticality and Comparison was not significant, F(1,15) = 1.30, p =.27. The three-way

interaction between Grammaticality, Delay, and Comparison was not significant, F(1,15) = 1.20,

p = .29.

The significant interactions reported above were investigated by using simple effects. The

Grammaticality x Delay interaction was driven by the fact that participants performed better for

ungrammatical trials than grammatical ones after long delays, t(15) = 2.53, p = .02, but not short

ones, t(15) = 0.08, p = .94 (Figure 15).

51

Grammatical Ungrammatical

Grammaticality

0.5

0.6

0.7

0.8

0.9

Are

a U

nd

er

MO

C

Short Delay

Long Delay

*

Figure 15. Grammaticality x Delay interaction for memory operating characteristic in

Experiment 3.

The Delay x Comparison interaction was driven by the fact that random trials were much more

discriminable than same contour trials after short delays, t(15) = 9.26, p < .001, than long delays,

t(15) = 4.89, p < .001, although this difference was still significant for long delays (Figure 16).

52

Match vs. Same Contour Match vs. Random

Comparison

0.5

0.6

0.7

0.8

0.9

1

Are

a U

nd

er

MO

C

Short Delay

Long Delay

***

***

Figure 16. Delay x Comparison interaction for memory operating characteristic in Experiment 3.

Lastly, a priori comparisons were conducted to see if performance differed based on whether the

standard melody was familiar from being presented during the familiarization phase. For this

analysis, MOC areas were calculated by collapsing across delay and lure type (Same Contour,

Random). Memory performance for ungrammatical trials was equivalent to performance for old

grammatical trials that had been encountered in the familiarization phase, t(15) = 1.47, p = .16.

Performance in both these conditions was superior to performance for new grammatical trials

that had not been encountered in the familiarization phase, both t values > 3.16, both p values <

.01 (Figure 17).

53

Condition

0

0.2

0.4

0.6

0.8

1

Are

a U

nd

er

MO

C

**

Grammatical Old

Grammatical New

Ungrammatical

Figure 17. Effect of melody familiarity on area under memory operating characteristic in

Experiment 3.

Raw similarity ratings:

Similarity responses were collapsed across the 10 melodic exemplars for each combination of

tonality, delay, and comparison type (Table 6). These averaged ratings were then submitted to

repeated measures ANOVA with Grammaticality (Grammatical, Ungrammatical), Delay (Short,

Long), and Trial Type (Match, Same Contour, Random) as factors. All three main effects were

significant. Melodies in grammatical trials were judged more similar than in ungrammatical

trials, F(1,15) = 207.76, MSE = 0.49, p < .001, ɳp2 = .93. Melodies in short delay trials were

judged more similar than in long delay trials, F(1,15) = 133.56, MSE = 0.15, p < .001, ɳp2 = .90

(Figure 18). Finally, the three trial types differed in their similarity ratings, F(2,15) = 53.81, MSE

= 0.28, p < .001, ɳp2 = .78. Multiple Bonferroni comparisons determined that same contour trials

54

were judged as more similar than match trials, t(15) = 3.39, p < .01, and that match trials were

judged as more similar than random trials, t(15) = 5.79, p < .001 (Figure 18).

Table 6

Mean Similarity Ratings for Experiment 3

Delay

Short Long

Grammar Comparison

Type

M SC R M SC R

Grammatical 1.34 1.51 2.67 2.86 2.43 2.69

Ungrammatical 2.61 3.29 4.54 4.49 3.14 4.16

Note. M = Match; SC = Same Contour; R = Random.

Grammatical Ungrammatical

Grammaticality

0

1

2

3

4

Sim

ila

rity

(1

= s

am

e,

5 =

dif

fere

nt)

Short Delay Long Delay

Delay

Match Same Contour Random

Trial Type

*** ***

**

***

Figure 18. Main effects on similarity ratings for Experiment 3.

55

In terms of two-way interactions, the interaction between Grammaticality and Delay was

significant, F(1,15) = 11.30, MSE = 0.14, p < .01, ɳp2 = .43, as was the interaction between Delay

and Trial Type, F(2,30) = 70.63, MSE = 0.21, p < .001, ɳp2 = .83. The Grammaticality x Trial

Type interaction was marginally significant, F(2,30) = 2.95, p = .07. The three-way interaction

between Grammaticality, Delay, and Comparison Type was significant, F(2,30) = 5.76, MSE =

0.36, p = .01, ɳp2 = .28.

The significant two-way interactions reported above were investigated by using simple effects.

The Grammaticality x Delay interaction was driven by the fact that the difference in similarity

ratings between short and long delay trials was larger when the trial was also grammatical, t(15)

= 11.63, p < .001, than ungrammatical, t(15) = 5.37, p < .001, with melodies in the short delay

condition judged as more similar than in the long delay condition in both cases (Figure 19).

Short Long

Delay

1.5

2

2.5

3

3.5

4

4.5

Sim

ila

rity

(1

= s

am

e,

5 =

dif

fere

nt)

Grammatical

Ungrammatical

***

***

Figure 19. Grammaticality x Delay interaction for similarity ratings in Experiment 3.

The Grammaticality x Trial Type interaction was driven by the fact that the difference between

grammatical and ungrammatical trials was larger for the Match and Random conditions, t(15) =

11.08, p < .001, than for the same contour condition, t(15) = 7.83, p < .001, with melodies in the

grammatical condition always judged as more similar than those in the ungrammatical condition

(Figure 20).

56

Grammatical Ungrammatical

Grammaticality

1

2

3

4

5

Sim

ila

rity

(1

= s

am

e,

5 =

dif

fere

nt)

Match

Same Contour

Random

***

***

***

Figure 20. Grammaticality x Trial Type interaction for similarity ratings in Experiment 3.

The Delay x Trial Type interaction was driven by differences in the effect of delay for each trial

type. For the match condition, short delay trials received much more similar ratings than long

delay trials, t(15) = 17.90, p < .001. For the same contour condition, short delay trials received

more similar ratings than Long Delay trials, t(15) = 2.83, p = .01, but this difference is not as

great as for the match condition. For the random condition, the pattern was reversed, with short

delay trials receiving marginally less similar ratings than long delay trials (Figure 21).

57

Short Long

Delay

1

2

3

4

5

Sim

ila

rity

(1

= s

am

e,

5 =

dif

fere

nt)

Match

Same Contour

Random

*

***

#

.

Figure 21. Delay x Trial Type interaction for similarity ratings in Experiment 3. # signifies p <

.10.

Finally, the three-way interaction of Grammaticality, Delay, and Comparison Type was fairly

complex. Critically, however, it took the form of the interaction observed in previous work

(Dowling, 1991), wherein similarity ratings for the same contour and random trials converge at

long delays, but only for grammatical trials (Figure 22).

58

Short Long

0

1

2

3

4

5

Sim

ila

rity

(1

= s

am

e,

5 =

dif

fere

nt)

Short Long

Match

Same Contour

Random

Delay

Grammatical Ungrammatical

***

***

***

*

Figure 22. Grammaticality x Delay x Comparison Type interaction for similarity ratings in

Experiment 3.

Lastly, a priori comparisons were conducted to see if responses differed based on whether the

standard melody had been presented during the familiarization phase. There was no difference in

similarity judgments between old grammatical trials with standards that had been encountered

during familiarization, and new grammatical trials with standards that had not been heard before,

t(15) = 0.33, p = .75. However, as supported by the main effect for grammaticality in the

previously reported ANOVA, grammatical trials garnered more similar responses than

ungrammatical trials, regardless of whether they were heard in familiarization, t(15) = 6.85, p <

.001, or not, t(15) = 8.15, p < .001 (Figure 23).

59

Condition

0

1

2

3

4

Sim

ila

rity

(1

= s

am

e,

5 =

dif

fere

nt) ***

Grammatical Old

Grammatical New

Ungrammatical

Figure 23. Effect of melody familiarity on similarity ratings in Experiment 3.

15 Discussion

In this experiment, few of the critical hypotheses concerning discrimination and expectancy

effects were borne out. First of all, the discrimination group was unable to perform above chance

in either the recognition or generalization task. This result is problematic given that it essentially

represents a failure to replicate Loui et al.’s (Loui & Wessel, 2008; Loui, et al., 2010) finding

that participants performed better than chance in both discrimination tasks. This failure to

replicate occurred despite the use of identically constructed materials, and a similar participant

population (trained musicians). Furthermore, this result throws doubt on whether the expectancy

group would be able to learn melodic expectancies at all, since the melody memory task seems to

require more sophisticated knowledge of the novel grammatical structure than do the recognition

or generalization tasks. However, in Experiment 2, participants exhibited evidence of learned

harmonic expectancies even though they were not able to perform a discrimination task (Correct

vs. Error) that required the same structural knowledge. It was hypothesized then that perhaps the

expectancy task was a more sensitive measure of that structural information, and this could be

the case for the expectancy task in this experiment as well.

60

The data from that expectancy block provided a mixture of expected and unexpected results,

when compared with the findings of DeWitt & Crowder (1986) and Dowling (1991). The main

effect of delay, with short delays leading to better performance and more similar ratings, accords

with previous work, as does the effect of comparison type, with poorer performance and more

similar ratings for same contour trials than random trials. The interaction between these

variables, whereby same contour lures are confused with matches at short delays but not long

delays, was also expected. However, here this interaction was only significant for the analysis of

raw similarity ratings, whereas it was found for both the ratings and area scores analysis in

previous work. The lack of the Delay x Comparison Type interaction for the MOC area scores

may potentially be attributed to the unfamiliar Bohlen-Pierce materials used here, which have not

been tested with this methodology previously. Regardless, the generally successful replication of

past research for these variables is somewhat reassuring, as it provides an important point of

convergence between the current study and previous findings.

The critical effects involving grammar, however, did not agree as well with previous work. As

expected, grammatical trials received more similar ratings than ungrammatical ones. However,

memory performance was better when the standard was ungrammatical than when it was

grammatical (Figure 9). This result ran counter to predictions because memory for grammatical

melodies whose structure was expected should be superior to memory for ungrammatical

melodies whose structure was randomly chosen, as Dowling (1991) found for tonality. The

significant two-way interactions between grammar and delay (found for MOC and similarity

analysis) and grammar and comparison type (found for similarity only) were similarly

unprecedented in the literature, and do not lend themselves readily to any obvious explanation.

The observed three-way interaction between tonality, delay, and comparison type was expected

based on previous research (Figure 10), but it is important to note that this interaction was

significant for the similarity ratings only, whereas previous research has found a significant

interaction for both memory performance and similarity ratings.

This unexpected effect of grammar (and the associated interactions) may be explained by a

novelty advantage. The ungrammatical melodies were composed randomly from the full set of

13 Bohlen-Pierce tones, whereas the grammatical melodies were more restricted, moving within

a subset of only six tones through set paths (Figure 13). Thus, the ungrammatical melodies may

be perceived by participants as highly salient due to their distinctiveness following 30 minutes of

61

familiarization on grammatical melodies, leading to increased attention and better memory

performance.

The main finding of this experiment was some weak evidence for expectancy learning in the

analysis of training effects on memory performance, whereby memory was better for melodies

encountered during the training phase than new grammatical melodies. Thus, a limited amount of

familiarization with the familiarization corpus may have led to the formation of veridical

expectancies that boosted memory performance for those items. However, there was no evidence

that participants learned any structural expectancies based on the grammar that produced the

familiarization items.

One general conclusion that might be drawn from this study is that employing a novel tone set

led to a failure to learn expectancies in this study. However, before drawing such a conclusion it

must be remembered that the discrimination task in this study also failed to demonstrate

sensitivity to the grammatical structure of the sequences. Accordingly, it could be that, for some

reason, the familiarization phase of this experiment was simply inadequate to impart the requisite

grammatical structure of these stimuli. Of course, such a failure is curious given that previous

work by Loui and colleagues (Loui & Wessel, 2008; Loui, et al., 2010) has demonstrated

successful discrimination using almost identical methods.

The use of 18 grammatical melodies repeated 28 times each was an adaptation of Loui & Wessel

(2008), who used 15 melodies repeated 27 times each, with alterations to balance the number of

items in each condition of this design. However, these authors found in a subsequent study (Loui,

et al., 2010) that their effect size for the generalization task, which targets structural knowledge

that is critical for expectancy formation, was much larger when they trained participants on a

larger set of grammatical items. Furthermore, previous research has shown that the statistical

coverage of a familiarization set – whether it contains all of the permissible transitions between

units in a grammar – can determine whether the grammatical structure is learned (Poletiek & van

Schijndel, 2009). Assessment of statistical coverage in these materials revealed that two of the

grammatical transitions (out of 27) were not represented in the familiarization melodies.

Before concluding that participants cannot learn musical expectancies with melodic stimuli (such

as those used in Experiments 1 and 3), it must be noted that Tillmann and Poulin-Charronnat

(2010) were recently successful in measuring expectancies in participants following statistical

62

learning with a novel melodic grammar (though this grammar was constructed using the familiar

Western chromatic tone set). Crucially, these authors employed a tonal priming paradigm, very

similar to the one used in Experiment 2, in order to quantify expectancy learning. Therefore,

perhaps the melody memory task used in the present experiment was not sensitive enough to

detect learned expectancies, or did not assess the specific type of expectancies learned by

participants during familiarization.

Consequently, these methodological issues were addressed by two changes in the next

experiment. First, the familiarization phase was changed to more precisely replicate Loui et al.

(2010) in order to maximize the potential effects of statistical learning. Second, expectancies

were measured using a priming task rather than a memory task.

63

Chapter 5 Experiment 4: Tonal Priming With Bohlen-Pierce Melodies

16 Introduction

This experiment was conceptually identical to the last experiment, but used different methods in

order to better potentiate expectancy learning in participants. As discussed, previous research has

found larger effect sizes, possibly because of greater statistical coverage, when larger

familiarization sets are used. Thus, in this experiment, participants were trained with 400 distinct

grammatical items, rather than repeating a smaller subset of items, and this familiarization set

was assessed to ensure complete statistical coverage of the familiarization grammar.

Additionally, participants were randomly assigned to Grammar A (the same familiarization

grammar as Experiment 3) or Grammar B (the retrograde of Grammar A), in order to more

completely replicate the design of Loui et al. (2008; 2010).

Finally, the expectancy phase task was changed from melody memory to the tonal priming task

of Tillmann and Poulin-Charronnat (2010). In these authors’ study, participants were trained on a

set of melodies produced from a finite-state grammar that combined Western chromatic tones in

novel ways. Following familiarization, participants were tested with a melodic priming task in

which they had to identify whether a target note was in- or out-of-tune. The melodies used in this

priming task were novel exemplars of melodies based on the familiarization grammar wherein

the target note was manipulated to either follow or violate the familiarization grammar.

Participants showed successful expectancy learning following familiarization, with better

accuracy and faster reaction times for trials in which the target conformed to the grammar than

trials in which the target violated the grammar. The present study made one alteration to this

paradigm; participants were asked to make timbre judgments rather than tuning judgments, since

they were not familiar with the tuning conventions of the Bohlen-Pierce scale.

64

17 Methods

17.1 Participants

Twenty-two participants were recruited from the University of Toronto Scarborough community

using the introductory psychology participant pool as well as posted advertisements. Participants

were compensated with course credit or $10 per hour.

They consisted of six males and 16 females with a mean age of 19.8 years (SD = 4.9 years).

Participants were selected to have at least five years of formal musical training, due to concerns

about task difficulty for non-musicians. Participants had on average 9.11 years of formal musical

training (SD = 2.8 years), 3.5 years of musical theory training (SD = 3.1 years), played music for

1.8 hours per week (SD = 1.9 hours), and listened to music for 16.5 hours per week (SD = 17.1

hours). No participants reported having absolute pitch, nor did anyone report having taken part in

a music psychology experiment previously.

17.2 Apparatus

This experiment used an identical apparatus to Experiment 1.

17.3 Materials

Tone synthesis:

Three sets of Bohlen-Pierce tones with differing timbres and equal loudnesses were constructed

in CSound according to the frequencies from Table 3. All tones were 500 ms in duration. For the

regular melody notes, a set of pure tones was constructed with 5 ms rise and fall times. For the

target tones (see expectancy phase below), a set of bright-sounding tones (Timbre A) and a set of

dull-sounding tones (Timbre B) was created. All tones consisted of eight harmonics and 5 ms

rise and fall times. Timbre A tones had 1500 times the energy in the last three harmonics as the

first five, and Timbre B tones had 1500 times the energy in the first three harmonics as the last

five.

65

Familiarization phase:

The familiarization phase melodies were constructed identically to those from Experiment 3,

from the pure tones described above. However, 400 unique Bohlen-Pierce melodies were

presented in random order during familiarization (rather than 28 melodies x 18), with this

familiarization set designed for full statistical coverage of the familiarization grammar. Two

familiarization sets were created, with each melody in the Grammar B set the retrograde of a

melody in the Grammar A set. Participants were assigned randomly to Grammar A or B.

Discrimination phase:

The discrimination blocks for this experiment were identical to those from Experiment 3;

participants were presented with 10 recognition trials and 10 generalization trials. Because half

the participants were trained on Grammar B items, the correct grammatical response for

Grammar A participants was the wrong response for Grammar B participants, and vice versa.

Expectancy phase and timbre training:

Two sets of 48 priming melodies were constructed from the tone sets described above, one for

Grammar A and one for Grammar B. For the priming trials, each melody was presented in

grammatical and ungrammatical form. To create ungrammatical versions of each melody, a

target note was altered so that it violated the rules of the grammar. Thus, there were 96 priming

trials each for Grammar A and B. All the notes in each priming melody were played as pure

tones, except for the target note. The melodies in each trial were manipulated with respect to

Grammaticality (Grammatical, Ungrammatical), Timbre (Bright, Dull), Target Position (5, 6, 7),

and Familiarization (Familiarization, Novel). Timbre refers to whether the target note was played

with a Timbre A (bright) or B (dull). Target Position refers to which of the eight notes in the

melody was the target. The Familiarization variable refers to whether or not the melody was part

of the familiarization set.

17.4 Procedure

Each participant completed all phases of the experiment.

66

To start, participants were trained to discriminate Timbre A from Timbre B. First, participants

listened to a randomized set of the 13 Bohlen-Pierce tones in each of the timbres twice, with the

option to listen again until they felt comfortable with the distinction. Next, participants heard a

randomized series of 24 tones, half in Timbre A and half in Timbre B. They were instructed to

indicate which of the two timbres each tone was played with, and were not allowed to proceed

until they achieved a score of at least 20/24 correct.

Lastly, participants were tested with melodies similar to the trials they would encounter in the

priming phase. On each of 15 trials, they heard a melody in which the target note was played by

either Timbre A or Timbre B, and asked to indicate which timbre this was. Participants were

visually-guided in this task (see Figure 24). Each trial began with a fixation cross in the centre of

the screen for 1 second. Then the melody started playing, with each note designated by a white

dot. The note prior to the target was designated by a warning sign, and the target note was

designated by a red question mark. Participants were instructed that upon hearing the target note,

they were to respond as quickly and accurately as possible, indicating which of the two timbres

played the target. Participants were told that the melody would continue playing after the target

note was sounded, and that they must respond within two seconds of hearing the target.

Participants were not allowed to proceed to the familiarization phase until they had scored at

least 12/15 correct.

67

Figure 24. Visual presentation of priming trials in Experiment 4, based on Figure 2 from

Tillmann & Poulin-Charronnat, 2010.

Familiarization phase:

During the familiarization phase, participants heard either 400 Grammar A or 400 Grammar B

progressions. Like in Experiment 3, participants were instructed to listen carefully to the

melodies and provided with paper and crayons with which they could draw to help them keep

alert over the 30 minutes of familiarization.

Discrimination phase:

The discrimination phase was identical to that from Experiment 3.

Expectancy phase:

Before beginning the expectancy trials, participants were reminded of the two timbres with a set

of 13 randomized tones each from Timbre A and Timbre B. They were then reminded of the

instructions for the priming task from timbre training before completing the 96 expectancy trials

for their grammar in random order. The only difference between the timbre training trials and the

68

expectancy trials was that the participants went through training before the expectancy phase;

thus, an effect of grammaticality was predicted for the expectancy trials.

Following all the experimental trials, participants completed a survey regarding their musical

experience. The entire experimental session lasted approximately 90 minutes.

18 Results

18.1 Discrimination Phase

Like in Experiment 3, the total number of correct responses (out of 10) was tallied for each

participant for the recognition and generalization blocks. The scores from each block were then

submitted to a one-sample t-test with µ = 5 (chance performance). Performance in the

recognition block was no different from chance, mean = 4.73 ± 0.34, t(21) = 0.81, p = .43, but

performance in the generalization block was marginally significant, mean = 5.73 ± 0.37, t(21) =

1.95, p = .07. Grammar B participants outperformed Grammar A participants in the recognition

task, t(21) = 2.39, p = .03; this difference was driven by Grammar A participants performing

worse than chance (Figure 25). There was no difference between Grammar A and Grammar B

for generalization, t(21) = 0.48, p = .64.

Grammar A Grammar B

Training Grammar

0

2

4

6

Nu

mb

er

Co

rre

ct

(/1

0)

*

chanceperformance

Figure 25. Effect of familiarization grammar on recognition performance in Experiment 4.

69

18.2 Expectancy Phase

Responses from the priming task were analyzed with respect to accuracy and reaction time.

Accuracy data:

Raw accuracy data were collapsed across the four melodies in each condition. These mean

accuracy scores were then submitted to repeated measures ANOVA with Grammaticality

(Grammatical, Ungrammatical), Timbre (Bright, Dull), Target Position (5, 6, 7), and

Familiarization (Familiarization, Novel) as factors. There was a main effect of Grammaticality,

F(1,21) = 6.81, MSE = 0.02, p = .02, ɳp2 = .25, with ungrammatical trials receiving more

accurate responses than grammatical trials (Figure 26). The main effects of Timbre, Target

Position, and Familiarization were all non-significant, all F values < 1.93, all p values > .15.

Grammatical Ungrammatical

0.9

0.92

0.94

0.96

0.98

Ac

cu

rac

y (

%)

Grammaticality

*

Figure 26. Effect of Grammaticality on accuracy in Experiment 4.

Three two-way interactions were significant: Grammaticality x Timbre, F(1,21) = 6.26, MSE =

0.01, p = .02, ɳp2 = .23, Grammaticality x Familiarization, F(1,21) = 12.72, MSE = 0.01, p < .01,

70

ɳp2 = .38, and Timbre x Familiarization, F(1,21) = 26.27, MSE = 0.01, p < .001, ɳp

2 = .56. All

other two-, three-, and four-way interactions were non-significant, all F values < 1.28, all p

values > .29.

The significant interactions were investigated using simple effects analyses. The interaction

between Grammaticality and Timbre was driven by the fact that ungrammatical melodies

garnered more accurate responses than grammatical ones for bright target notes, t(21) = 3.52, p <

.01, but not dull ones, t(21) = 0.53, p = .61 (Figure 27).

Grammatical Ungrammatical

Grammaticality

88

90

92

94

96

98

100

Ac

cu

rac

y (

%)

Bright Target

Dull Target

**

Figure 27. Grammaticality x Timbre interaction for accuracy in Experiment 4.

The interaction between Grammaticality and Familiarization was driven by the fact that

ungrammatical melodies garnered more accurate responses than grammatical ones for

familiarization melodies, t(21) = 3.94, p = .001, but not novel ones, t(21) = 0.41, p = .68 (Figure

28).

71

Grammatical Ungrammatical

Grammaticality

88

90

92

94

96

98A

cc

ura

cy

(%

)

Training Melody

Novel Melody

***

Figure 28. Grammaticality x Familiarization interaction for accuracy in Experiment 4.

Lastly, the interaction between Timbre and Familiarization was driven by the fact that novel

melodies garnered more accurate responses than familiarization ones when the target was bright,

t(21) = 2.98, p = .01, but familiarization melodies had a marginal accuracy advantage over novel

ones when the target was dull, t(21) = 1.92, p = .07 (Figure 29).

Training Novel

Source of Melody

90

92

94

96

98

Ac

cu

rac

y (

%)

Bright Target

Dull Target

**

Figure 29. Timbre x Familiarization interaction for accuracy in Experiment 4.

72

Reaction time data:

Like for Experiment 2, reaction times were only analyzed for trials where participants answered

correctly, and all reaction times greater than 2000 ms were discarded. All participants performed

at greater than 75% accuracy, so reaction time data for all participants were retained for this

analysis. Like for accuracy, valid data were then collapsed across the four melodies in each

condition. These mean reaction times were submitted to repeated measures ANOVA with

Grammaticality (Grammatical, Ungrammatical), Timbre (Bright, Dull), Target Position (5, 6, 7),

and Familiarization (Familiarization, Novel) as factors. The main effect of Grammaticality was

significant, F(1,21) = 5.87, MSE = 25833.31, p = .03, ɳp2 = .22, with faster reaction times for

ungrammatical than grammatical trials (Figure 12). The main effect of Timbre was significant,

F(1,21) = 18.55, MSE = 31340.85, p < .001, ɳp2 = .47, with faster reaction times for bright than

dull targets. The main effect of Target Position was also significant, F(2,42) = 10.45, MSE =

13697.40, p < .001, ɳp2 = .33. Multiple Bonferroni-corrected comparisons indicated that this

effect was due to participants responding slower for targets at the fifth note of the melody than at

the sixth note, t(21) = 3.21, p < .01, and seventh note, t(21) = 3.56, p < .01. The main effect of

Familiarization was not significant, F(1,21) = 1.06, p = .31. All significant main effects are

illustrated in Figure 30.

Grammatical Ungrammatical

Grammaticality

0

200

400

600

800

Rea

cti

on

Tim

e (

ms

)

Bright Dull

Timbre

5 6 7

Target Position

* *****

Figure 30. Main effects for reaction time in Experiment 4.

73

Two 2-way interactions were marginally significant: Grammaticality x Target Position, F(2,42)

= 3.11, MSE = 16960.00, p = .06, ɳp2 = .13, and Grammaticality x Familiarization, F(1,21) =

4.24, MSE = 17812.52, p = .06, ɳp2 = .17. All other two-, three-, and four-way interactions were

non-significant, all F values < 1.94, all p values > .17.

Like for accuracy, the interactions for reaction time were investigated using simple effects

analyses. The interaction between Grammaticality and Target Position was driven by the fact that

ungrammatical melodies received faster responses than grammatical ones if the target was the

fifth note of the melody, t(21) = 3.46, p < .01, but not if it was the sixth, t(21) = 0.22, p = .83, or

seventh, t(21) = 1.32, p = .20 (Figure 31).

Grammatical Ungrammatical

Grammaticality

520

560

600

640

680

720

760

Re

ac

tio

n T

ime

(m

s)

Target Position 5

Target Position 6

Target Position 7

**

Figure 31. Grammaticality x Target Position interaction for RT in Experiment 4.

The interaction between Grammaticality and Familiarization was driven by the fact that

ungrammatical melodies received faster responses than grammatical ones for familiarization

items but not for novel items (Figure 32).

74

Grammatical Ungrammatical

Grammaticality

520

560

600

640

680

720

Re

ac

tio

n T

ime

(m

s)

Training Melody

Novel Melody

**

Figure 32. Grammaticality x Familiarization interaction for RT in Experiment 4.

Effect of familiarization grammar:

Next, the possibility that the familiarization grammar (A or B) affected participant performance

in the priming task was explored. For this analysis, priming effects for accuracy and reaction

time were quantified for each participant. The accuracy priming effect was calculated by

subtracting the participant’s mean accuracy in the grammatical condition from the mean

accuracy in the ungrammatical condition, with a larger positive difference indicating a larger

accuracy advantage for ungrammatical trials. Similarly, the reaction time priming effect was

calculated by subtracting the participant’s mean reaction time in the grammatical condition from

the mean accuracy in the ungrammatical condition, with a larger negative difference indicating a

larger reaction time advantage for ungrammatical trials. An independent-samples t-test indicated

that there was no difference between Grammar A and B participants in terms of their accuracy or

reaction times, all t values < 1.72, all p values > .10.

Individual Differences

Individual differences analyses explored whether performance in the discrimination phase was

associated with performance in the expectancy phase. Discrimination scores were calculated for

each participant by averaging their scores from the recognition and generalization blocks.

Bivariate correlations were calculated between discrimination scores and each of the priming

effects calculated for the last analysis. Neither of the two correlations were significant, although

75

there was a trend towards a positive relation between discrimination performance and the

priming effect as measured by reaction time, r(20) = .35, p = .11.

Corpus Analysis

Finally, in an effort to explain why participants may have performed better on ungrammatical

than grammatical trials in both Experiments 3 and 4, an analysis of the familiarization and

expectancy corpuses was conducted. These 10 analyses targeted the various types of statistical

information available in familiarization and expectancy items (on the basis of which participants

may have made their response decisions) and were based on the analyses conducted by Tillmann

& Poulin-Charronnat (2010) on their experimental materials. For each analysis (except for

element frequency, see explanation below), Familiarization Grammar (A, B) and Grammaticality

(Grammatical, Ungrammatical) were used as factors.

1. Element frequency was calculated by taking each of the unique targets occurring in the

expectancy (priming) trials and finding the absolute frequency of its occurrence in the

familiarization corpus. Grammar A and Grammar B employed the same set of

grammatical and ungrammatical targets; Familiarization Grammar was therefore not

analyzed as a factor here. There was no difference in element frequency for grammatical

and ungrammatical items, t(8) = 0.92, p = .39.

2. Repetition frequency was calculated by noting, for each of the 96 priming trials in each

language, whether the target tone was a repetition from two tones previous (n-2) or one

tone previous (n -1). Finally, the repetition frequency for each trial (0, 1, or 2) could be

computed. Grammatical trials comprised significantly more repetitions (24.5) than

ungrammatical trials (8.5), t(188) = 2.41, p < .001.

3. Melodic contour was defined for each priming trial by the overall contour (rising, falling,

or static) between the n-2 tone and the target. There was no difference in melodic contour

based on Familiarization Grammar or Grammaticality, both t values < 0.60, both p values

> .55.

76

4. Bigram frequency was calculated by separating all of the priming melodies into their

constituent 2-note chunks. Next, for each of the unique bigrams occurring in the priming

melodies, the frequency of its occurrence in the familiarization corpus was computed.

Grammatical melodies had a higher bigram frequency (88.55) than ungrammatical trials

(35.97), t(57) = 3.64, p = .001.

5. Trigram frequency was calculated identically to bigram frequency, but for 3-note chunks.

Grammar A melodies had a higher trigram frequency (10.94) than Grammar B melodies

(7.34), t(127) = 2.05, p = .04, and grammatical melodies had a higher trigram frequency

(16.24) than ungrammatical melodies (2.03), t(127) = 8.07, p < .001.

6. Associative chunk strength was calculated for each priming sequence by averaging the

chunk frequency (see 4 and 5) associated with each of its constituent bigrams and

trigrams. Grammatical sequences had higher associative chunk strengths (71.12) than

ungrammatical sequences (55.89), t(188) = 5.83, p < .001.

7. A chunk novelty value was assigned to each priming sequence based on whether any

bigram or trigram in that sequence had not occurred in the familiarization sequences (0 =

no novel chunks, 1 = at least one novel chunk). More Grammar B sequences contained

novel chunks (45) than Grammar A sequences (37), t(188) = 2.23, p = .03, and more

ungrammatical sequences contained novel chunks (79) than grammatical sequences (3),

t(188) = 19.30, p < .001.

8. A novel chunk position value was assigned to each priming sequence based on whether

any bigram or trigram in that sequence that had occurred in the familiarization sequences

was occurring in a position it had not occupied in the familiarization sequences (0 = no

chunks are in a novel position, 1 = at least one chunk is in a novel position). More

ungrammatical sequences contained chunks in novel positions (60) than grammatical

sequences (21), t(188) = 6.40, p < .001.

9. First-order transitional probability (TP1) was calculated for each priming sequence. If A

designates the note directly preceding the target, and B designates the target, TP1 was

77

calculated by finding the ratio between the frequency of AB in the familiarization

sequences and the frequency of A in the familiarization sequences. TP1 was higher in

grammatical sequences (0.21) than ungrammatical sequences (0.03), t(188) = 17.44, p <

.001.

10. Second-order transitional probability (TP2) was calculated for each priming sequence. If

A designates the note occurring two positions before the target, B the note directly

preceding the target, and C the target, TP2 was calculated by finding the ratio between

the frequency of ABC in the familiarization sequences and the frequency of AB in the

familiarization sequences. TP2 was higher in grammatical sequences (0.22) than

ungrammatical sequences (0.01), t(188) = 17.62, p < .001.

19 Discussion

Despite methodological improvements, results for this experiment closely paralleled those from

Experiment 3. In the discrimination phase, participants were not able to perform above chance in

the recognition task, but they performed at marginally significant levels for generalization. The

presentation of 400 different melodies over 30 minutes of familiarization means that

remembering any one melody should be quite difficult. Thus, performance on the discrimination

task was expected to be no different from performance on the generalization task, wherein the

grammatical melodies that were presented were novel to participants. At best, perhaps

participants would find the familiarization melodies somewhat familiar, leading to better

performance on discrimination than generalization. Therefore, finding that participants fared

better at generalization than discrimination is a highly odd result, leading to speculation that one

or both of these results may be due to statistical error. The fact that Grammar B participants

performed better than Grammar A participants at recognition is also unexpected. This seemed to

be a result of the Grammar A participants performing at less-than-chance levels, whereas the

Grammar B participants performed at chance. Overall then, replication of Loui et al.’s results –

successful recognition and generalization and no difference between the Grammar A and B

groups – failed again.

Turning to the expectancy phase, the effect of timbre and target position on reaction time was not

surprising. The bright-sounding targets were more discriminable from the pure tones than were

78

the dull-sounding targets, and received faster responses. Later targets received faster responses

than earlier targets, probably because participants were more prepared to respond. However, the

priming advantage of ungrammatical trials over grammatical ones observed in Experiment 3 was

observed here for both accuracy and reaction time. This general trend of better performance for

ungrammatical trials continued through the analyses of interaction effects with grammaticality as

a factor. This ungrammatical advantage is particularly strange considering that grammatical and

ungrammatical trials differed by only one note, and were designed not to differ in terms of

contour.

Furthermore, the extensive corpus analysis did not shed any light on this issue. Calculations of

statistical learning variables (repetition frequency, bigram and trigram frequency, associative

chunk strength, first- and second-order transitional probability) all showed significantly higher

values for grammatical than ungrammatical items. Conversely, calculations of novelty variables

(chunk novelty and novel chunk position) all showed significantly higher values for

ungrammatical than grammatical items. Interestingly, differences between Grammar A and

Grammar B familiarization materials were found for trigram frequency and chunk novelty.

Important however, is the fact that the effect size of grammaticality for these statistics far

outstripped the effect size for those familiarization grammar differences. These findings agree

with past statistical learning research on these statistical variables (Hunt & Aslin, 2001;

Johnstone & Shanks, 1999; Meulemans & Van der Linden, 1997; Tillmann & Poulin-

Charronnat, 2010), which indicates that participants are sensitive to these statistics and should

find grammatical items more familiar than ungrammatical ones, and respond accordingly. In this

experiment, this means that given our stimulus corpus, it is reasonable to expect that participants

would have formed musical expectancies based on the statistical information present in

familiarization items.

Delving further into the interactions observed for accuracy and reaction time produced two

interesting results. First, the timbre effect observed in this experiment does not agree with the

tuning effect observed in Experiment 2. Specifically, Experiment 2 revealed that priming effects

generally only occurred for in-tune trials, where the acoustic surface of the stimulus was not

disrupted. In this experiment, pilot work produced the observation that the dull targets were more

similar to the pure tone notes comprising the remainder of the melody than were the bright

targets. Thus, if priming differences were observed, they would be expected for dull targets over

79

bright targets. However, the Grammaticality x Timbre accuracy interaction in this experiment

revealed that the ungrammatical advantage was seen only for bright targets. That said, this

“priming” effect was, of course, not in the expected direction, meaning that predictions

concerning a priming effect may not necessarily be valid in this case. Perhaps the acoustical

disruption caused heightened attention to bright target trials, which led to the ungrammatical

advantage for these trials, whatever the mechanism of that effect may be.

Second, whether or not participants had heard priming melodies during familiarization or not had

an important interactional effect on both reaction time and response accuracy. The two-way

interactions involving familiarization indicated that the performance advantages observed for

bright and ungrammatical targets depended on whether the melody was heard during

familiarization. This is a very interesting result, because participants only heard the 400

familiarization melodies once in advance of the priming trials in the expectancy phase. Thus,

hearing a melody just once, embedded in a stream of 399 other melodies, altered processing of

that melody later in the experiment. This indicates that the learning mechanisms being employed

by participants during familiarization are very sensitive, and highlights the benefit of using

implicit methodologies such as the priming paradigm because they enable participants to

showcase learning effects that are presumably unavailable to consciousness.

Overall it can be said that the most important effect observed in this experiment, the unexpected

accuracy and reaction time advantage of ungrammatical over grammatical trials, was very

sensitive to context. This leads to the question of whether this strange effect would be maintained

given a change in context. Moreover, the investigation of contextual properties of the

familiarization materials has been a central theme of this project. Therefore, in the last of this

series of experiments, the final manipulation of familiarity and complexity was conducted, and

the familiarization, discrimination, and expectancy corpuses were composed from unfamiliar

complex units: Bohlen-Pierce chords.

80

Chapter 6 Experiment 5: Tonal Priming With Bohlen-Pierce Chords

20 Introduction

The primary purpose of this experiment was to test the learning of musical expectancies with a

novel tone set in a harmonic context. Recall that in a Western chromatic context, the learning of

melodic expectancies was not demonstrated in Experiment 1 (although see Tillmann & Poulin-

Charronnat, 2010), whereas some evidence for the learning of harmonic expectancies was seen

in Experiment 2. Thus, it was of interest to investigate harmonic expectancy learning with a

novel tone set here, particularly because it would complete the testing of hypotheses regarding

the role of stimulus familiarity and complexity in the statistical learning of musical expectancies.

Thus, a Bohlen-Pierce chord grammar was created by combining Krumhansl’s (Krumhansl,

1987) description of the Bohlen-Pierce system with Jonaitis and Saffran’s (Jonaitis & Saffran,

2009) chord grammar, resulting in familiarization materials composed from an unfamiliar tone

set and complex in texture.

Given the results of Experiments 3 and 4, this experiment also serves to test the context

sensitivity of the ungrammatical priming advantage observed in those experiments. If the

ungrammatical advantage is observed again in this experiment, this provides inductive evidence

that the locus of the effect may be a perceptual novelty effect caused by the Bohlen-Pierce tone

system. However, if a regular priming effect (a grammatical advantage for accuracy and reaction

time) is observed, this will indicate that the ungrammatical advantage is due, not to the

unfamiliar tone set, but something more idiosyncratic about the Bohlen-Pierce melodic stimuli

used in Experiments 3 and 4.

21 Methods

21.1 Participants

Forty participants were recruited from the University of Toronto Scarborough community using

the introductory psychology participant pool as well as posted advertisements. Participants were

compensated with course credit or $10 per hour.

81

Participants consisted of 11 males and 29 females with a mean age of 19.5 years (SD = 1.8

years). Participants were selected to have at least five years of formal musical training, due to

concerns about task difficulty for non-musicians. Participants had on average 9.2 years of formal

musical training (SD = 2.9 years), 3.7 years of musical theory training (SD = 3.4 years), played

music for 3.3 hours per week (SD = 4.4 hours), and listened to music for 18.5 hours per week

(SD = 15.7 hours). Two participants in the discrimination condition and three participants in the

expectancy condition reported having absolute pitch. Four participants in the discrimination

group and two participants in the expectancy group reported having taken part in a music

psychology experiment previously, but all six reported being naïve to the experimental

hypotheses.

21.2 Apparatus

This experiment used an identical apparatus to Experiment 1.

21.3 Materials

Chord synthesis:

Seven Bohlen-Pierce chords were constructed for this experiment, according to theoretical

descriptions by Krumhansl (1987). Table 7 depicts the tones used for each chord and their

frequencies. Because each chord has a specified inversion (order of pitch heights for the three

notes), the required pitches were generated in the original tritave starting at k = 220 Hz as well as

the tritave below (starting at k = 110 Hz) and above (starting at k = 660 Hz).

The seven chords, all 500 ms long, were synthesized in CSound in three different timbres. For

target chords and timbre training, the seven chords were synthesized using the bright (Timbre A)

and dull (Timbre B) tones from Experiment 4. For the majority of the chords in each progression,

a new set of complex tones was synthesized with four octave-separated harmonics played at

equal magnitude and 5 ms rise and fall times. These were used instead of the pure tones (from

Experiment 4) because chords composed of pure tones were very difficult to discriminate from

Timbre B chords during pilot work.

Chord grammars for this study were constructed by replacing the seven nodes in Jonaitis &

Saffran’s (2009) Grammars A and B with the seven Bohlen-Pierce chords described above

82

(Figure 33). These two Bohlen-Pierce chord grammars were then used to generate chord

progression stimuli for the three experimental phases.

Table 7

Composition of Materials for Experiment 5 (from Krumhansl, 1987).

Chord Note in Tritave (n) Frequency (Hz)

I

0

220.00

6 365.29

10 512.20

III 3 283.48

9 470.69

(0) 660.00

IV 4 308.48

10 512.20

(1) 718.20

V 6 365.29

12 606.52

(3) 850.45

VI 7 397.50

(0) 660.00

(4) 925.44

VIII [10] 170.73

3 283.48

7 397.50

ix [12] 202.17

3 283.48

9 470.69

Note. The column for “n” gives the value for k = 220 Hz. The values for k = 110 Hz and k = 660

Hz appear in square and rounded brackets, respectively, if applicable.

83

Figure 33. Bohlen-Pierce chord grammars used in Experiment 5. (a) Grammar A; (b) Grammar

B. Composition of these chords is specified in Table 7.

Familiarization phase:

The familiarization phase materials consisted of 50 grammatical chord progressions in each of

the two grammars, from 4 to 10 chords long, repeated four times each in random order such that

no progressions occurred back-to-back. Thus, there were 200 familiarization sequences in

contrast to the 100 sequences from Experiment 2 (based on Experiment 1 from Jonaitis &

Saffran, 2009). The familiarization period here was doubled in imitation of the two

familiarization periods used by Jonaitis and Saffran (2009) in their second experiment, which led

to better grammar learning. This was done because of concerns that the Bohlen-Pierce grammar

would be more difficult to learn because of perceptual unfamiliarity. Participants were randomly

assigned to the Grammar A or Grammar B group.

Discrimination phase:

The discrimination phase materials were identical to those of Experiment 2, except for the fact

that they were generated from the Bohlen-Pierce chord grammars described above (rather than

the Phrygian grammar from Jonaitis & Saffran, 2009). Thus the 60 discrimination trials were

manipulated in terms of Grammar (A vs. B) and Correctness (Correct vs. Error).

Expectancy phase and timbre training:

The expectancy phase and timbre training materials were very similar to those of Experiment 4,

except for the fact that they were composed of three-tone chords (and chord progressions) rather

84

than tones (and melodies). There were 96 priming trials for each grammar, with each Grammar B

chord progression being a retrograde version of a Grammar A progression. Each progression was

either eight chords or 10 chords in length. For eight-chord progressions the target was either the

fourth or fifth chord, and for 10-chord progressions the target was either the fifth or sixth chord.

As in Experiment 4, trials were manipulated in terms of Grammaticality (Grammatical,

Ungrammatical), Timbre (Bright, Dull), and Familiarization (Familiarization, Novel), with 12

progressions corresponding to each combination of these three factors.

21.4 Procedure

Participants were randomly assigned to the discrimination group or the expectancy group. Both

groups took part in the familiarization phase. Only the discrimination group participated in the

discrimination phase, and only the expectancy group participated in the expectancy phase.

Discrimination group:

The discrimination group participated in the familiarization phase and then the discrimination

phase. During the familiarization phase, participants heard 200 familiarization progressions in

their assigned grammar (A or B). Participants rated each progression according to how much

they liked it on a scale from 1-7 (see Experiment 2). Next, in the discrimination phase,

participants heard 60 new progressions and rated each progression according to how similar it

sounded to the familiarization items on a scale from 1-7 (see Experiment 2). Following the

discrimination trials, participants completed a survey regarding their musical experience. The

entire experimental session lasted approximately one hour.

Expectancy group:

The expectancy group was first trained to discriminate between Timbre A and B, using identical

methodology as Experiment 4. Next, they completed the familiarization phase (see description

for discrimination group). Finally, they completed the expectancy phase, again using identical

methodology as Experiment 4. Following the priming trials in the expectancy phase, participants

completed a survey regarding their musical experience. The entire experimental session lasted

approximately one hour.

85

22 Results

22.1 Discrimination Phase

Similarity ratings from each participant were collapsed across the 15 exemplars in each of the

Grammar/Correctness conditions. These mean ratings were then submitted to repeated measures

ANOVA with Grammaticality (Grammatical, Ungrammatical) and Correctness (Correct, Error)

as factors. There was a main effect of Grammaticality, F(1,19) = 57.93, MSE = 0.51, p < .001,

ɳp2 = .75, with higher similarity ratings for grammatical sequences than ungrammatical ones. The

main effect of Correctness was also significant, F(1,19) = 71.09, MSE = 0.21, p < .001, ɳp2 = .79,

with higher similarity ratings for correct than error items. Finally, the interaction between

Grammaticality and Correctness was significant, F(1,19) = 29.13, MSE = 0. 41, p < .001, ɳp2 =

.61. Simple effects indicated that although correct items were rated as significantly more similar

than error items for both grammatical and ungrammatical trials, the correct – error difference

was larger for grammatical trials (Figure 34).

Correct Error

Correctness

2

3

4

5

6

Sim

ila

rity

(1

= d

iffe

ren

t, 7

= s

am

e)

Grammatical Progression

Ungrammatical Progression

***

***

Figure 34. Grammar x Correctness interaction for discrimination phase in Experiment 5.

86

22.2 Expectancy Phase

Accuracy data:

Raw accuracy data were collapsed across length, target position, and repetition. These mean

accuracy scores were then submitted to repeated measures ANOVA with Grammaticality

(Grammatical, Ungrammatical), Timbre (Bright, Dull), and Familiarization (Familiarization,

Novel) as factors. None of the main effects were significant, all F values < 3.39, all p values >

.08. The interaction between Grammaticality and Familiarization was significant, F(1,19) = 6.45,

MSE = 0.003, p = .02, ɳp2

= .25. Simple effects indicates that this interaction was driven by the

fact that novel chord progressions received more accurate ratings than familiarization chord

progressions on ungrammatical trials, t(19) = 3.10, p = .01, but not grammatical trials, t(19) =

0.13, p = .90 (Figure 35). None of the other two- or three-way interactions were significant, all F

values < 3.06, all p values > .09.

Training Novel

Source of Progression

84

86

88

90

92

94

96

Ac

cu

rac

y (

%)

Grammatical Progression

Ungrammatical Progression

**

Figure 35. Grammaticality x Familiarization interaction for accuracy in Experiment 5.

87

Reaction time data:

Reaction times were only analyzed for trials where participants answered correctly, and all

reaction times greater than 2000 ms were discarded. One participant’s reaction time data were

discarded because he or she scored less than 75% accuracy. The remaining valid data were then

collapsed across length, target position, and repetition. These mean reaction times were

submitted to repeated measures ANOVA with Grammaticality (Grammatical, Ungrammatical),

Timbre (Bright, Dull), and Familiarization (Familiarization, Novel) as factors. None of the main

effects were significant, all F values < 3.06, all p values > .09, nor were any of the interactions,

all F values < 1.72, all p values > .20.

23 Discussion

The results from this experiment were quite different from the two previous experiments using

Bohlen-Pierce stimuli. In Experiments 3 and 4 (Bohlen-Pierce melodies), participants were

unable to perform above chance in the discrimination phase. In this experiment, not only did

participants distinguish between sequences that followed the familiarization grammar and

sequences that did not, they also successfully distinguished between completely correct

sequences and ones that contained grammatical errors. Thus, participants exhibited more

sophisticated comprehension of a Bohlen-Pierce chord grammar after 200 familiarization items

than a Western diatonic chord grammar after 100 familiarization items (i.e., Experiment 2 of this

project, and Jonaitis and Saffran (2009). The obvious explanation here is that more

familiarization was able to overcome participants’ perceptual unfamiliarity with Bohlen-Pierce

chords and led to better performance. Additionally, as discussed previously, the novel Western

chord grammar had to compete with existing mental representations linking those chords

together (Western harmony), whereas no such representations exist for Bohlen-Pierce chords.

Additionally, this result shows that participants were able to learn the structure of a chord

grammar, but not a melodic grammar composed of Bohlen-Pierce tones. Given that the melodic

grammar from Experiments 3 and 4 were in fact based on a four-chord progression, perhaps this

result reflects the increased contextual information provided by the more complex chords over

the simpler melodies.

One unexpected result from the discrimination phase is that participants reliably judged correct

sequences from the retrograde grammar as more similar to familiarization items than error items

88

(see Figure 14). This illustrates a surprising ability to cognitively extrapolate what grammatical

items should sound like going forward, but also in reverse. Sensitivity to retrograde forms has

been demonstrated previously. Dowling (1972) found that listeners could recognize retrograde

transformations of short atonal melodies at above-chance levels. Research on twelve-tone serial

music, in which composers commonly employ backwards iterations of the main structural

sequence (the prime form) has also provided evidence for listener sensitivity to retrogrades.

Krumhansl, Sandell, and Sergeant (1987) demonstrated that following a familiarization period

where listeners were exposed to two prime forms, listeners could match retrograde tone rows

with the corresponding prime form.

The results from this experiment go one step further, however, and reveal that after

familiarization on a large number of grammatical exemplars, participants are able to generalize

their knowledge to novel sequences, both in prime and retrograde form. It is important to note,

however, that this result is currently restricted to Bohlen-Pierce chord stimuli.

Turning to the expectancy phase, in contrast to the ungrammatical advantage observed in

Experiments 3 and 4, there was a complete absence of any priming effect in this experiment.

This was unexpected since it followed the successful grammar learning observed in the

discrimination phase. The reasons for this null effect are unclear. The most obvious explanation

would be that participants simply do not develop musical expectancies from familiarization with

this unfamiliar chord grammar. However, this seems unlikely, considering their robust

performance in the discrimination phase, and the demonstration of some priming effects in

Experiment 2 following weaker discrimination performance. It is possible that Bohlen-Pierce

tones, which are tuned in a way that was reported to be completely novel by all participants, are

perceived and hence processed differently from familiar Western chromatic tones, with which

participants have a lifetime of experience. This differential processing may lead to expectancies

that behave differently. Unfortunately, these speculations were not tested by this experiment, and

not enough experimental work has been conducted on the perception of the Bohlen-Pierce tuning

system to make a conjecture regarding the exact locus of this non-effect.

89

Chapter 7 General Discussion

23.1 Examination of Research Goals

The main objective of this project was to test the hypothesis that listeners develop musical

expectancies through the mechanism of statistical learning. On this count, the project was to

some extent successful. In Experiment 2, listeners exhibited an accuracy and reaction time

advantage for chord pairs that were related over chord pairs that were unrelated with respect to

the familiarization grammar. This priming effect was analogous to the priming effects that have

been previously observed for materials structured according to Western tonality (for review, see

Tillmann, 2005). As such, Experiment 2’s result provides a valuable piece of evidence for the

notion that musical expectancy, which is ordinarily predicated on knowledge of tonal structure,

can be acquired through statistical learning processes. However, the success at inducing musical

expectancies in Experiment 2 must be weighed against the failure to induce such expectancies in

Experiments 1, 3, 4, and 5.

While there was simply no evidence of expectancy learning in Experiments 1 and 5, it is of

particular interest that the expectancy-based priming effects that were expected in Experiments 3

and 4 were in fact significant, but in the opposite direction from predictions. Thus, with Bohlen-

Pierce melodies, listeners exhibited better memory performance (Experiment 3) and faster, more

accurate responses (Experiment 4) for ungrammatical items than grammatical ones. This

unexpected ungrammatical advantage, as well as the failure to observe expectancy effects in

Experiment 5, may have been the consequence of the stimulus materials; this idea is further

discussed subsequently.

The secondary objective of this project was to investigate how the properties of the stimuli used

in musical statistical learning paradigms affect listeners’ ability to form expectancies. To this

end, two properties of the stimuli, tone set familiarity and textural complexity, were

systematically manipulated.

With respect to tone set familiarity, one can compare the results of experiments that used

materials constructed from a familiar tone set with the results of studies that used a novel tone

set. None of the studies from this project that used novel tones demonstrated successful

90

expectancy learning, whereas the one example of successful expectancy induction came from a

study that used a Western tone set (Experiment 2). However, before concluding that expectancies

can be learned with familiar tone sets but not novel ones, note that Loui and colleagues (Loui,

Wu, Wessel, & Knight, 2009) were able to use electroencephalography (EEG) to measure

electrophysiological evidence that listeners had learned expectancies through exposure to a

Bohlen-Pierce chord progression. In this study, an early anterior negativity (EAN) was observed

in response to deviant chord progressions that did not match the majority of the progressions

being presented. While this is not strictly evidence that structural learning of the Bohlen-Pierce

chord progression resulted in downstream processing effects (as was earlier defined as a

behavioural benchmark for expectancy learning), the EAN has been well-documented as a brain

response to violations of musical expectancy.

With respect to textural complexity, one can compare the results of experiments that used

simply-textured melodies with the results of studies that used complex-textured chord

progressions. None of the studies from this project that used melodic stimuli demonstrated

successful expectancy learning, whereas the one example of successful expectancy induction

came from a study that used harmonic materials (Experiment 2). However, before concluding

that novel expectancies can be learned with chords but not melodies, recall that Tillmann &

Poulin-Charronnat (2010) showed successful induction of expectancies with melodies using a

priming task.

If not for reasons of tone set familiarity and textural complexity, why was expectancy learning

successful in Experiment 2 (Tonal Priming with Western Chords) and in the studies by Loui et

al. (2009) and Tillmann and Poulin-Charronnat (2010)? These three experiments share two

characteristics that may have potentiated expectancy learning. First, all three experiments were

ecologically valid in terms of the way that stimuli were constructed. Specifically, all three

experiments used finite state grammars that defined the transitional dependencies between units,

similar to the way that tonality governs which tones can occur and in what order. This was in

contrast to the melodic segmentation explored in Experiment 1, for instance. However, it should

be noted that Experiments 3 and 4 used a grammar identical to the one employed by Loui et al.

(2009) and failed to demonstrate expectancy learning. In this case, perhaps the behavioural

priming task (Experiments 3 and 4) was not as sensitive as EEG (Loui, et al., 2009) in detecting

expectancy effects; this idea will be discussed further subsequently.

91

Secondly, expectancies were measured in these three studies using methods that have been

employed extensively to measure expectancy effects produced by real musical structure. Two of

the experiments then measured expectancy through priming and the third used

electroencephalography (EEG) to measure the EAN, a frequently replicated brain indicator of

musical expectancy violation. This is in contrast to the melody memory task used in Experiments

1 and 3, which has not been commonly used in an expectancy context, and may not be sensitive

enough to detect the weak nascent expectancies that developed in these two experiments. Oddly,

however, Experiments 4 and 5 both failed to show expectancy learning using a priming task.

Perhaps the expectancies developed with the Bohlen-Pierce tones were weaker than the ones

developed with familiar Western tones, and therefore required a more sensitive method of

measuring expectancy, such as EEG.

Based on the assembled evidence then, the differences in the current project between studies that

used a familiar tone set and studies that used a novel tone set, and between studies that used

melodic stimuli and studies that used harmonic stimuli, seem to be based upon idiosyncratic

features of the studies themselves rather than upon systematic differences between the

apprehension of expectancies from familiar versus novel tone sets, and melodies versus chords.

23.2 Refining the Proposed Model of Expectancy Learning

One question arising from these studies is how the discrimination and expectancy tasks actually

relate to one another. These experiments were structured on the assumption that discrimination

reflects mental representation of stimulus structure, whereas priming or memory performance

reflects the manifestation of expectancies based on that structure. Within this framework,

statistical learning of stimulus structure is thought to precede the expression of expectancy

effects due to that learning.

In Experiments 1 and 5, participants performed above chance at discrimination but failed to

show any expectancy effects. In Experiment 2, participants performed successfully in both the

discrimination and expectancy phases. These results would be expected based on the framework

described above. In Experiments 3 and 4, however, participants performed at chance in the

discrimination phase and then showed significant priming effects in the expectancy phase, albeit

92

in an unexpected direction. If structure learning precedes expectancy expression, and assuming

the proper measurement of discrimination and expectancy, these results are clearly contradictory.

Therefore, these assumptions must be reconsidered.

Borrowing from the ideas of Perruchet and Pacton (2006) regarding statistical learning and

chunk formation, there are three possibilities here. First, as originally assumed, statistical

learning of structure may precede expectancy formation. As mentioned, the results of

Experiments 3 and 4 contradict this model, although the unexpected direction of this effect

suggests that this ungrammatical advantage may be the result of a perceptual novelty effect

rather than the reflection of expectancy learning.

Second, expectancy formation may precede the apprehension of structure from statistical cues.

This model operates on the idea that expectancies are formed implicitly based upon structure,

and the expression of these expectancies allows listeners to become aware of the statistical

information underlying those expectancies. Thus, although statistical structure is required for the

formation of expectancies, listeners are able to convey their expectancies before being able to

convey knowledge of the structure upon which it is based. This model speaks to the results of

Experiments 3 and 4, and supports the idea that the priming task was a more sensitive measure of

listeners’ structural understanding (via expectancy) than the more explicit discrimination task.

However, this model conflicts with the results from Experiment 5, in which listeners exhibited

highly refined discrimination performance and a complete lack of learned expectancies in the

priming task.

The last possibility, and the most likely, is a combination of the previous two models. According

to this account, structure learning and expectancy formation both rely on statistical cues in the

stimulus. However, these two putative processes would operate in parallel and under some

conditions, would be able to feed back and forth into one another. Thus, successful

discrimination does not necessarily mean successful expectancy learning, and vice versa.

Turning to Experiments 3 and 4, perhaps there was not enough familiarization for the statistical

cues to transitional dependencies in the familiarization grammar to be absorbed. Rather, more

global statistical cues concerning the tone set may have been picked up to listeners. If this was

the case, then discrimination performance would have suffered because this task specifically

targeted transitional dependency information. However, listeners would have the ability to

93

express expectancies based on tone set learning – as discussed previously, grammatical melodies

used a more restricted tone set than ungrammatical ones – leading to expectancy formation and

the ensuing novelty effect. If a longer familiarization phase had been used, perhaps the statistical

cues to transitional dependencies would have surpassed some threshold for structural learning,

and led to successful discrimination as well as a classic priming effect in which performance was

better for grammatical rather than ungrammatical items.

23.3 Improving Methodology

Another question that arises from these inconsistent results is whether behavioural

experimentation is the best methodology available to study statistical learning. Behavioural work

has had reasonable success explaining how listeners develop psychological representations of

musical structure. However, these paradigms have three shortcomings when it comes to

statistical learning. First, effects, when found, are generally small in magnitude, which makes

detecting them an onerous task. For example Jonaitis and Saffran (2009) had to train listeners

with two sessions on two different days before listeners could distinguish correct grammatical

items from ones containing errors. It is possible that the failure to induce expectancies in some of

the present experiments was attributable to familiarization periods that were too short.

Second, behavioural experiments require listeners to be aware of the responses they are making,

whereas statistical learning is largely an implicit process. The use of implicit methods such as the

priming paradigm can alleviate this problem somewhat, although one may still argue that

priming tasks are not as sensitive in the detection of nascent expectancies as are brain techniques

such as EEG (Loui, et al., 2009).

Finally, a third disadvantage of many behavioural paradigms, including the ones employed in

this project, and the most important, is that they assess the end products of learning, rather than

the process itself. Of course, it is possible to measure the status of learned representations

multiple times during familiarization. For instance, Rohrmeier, Rebuschat, and Cross (2011)

found that listeners who did not experience the familiarization phase performed above chance in

the latter parts of their discrimination phase, indicating that they were able to pick up statistical

cues from the discrimination trials. However, these types of paradigms tend to be complicated in

94

design and time-consuming to run, which puts participants at risk for fatigue effects before the

experiment is over.

Therefore, developing a new methodology to study statistical learning, one that surmounts these

problems, would be highly beneficial. As alluded to previously, one promising approach would

be to use electroencephalography (EEG) or magnetoencephalography (MEG) to measure and

localize event-related potentials (ERPs) related to musical expectancy in the brain. ERPs can be

measured continuously during learning and without listeners being aware of the phenomenon

being studied, which may lead to larger effect sizes. Importantly, musical expectancy has been

extensively studied using these techniques (particularly EEG), so the brain indices that are

modulated by expectancy differences have been described in great detail. These ERPs include

the EAN, described previously, and the late negativity (LN), which appears around 400 ms post-

stimulus and has a fronto-central scalp distribution (Koelsch, et al., 2000; Koelsch & Siebel,

2005; Maess, et al., 2001). The EAN is thought to reflect representations of stimulus structure

(i.e., tonality), whereas the LN is thought to reflect the neural integration of each event into the

structural whole (Koelsch & Siebel, 2005).

As described briefly in the previous section, Loui et al. (2009) measured ERPs while participants

listened to Bohlen-Pierce chord progressions. Most of the chord sequences followed a

grammatical system, while a small number of the sequences contained ungrammatical targets.

These ungrammatical targets elicited both the EAN and the LN. Critically, the EAN response

grew in magnitude over time, indicating that as participants encountered more grammatical

exemplars, their knowledge of the chord grammar grew, resulting in stronger responses to

grammatical violations. These results indicate that brain techniques can offer important insights

into the statistical learning process. EEG and MEG are potentially the best way to proceed in

examining the main questions of this project, given the inconsistent results obtained with this set

of behavioural experiments.

23.4 Conclusions

In sum, this project presented a systematic behavioural exploration of the statistical learning of

musical expectancy. Results indicated that statistical learning can indeed contribute to the

formation of musical expectancies, particularly if the novel structure being learned is composed

from Western chromatic tones, with which listeners have extensive perceptual experience.

95

Listeners have much more difficulty learning musical structures comprised of unfamiliar tones,

such as those of the Bohlen-Pierce scale. Future research will focus on the measurement of

event-related potentials as indicators of musical expectancy learning in vivo, which will provide

a more sensitive method to elucidate the relation between statistical learning and mental

representations of musical expectancy.

96

References

Bartlett, J. C., & Dowling, W. J. (1980). Recognition of transposed melodies: A key-distance

effect in developmental perspective. Journal of Experimental Psychology: Human

Perception & Performance, 6(3), 501-515.

Bharucha, J. J., & Stoeckig, K. (1986). Reaction time and musical expectancy: Priming of

chords. Journal of Experimental Psychology: Human Perception & Performance, 12(4),

403-410.

Bharucha, J. J., & Stoeckig, K. (1987). Priming of chords: Spreading activation or overlapping

frequency spectra? Perception & Psychophysics, 41(6), 519-524.

Bigand, E., & Pineau, M. (1997). Global context effects on musical expectancy. Perception &

Psychophysics, 59(7), 1098-1107.

Bigand, E., Poulin, B., Tillmann, B., Madurell, F., & D'Adamo, D. A. (2003). Sensory versus

cognitive components in harmonic priming. Journal of Experimental Psychology: Human

Perception & Performance, 29(1), 159-171.

Boltz, M. (1991). Some structural determinants of melody recall. Memory & Cognition, 19(3),

239-251.

Boltz, M. (1993). The generation of temporal and melodic expectancies during musical listening.

Perception & Psychophysics, 53, 585-600.

Carlsen, J. C. (1981). Some factors which influence melodic expectancy. Psychomusicology, 1,

12-29.

Castellano, M. A., Bharucha, J. J., & Krumhansl, C. L. (1984). Tonal hierarchies in the music of

North India. Journal of Experimental Psychology: General, 113(3), 394-412.

Creel, S., Newport, E. L., & Aslin, R. N. (2004). Distant melodies: Statistical learning of

nonadjacent dependencies in tone sequences. Journal of Experimental Psychology:

Learning, Memory, & Cognition, 30(5), 1119-1130.

Cuddy, L. L., & Badertscher, B. (1987). Recovery of the tonal hierarchy: Some comparisons

across age and levels of musical experience. Perception & Psychophysics, 41(6), 609-

620.

Cuddy, L. L., Cohen, A., & Mewhort, D. J. (1981). Perception of structure in short melodic

sequences. Journal of Experimental Psychology: Human Perception & Performance,

7(4), 869-883.

Cuddy, L. L., Cohen, A. J., & Miller, J. (19799). Melody recognition: The experimental

application of musical rules. Canadian Journal of Psychology, 33, 148-157.

97

Cuddy, L. L., & Lunney, C. A. (1995). Expectancies generated by melodic intervals: Perceptual

judgments of melodic continuity. Perception & Psychophysics, 57, 451-462.

Dewar, K. M., Cuddy, L. L., & Mewhort, D. J. K. (1977). Recognition memory for single tones

with an without context. Journal of Experimental Psychology: Human Learning &

Memory, 3, 60-67.

DeWitt, L. A., & Crowder, R. G. (1986). Recognition of novel melodies after brief delays. Music

Perception, 3(3), 259-274.

Dowling, W. J. (1972). Recognition of melodic transformations: Inversion, retrograde, and

retrograde inversion. Perception & Psychophysics, 12(5), 417-421.

Dowling, W. J. (1978). Scale and contour: Two components of a theory of memory for melodies.

Psychological Review, 85(4), 341-354.

Dowling, W. J. (1991). Tonal strength and melody recognition after long and short delays.

Perception & Psychophysics, 50(4), 305-313.

Dowling, W. J., & Bartlett, J. C. (1981). The importance of interval information in long-term

memory for melodies. Psychomusicology, 1(1), 30-49.

Dowling, W. J., & Fujitani, D. S. (1971). Contour, interval, and pitch recognition in memory for

melodies. The Journal of the Acoustical Society of America, 49(2), 524-531.

Dowling, W. J., Kwak, S., & Andrews, M. W. (1995). The time course of recognition of novel

melodies. Perception & Psychophysics, 57(2), 136-149.

Endress, A. D. (2010). Learning melodies from non-adjacent tones. Acta Psychologica, 135,

182-190.

Frances, R. (1972). La perception de la musique (2nd ed.). Paris: Vrin.

Graf Estes, K., Evans, J. L., Alibali, M. W., & Saffran, J. R. (2007). Can infants map meaning to

newly segmented words? Statistical segmentation and word learning. Psychological

Science, 18(3), 254-260.

Hunt, R. H., & Aslin, R. N. (2001). Statistical learning in a serial reaction time task: Access to

separable statistical cues by individual learners. Journal of Experimental Psychology:

General, 130(4), 658-680.

Huron, D. (2006). Sweet Anticipation. Cambridge: MIT Press.

Johnstone, T., & Shanks, D. R. (1999). Two mechanisms in implicit artificial grammar learning?

Comment on Meulemans and Van der Linden (1997). Journal of Experimental

Psychology: Learning, Memory, & Cognition, 25(2), 524-531.

Jonaitis, E. M., & Saffran, J. R. (2009). Learning harmony: The role of serial statistics. Cognitive

Science, 33, 951-968.

98

Jones, M. R. (1990). Musical events and models of musical time. In R. Block (Ed.), Cognitive

models of psychological time. Hillsdale, N.J.: Lawrence Erlbaum.

Kessler, E. J., Hansen, C., & Shepard, R. N. (1984). Tonal schemata in the perception of music

in Bali and in the West. Music Perception, 2(2), 131-165.

Koelsch, S., Gunter, T., & Friederici, A. D. (2000). Brain indices of music processing:

"Nonmusicians" are musical. Journal of Cognitive Neuroscience, 12(3), 520-541.

Koelsch, S., & Siebel, W. A. (2005). Towards a neural basis of music perception. Trends in

Cognitive Sciences, 9(12), 578-584.

Krumhansl, C. L. (1987). General properties of musical pitch systems: Some psychological

considerations. In J. Sundberg (Ed.), Harmony and Tonality. Stockholm: Royal Swedish

Academy of Music.

Krumhansl, C. L. (1990). Cognitive foundations of musical pitch. New York: Oxford University

Press.

Krumhansl, C. L. (1995a). Effects of musical context on similarity and expectancy.

Systematische Musikwissenschaft, 3, 211-250.

Krumhansl, C. L. (1995b). Music psychology and music theory: Problems and prospects. Music

Theory Spectrum, 17, 53-80.

Krumhansl, C. L. (2000). Rhythm and pitch in music cognition. Psychological Bulletin, 126(1),

159-179.

Krumhansl, C. L., Bharucha, J. J., & Kessler, E. J. (1982). Perceived harmonic structure of

chords in three related musical keys. Journal of Experimental Psychology: Human

Perception & Performance, 8(1), 24-36.

Krumhansl, C. L., Sandell, G. J., & Sergeant, D. C. (1987). The perception of tone hierarchies

and mirror forms in twelve-tone serial music. Music Perception, 5(1), 31-78.

Krumhansl, C. L., & Shepard, R. N. (1979). Quantification of the hierarchy of tonal functions

within a diatonic context. Journal of Experimental Psychology: Human Perception &

Performance, 5(4), 579-594.

Krumhansl, C. L., Toivanen, P., Eerola, T., Toiviainen, P., Järvinen, T., & Louhivuori, J. (2000).

Cross-cultural music cognition: Cognitive methodology applied to North Sami yoiks.

Cognition, 76, 13-58.

Laitz, S. (2008). The complete musician: An integrated approach to tonal theory, analysis, and

listening. (2nd ed.). New York: Oxford University Press.

Lerdahl, F., & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge: MIT Press.

Loui, P., & Wessel, D. (2008). Learning and liking an artificial musical system: Effects of set

size and repeated exposure. Musicae Scientiae, 12(2), 207-230.

99

Loui, P., Wessel, D. L., & Kam, C. L. H. (2010). Humans rapidly learn grammatical structure in

a new musical scale. Music Perception, 27(5), 377-388.

Loui, P., Wu, E. H., Wessel, D. L., & Knight, R. T. (2009). A generalized mechanism for

perception of pitch patterns. The Journal of Neuroscience, 29(2), 454-459.

MacMillan, N. A., & Creelman, C. D. (2005). Detection Theory: A User's Guide (2nd. ed.).

Mahwah, N.J.: Lawrence Erlbaum Associates.

Maess, B., Koelsch, S., Gunter, T. C., & Friederici, A. D. (2001). Musical syntax is processed in

Broca's area: An MEG study. Nature Neuroscience, 4(5), 540-545.

Manzara, L. C., Witten, I. H., & James, M. (1992). On the entropy of music: An experiment with

Bach chorale melodies. Leonardo, 2, 81-88.

Margulis, E. H. (2005). A model of melodic expectation. Music Perception, 22(4), 663-714.

Marmel, F., & Tillmann, B. (2009). Tonal priming beyond tonics. Music Perception, 26(3), 211-

221.

Marmel, F., Tillmann, B., & Delbe, C. (2010). Priming in melody perception: Tracking down the

strength of cognitive expectations. Journal of Experimental Psychology: Human

Perception & Performance, 36(4), 1016-1028.

Marmel, F., Tillmann, B., & Dowling, W. J. (2008). Tonal expectations influence pitch

perception. Perception & Psychophysics, 70(5), 841-852.

Meulemans, T., & Van der Linden, M. (1997). Associative chunk strength in artificial grammar

learning. Journal of Experimental Psychology: Learning, Memory, & Cognition, 23(4),

1007-1028.

Meyer, L. B. (1956). Emotion and meaning in music. Chicago, IL: University of Chicago Press.

Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity

for processing information. Psychological Review, 63(2), 81-97.

Narmour, E. (1990). The analysis and cognition of basic melodic structures: The implication-

realization model. Chicago, IL: University of Chicago Press.

Narmour, E. (1992). The analysis and cognition of melodic complexity: The implication-

realization model. Chicago, IL: University of Chicago Press.

Oram, N., & Cuddy, L. L. (1995). Responsiveness of Western adults to pitch-distributional

information in melodic sequences. Psychological Research, 57, 103-118.

Pearce, M. T., & Wiggins, G. A. (2006). Expectation in Melody: The Influence of Context and

Learning. Music Perception, 23(5), 377-405.

Perruchet, P., & Pacton, S. (2006). Implicit learning and statistical learning: One phenomenon,

two approaches. Trends in Cognitive Sciences, 10(5), 233-238.

100

Poletiek, F. H., & van Schijndel, T. J. P. (2009). Stimulus set size and statistical coverage of the

grammar in artificial grammar learning. Psychonomic Bulletin & Review, 16(6), 1058-

1064.

Rameau, J. (1971). Treatise on Harmony. Mineola, N.Y.: Dover Publications.

Rohrmeier, M., Rebuschat, P., & Cross, I. (2011). Incidental and online learning of melodic

structure. Consciousness and Cognition, 20, 214-222.

Saffran, J. R. (2003a). Absolute pitch in infancy and adulthood: The role of tonal structure.

Developmental Science, 6(1), 37-49.

Saffran, J. R. (2003b). Musical learning and language development. Annals of the New York

Academy of Sciences, 999, 1-5.

Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants.

Science, 274, 1926-1927.

Saffran, J. R., & Griepentrog, G. J. (2001). Absolute pitch in infant auditory learning: Evidence

for developmental reorganization. Developmental Psychology, 37(1), 74-85.

Saffran, J. R., Johnson, E. K., Aslin, R. N., & Newport, E. L. (1999). Statistical learning of tone

sequences by human infants and adults. Cognition, 70, 27-52.

Saffran, J. R., Newport, E. L., & Aslin, R. N. (1996). Word segmentation: The role of

distributional cues. Journal of Memory and Language, 35, 606-621.

Saffran, J. R., Reeck, K., Niebuhr, A., & Wilson, D. (2005). Changing the tune: The structure of

the input affects infants' use of absolute and relative pitch. Developmental Science, 8(1),

1-7.

Schellenberg, E. G. (1996). Expectancy in melody: Tests of the implication-realization model.

Cognition, 58, 75-125.

Schenker, H. (1954). Harmony (E. M. Borgese, Trans.). Cambridge: MIT Press.

Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic

processes. Music Perception, 7(2), 109-150.

Schmuckler, M. A. (1997). Expectancy effects in memory for melodies. Canadian Journal of

Psychology, 51(4), 292-305.

Smith, N. A., & Schmuckler, M. A. (2004). The perception of tonal structure through the

differentiation and organization of pitches. Journal of Experimental Psychology: Human

Perception & Performance, 30(2), 268-286.

Tekman, H. G., & Bharucha, J. J. (1998). Implicit knowledge versus psychoacoustic similarity in

priming of chords. Journal of Experimental Psychology: Human Perception &

Performance, 24(1), 252-260.

101

Thompson, W. F., Cuddy, L. L., & Plaus, C. (1997). Expectancies generated by melodic

intervals: Evaluation of principles of melodic implication in a melody production task.

Perception & Psychophysics, 59(1069-1076).

Tillmann, B. (2005). Implicit investigations of tonal knowledge in nonmusician listeners. Annals

of the New York Academy of Sciences, 1060, 100-110.

Tillmann, B., Bharucha, J. J., & Bigand, E. (2000). Implicit learning of tonality: A self-

organizing approach. Psychological Review, 107(4), 885-913.

Tillmann, B., Bigand, E., & Pineau, M. (1998). Effects of global and local contexts on harmonic

expectancy. Music Perception, 16(1), 99-117.

Tillmann, B., Janata, P., Birk, J., & Bharucha, J. J. (2003). The costs and benefits of tonal centers

for chord processing. Journal of Experimental Psychology: Human Perception &

Performance, 29(2), 470-482.

Tillmann, B., Janata, P., Birk, J., & Bharucha, J. J. (2008). Tonal centers and expectancy:

Facilitation or inhibition of chords at the top of the harmonic hierarchy? Journal of

Experimental Psychology: Human Perception & Performance, 34(4), 1031-1043.

Tillmann, B., & Poulin-Charronnat, B. (2010). Auditory expectations for newly acquired

structures. The Quarterly Journal of Experimental Psychology, 63(8), 1646-1664.

Trainor, L. J., & Trehub, S. E. (1992). A comparison of infants' and adults' sensitivity to Western

musical structure. Journal of Experimental Psychology: Human Perception and

Performance, 18(2), 394-402.

Trainor, L. J., & Trehub, S. E. (1993). Musical context effects in infants and adults: Key

distance. Journal of Experimental Psychology: Human Perception & Performance, 19(3),

615-626.

Trehub, S. E. (2003). Absolute and relative pitch processing in tone learning tasks.

Developmental Science, 6(1), 44-45.

Trehub, S. E., Schellenberg, E. G., & Kamenetsky, S. B. (1999). Infants' and adults' perception

of scale structure. Journal of Experimental Psychology: Human Perception &

Performance, 25(4), 965-975.

Unyk, A. M., & Carlsen, J. C. (1987). The influence of expectancy on melodic perception.

Psychomusicology, 7, 3-23.