Download - Time course of the influence of musical expertise on the ...

Neuroscience 290 (2015) 175–184

TIME COURSE OF THE INFLUENCE OF MUSICAL EXPERTISEON THE PROCESSING OF VOCAL AND MUSICAL SOUNDS

S. RIGOULOT, a,b* M. D. PELL a,c AND J. L. ARMONY a,b

aCentre for Research on Brain, Language and Music

(CRBLM), Montreal, Canada

bDepartment of Psychiatry, McGill University and Douglas

Mental Health University Institute, Montreal, Canada

cSchool of Communication Sciences and Disorders, McGill

University, Canada

Abstract—Previous functional magnetic resonance imaging

(fMRI) studies have suggested that different cerebral

regions preferentially process human voice and music.

Yet, little is known on the temporal course of the brain

processes that decode the category of sounds and how

the expertise in one sound category can impact these

processes. To address this question, we recorded the

electroencephalogram (EEG) of 15 musicians and 18 non-

musicians while they were listening to short musical

excerpts (piano and violin) and vocal stimuli (speech and

non-linguistic vocalizations). The task of the participants

was to detect noise targets embedded within the stream of

sounds. Event-related potentials revealed an early differenti-

ation of sound category, within the first 100 ms after the

onset of the sound, with mostly increased responses to

musical sounds. Importantly, this effect was modulated by

the musical background of participants, as musicians were

more responsive to music sounds than non-musicians, con-

sistent with the notion that musical training increases sensi-

tivity to music. In late temporal windows, brain responses

were enhanced in response to vocal stimuli, but musicians

were still more responsive to music. These results shed

new light on the temporal course of neural dynamics of

auditory processing and reveal how it is impacted by the

stimulus category and the expertise of participants.

� 2015 IBRO. Published by Elsevier Ltd. All rights reserved.

Key words: music, voice, vocalizations, speech prosody,

ERPs, expertise.

INTRODUCTION

When people are repeatedly exposed to the same type of

stimulus, they can develop a certain expertise, which

often leads to faster, better and less effortful processing

http://dx.doi.org/10.1016/j.neuroscience.2015.01.0330306-4522/� 2015 IBRO. Published by Elsevier Ltd. All rights reserved.

*Correspondence to: S. Rigoulot, Douglas Mental Health UniversityInstitute, 6875 LaSalle Boulevard, Montreal, Quebec H4H 1R3,Canada.

E-mail address: [email protected] (S. Rigoulot).Abbreviations: EEG, electroencephalogram; ERP, event-relatedpotential; fMRI, functional magnetic resonance imaging; PCA,principal component analysis; ROI, region of interest.

175

of this stimulus. This appears to be particularly true in

the case of musicians, who are more accurate than

non-musicians to discriminate musical timbre (Chartrand

and Belin, 2006) and sound duration (Rammsayer and

Altenmuller, 2006; Guclu et al., 2011). They also more

easily detect pitch violations within melody (Brattico

et al., 2006; Habibi et al., 2013) and synchronize more

precisely to sounds (Repp, 2010).

The neural correlates of such advantages have been

recently investigated. Using brain imaging techniques,

enhancement of brain activity in response to musical

sounds has been evidenced in musicians, relative to non-

musicians. For instance, functional magnetic resonance

imaging (fMRI) studies have shown that, although

planum polare responds more preferentially to musical

than to other complex sounds regardless of musical

expertise (Lai et al., 2012; Tierney et al., 2013), this pattern

is more prevalent in musicians than in non-musicians

(Angulo-Perkins et al., 2014). This is consistent with find-

ings that musical training is associated with altered gray

matter architecture in the left planum temporale

(Bermudez et al., 2009; Elmer et al., 2013). Using magne-

toencephalography, Pantev et al. (1998) also showed that

cortical responses to piano, but not to pure tones, were

greater in musicians than non-musicians. Furthermore,

the amplitude of these responses was correlated with the

age at which musicians began their musical training. Sev-

eral electroencephalographic studies have also revealed

an increase of the amplitude of event-related potential

(ERP) components (N100, P200, MMN, P300 among oth-

ers) in musicians (Trainor et al., 1999; Shahin et al., 2003,

2007; Jongsma et al., 2004;Magne et al., 2006;Musacchia

et al., 2007; Seppanen et al., 2012; Habibi et al., 2013;

Kaganovich et al., 2013; Ungan et al., 2013; Virtala et al.,

2014). For example, Shahin et al. (2003) found that highly

skilled violinists and pianists exhibited larger N1 and P2

responses compared with non-musicians when they pas-

sively listened to musical tones (violin, piano) and pure

tones matched in fundamental frequency to the musical

tones. Virtala et al. (2014) showed that musicians outper-

formed non-musicians in a discrimination task, a pattern

that was associated with a larger N1 amplitude in musi-

cians than in non-musicians. Another source of evidence

comes from the Mismatch Negativity (MMN, Naatanen

et al., 1978), a component reflecting pre-attentive auditory

processing, which is larger and/or earlier in musicians than

in non-musicians to pitch changes (Pantev et al., 1998;

Koelsch et al., 1999; Tervaniemi et al., 2001; Fujioka

et al., 2004). Finally, Ungan et al. (2013) found that

http://dx.doi.org/10.1016/j.neuroscience.2015.01.033

mailto:[email protected]

http://dx.doi.org/10.1016/j.neuroscience.2015.01.033

176 S. Rigoulot et al. / Neuroscience 290 (2015) 175–184

musicians’ better behavioral performance in an oddball

task was linked to earlier and significantly larger P300 to

rhythm changes.

Interestingly, there is some behavioral (Chartrand and

Belin, 2006; Zuk et al., 2013) and EEG (Schon et al.,

2004; Marie et al., 2011; Kuhnis et al., 2013; Elmer

et al., 2014; Francois et al., 2014; Janzten et al., 2014;

for reviews, see Besson et al., 2007; Patel, 2011;

Shahin, 2011) evidence suggesting that musical expertise

is associated with enhanced processing of not only music,

but also vocal sounds, in particular speech. However, only

a couple of studies directly compared the effects of musi-

cal expertise in brain responses to both music and voice

in the same participants. Schon et al. (2004) manipulated

the fundamental frequency of final notes of melodies or

final words of sentences. When they played these stimuli

to musicians and non-musicians, they found that F0 viola-

tions in both language and music elicited large positive

components, with a shorter latency in musicians. Elmer

et al. (2014) used non-morphed and morphed sounds to

investigate if the expertise of participants in music and

speech (simultaneous interpreters) biased their percep-

tion of these morphs (speech to noise, music to noise

and speech to music). They found in the music-to-noise

condition that musicians and speech experts were simi-

larly biased to the musical part of the morphing, and linked

their results to the behavior of N400 component.

Thus, although previous studies generally support the

notion of enhanced processing of both music and voice in

musicians, the lack of direct comparisons between

categories or the use of a restricted set of stimulus

types (e.g., only speech for the voice category) limits

their generalizability. Moreover, most of the previous

EEG studies focused on specific (early) ERP

components, leaving open the question of whether other

effects can also be observed at longer delays.

Therefore, the aim of this study was to further

determine whether musical expertise significantly

modulates the temporal characteristics of the processing

of auditory expressions conveyed through human voice

(using both speech and nonlinguistic vocalizations) and

musical sounds (short unfamiliar excerpts played with

piano or violin), using previously validated stimuli in the

same experimental session. Based on the literature

described above, we expected stronger responses in

musicians to music and speech sounds. The key

question, however, was whether similar effects would

also be observed for nonlinguistic vocalizations, which,

although also an important part the human vocal

repertoire, are very different from speech in terms of their

acoustic characteristics. A second question was whether,

in addition to the predicted expertise-related effects in the

early ERP components (N1 and P2), there would also be

differences in later parts of the EEG response, once

more information about the stimuli becomes available.

EXPERIMENTAL PROCEDURES

Participants

Thirty-nine native Canadian French or English speakers

(18 men/21 women from 20 to 32 years old, mean age:

24.5 ± 3.6 years old) recruited through campus

advertisements participated in the study. Based on self-

report, 31 of the participants were right-handed and all

had normal hearing and normal/corrected-to-normal

vision. Before the experiment, each participant

completed a questionnaire to establish basic

demographic information (age, handedness, and

language abilities) and musical expertise (musical

training, instrument played, and daily activity).

Participants were then assigned into one of two groups

as a function of their musical expertise: participants who

had more than five years of musical training and were

playing at least one instrument on a daily basis were

considered to be musicians (8 men/10 women, mean

age: 24.2 years), whereas the other participants were

considered to be ‘‘non-musicians’’ (10 men/11 women,

mean age: 24.7 years). All members of the musicians

group were professional musicians or University music

students with extensive theoretical and practical training

(average duration of music practice: 12.5 years [range:

9–23]). In contrast, non-musicians had no or very little

formal music training (average duration of music practice:

1.2 years [range: 0–2]). Informed written consent was

obtained from each participant prior to entering the study

and they received $20 compensation for their participation.

Apparatus

Auditory and visual stimuli were presented via E-Prime 2

software (Psychology Software Tools) on a SONY

(Trinitron) monitor with a Xeron Intel� computer (3 GHz,

Windows XP) and through insert-earphones (Etymotic

ER-2). EEG was recorded on 71 electrodes using a

Biosemi ActiveTwo system (Biosemi, Inc., Netherlands)

connected to an Intel Xeron� computer (3 GHz,

Windows XP).

MATERIALS

Auditory stimuli from two different categories, musicor human voice, were employed

a. Musical sounds were short excerpts that were unfa-

miliar and followed the rules of Western tonal music.

On the basis of the behavioral ratings on the emo-

tional content of these excerpts (Vieillard et al.,

2008; Aube et al., 2013), we selected 64 musical

sounds, played by two types of instrument (piano

and violin, n= 32 each). These sounds included

expressions of fear, happiness, sadness or they

were ‘‘neutral’’ or ‘‘peaceful’’ (with valence ratings

significantly less positive than happy excerpts and

less negative than fearful or sad ones; Aube et al.,

in press). There were 8 different sounds for each

emotion and condition.b. Human vocal sounds were non-linguistic vocaliza-

tions and pseudo-utterances. Non-linguistic vocal-

izations were selected from a previously employed

dataset (Armony et al., 2007; Fecteau et al.,

2007). They reliably expressed fear (screams), sad-

ness (cries) or happiness (pleasure or laughter), or

were emotionally neutral (coughs and yawns). Emo-

tionally inflected pseudo-utterances selected from

S. Rigoulot et al. / Neuroscience 290 (2015) 175–184 177

the database of Pell et al. (2009) exploited the pho-

nological and morpho-syntactic properties of Eng-

lish, in the absence of meaningful lexical-semantic

cues about emotion (e.g., Someone migged thepazing; Banse and Scherer, 1996; Pell et al.,

2011; Rigoulot and Pell, 2012). On the basis of

the results from the validation study (Pell et al.,

2009), we selected 32 sentences whose tone was

fearful, sad, happy, or neutral. Altogether, we pre-

sented 64 items of human vocal sounds (32 vocal-

izations and 32 speech, with the same number of

female and male voices), that were expressing fear,

sadness, happiness, or were neutral (n= 8 for

each emotion and condition).

Music and vocal sounds were 2-s long on average

(range: 1.5–3.0 s). The mean durations of the stimuli for

each condition are shown in Table 1. We also computed

some of the main acoustic parameters for each sound

category with MIRtoolbox (Lartillot et al., 2008); Table 1).

In addition, sixteen 500-ms pure tones (eight 1-kHz

and eight 100-Hz tones) were used as targets. All

sounds were normalized in terms of peak intensity using

Adobe Audition 3.0.1 (Adobe Systems, San Jose, CA,

USA) and were presented at 75 dB.

Experimental design/procedure

Participants were invited to take part in a study on music

processing. After the experimental procedures were

explained to them, they were seated in a sound-

attenuated booth at a distance of 75 cm from the

computer screen. After the setup of the cap and

electrodes, the quality of the EEG signal was checked

and participants were informed about the problem of

artifacts and how to reduce them. E-prime 2 software

(Psychology Software Tools) was used for the auditory

presentation of music and vocal sounds via insert

earphones (ER-2 Tubephone, Etymotic Research).

We used 256 experimental trials (64 musical and 64

vocal sounds; n= 16 for each emotion and each sound

category) and 16 targets (pure tones). All sounds were

Table 1. Summary of the main acoustical parameters for the total duration an

(Lartillot et al., 2008). The spectral flux conveys spectrotemporal information (

quantifies the rising time of the energy during the transient portion of a signal. A

measures were also computed such as the Root Mean Square (RMS) and the

dB: decibel; a.u.: arbitratory units; n.u.: no units

Music

Piano Sig. Violin

Duration (ms) 2192 ± 172 2106 ± 93

Spectral flux (a.u.) 328 ± 47 321 ± 44

Attack (a.u.) 2.88 ± 0.90 2.72 ± 0.68

RMS (a.u.) 0.22 ± 0.05 0.18 ± 0.03

HNR (dB) 5.9 ± 2.5 * 18.5 ± 5.3

200 ms

Spectral flux (a.u.) 276 ± 41 * 238 ± 37

Attack (a.u.) 11.8 ± 3.4 * 8.9 ± 2.8

RMS (a.u.) 0.26 ± 0.08 * 0.21 ± 0.09

HNR (dB) 8.0 ± 5.9 * 19.5 ± 8.7

* p < 0.05.

presented in a pseudo-random order and then repeated

once, in a different order.

Each trial started with a central fixation cross to

reduce eye movements. The duration of the cross

varied from 750 to 1000 ms and was followed by the

auditory stimulus. A variable delay was introduced after

each sound so that the stimulus onset asynchrony was

always 4000 ms. Subjects were instructed to listen

carefully to the stimuli and to press the down arrow of a

keyboard placed in front of them as soon as they heard

the target. Participants were presented with five practice

trials at the beginning to adjust the volume of the ear-

phones and to familiarize themselves with the

procedures and materials. The entire experiment lasted

approximately one hour and a half.

EEG recording and analysis

EEG was recorded from 71 active electrodes at a

sampling rate of 1024 Hz, using a Biosemi ActiveTwo

system (Biosemi, Inc., Netherlands; reference-free

system). Two electrodes were placed bilaterally at the

mastoid, and four additional electrodes were placed for

vertical and horizontal electro-oculogram recording: two

at the outer canthi of the eyes and one above and

below each eye. The EEG was down-sampled to

250 Hz for analyses (EEGLab software, Delorme and

Makeig, 2004; version 9) with MATLAB (R2010b, 7.11)

and re-referenced to the average of all electrodes. After

the recording, a band-pass filter (0.016–30 Hz) was

applied offline. The rejection of artifacts, in particular

eye movements and blinks, was performed using EEG-

Lab and ERPLab plug-in (Lopez-Calderon and Luck,

2014) in a semi-automatic way. When an electrode was

consistently bad, the signal of this electrode was recon-

structed by linear interpolation (no more than three elec-

trodes were interpolated for each participant). For the

rejection of artifacts, we first used EEGLab to automati-

cally reject trials with linear drifts; second we used ERP-

Lab to automatically remove trials in which the

amplitude of the voltage varied more than ±100 lV

d the first 200 ms of each sound category (calculated with MIR toolbox

variation of the spectrum over time; Marozeau et al., 2003). The attack

rticulation estimates the average silence ratio of the onset curve. Other

Harmonic to Noise Ratio (HNR; Fecteau et al., 2007; Lima et al., 2013).

Sig. Human voice

Vocalizations Sig. Speech

2088 ± 360 2268 ± 391

346 ± 83 * 279 ± 53

2.3 ± 0.77 * 3.55 ± 0.56

0.15 ± 0.0 * 0.10 ± 0.00

10.0 ± 6.1 * 12.4 ± 3.6

* 253 ± 47 * 175 ± 133

10.1 ± 5.2 9.5 ± 2.8* 0.19 ± 0.1 * 0.10 ± 0.04* 7.1 ± 6.2 * 12.6 ± 6.0


within temporal windows of 200 ms; we also used ERP-

Lab to remove trials that contained horizontal eye move-

ments (step-like artifacts). Finally, we checked visually

all the trials and removed any remaining trials that were

still contaminated with artifacts. For all participants,

1025% of the trials were removed by this procedure.

In the final step, EEG epochs (�200 to 1000 ms,

relative to stimulus onset) were time-locked to the

stimulus onset, baseline corrected (�200 to 0 ms), and

averaged offline according to the different experimental

categories.

Statistical analyses

Six participants (two female non-musicians, onemale non-

musician, one female musician, two male musicians)

were excluded because too many trials had to be

rejected (more than 30% of trials). Therefore, data of 33

participants were considered in all statistical analyses.

In a first step, we used the properties of principal

component analysis (PCA) to define spatial regions of

interest (Spencer et al., 1999, 2001). We performed a

spatial PCA (Varimax rotation, SPSS V.20 software) with

64 electrode sites as dependent variables and time points

(249), participants (33), and conditions (music/voice) as

observations (Pourtois et al., 2008). Each spatial factor

represents a specific spatial configuration of brain activity

and the factor loading corresponds to the spatial factor’s

contribution to the original variables (i.e., how much the

spatial factor accounts for the voltage recorded at each

electrode). These spatial configurations can be visualized

by topographic maps of factor loadings (Cartool software

v.3.52, D. Brunet, https://sites.google.com/site/fbmlab/

cartool) and are usually defined by considering electrodes

with the highest factor loadings (Delplanque et al., 2006;

Rigoulot et al., 2008, 2011, 2012; Rigoulot and Pell,

2012). Here, a group of electrodes was identified as a

region of interest (ROI) when the loadings of these elec-

trodes were superior to 0.707, corresponding to more

than 50% of the data variance being explained.

To analyze the electrophysiological data, we focused

first on the classical early evoked responses by auditory

stimuli, the N100 and P200 components. To investigate

the effects on the N100 component, we analyzed the

amplitude and the latency of the peaks between 70 and

140 ms in the fronto-central ROI defined by the spatial

PCA, where the N100 is maximal (Naatanen and Picton,

1987). For the P200 component, we analyzed the ampli-

tude and the latency of the peaks between 140 and

300 ms in the central ROI, where the P200 is maximal

(Crowley and Colrain, 2004). These temporal windows

were defined after examination of the grand average

and of the individual peaks, and are similar to the ones

used in the literature. For the first approach, we per-

formed Greenhouse–Geisser corrected ANOVAs on the

peak amplitudes and latencies of N100 and P200 with

sound category (music or vocal sounds) as within-subject

factor and musical expertise as between-subject factor.

In a second approach, we analyzed the mean

amplitudes of the electrophysiological responses in 100-

ms sliding temporal windows, from 0 to 900 ms after the

onset of the stimulus. Mean amplitudes in these

temporal windows were analyzed through ANOVAs for

each ROI revealed by sPCA with sound category as

within-subject factor and musical expertise as between-

subject factor. A p< 0.05 was used as threshold for

statistical significance. When a domain-by-expertise

interaction (the main focus of the study) was found to

be significant, we compared electrophysiological

responses to music and vocal sounds in musicians and

non-musicians with t-tests, Bonferroni-corrected for

multiple tests (t < 0.05).

RESULTS

The initial spatial PCA yielded six factors explaining 78%

of the data variance. We determined groups of electrodes

of interest by clustering the electrodes that were

accounting for more than 50% of the variance of the

data pertaining to each factor. Six groups of electrodes,

each defining an ROI, were identified (see Fig. 1): ROIs

were located in the fronto-central (associated with

electrodes F1, F2, Fz, FC1, FC2, FC4, Cz; ROI1); left

temporal (C5, FT7, T7; ROI2); centro-parietal (CP1,

CP2, CPz, P1, P2, Pz; ROI3); anterior frontal area

(AF7, AF3, AF4, AF8, AFz, Fp1, Fpz, Fp2; ROI4); right

temporal area (C6, FT8, FC6, T8; ROI5) and parieto-

occipital areas of the scalp (electrodes PO7, PO8, Iz,

O1, Oz, O2; ROI6).

(a) N100: Peak amplitudes for the fronto-central ROI

were analyzed in the 70–140-ms time window with

a repeated-measures ANOVA with sound category

(human/voice) as within-subject factor and musical

expertise as between-subject factor. The analysis

revealed an effect of sound category

(F(1,31) = 38.68; p < 0.001), with no significant

interaction with the expertise of participants

(F(1,31) = 0.91; p = 0.35). Amplitudes in response

to music were more negative than the amplitudes in

response to vocal expressions (music:

�2.5 lV± 1.4; voice: �1.8 lV± 1.3; Fig. 2A),

with no significant differences between piano and

violin (piano: �2.4 lV± 1.3; violin: �2.6 lV±

1.5; p = 0.56) or between speech and vocalizations

(vocalizations: �1.8 lV± 1.4; speech: �1.9 lV±

1.2; p = 0.99). The same analysis performed on

the latency of the peaks did not show any significant

main effects of sound category (F(1,31) = 0.78;

p = 0.38) or interaction with musical expertise

(F(1,31) = 0.01; p = 0.90).

(b) P200: Peak amplitudes and latencies of P200 for

the central ROI were analyzed similarly to the

N100. The analysis on the peak amplitude revealed

a main effect of sound category (F(1,31) = 50.45;

p< 0.001), with more positive amplitudes in

response to musical than vocal sounds (Fig. 2B),

with the same pattern of activation for piano and

violin (piano: 2.8 lV± 1.2; violin: 2.8 lV± 1.2;

p = 0.98) and for speech and vocalization (vocal-

izations: 1.9 lV± 1.4; speech: 1.6 lV± 1.4;

p = 0.18). The analysis on the latency of peaks

also revealed an effect of sound category

(F(1,31) = 5.92; p= 0.02), with peaks being later

https://sites.google.com/site/fbmlab/cartool

https://sites.google.com/site/fbmlab/cartool

Fig. 1. Factor loadings obtained from the spatial PCA (bottom, n.u.) of each electrodes are mapped to illustrate where were defined the different

ROIs (top). It should be noted that 0.707 is an arbitrary threshold and represents 50% of the variance of the data.

Fig. 2. Illustration of the early effects of sound category (N100–P200). Grand-average ERPs on FCz and CPz (where the effects are maximal, see

topographies) elicited by vocal (in dotted lines) and musical sounds (in plain lines) in musicians (red) and non-musicians (blue). (For interpretation of

the references to color in this figure legend, the reader is referred to the web version of this article.)


for vocal than musical sounds (207 ms ± 32 vs.

199 ms ± 31). We found that this effect was mainly

driven by speech, as the latency of its peak was

longer than that of the other sound categories

(ps < 0.001), which were not different form each

other (ps > 0.34). There were no other significant

main effects or interactions with the other factors

(ps > 0.25).

(c) ROI analysis: Mean amplitudes for the successive

temporal windows post-stimulus onset revealed in

all the six ROIs defined by the PCA a significant

interaction between sound category and time win-

dows (ps < 0.001). In the fronto-central (ROI1),

amplitudes in response to music were larger than

to vocal expressions, whereas the opposite pattern

was observed in the anterior frontal (ROI4), right

temporal (ROI5) and parieto-occipital (ROI6) ROIs.

In left temporal (ROI2) and centro-parietal (ROI3)

ROIs, amplitudes were first stronger to music and

after 400 ms to speech. Of particular interest, there

were significant interactions between category and

musical expertise of the participants in the anterio-

frontal (ROI4) (p= 0.017) and the right temporal

(ROI5) (p= 0.018) regions. In the anterior frontal

area, this effect was mainly due to significant

differences between music and voice only in non-

musicians, whereas in the right temporal ROI the

interaction was the result of an overall larger

amplitude for voice than music in musicians with

no significant differences between categories for

non-musicians. As can be seen from the time

courses – shown in Figs. 3 and 4 for anterior frontal

and right temporal ROIs, respectively – and con-

firmed by t-tests (p< 0.05, Bonferroni-corrected

Fig. 3. Temporal course of the interaction between sound category and expertise in anterio-frontal area. (A) Mean amplitude in anterio-frontal

electrodes (AF7, AF3, AF4, AF8, AFz, Fp1, Fpz, Fp2) in response to music and vocal sounds in musicians and non-musicians. (B) Summary of

patterns of activation in two temporal windows, one early (from 0 to 500 ms) and one late (from 500 to 900 ms), in response to musical and vocal

sounds in musicians and non-musicians. (C) Difference of amplitude between vocal and musical sounds in musicians and non-musicians at anterio-

frontal cluster of electrodes. Stars indicate significant differences with 0 (t-tests, Bonferroni-corrected for multiple comparisons).


for multiple tests), the response pattern was

different for the early and late parts of the temporal

window. In the anterior frontal region, responses to

vocal expressions were larger than to music in non-

musicians during the first 500 ms (as early as

100 ms) after stimulus onset, with no significant dif-

ferences in the case of musicians. In contrast, late

responses to music were larger than to voice in both

groups (Fig. 3). For the right temporal ROI, activity

for vocal expressions was consistently more posi-

tive than for music in musicians, although the latter

elicited a larger (negative) amplitude in the later part

of the stimulus presentation. In contrast, non-musi-

cians showed no differences between categories in

any of the bins (see Fig. 4). The overall response

patterns did not significantly differ between sub-cat-

egories (i.e., speech/vocalizations and piano/violin).

DISCUSSION

The objective of this study was to investigate the temporal

course of the processing of vocal and musical

expressions and, particularly, to assess whether musical

expertise modulated these responses. We found an

early differentiation of the category of the sound,

within the first 100 ms after the onset of the stimulus,

which was modulated early on by the degree of

musical expertise of the participants, thus confirming

our hypotheses. The significance of these patterns is

discussed below.

Effects of sound category

When we compared music to vocal sounds, we found

larger (more negative N1 and more positive P2)

amplitudes in response to music than to vocal sounds.

Previous investigations comparing music and vocal

sounds are very scarce. In one study, Levy and

colleagues (Levy et al., 2001, 2003) used very short

sounds that were produced by different musical instru-

ments (e.g., violin, flute, trumpet, French horn) or sung

by different types of singers (e.g., Alto, Bass, Baritone,

Mezzo, Soprano). They found that music and vocal

sounds triggered ‘‘equivalent’’ P1, N1 and P2 compo-

nents, in both passive and active listening tasks. Similar

findings were reported in another study (Kaganovich

Fig. 4. Temporal course of the interaction between sound category and expertise in right temporal area. (A) Mean amplitude in right temporal

electrodes (C6, FC6, FT8, T8) in response to music and vocal sounds in musicians and non-musicians. (B) Summary of patterns of activation in two

temporal windows, one early (from 0 to 500 ms) and one late (from 500 to 900 ms), in response to musical and vocal sounds in musicians and non-

musicians. (C) Difference of amplitude between vocal and musical sounds in musicians and non-musicians at right temporal cluster of electrodes.

Stars indicate significant differences with 0 (t-tests, Bonferroni-corrected for multiple comparisons).


et al., 2013) in which neither the amplitude nor the latency

of the peak of N1 was modulated by the category of sound

(the vowel [a], a cello and a French Horn). In contrast, and

in agreement with our results, Meyer and colleagues

(Meyer et al., 2007) found N1 and P2 enhanced ampli-

tudes in response to music sounds (artificial piano tones),

compared to spoken syllables. The authors interpreted

these results as reflecting the more complex spectral pro-

file of musical stimuli, which fits well with our observations

that musical sounds were differentially processed from

both speech and nonlinguistic vocalizations. Moreover,

this interpretation is consistent with the fact that N1 com-

ponent is a measure of early sensory encoding of the

physical properties of sound, such as frequency, com-

plexity and intensity (Naatanen and Picton, 1987) and that

P200 has been traditionally considered to also be modu-

lated by physical features of the stimulus (although there

is also evidence that P200 latency and amplitude are sen-

sitive to learning and attention processes among other

factors; Crowley and Colrain, 2004). Importantly, these

ERP differences could not be explained by category differ-

ences of any of the most common acoustical features

(see Table 1), given that any acoustic parameter that

significantly differed between music and voice also was

different between subcategories (i.e., piano vs. violin

and vocalizations vs. speech; see Table 1), even though

these showed similar EEG response patterns. It is there-

fore likely that a combination of some of these acoustical

features underlay these effects, though we cannot

exclude that other parameters or cognitive factors, such

as attention, could have played a role (Baumann et al.,

2008). It should be noted that we also found an early dif-

ferentiation of musical and vocal sounds in the other

areas of the scalp. As some areas seemed to be more

responsive to musical than vocal sounds and others more

active in response to vocal than to music, these results

could suggest that there is a specialization of two different

pathways for the processing of music and voice, which

would be in line with our recent fMRI studies using the

same musical stimuli (Angulo-Perkins et al., 2014; Aube

et al., in press).

Late effects of human voice and music on ERP

components like N400 or N500 have been already

described in priming paradigms or in language/music

violation detection tasks (Koelsch et al., 2004; Steinbeis

and Koelsch, 2008). These studies have found that syn-

tactic violations in music and speech elicit a late positivity

described as P600 or late positive component (LPC).


Here, we found that human voice triggered higher ampli-

tudes than musical sounds. Similar differences between

speech and music was also found in another study com-

paring the processing of musical (sine wave tone) and

verbal (spoken syllable) stimuli in working memory

(Bittrich et al., 2012). These authors found a larger

N400 amplitude for new compared to old items in the ver-

bal, but not the musical condition. These results, in agree-

ment with our data, suggest that, as could be expected,

differences in the processing of human voice and music

also are present at later latencies. As no semantic mean-

ing was conveyed by our speech stimuli (pseudo-sen-

tences), these late effects could reflect re-evaluation or

sustained attention to the vocalizations and speech pros-

ody, as observed in studies investigating how prosody

can help disambiguate the meaning of a message

(Brouwer et al., 2012; Rigoulot et al., 2014) or how emo-

tional prosody affects early and late ERP components

(Paulmann et al., 2013). Importantly, we found in the

present study that the effects on sound category were

modulated by the expertise of the participants.

Influence of musical expertise

We found that early and late ERP components were

modulated by the musical expertise of participants. In

the anterior frontal area, these effects appeared early, at

100 ms post stimulus onset, and were due to lower

responses to musical sounds than to vocal expressions

for non-musicians and the opposite pattern in the case

of musicians. As shown in Fig. 3, early on musicians

responded similarly to vocal and musical sounds,

whereas non-musicians were only responsive to vocal

sounds. In both groups, responses to music increased

over time (albeit more slowly in the case of non-

musicians), so that after about 1000 ms after stimulus

onset, responses to music were larger than to voice.

These patterns show that expertise modulates

responses to sounds of different categories and that

musicians show an early sensitivity to musical sounds

compared to non-musicians. These results are

consistent with some previous studies reporting that

music training increases the amplitude of early

components, like the N1 and P2 components to musical

notes (Shahin et al., 2003, 2005; Habibi et al., 2013).

For example, in the study reported above, Kaganovitch

and colleagues (2013) found that N1 amplitude in

response to music, vocal and spectrally rotated sounds

was increased in musicians compared to non-musicians.

Importantly, these group differences are unlikely to

reflect differential sensitivity to basic acoustic

parameters, given that musicians were similarly

sensitive to musical stimuli played by piano and violin,

two types of instruments which are relatively different in

terms of acoustic properties. Moreover, we did not find

any difference between speech and vocalizations in the

anterio-frontal area, suggesting that vocal sounds were

processed similarly (but see Pell et al., submitted).

Altogether, these results suggest that musical training is

associated with a general enhancement in the neural

encoding of acoustic properties of complex sounds, and

that this effect generalizes to all types of sounds. For

example, an effect of expertise of musicians on the

processing of speech stimuli has even been described

on the amplitudes of P50, an ERP component which is

peaking 50 ms after the onset of the sounds (Jantzen

et al., 2014). These authors presented speech stimuli dif-

fering in voice onset time (the duration of the delay

between release of closure and start of voicing) and found

using source analysis that musicians engage right hemi-

sphere areas (which are traditionally associated with the

processing of musical sounds) whereas the left hemi-

sphere homologs of these areas were more activated by

non-musicians. In agreement with this, several studies

using fMRI have shown that neural mechanisms involved

in the perception and processing of music overlap with

those devoted to the processing of speech (e.g.,

Rogalsky et al., 2011; Angulo-Perkins et al., 2014) and

Patel proposed that different mechanisms like the repeti-

tion implied by musical training and attentional processes

would explain why musicians benefit from their expertise

to process sounds from other categories (OPERA hypoth-

esis, Patel, 2011). The present study highlights the tem-

poral course of these influences and show that these

effects of expertise can arise very early.

We also observed group differences in some of the

late components of the ERP responses to music and

voice. Other studies also reported late ERP differences

between musicians and non-musicians in oddball

paradigms in which participants detected music or

speech pitch violation (Besson and Faita, 1995; Granot

and Donchin, 2002; Fitzroy and Sanders, 2012; Habibi

et al., 2013). Interestingly, previous studies also sug-

gested that the amplitude of the late components can

be reduced (more negative) when fewer resources are

needed (e.g., Kaan et al., 2000). In our case, the more

negative amplitudes observed for music specifically for

musicians could thus be interpreted as evidence that

musicians need fewer resources to process musical

sounds, possibly given their expertise in the domain. Alto-

gether, our results tend to confirm the idea that musical

training enhances brain sensitivity to musical sounds,

which is also in agreement with several fMRI studies

showing that training in music can lead to important func-

tional reorganization in the brain (Pantev et al., 1998).

Acknowledgements—We are grateful to Mihaela Felezeu for help

with the EEG recording. We also thank William Aube and

Bernard Bouchard for providing the musical stimuli and Isabelle

Peretz for helpful discussions. This research was funded by

grants from the Canadian Institutes of Health Research (CIHR)

and the National Science and Engineering Research Council of

Canada (NSERC) to JLA.

REFERENCES

Angulo-Perkins A, Aube W, Peretz I, Barrios FA, Armony J, Concha L

(2014) Music listening engages specific cortical regions within the

temporal lobes: differences between musicians and non-

musicians. Cortex 59:126–137.

Armony JL, Chochol C, Fecteau S, Belin P (2007) Laugh (or cry) and

you will be remembered. Psychol Sci 18(12):1027–1029.

Aube W, Peretz I, Armony JL (2013) The effects of emotion on

memory for music and vocalisations. Memory 21(8):981–990.

http://refhub.elsevier.com/S0306-4522(15)00088-3/h0005









Aube W, Angulo-Perkins A, Peretz I, Concha L, Armony JL (2014)

Fear across the senses: brain responses to music, vocalizations

and facial expressions. Soc Cogn Affect Neurosci (in press).

Banse R, Scherer KR (1996) Acoustic profiles in vocal emotion

expression. J Pers Soc Psychol 70(3):614–636.

Baumann S, Meyer M, Jancke L (2008) Enhancement of auditory-

evoked potentials in musicians reflects an influence of expertise

but not selective attention. J Cogn Neurosci 20(12):2238–2249.

Bermudez P, Lerch JP, Evans AC, Zatorre RJ (2009)

Neuroanatomical correlates of musicianship as revealed by

cortical thickness and voxel-based morphometry. Cereb Cortex

19:1583–1596.

Besson M, Faita F (1995) An event-related potential (ERP) study of

musical expectancy: comparison of musicians with nonmusicians.

J Exp Psychol Hum Percept Perform 21(6):1278–1296.

Besson M, Schon D, Moreno S, Santos A, Magne C (2007) Influence

of musical expertise and musical training on pitch processing in

music and language. Restorative Neurol Neurosci 25:398–410.

Bittrich K, Schulze K, Koelsch S (2012) Electrophysiological

correlates of verbal and tonal working memory. Brain Res

1432:84–94.

Brattico E, Tervaniemi E, Naatanen R, Peretz I (2006) Musical scale

properties are automatically processed in the human auditory

cortex. Brain Res 1117:162–174.

Brouwer T, Fitz H, Hoeks J (2012) Getting real about semantic

illusions: rethinking the functional role of the P600 in language

comprehension. Brain Res 1446:127–143.

Chartrand J-P, Belin P (2006) Superior voice timbre processing in

musicians. Neurosci Lett 405:164–167.

Crowley KE, Colrain IM (2004) A review of the evidence for P2 being

an independent component process: age, sleep and modality.

Clin Neurophysiol 115(4):732–744.

Delorme A, Makeig S (2004) EEGLAB: an open source toolbox for

analysis of single-trial EEG dynamics. J Neurosci Methods

134:9–21.

Delplanque S, Silvert L, Hot P, Rigoulot S, Sequeira H (2006) Arousal

and valence effects on event-related P3a and P3b during

emotional categorization. Int J Psychophysiol 60(3):315–322.

Elmer S, Hanggi J, Meyer M, Jancke L (2013) Increased cortical

surface area of the left planum temporale in musicians facilitates

the categorization of phonetic and temporal speech sounds.

Cortex 49:2812–2821.

Elmer S, Klein C, Kuhnis J, Liem F, Meyer M, Jancke L (2014) Music

and language expertise influence the categorization of speech

and musical sounds: behavioral and electrophysiological

measurements. J Cogn Neurosci 26(10):2356–2369.

Fecteau S, Belin P, Joanette Y, Armony JL (2007) Amygdala

responses to nonlinguistic emotional vocalizations. NeuroImage

36(2):480–487.

Fitzroy AB, Sanders LD (2012) Musical expertise modulates early

processing of syntactic violations in language. Front Psychol

3(January):603.

Francois C, Jaillet F, Takerkart S, Schon D (2014) Faster sound

stream segmentation in musicians than in nonmusicians. PLoS

One 7:e101340.

Fujioka T, Trainor LJ, Ross B, Kakigi R, Pantev C (2004) Musical

training enhances automatic encoding of melodic contour and

interval structure. J Cogn Neurosci 16(6):1010–1021.

Granot R, Donchin E (2002) Do Re Mi Fa Sol La Ti-constraints,

congruity and musical training: an event-related brain potentials

study of musical expectancies. Music Perception 19(4):487–528.

Guclu B, Sevinc E, Canbeyli R (2011) Duration discrimination by

musicians and nonmusicians. Psychol Rep 108:675–687.

Habibi A, Wirantana V, Starr A (2013) Cortical activity during

perception of musical pitch: comparing musicians and

nonmusicians. Music Perception 30(5):463–479.

Janzten MG, Howe BM, Jantzen KJ (2014) Neurophysiological

evidence that musical training influences the recruitement of

right hemisphere homologues for speech perception. Front

Psychol 5:e171.

Jongsma ML, Desain P, Honing H (2004) Rhythmic context

influences the auditory evoked potentials of musicians and non-

musicians. Biol Psychol 66(2):129–152.

Kaan E, Harris A, Gibson E, Holcomb P (2000) The P600 as an index

of syntactic integration difficulty. Lang Cogn Process 15:159–201.

Kaganovich N, Kim J, Herring C, Schumaker J, Macpherson M,

Weber-Fox C (2013) Musicians show general enhancement of

complex sound encoding and better inhibition of irrelevant

auditory change in music: an ERP study. Eur J Neurosci

37(8):1295–1307.

Koelsch S, Kasper E, Sammler D, Schulze K, Gunter T, Friederici AD

(2004) Music, language and meaning: brain signatures of

semantic processing. Nat Neurosci 7(3):302–307.

Koelsch S, Schroger E, Tervaniemi M (1999) Superior pre-attentive

auditory processing in musicians. NeuroReport 10(6):1309–1313.

Kuhnis J, Elmer S, Meyer M, Jancke L (2013) The encoding of vowels

and temporal speech cues in the auditory cortex of professional

musicians: an EEG study. Neuropsychologia 51(8):1608–1618.

Lai G, Pantazatos SP, Schneider H, Hirsch J (2012) Neural systems

for speech and song in autism. Brain 135:961–975.

Lartillot O, Toiviainen P, Eerola T (2008) A matlab toolbox for music

information retrieval. In: Preisah C, Burkhardt H, Schmidt-Thieme

L, Decker R, editors. Data analysis machine learning and

applications, studies in classification, data analysis, and

knowledge organization. Springer-Verlag. p. 261–268.

Levy DA, Granot R, Bentin S (2001) Processing specificity for human

voice stimuli: electrophysiological evidence. NeuroReport

12(12):2653–2657.

Levy DA, Granot R, Bentin S (2003) Neural sensitivity to human

voices: ERP evidence of task and attentional influences.

Psychophysiology 40(2):291–305.

Lima CF, Castro SL, Scott SK (2013) When voices get emotional: a

corpus of nonverbal vocalizations for research on emotion

processing. Behav Res Methods 45(4):1234–1245.

Lopez-Calderon J, Luck SJ (2014) ERPLAB: an open-source toolbox

for the analysis of event-related potentials. Front Hum Neurosci

8:213.

Magne C, Schon D, Besson M (2006) Musician children detect pitch

violations in both music and language better than nonmusician

children: behavioral and electrophysiological approaches. J Cogn

Neurosci 18(2):199–211.

Marie C, Magne C, Besson M (2011) Musicians and the metric

structure of words. J Cogn Neurosci 23(2):294–305.

Marozeau J, de Cheveigne A, McAdams S, Winsberg S (2003) The

dependency of timbre on fundamental frequency. J Acoust Soc

Am 114(5):2946.

Meyer M, Elmer S, Baumann S, Jancke L (2007) Short-term plasticity

in the auditory system: differential neural responses to perception

and imagery of speech and music. Restorative Neurol Neurosci

25(3–4):411–431.

Musacchia G, Sams M, Skoe E, Kraus N (2007) Musicians have

enhanced subcortical auditory and audiovisual processing of

speech and music. PNAS 104(40):15894–15898.

Naatanen R, Gaillard AWK, Mantysalo S (1978) Early selective-

attention effect on evoked potential reinterpreted. Acta Psychol

42:313–329.

Naatanen R, Picton T (1987) The N1 wave of the human electric and

magnetic response to sound: a review and an analysis of the

component structure. Psychophysiology 24(4):375–425.

Pantev C, Oostenveld R, Engelien A, Ross B, Roberts LE, Hoke M

(1998) Increased auditory cortical representation in musicians.

Nature 392(6678):811–814.

Patel AD (2011) Why would musical training benefit the neural

encoding of speech? The OPERA hypothesis. Front Pscyhol

2:e142.

Paulmann S, Bleichner M, Kotz SA (2013) Valence, arousal, and task

effects in emotional prosody processing. Front Psychol 4:e345.

Pell MD, Kotz SA (2011) On the time course of vocal emotion

recognition. PLoS One 6(11):e27256.










































































































































Pell MD, Paulmann S, Dara C, Alasseri A, Kotz SA (2009) Factors in

the recognition of vocally expressed emotions: a comparison of

four languages. J Phon 37:417–435.

Pourtois G, Delplanque S, Michel C, Vuilleumier P, Vuilleumier ÆP

(2008) Beyond conventional event-related brain potential (ERP):

exploring the time-course of visual emotion processing using

topographic and principal component analyses. Brain Topogr

20(4):265–277.

Rammsayer T, Altenmuller E (2006) Temporal information

processing in musicians and nonmusicians. Music Perception

24:37–48.

Repp BH (2010) Sensorimotor synchronization and perception of

timing: effects of music training and task experience. Hum Mov

Sci 29:200–213.

Rigoulot S, D’Hondt F, Defoort-Dhellemmes S, Despretz P, Honore J,

Sequeira H (2011) Fearful faces impact in peripheral vision:

behavioral and neural evidence. Neuropsychologia 49(7):

2013–2021.

Rigoulot S, D’Hondt F, Honore J, Sequeira H (2012) Implicit

emotional processing in peripheral vision: behavioral and neural

evidence. Neuropsychologia 50(12):2887–2896.

Rigoulot S, Delplanque S, Despretz P, Defoort-Dhellemmes S,

Honore J, Sequeira H (2008) Peripherally presented emotional

scenes: a spatiotemporal analysis of early ERP responses. Brain

Topogr 20(4):216–223.

Rigoulot S, Pell MD (2012) Seeing emotion with your ears: emotional

prosody implicitly guides visual attention to faces. PLoS One

7(1):e30740.

Rigoulot S, Fish K, Pell MD (2014) Neural correlates of inferring

speaker sincerity from white lies: an event-related potential

source localization study. Brain Res 1565:48–62.

Rogalsky C, Rong F, Saberi K, Hickok G (2011) Functional anatomy

of language and music perception: temporal and structural factors

investigated using fMRI. J Neurosci 31(10):3843–3852.

Schon D, Magne C, Besson M (2004) The music of speech: music

training facilitates pitch processing in both music and language.

Psychophysiology 41:341–349.

Seppanen M, Pesonen AK, Tervaniemi M (2012) Music training

enhances the rapid plasticity of P3a/P3b event-related brain

potentials for unattended and attended target sounds. Atten

Percept Psychophys 74(3):600–612.

Shahin AJ (2011) Neurophysiological influence of musical training on

speech perception. Front Psychol 2:126.

Shahin A, Bosnyak DJ, Trainor LJ, Roberts LE (2003) Enhancement

of neuroplastic P2 and N1c auditory evoked potentials in

musicians. J Neurosci 23(13):5545–5552.

Shahin AJ, Roberts LE, Miller LM, McDonald KL, Alain C (2007)

Sensitivity of EEG and MEG to the N1 and P2 auditory evoked

responses modulated by spectral complexity of sounds. Brain

Topogr 20(2):55–61.

Shahin A, Roberts LE, Pantev C, Trainor LJ, Ross B (2005)

Modulation of P2 auditory-evoked responses by the spectral

complexity of musical sounds. NeuroReport 16(16):1781–1785.

Spencer KM, Dien J, Donchin E (1999) A componential analysis of

the ERP elicited by novel events using a dense electrode array.

Psychophysiology 36(3):409–414.

Spencer KM, Dien J, Donchin E (2001) Spatiotemporal analysis of the

late ERP responses to deviant stimuli. Psychophysiology

38(2):343–358.

Steinbeis N, Koelsch S (2008) Shared neural resources between

music and language indicate semantic processing of musical

tension-resolution patterns. Cerebral Cortex (New York, NY:

1991) 18(5):1169–1178.

Tervaniemi M, Rytkonen M, Schroger E, Ilmoniemi RJ, Naatanen R

(2001) Superior formation of cortical memory traces for melodic

patterns in musicians. Learn Mem 8(5):295–300.

Tierney A, Dick F, Deutsch D, Sereno M (2013) Speech versus song:

multiple pitch-sensitive areas revealed by a naturally occurring

musical illusion. Cereb Cortex 23:249–254.

Trainor L, Desjardins R, Rockel C (1999) A comparison of contour

and interval processing in musicians and nonmusicians using

event-related potentials. Aust J Psychol 51:147–153.

Ungan P, Berki T, Erbil N, Yagcioglu S, Yuksel M, Utkucal R (2013)

Event-related potentials to changes of rhythmic unit: differences

between musicians and nonmusicians. Neurol Sci 34(1):25–39.

Vieillard S, Peretz I, Khalfa S, Gagnon L, Bouchard B (2008) Happy,

sad, scary and peaceful musical excerpts for research on

emotions. Cogn Emot 22(4):37–41.

Virtala P, Huotilainen M, Partanen E, Tervaniemi M (2014)

Musicianship facilitates the processing of Western music

chords–an ERP and behavioral study. Neuropsychologia

61:247–258.

Zuk J, Ozernov-Palchik O, Kim H, Lakshminarayanan K, Gabrielli JD,

Talla P, Gaab N (2013) Enhanced syllable discrimination

thresholds in musicians. PLoS One 8(12):e80546.

(Accepted 12 January 2015)(Available online 28 January 2015)