Neuroscience 290 (2015) 175–184
TIME COURSE OF THE INFLUENCE OF MUSICAL EXPERTISEON THE PROCESSING OF VOCAL AND MUSICAL SOUNDS
S. RIGOULOT, a,b* M. D. PELL a,c AND J. L. ARMONY a,b
aCentre for Research on Brain, Language and Music
(CRBLM), Montreal, Canada
bDepartment of Psychiatry, McGill University and Douglas
Mental Health University Institute, Montreal, Canada
cSchool of Communication Sciences and Disorders, McGill
University, Canada
Abstract—Previous functional magnetic resonance imaging
(fMRI) studies have suggested that different cerebral
regions preferentially process human voice and music.
Yet, little is known on the temporal course of the brain
processes that decode the category of sounds and how
the expertise in one sound category can impact these
processes. To address this question, we recorded the
electroencephalogram (EEG) of 15 musicians and 18 non-
musicians while they were listening to short musical
excerpts (piano and violin) and vocal stimuli (speech and
non-linguistic vocalizations). The task of the participants
was to detect noise targets embedded within the stream of
sounds. Event-related potentials revealed an early differenti-
ation of sound category, within the first 100 ms after the
onset of the sound, with mostly increased responses to
musical sounds. Importantly, this effect was modulated by
the musical background of participants, as musicians were
more responsive to music sounds than non-musicians, con-
sistent with the notion that musical training increases sensi-
tivity to music. In late temporal windows, brain responses
were enhanced in response to vocal stimuli, but musicians
were still more responsive to music. These results shed
new light on the temporal course of neural dynamics of
auditory processing and reveal how it is impacted by the
stimulus category and the expertise of participants.
� 2015 IBRO. Published by Elsevier Ltd. All rights reserved.
Key words: music, voice, vocalizations, speech prosody,
ERPs, expertise.
INTRODUCTION
When people are repeatedly exposed to the same type of
stimulus, they can develop a certain expertise, which
often leads to faster, better and less effortful processing
http://dx.doi.org/10.1016/j.neuroscience.2015.01.0330306-4522/� 2015 IBRO. Published by Elsevier Ltd. All rights reserved.
*Correspondence to: S. Rigoulot, Douglas Mental Health UniversityInstitute, 6875 LaSalle Boulevard, Montreal, Quebec H4H 1R3,Canada.
E-mail address: [email protected] (S. Rigoulot).Abbreviations: EEG, electroencephalogram; ERP, event-relatedpotential; fMRI, functional magnetic resonance imaging; PCA,principal component analysis; ROI, region of interest.
175
of this stimulus. This appears to be particularly true in
the case of musicians, who are more accurate than
non-musicians to discriminate musical timbre (Chartrand
and Belin, 2006) and sound duration (Rammsayer and
Altenmuller, 2006; Guclu et al., 2011). They also more
easily detect pitch violations within melody (Brattico
et al., 2006; Habibi et al., 2013) and synchronize more
precisely to sounds (Repp, 2010).
The neural correlates of such advantages have been
recently investigated. Using brain imaging techniques,
enhancement of brain activity in response to musical
sounds has been evidenced in musicians, relative to non-
musicians. For instance, functional magnetic resonance
imaging (fMRI) studies have shown that, although
planum polare responds more preferentially to musical
than to other complex sounds regardless of musical
expertise (Lai et al., 2012; Tierney et al., 2013), this pattern
is more prevalent in musicians than in non-musicians
(Angulo-Perkins et al., 2014). This is consistent with find-
ings that musical training is associated with altered gray
matter architecture in the left planum temporale
(Bermudez et al., 2009; Elmer et al., 2013). Using magne-
toencephalography, Pantev et al. (1998) also showed that
cortical responses to piano, but not to pure tones, were
greater in musicians than non-musicians. Furthermore,
the amplitude of these responses was correlated with the
age at which musicians began their musical training. Sev-
eral electroencephalographic studies have also revealed
an increase of the amplitude of event-related potential
(ERP) components (N100, P200, MMN, P300 among oth-
ers) in musicians (Trainor et al., 1999; Shahin et al., 2003,
2007; Jongsma et al., 2004;Magne et al., 2006;Musacchia
et al., 2007; Seppanen et al., 2012; Habibi et al., 2013;
Kaganovich et al., 2013; Ungan et al., 2013; Virtala et al.,
2014). For example, Shahin et al. (2003) found that highly
skilled violinists and pianists exhibited larger N1 and P2
responses compared with non-musicians when they pas-
sively listened to musical tones (violin, piano) and pure
tones matched in fundamental frequency to the musical
tones. Virtala et al. (2014) showed that musicians outper-
formed non-musicians in a discrimination task, a pattern
that was associated with a larger N1 amplitude in musi-
cians than in non-musicians. Another source of evidence
comes from the Mismatch Negativity (MMN, Naatanen
et al., 1978), a component reflecting pre-attentive auditory
processing, which is larger and/or earlier in musicians than
in non-musicians to pitch changes (Pantev et al., 1998;
Koelsch et al., 1999; Tervaniemi et al., 2001; Fujioka
et al., 2004). Finally, Ungan et al. (2013) found that
176 S. Rigoulot et al. / Neuroscience 290 (2015) 175–184
musicians’ better behavioral performance in an oddball
task was linked to earlier and significantly larger P300 to
rhythm changes.
Interestingly, there is some behavioral (Chartrand and
Belin, 2006; Zuk et al., 2013) and EEG (Schon et al.,
2004; Marie et al., 2011; Kuhnis et al., 2013; Elmer
et al., 2014; Francois et al., 2014; Janzten et al., 2014;
for reviews, see Besson et al., 2007; Patel, 2011;
Shahin, 2011) evidence suggesting that musical expertise
is associated with enhanced processing of not only music,
but also vocal sounds, in particular speech. However, only
a couple of studies directly compared the effects of musi-
cal expertise in brain responses to both music and voice
in the same participants. Schon et al. (2004) manipulated
the fundamental frequency of final notes of melodies or
final words of sentences. When they played these stimuli
to musicians and non-musicians, they found that F0 viola-
tions in both language and music elicited large positive
components, with a shorter latency in musicians. Elmer
et al. (2014) used non-morphed and morphed sounds to
investigate if the expertise of participants in music and
speech (simultaneous interpreters) biased their percep-
tion of these morphs (speech to noise, music to noise
and speech to music). They found in the music-to-noise
condition that musicians and speech experts were simi-
larly biased to the musical part of the morphing, and linked
their results to the behavior of N400 component.
Thus, although previous studies generally support the
notion of enhanced processing of both music and voice in
musicians, the lack of direct comparisons between
categories or the use of a restricted set of stimulus
types (e.g., only speech for the voice category) limits
their generalizability. Moreover, most of the previous
EEG studies focused on specific (early) ERP
components, leaving open the question of whether other
effects can also be observed at longer delays.
Therefore, the aim of this study was to further
determine whether musical expertise significantly
modulates the temporal characteristics of the processing
of auditory expressions conveyed through human voice
(using both speech and nonlinguistic vocalizations) and
musical sounds (short unfamiliar excerpts played with
piano or violin), using previously validated stimuli in the
same experimental session. Based on the literature
described above, we expected stronger responses in
musicians to music and speech sounds. The key
question, however, was whether similar effects would
also be observed for nonlinguistic vocalizations, which,
although also an important part the human vocal
repertoire, are very different from speech in terms of their
acoustic characteristics. A second question was whether,
in addition to the predicted expertise-related effects in the
early ERP components (N1 and P2), there would also be
differences in later parts of the EEG response, once
more information about the stimuli becomes available.
EXPERIMENTAL PROCEDURES
Participants
Thirty-nine native Canadian French or English speakers
(18 men/21 women from 20 to 32 years old, mean age:
24.5 ± 3.6 years old) recruited through campus
advertisements participated in the study. Based on self-
report, 31 of the participants were right-handed and all
had normal hearing and normal/corrected-to-normal
vision. Before the experiment, each participant
completed a questionnaire to establish basic
demographic information (age, handedness, and
language abilities) and musical expertise (musical
training, instrument played, and daily activity).
Participants were then assigned into one of two groups
as a function of their musical expertise: participants who
had more than five years of musical training and were
playing at least one instrument on a daily basis were
considered to be musicians (8 men/10 women, mean
age: 24.2 years), whereas the other participants were
considered to be ‘‘non-musicians’’ (10 men/11 women,
mean age: 24.7 years). All members of the musicians
group were professional musicians or University music
students with extensive theoretical and practical training
(average duration of music practice: 12.5 years [range:
9–23]). In contrast, non-musicians had no or very little
formal music training (average duration of music practice:
1.2 years [range: 0–2]). Informed written consent was
obtained from each participant prior to entering the study
and they received $20 compensation for their participation.
Apparatus
Auditory and visual stimuli were presented via E-Prime 2
software (Psychology Software Tools) on a SONY
(Trinitron) monitor with a Xeron Intel� computer (3 GHz,
Windows XP) and through insert-earphones (Etymotic
ER-2). EEG was recorded on 71 electrodes using a
Biosemi ActiveTwo system (Biosemi, Inc., Netherlands)
connected to an Intel Xeron� computer (3 GHz,
Windows XP).
MATERIALS
Auditory stimuli from two different categories, musicor human voice, were employed
a. Musical sounds were short excerpts that were unfa-
miliar and followed the rules of Western tonal music.
On the basis of the behavioral ratings on the emo-
tional content of these excerpts (Vieillard et al.,
2008; Aube et al., 2013), we selected 64 musical
sounds, played by two types of instrument (piano
and violin, n= 32 each). These sounds included
expressions of fear, happiness, sadness or they
were ‘‘neutral’’ or ‘‘peaceful’’ (with valence ratings
significantly less positive than happy excerpts and
less negative than fearful or sad ones; Aube et al.,
in press). There were 8 different sounds for each
emotion and condition.b. Human vocal sounds were non-linguistic vocaliza-
tions and pseudo-utterances. Non-linguistic vocal-
izations were selected from a previously employed
dataset (Armony et al., 2007; Fecteau et al.,
2007). They reliably expressed fear (screams), sad-
ness (cries) or happiness (pleasure or laughter), or
were emotionally neutral (coughs and yawns). Emo-
tionally inflected pseudo-utterances selected from
S. Rigoulot et al. / Neuroscience 290 (2015) 175–184 177
the database of Pell et al. (2009) exploited the pho-
nological and morpho-syntactic properties of Eng-
lish, in the absence of meaningful lexical-semantic
cues about emotion (e.g., Someone migged thepazing; Banse and Scherer, 1996; Pell et al.,
2011; Rigoulot and Pell, 2012). On the basis of
the results from the validation study (Pell et al.,
2009), we selected 32 sentences whose tone was
fearful, sad, happy, or neutral. Altogether, we pre-
sented 64 items of human vocal sounds (32 vocal-
izations and 32 speech, with the same number of
female and male voices), that were expressing fear,
sadness, happiness, or were neutral (n= 8 for
each emotion and condition).
Music and vocal sounds were 2-s long on average
(range: 1.5–3.0 s). The mean durations of the stimuli for
each condition are shown in Table 1. We also computed
some of the main acoustic parameters for each sound
category with MIRtoolbox (Lartillot et al., 2008); Table 1).
In addition, sixteen 500-ms pure tones (eight 1-kHz
and eight 100-Hz tones) were used as targets. All
sounds were normalized in terms of peak intensity using
Adobe Audition 3.0.1 (Adobe Systems, San Jose, CA,
USA) and were presented at 75 dB.
Experimental design/procedure
Participants were invited to take part in a study on music
processing. After the experimental procedures were
explained to them, they were seated in a sound-
attenuated booth at a distance of 75 cm from the
computer screen. After the setup of the cap and
electrodes, the quality of the EEG signal was checked
and participants were informed about the problem of
artifacts and how to reduce them. E-prime 2 software
(Psychology Software Tools) was used for the auditory
presentation of music and vocal sounds via insert
earphones (ER-2 Tubephone, Etymotic Research).
We used 256 experimental trials (64 musical and 64
vocal sounds; n= 16 for each emotion and each sound
category) and 16 targets (pure tones). All sounds were
Table 1. Summary of the main acoustical parameters for the total duration an
(Lartillot et al., 2008). The spectral flux conveys spectrotemporal information (
quantifies the rising time of the energy during the transient portion of a signal. A
measures were also computed such as the Root Mean Square (RMS) and the
dB: decibel; a.u.: arbitratory units; n.u.: no units
Music
Piano Sig. Violin
Duration (ms) 2192 ± 172 2106 ± 93
Spectral flux (a.u.) 328 ± 47 321 ± 44
Attack (a.u.) 2.88 ± 0.90 2.72 ± 0.68
RMS (a.u.) 0.22 ± 0.05 0.18 ± 0.03
HNR (dB) 5.9 ± 2.5 * 18.5 ± 5.3
200 ms
Spectral flux (a.u.) 276 ± 41 * 238 ± 37
Attack (a.u.) 11.8 ± 3.4 * 8.9 ± 2.8
RMS (a.u.) 0.26 ± 0.08 * 0.21 ± 0.09
HNR (dB) 8.0 ± 5.9 * 19.5 ± 8.7
* p < 0.05.
presented in a pseudo-random order and then repeated
once, in a different order.
Each trial started with a central fixation cross to
reduce eye movements. The duration of the cross
varied from 750 to 1000 ms and was followed by the
auditory stimulus. A variable delay was introduced after
each sound so that the stimulus onset asynchrony was
always 4000 ms. Subjects were instructed to listen
carefully to the stimuli and to press the down arrow of a
keyboard placed in front of them as soon as they heard
the target. Participants were presented with five practice
trials at the beginning to adjust the volume of the ear-
phones and to familiarize themselves with the
procedures and materials. The entire experiment lasted
approximately one hour and a half.
EEG recording and analysis
EEG was recorded from 71 active electrodes at a
sampling rate of 1024 Hz, using a Biosemi ActiveTwo
system (Biosemi, Inc., Netherlands; reference-free
system). Two electrodes were placed bilaterally at the
mastoid, and four additional electrodes were placed for
vertical and horizontal electro-oculogram recording: two
at the outer canthi of the eyes and one above and
below each eye. The EEG was down-sampled to
250 Hz for analyses (EEGLab software, Delorme and
Makeig, 2004; version 9) with MATLAB (R2010b, 7.11)
and re-referenced to the average of all electrodes. After
the recording, a band-pass filter (0.016–30 Hz) was
applied offline. The rejection of artifacts, in particular
eye movements and blinks, was performed using EEG-
Lab and ERPLab plug-in (Lopez-Calderon and Luck,
2014) in a semi-automatic way. When an electrode was
consistently bad, the signal of this electrode was recon-
structed by linear interpolation (no more than three elec-
trodes were interpolated for each participant). For the
rejection of artifacts, we first used EEGLab to automati-
cally reject trials with linear drifts; second we used ERP-
Lab to automatically remove trials in which the
amplitude of the voltage varied more than ±100 lV
d the first 200 ms of each sound category (calculated with MIR toolbox
variation of the spectrum over time; Marozeau et al., 2003). The attack
rticulation estimates the average silence ratio of the onset curve. Other
Harmonic to Noise Ratio (HNR; Fecteau et al., 2007; Lima et al., 2013).
Sig. Human voice
Vocalizations Sig. Speech
2088 ± 360 2268 ± 391
346 ± 83 * 279 ± 53
2.3 ± 0.77 * 3.55 ± 0.56
0.15 ± 0.0 * 0.10 ± 0.00
10.0 ± 6.1 * 12.4 ± 3.6
* 253 ± 47 * 175 ± 133
10.1 ± 5.2 9.5 ± 2.8* 0.19 ± 0.1 * 0.10 ± 0.04* 7.1 ± 6.2 * 12.6 ± 6.0
178 S. Rigoulot et al. / Neuroscience 290 (2015) 175–184
within temporal windows of 200 ms; we also used ERP-
Lab to remove trials that contained horizontal eye move-
ments (step-like artifacts). Finally, we checked visually
all the trials and removed any remaining trials that were
still contaminated with artifacts. For all participants,
1025% of the trials were removed by this procedure.
In the final step, EEG epochs (�200 to 1000 ms,
relative to stimulus onset) were time-locked to the
stimulus onset, baseline corrected (�200 to 0 ms), and
averaged offline according to the different experimental
categories.
Statistical analyses
Six participants (two female non-musicians, onemale non-
musician, one female musician, two male musicians)
were excluded because too many trials had to be
rejected (more than 30% of trials). Therefore, data of 33
participants were considered in all statistical analyses.
In a first step, we used the properties of principal
component analysis (PCA) to define spatial regions of
interest (Spencer et al., 1999, 2001). We performed a
spatial PCA (Varimax rotation, SPSS V.20 software) with
64 electrode sites as dependent variables and time points
(249), participants (33), and conditions (music/voice) as
observations (Pourtois et al., 2008). Each spatial factor
represents a specific spatial configuration of brain activity
and the factor loading corresponds to the spatial factor’s
contribution to the original variables (i.e., how much the
spatial factor accounts for the voltage recorded at each
electrode). These spatial configurations can be visualized
by topographic maps of factor loadings (Cartool software
v.3.52, D. Brunet, https://sites.google.com/site/fbmlab/
cartool) and are usually defined by considering electrodes
with the highest factor loadings (Delplanque et al., 2006;
Rigoulot et al., 2008, 2011, 2012; Rigoulot and Pell,
2012). Here, a group of electrodes was identified as a
region of interest (ROI) when the loadings of these elec-
trodes were superior to 0.707, corresponding to more
than 50% of the data variance being explained.
To analyze the electrophysiological data, we focused
first on the classical early evoked responses by auditory
stimuli, the N100 and P200 components. To investigate
the effects on the N100 component, we analyzed the
amplitude and the latency of the peaks between 70 and
140 ms in the fronto-central ROI defined by the spatial
PCA, where the N100 is maximal (Naatanen and Picton,
1987). For the P200 component, we analyzed the ampli-
tude and the latency of the peaks between 140 and
300 ms in the central ROI, where the P200 is maximal
(Crowley and Colrain, 2004). These temporal windows
were defined after examination of the grand average
and of the individual peaks, and are similar to the ones
used in the literature. For the first approach, we per-
formed Greenhouse–Geisser corrected ANOVAs on the
peak amplitudes and latencies of N100 and P200 with
sound category (music or vocal sounds) as within-subject
factor and musical expertise as between-subject factor.
In a second approach, we analyzed the mean
amplitudes of the electrophysiological responses in 100-
ms sliding temporal windows, from 0 to 900 ms after the
onset of the stimulus. Mean amplitudes in these
temporal windows were analyzed through ANOVAs for
each ROI revealed by sPCA with sound category as
within-subject factor and musical expertise as between-
subject factor. A p< 0.05 was used as threshold for
statistical significance. When a domain-by-expertise
interaction (the main focus of the study) was found to
be significant, we compared electrophysiological
responses to music and vocal sounds in musicians and
non-musicians with t-tests, Bonferroni-corrected for
multiple tests (t < 0.05).
RESULTS
The initial spatial PCA yielded six factors explaining 78%
of the data variance. We determined groups of electrodes
of interest by clustering the electrodes that were
accounting for more than 50% of the variance of the
data pertaining to each factor. Six groups of electrodes,
each defining an ROI, were identified (see Fig. 1): ROIs
were located in the fronto-central (associated with
electrodes F1, F2, Fz, FC1, FC2, FC4, Cz; ROI1); left
temporal (C5, FT7, T7; ROI2); centro-parietal (CP1,
CP2, CPz, P1, P2, Pz; ROI3); anterior frontal area
(AF7, AF3, AF4, AF8, AFz, Fp1, Fpz, Fp2; ROI4); right
temporal area (C6, FT8, FC6, T8; ROI5) and parieto-
occipital areas of the scalp (electrodes PO7, PO8, Iz,
O1, Oz, O2; ROI6).
(a) N100: Peak amplitudes for the fronto-central ROI
were analyzed in the 70–140-ms time window with
a repeated-measures ANOVA with sound category
(human/voice) as within-subject factor and musical
expertise as between-subject factor. The analysis
revealed an effect of sound category
(F(1,31) = 38.68; p < 0.001), with no significant
interaction with the expertise of participants
(F(1,31) = 0.91; p = 0.35). Amplitudes in response
to music were more negative than the amplitudes in
response to vocal expressions (music:
�2.5 lV± 1.4; voice: �1.8 lV± 1.3; Fig. 2A),
with no significant differences between piano and
violin (piano: �2.4 lV± 1.3; violin: �2.6 lV±
1.5; p = 0.56) or between speech and vocalizations
(vocalizations: �1.8 lV± 1.4; speech: �1.9 lV±
1.2; p = 0.99). The same analysis performed on
the latency of the peaks did not show any significant
main effects of sound category (F(1,31) = 0.78;
p = 0.38) or interaction with musical expertise
(F(1,31) = 0.01; p = 0.90).
(b) P200: Peak amplitudes and latencies of P200 for
the central ROI were analyzed similarly to the
N100. The analysis on the peak amplitude revealed
a main effect of sound category (F(1,31) = 50.45;
p< 0.001), with more positive amplitudes in
response to musical than vocal sounds (Fig. 2B),
with the same pattern of activation for piano and
violin (piano: 2.8 lV± 1.2; violin: 2.8 lV± 1.2;
p = 0.98) and for speech and vocalization (vocal-
izations: 1.9 lV± 1.4; speech: 1.6 lV± 1.4;
p = 0.18). The analysis on the latency of peaks
also revealed an effect of sound category
(F(1,31) = 5.92; p= 0.02), with peaks being later
Fig. 1. Factor loadings obtained from the spatial PCA (bottom, n.u.) of each electrodes are mapped to illustrate where were defined the different
ROIs (top). It should be noted that 0.707 is an arbitrary threshold and represents 50% of the variance of the data.
Fig. 2. Illustration of the early effects of sound category (N100–P200). Grand-average ERPs on FCz and CPz (where the effects are maximal, see
topographies) elicited by vocal (in dotted lines) and musical sounds (in plain lines) in musicians (red) and non-musicians (blue). (For interpretation of
the references to color in this figure legend, the reader is referred to the web version of this article.)
S. Rigoulot et al. / Neuroscience 290 (2015) 175–184 179
for vocal than musical sounds (207 ms ± 32 vs.
199 ms ± 31). We found that this effect was mainly
driven by speech, as the latency of its peak was
longer than that of the other sound categories
(ps < 0.001), which were not different form each
other (ps > 0.34). There were no other significant
main effects or interactions with the other factors
(ps > 0.25).
(c) ROI analysis: Mean amplitudes for the successive
temporal windows post-stimulus onset revealed in
all the six ROIs defined by the PCA a significant
interaction between sound category and time win-
dows (ps < 0.001). In the fronto-central (ROI1),
amplitudes in response to music were larger than
to vocal expressions, whereas the opposite pattern
was observed in the anterior frontal (ROI4), right
temporal (ROI5) and parieto-occipital (ROI6) ROIs.
In left temporal (ROI2) and centro-parietal (ROI3)
ROIs, amplitudes were first stronger to music and
after 400 ms to speech. Of particular interest, there
were significant interactions between category and
musical expertise of the participants in the anterio-
frontal (ROI4) (p= 0.017) and the right temporal
(ROI5) (p= 0.018) regions. In the anterior frontal
area, this effect was mainly due to significant
differences between music and voice only in non-
musicians, whereas in the right temporal ROI the
interaction was the result of an overall larger
amplitude for voice than music in musicians with
no significant differences between categories for
non-musicians. As can be seen from the time
courses – shown in Figs. 3 and 4 for anterior frontal
and right temporal ROIs, respectively – and con-
firmed by t-tests (p< 0.05, Bonferroni-corrected
Fig. 3. Temporal course of the interaction between sound category and expertise in anterio-frontal area. (A) Mean amplitude in anterio-frontal
electrodes (AF7, AF3, AF4, AF8, AFz, Fp1, Fpz, Fp2) in response to music and vocal sounds in musicians and non-musicians. (B) Summary of
patterns of activation in two temporal windows, one early (from 0 to 500 ms) and one late (from 500 to 900 ms), in response to musical and vocal
sounds in musicians and non-musicians. (C) Difference of amplitude between vocal and musical sounds in musicians and non-musicians at anterio-
frontal cluster of electrodes. Stars indicate significant differences with 0 (t-tests, Bonferroni-corrected for multiple comparisons).
180 S. Rigoulot et al. / Neuroscience 290 (2015) 175–184
for multiple tests), the response pattern was
different for the early and late parts of the temporal
window. In the anterior frontal region, responses to
vocal expressions were larger than to music in non-
musicians during the first 500 ms (as early as
100 ms) after stimulus onset, with no significant dif-
ferences in the case of musicians. In contrast, late
responses to music were larger than to voice in both
groups (Fig. 3). For the right temporal ROI, activity
for vocal expressions was consistently more posi-
tive than for music in musicians, although the latter
elicited a larger (negative) amplitude in the later part
of the stimulus presentation. In contrast, non-musi-
cians showed no differences between categories in
any of the bins (see Fig. 4). The overall response
patterns did not significantly differ between sub-cat-
egories (i.e., speech/vocalizations and piano/violin).
DISCUSSION
The objective of this study was to investigate the temporal
course of the processing of vocal and musical
expressions and, particularly, to assess whether musical
expertise modulated these responses. We found an
early differentiation of the category of the sound,
within the first 100 ms after the onset of the stimulus,
which was modulated early on by the degree of
musical expertise of the participants, thus confirming
our hypotheses. The significance of these patterns is
discussed below.
Effects of sound category
When we compared music to vocal sounds, we found
larger (more negative N1 and more positive P2)
amplitudes in response to music than to vocal sounds.
Previous investigations comparing music and vocal
sounds are very scarce. In one study, Levy and
colleagues (Levy et al., 2001, 2003) used very short
sounds that were produced by different musical instru-
ments (e.g., violin, flute, trumpet, French horn) or sung
by different types of singers (e.g., Alto, Bass, Baritone,
Mezzo, Soprano). They found that music and vocal
sounds triggered ‘‘equivalent’’ P1, N1 and P2 compo-
nents, in both passive and active listening tasks. Similar
findings were reported in another study (Kaganovich
Fig. 4. Temporal course of the interaction between sound category and expertise in right temporal area. (A) Mean amplitude in right temporal
electrodes (C6, FC6, FT8, T8) in response to music and vocal sounds in musicians and non-musicians. (B) Summary of patterns of activation in two
temporal windows, one early (from 0 to 500 ms) and one late (from 500 to 900 ms), in response to musical and vocal sounds in musicians and non-
musicians. (C) Difference of amplitude between vocal and musical sounds in musicians and non-musicians at right temporal cluster of electrodes.
Stars indicate significant differences with 0 (t-tests, Bonferroni-corrected for multiple comparisons).
S. Rigoulot et al. / Neuroscience 290 (2015) 175–184 181
et al., 2013) in which neither the amplitude nor the latency
of the peak of N1 was modulated by the category of sound
(the vowel [a], a cello and a French Horn). In contrast, and
in agreement with our results, Meyer and colleagues
(Meyer et al., 2007) found N1 and P2 enhanced ampli-
tudes in response to music sounds (artificial piano tones),
compared to spoken syllables. The authors interpreted
these results as reflecting the more complex spectral pro-
file of musical stimuli, which fits well with our observations
that musical sounds were differentially processed from
both speech and nonlinguistic vocalizations. Moreover,
this interpretation is consistent with the fact that N1 com-
ponent is a measure of early sensory encoding of the
physical properties of sound, such as frequency, com-
plexity and intensity (Naatanen and Picton, 1987) and that
P200 has been traditionally considered to also be modu-
lated by physical features of the stimulus (although there
is also evidence that P200 latency and amplitude are sen-
sitive to learning and attention processes among other
factors; Crowley and Colrain, 2004). Importantly, these
ERP differences could not be explained by category differ-
ences of any of the most common acoustical features
(see Table 1), given that any acoustic parameter that
significantly differed between music and voice also was
different between subcategories (i.e., piano vs. violin
and vocalizations vs. speech; see Table 1), even though
these showed similar EEG response patterns. It is there-
fore likely that a combination of some of these acoustical
features underlay these effects, though we cannot
exclude that other parameters or cognitive factors, such
as attention, could have played a role (Baumann et al.,
2008). It should be noted that we also found an early dif-
ferentiation of musical and vocal sounds in the other
areas of the scalp. As some areas seemed to be more
responsive to musical than vocal sounds and others more
active in response to vocal than to music, these results
could suggest that there is a specialization of two different
pathways for the processing of music and voice, which
would be in line with our recent fMRI studies using the
same musical stimuli (Angulo-Perkins et al., 2014; Aube
et al., in press).
Late effects of human voice and music on ERP
components like N400 or N500 have been already
described in priming paradigms or in language/music
violation detection tasks (Koelsch et al., 2004; Steinbeis
and Koelsch, 2008). These studies have found that syn-
tactic violations in music and speech elicit a late positivity
described as P600 or late positive component (LPC).
182 S. Rigoulot et al. / Neuroscience 290 (2015) 175–184
Here, we found that human voice triggered higher ampli-
tudes than musical sounds. Similar differences between
speech and music was also found in another study com-
paring the processing of musical (sine wave tone) and
verbal (spoken syllable) stimuli in working memory
(Bittrich et al., 2012). These authors found a larger
N400 amplitude for new compared to old items in the ver-
bal, but not the musical condition. These results, in agree-
ment with our data, suggest that, as could be expected,
differences in the processing of human voice and music
also are present at later latencies. As no semantic mean-
ing was conveyed by our speech stimuli (pseudo-sen-
tences), these late effects could reflect re-evaluation or
sustained attention to the vocalizations and speech pros-
ody, as observed in studies investigating how prosody
can help disambiguate the meaning of a message
(Brouwer et al., 2012; Rigoulot et al., 2014) or how emo-
tional prosody affects early and late ERP components
(Paulmann et al., 2013). Importantly, we found in the
present study that the effects on sound category were
modulated by the expertise of the participants.
Influence of musical expertise
We found that early and late ERP components were
modulated by the musical expertise of participants. In
the anterior frontal area, these effects appeared early, at
100 ms post stimulus onset, and were due to lower
responses to musical sounds than to vocal expressions
for non-musicians and the opposite pattern in the case
of musicians. As shown in Fig. 3, early on musicians
responded similarly to vocal and musical sounds,
whereas non-musicians were only responsive to vocal
sounds. In both groups, responses to music increased
over time (albeit more slowly in the case of non-
musicians), so that after about 1000 ms after stimulus
onset, responses to music were larger than to voice.
These patterns show that expertise modulates
responses to sounds of different categories and that
musicians show an early sensitivity to musical sounds
compared to non-musicians. These results are
consistent with some previous studies reporting that
music training increases the amplitude of early
components, like the N1 and P2 components to musical
notes (Shahin et al., 2003, 2005; Habibi et al., 2013).
For example, in the study reported above, Kaganovitch
and colleagues (2013) found that N1 amplitude in
response to music, vocal and spectrally rotated sounds
was increased in musicians compared to non-musicians.
Importantly, these group differences are unlikely to
reflect differential sensitivity to basic acoustic
parameters, given that musicians were similarly
sensitive to musical stimuli played by piano and violin,
two types of instruments which are relatively different in
terms of acoustic properties. Moreover, we did not find
any difference between speech and vocalizations in the
anterio-frontal area, suggesting that vocal sounds were
processed similarly (but see Pell et al., submitted).
Altogether, these results suggest that musical training is
associated with a general enhancement in the neural
encoding of acoustic properties of complex sounds, and
that this effect generalizes to all types of sounds. For
example, an effect of expertise of musicians on the
processing of speech stimuli has even been described
on the amplitudes of P50, an ERP component which is
peaking 50 ms after the onset of the sounds (Jantzen
et al., 2014). These authors presented speech stimuli dif-
fering in voice onset time (the duration of the delay
between release of closure and start of voicing) and found
using source analysis that musicians engage right hemi-
sphere areas (which are traditionally associated with the
processing of musical sounds) whereas the left hemi-
sphere homologs of these areas were more activated by
non-musicians. In agreement with this, several studies
using fMRI have shown that neural mechanisms involved
in the perception and processing of music overlap with
those devoted to the processing of speech (e.g.,
Rogalsky et al., 2011; Angulo-Perkins et al., 2014) and
Patel proposed that different mechanisms like the repeti-
tion implied by musical training and attentional processes
would explain why musicians benefit from their expertise
to process sounds from other categories (OPERA hypoth-
esis, Patel, 2011). The present study highlights the tem-
poral course of these influences and show that these
effects of expertise can arise very early.
We also observed group differences in some of the
late components of the ERP responses to music and
voice. Other studies also reported late ERP differences
between musicians and non-musicians in oddball
paradigms in which participants detected music or
speech pitch violation (Besson and Faita, 1995; Granot
and Donchin, 2002; Fitzroy and Sanders, 2012; Habibi
et al., 2013). Interestingly, previous studies also sug-
gested that the amplitude of the late components can
be reduced (more negative) when fewer resources are
needed (e.g., Kaan et al., 2000). In our case, the more
negative amplitudes observed for music specifically for
musicians could thus be interpreted as evidence that
musicians need fewer resources to process musical
sounds, possibly given their expertise in the domain. Alto-
gether, our results tend to confirm the idea that musical
training enhances brain sensitivity to musical sounds,
which is also in agreement with several fMRI studies
showing that training in music can lead to important func-
tional reorganization in the brain (Pantev et al., 1998).
Acknowledgements—We are grateful to Mihaela Felezeu for help
with the EEG recording. We also thank William Aube and
Bernard Bouchard for providing the musical stimuli and Isabelle
Peretz for helpful discussions. This research was funded by
grants from the Canadian Institutes of Health Research (CIHR)
and the National Science and Engineering Research Council of
Canada (NSERC) to JLA.
REFERENCES
Angulo-Perkins A, Aube W, Peretz I, Barrios FA, Armony J, Concha L
(2014) Music listening engages specific cortical regions within the
temporal lobes: differences between musicians and non-
musicians. Cortex 59:126–137.
Armony JL, Chochol C, Fecteau S, Belin P (2007) Laugh (or cry) and
you will be remembered. Psychol Sci 18(12):1027–1029.
Aube W, Peretz I, Armony JL (2013) The effects of emotion on
memory for music and vocalisations. Memory 21(8):981–990.
S. Rigoulot et al. / Neuroscience 290 (2015) 175–184 183
Aube W, Angulo-Perkins A, Peretz I, Concha L, Armony JL (2014)
Fear across the senses: brain responses to music, vocalizations
and facial expressions. Soc Cogn Affect Neurosci (in press).
Banse R, Scherer KR (1996) Acoustic profiles in vocal emotion
expression. J Pers Soc Psychol 70(3):614–636.
Baumann S, Meyer M, Jancke L (2008) Enhancement of auditory-
evoked potentials in musicians reflects an influence of expertise
but not selective attention. J Cogn Neurosci 20(12):2238–2249.
Bermudez P, Lerch JP, Evans AC, Zatorre RJ (2009)
Neuroanatomical correlates of musicianship as revealed by
cortical thickness and voxel-based morphometry. Cereb Cortex
19:1583–1596.
Besson M, Faita F (1995) An event-related potential (ERP) study of
musical expectancy: comparison of musicians with nonmusicians.
J Exp Psychol Hum Percept Perform 21(6):1278–1296.
Besson M, Schon D, Moreno S, Santos A, Magne C (2007) Influence
of musical expertise and musical training on pitch processing in
music and language. Restorative Neurol Neurosci 25:398–410.
Bittrich K, Schulze K, Koelsch S (2012) Electrophysiological
correlates of verbal and tonal working memory. Brain Res
1432:84–94.
Brattico E, Tervaniemi E, Naatanen R, Peretz I (2006) Musical scale
properties are automatically processed in the human auditory
cortex. Brain Res 1117:162–174.
Brouwer T, Fitz H, Hoeks J (2012) Getting real about semantic
illusions: rethinking the functional role of the P600 in language
comprehension. Brain Res 1446:127–143.
Chartrand J-P, Belin P (2006) Superior voice timbre processing in
musicians. Neurosci Lett 405:164–167.
Crowley KE, Colrain IM (2004) A review of the evidence for P2 being
an independent component process: age, sleep and modality.
Clin Neurophysiol 115(4):732–744.
Delorme A, Makeig S (2004) EEGLAB: an open source toolbox for
analysis of single-trial EEG dynamics. J Neurosci Methods
134:9–21.
Delplanque S, Silvert L, Hot P, Rigoulot S, Sequeira H (2006) Arousal
and valence effects on event-related P3a and P3b during
emotional categorization. Int J Psychophysiol 60(3):315–322.
Elmer S, Hanggi J, Meyer M, Jancke L (2013) Increased cortical
surface area of the left planum temporale in musicians facilitates
the categorization of phonetic and temporal speech sounds.
Cortex 49:2812–2821.
Elmer S, Klein C, Kuhnis J, Liem F, Meyer M, Jancke L (2014) Music
and language expertise influence the categorization of speech
and musical sounds: behavioral and electrophysiological
measurements. J Cogn Neurosci 26(10):2356–2369.
Fecteau S, Belin P, Joanette Y, Armony JL (2007) Amygdala
responses to nonlinguistic emotional vocalizations. NeuroImage
36(2):480–487.
Fitzroy AB, Sanders LD (2012) Musical expertise modulates early
processing of syntactic violations in language. Front Psychol
3(January):603.
Francois C, Jaillet F, Takerkart S, Schon D (2014) Faster sound
stream segmentation in musicians than in nonmusicians. PLoS
One 7:e101340.
Fujioka T, Trainor LJ, Ross B, Kakigi R, Pantev C (2004) Musical
training enhances automatic encoding of melodic contour and
interval structure. J Cogn Neurosci 16(6):1010–1021.
Granot R, Donchin E (2002) Do Re Mi Fa Sol La Ti-constraints,
congruity and musical training: an event-related brain potentials
study of musical expectancies. Music Perception 19(4):487–528.
Guclu B, Sevinc E, Canbeyli R (2011) Duration discrimination by
musicians and nonmusicians. Psychol Rep 108:675–687.
Habibi A, Wirantana V, Starr A (2013) Cortical activity during
perception of musical pitch: comparing musicians and
nonmusicians. Music Perception 30(5):463–479.
Janzten MG, Howe BM, Jantzen KJ (2014) Neurophysiological
evidence that musical training influences the recruitement of
right hemisphere homologues for speech perception. Front
Psychol 5:e171.
Jongsma ML, Desain P, Honing H (2004) Rhythmic context
influences the auditory evoked potentials of musicians and non-
musicians. Biol Psychol 66(2):129–152.
Kaan E, Harris A, Gibson E, Holcomb P (2000) The P600 as an index
of syntactic integration difficulty. Lang Cogn Process 15:159–201.
Kaganovich N, Kim J, Herring C, Schumaker J, Macpherson M,
Weber-Fox C (2013) Musicians show general enhancement of
complex sound encoding and better inhibition of irrelevant
auditory change in music: an ERP study. Eur J Neurosci
37(8):1295–1307.
Koelsch S, Kasper E, Sammler D, Schulze K, Gunter T, Friederici AD
(2004) Music, language and meaning: brain signatures of
semantic processing. Nat Neurosci 7(3):302–307.
Koelsch S, Schroger E, Tervaniemi M (1999) Superior pre-attentive
auditory processing in musicians. NeuroReport 10(6):1309–1313.
Kuhnis J, Elmer S, Meyer M, Jancke L (2013) The encoding of vowels
and temporal speech cues in the auditory cortex of professional
musicians: an EEG study. Neuropsychologia 51(8):1608–1618.
Lai G, Pantazatos SP, Schneider H, Hirsch J (2012) Neural systems
for speech and song in autism. Brain 135:961–975.
Lartillot O, Toiviainen P, Eerola T (2008) A matlab toolbox for music
information retrieval. In: Preisah C, Burkhardt H, Schmidt-Thieme
L, Decker R, editors. Data analysis machine learning and
applications, studies in classification, data analysis, and
knowledge organization. Springer-Verlag. p. 261–268.
Levy DA, Granot R, Bentin S (2001) Processing specificity for human
voice stimuli: electrophysiological evidence. NeuroReport
12(12):2653–2657.
Levy DA, Granot R, Bentin S (2003) Neural sensitivity to human
voices: ERP evidence of task and attentional influences.
Psychophysiology 40(2):291–305.
Lima CF, Castro SL, Scott SK (2013) When voices get emotional: a
corpus of nonverbal vocalizations for research on emotion
processing. Behav Res Methods 45(4):1234–1245.
Lopez-Calderon J, Luck SJ (2014) ERPLAB: an open-source toolbox
for the analysis of event-related potentials. Front Hum Neurosci
8:213.
Magne C, Schon D, Besson M (2006) Musician children detect pitch
violations in both music and language better than nonmusician
children: behavioral and electrophysiological approaches. J Cogn
Neurosci 18(2):199–211.
Marie C, Magne C, Besson M (2011) Musicians and the metric
structure of words. J Cogn Neurosci 23(2):294–305.
Marozeau J, de Cheveigne A, McAdams S, Winsberg S (2003) The
dependency of timbre on fundamental frequency. J Acoust Soc
Am 114(5):2946.
Meyer M, Elmer S, Baumann S, Jancke L (2007) Short-term plasticity
in the auditory system: differential neural responses to perception
and imagery of speech and music. Restorative Neurol Neurosci
25(3–4):411–431.
Musacchia G, Sams M, Skoe E, Kraus N (2007) Musicians have
enhanced subcortical auditory and audiovisual processing of
speech and music. PNAS 104(40):15894–15898.
Naatanen R, Gaillard AWK, Mantysalo S (1978) Early selective-
attention effect on evoked potential reinterpreted. Acta Psychol
42:313–329.
Naatanen R, Picton T (1987) The N1 wave of the human electric and
magnetic response to sound: a review and an analysis of the
component structure. Psychophysiology 24(4):375–425.
Pantev C, Oostenveld R, Engelien A, Ross B, Roberts LE, Hoke M
(1998) Increased auditory cortical representation in musicians.
Nature 392(6678):811–814.
Patel AD (2011) Why would musical training benefit the neural
encoding of speech? The OPERA hypothesis. Front Pscyhol
2:e142.
Paulmann S, Bleichner M, Kotz SA (2013) Valence, arousal, and task
effects in emotional prosody processing. Front Psychol 4:e345.
Pell MD, Kotz SA (2011) On the time course of vocal emotion
recognition. PLoS One 6(11):e27256.
184 S. Rigoulot et al. / Neuroscience 290 (2015) 175–184
Pell MD, Paulmann S, Dara C, Alasseri A, Kotz SA (2009) Factors in
the recognition of vocally expressed emotions: a comparison of
four languages. J Phon 37:417–435.
Pourtois G, Delplanque S, Michel C, Vuilleumier P, Vuilleumier ÆP
(2008) Beyond conventional event-related brain potential (ERP):
exploring the time-course of visual emotion processing using
topographic and principal component analyses. Brain Topogr
20(4):265–277.
Rammsayer T, Altenmuller E (2006) Temporal information
processing in musicians and nonmusicians. Music Perception
24:37–48.
Repp BH (2010) Sensorimotor synchronization and perception of
timing: effects of music training and task experience. Hum Mov
Sci 29:200–213.
Rigoulot S, D’Hondt F, Defoort-Dhellemmes S, Despretz P, Honore J,
Sequeira H (2011) Fearful faces impact in peripheral vision:
behavioral and neural evidence. Neuropsychologia 49(7):
2013–2021.
Rigoulot S, D’Hondt F, Honore J, Sequeira H (2012) Implicit
emotional processing in peripheral vision: behavioral and neural
evidence. Neuropsychologia 50(12):2887–2896.
Rigoulot S, Delplanque S, Despretz P, Defoort-Dhellemmes S,
Honore J, Sequeira H (2008) Peripherally presented emotional
scenes: a spatiotemporal analysis of early ERP responses. Brain
Topogr 20(4):216–223.
Rigoulot S, Pell MD (2012) Seeing emotion with your ears: emotional
prosody implicitly guides visual attention to faces. PLoS One
7(1):e30740.
Rigoulot S, Fish K, Pell MD (2014) Neural correlates of inferring
speaker sincerity from white lies: an event-related potential
source localization study. Brain Res 1565:48–62.
Rogalsky C, Rong F, Saberi K, Hickok G (2011) Functional anatomy
of language and music perception: temporal and structural factors
investigated using fMRI. J Neurosci 31(10):3843–3852.
Schon D, Magne C, Besson M (2004) The music of speech: music
training facilitates pitch processing in both music and language.
Psychophysiology 41:341–349.
Seppanen M, Pesonen AK, Tervaniemi M (2012) Music training
enhances the rapid plasticity of P3a/P3b event-related brain
potentials for unattended and attended target sounds. Atten
Percept Psychophys 74(3):600–612.
Shahin AJ (2011) Neurophysiological influence of musical training on
speech perception. Front Psychol 2:126.
Shahin A, Bosnyak DJ, Trainor LJ, Roberts LE (2003) Enhancement
of neuroplastic P2 and N1c auditory evoked potentials in
musicians. J Neurosci 23(13):5545–5552.
Shahin AJ, Roberts LE, Miller LM, McDonald KL, Alain C (2007)
Sensitivity of EEG and MEG to the N1 and P2 auditory evoked
responses modulated by spectral complexity of sounds. Brain
Topogr 20(2):55–61.
Shahin A, Roberts LE, Pantev C, Trainor LJ, Ross B (2005)
Modulation of P2 auditory-evoked responses by the spectral
complexity of musical sounds. NeuroReport 16(16):1781–1785.
Spencer KM, Dien J, Donchin E (1999) A componential analysis of
the ERP elicited by novel events using a dense electrode array.
Psychophysiology 36(3):409–414.
Spencer KM, Dien J, Donchin E (2001) Spatiotemporal analysis of the
late ERP responses to deviant stimuli. Psychophysiology
38(2):343–358.
Steinbeis N, Koelsch S (2008) Shared neural resources between
music and language indicate semantic processing of musical
tension-resolution patterns. Cerebral Cortex (New York, NY:
1991) 18(5):1169–1178.
Tervaniemi M, Rytkonen M, Schroger E, Ilmoniemi RJ, Naatanen R
(2001) Superior formation of cortical memory traces for melodic
patterns in musicians. Learn Mem 8(5):295–300.
Tierney A, Dick F, Deutsch D, Sereno M (2013) Speech versus song:
multiple pitch-sensitive areas revealed by a naturally occurring
musical illusion. Cereb Cortex 23:249–254.
Trainor L, Desjardins R, Rockel C (1999) A comparison of contour
and interval processing in musicians and nonmusicians using
event-related potentials. Aust J Psychol 51:147–153.
Ungan P, Berki T, Erbil N, Yagcioglu S, Yuksel M, Utkucal R (2013)
Event-related potentials to changes of rhythmic unit: differences
between musicians and nonmusicians. Neurol Sci 34(1):25–39.
Vieillard S, Peretz I, Khalfa S, Gagnon L, Bouchard B (2008) Happy,
sad, scary and peaceful musical excerpts for research on
emotions. Cogn Emot 22(4):37–41.
Virtala P, Huotilainen M, Partanen E, Tervaniemi M (2014)
Musicianship facilitates the processing of Western music
chords–an ERP and behavioral study. Neuropsychologia
61:247–258.
Zuk J, Ozernov-Palchik O, Kim H, Lakshminarayanan K, Gabrielli JD,
Talla P, Gaab N (2013) Enhanced syllable discrimination
thresholds in musicians. PLoS One 8(12):e80546.
(Accepted 12 January 2015)(Available online 28 January 2015)
Top Related