TrendsSource–filter theory is the unifyingmethodological framework in the studyof nonverbal vocal communication inhumans and other vertebrates.
Numerous studies have implicated fun-damental frequency (F0) and vocal tractresonances (formants) in mammaliansocial communication. In humans, thesesexually dimorphic source and filtervoice features reliably indicate sex,age, body size, and dominance.
By focusing on static rather than
TICS 1533 No. of Pages 15
ReviewVoice Modulation: A Windowinto the Origins of HumanVocal Control?Katarzyna Pisanski,1,2 Valentina Cartei,1 Carolyn McGettigan,3
Jordan Raine,1 and David Reby1,*
An unresolved issue in comparative approaches to speech evolution is theapparent absence of an intermediate vocal communication system betweenhuman speech and the less flexible vocal repertoires of other primates. Weargue that humans’ ability to modulate nonverbal vocal features evolutionarilylinked to expression of body size and sex (fundamental and formant frequen-cies) provides a largely overlooked window into the nature of this intermediatesystem. Recent behavioral and neural evidence indicates that humans’ vocalcontrol abilities, commonly assumed to subserve speech, extend to thesenonverbal dimensions. This capacity appears in continuity with context-depen-dent frequency modulations recently identified in other mammals, includingprimates, and may represent a living relic of early vocal control abilities that ledto articulated human speech.
dynamic vocal processes, this literaturehas largely overlooked the [3_TD$DIFF] humancapacity to volitionally modulate F0and formants to express or exaggerateecologically relevant traits, often inresponse to specific social contexts.
Emerging researchsuggests thatF0andformant scaling iswidespread inhumansand present in some other mammals,potentially representing an evolutionarypath from simple voice-frequency scal-ing to articulated human speech.
1Mammal Vocal Communication andCognition Research Group, School ofPsychology, University of Sussex,Brighton, UK2Institute of Psychology, University ofWrocław, Wrocław, Poland3Royal Holloway VocalCommunication Laboratory,Department of Psychology, RoyalHolloway, University of London,Egham, UK
*Correspondence: [email protected](D. Reby).
The Dynamic Human VoiceRecent research examining the communicative function of the human voice has effectivelyestablished the roles of two key acoustic components – the fundamental frequency (F0) (seeGlossary) and formants (Box 1) – in the expression of many biological and psychologicaldimensions, including sex and age, body size and shape, hormonal condition, dominance,masculinity or femininity, and attractiveness (for reviews see [1–5]). For example, men andwomen whose voices are characterized by low frequencies are typically judged by listeners asmore dominant, physically larger and stronger, and more masculine than are speakers withhigher-frequency voices [5], and these stereotypes appear partly driven by the physical,anatomical, and physiological mechanisms that influence and constrain F0 and formant pro-duction. In addition to typically having larger bodies than women, men also tend to have relativelylarger larynges, longer vocal tracts, and lower-frequency voices (Box 1). As a consequence,listeners often associate lower frequencies with stereotypically male traits [6,7].
By revealing the many ways in which the human voice is linked – physically or perceptually – toecologically relevant traits, this body of research has greatly advanced our understanding of theevolutionary functions of nonverbal vocal communication in humans. In particular it providescompelling evidence that the human voice, and the strong sexual dimorphism that characterizesits main frequency components, has been largely shaped by sexual selection [8] and continuesto play a substantial role in human mate choice and mate competition [1,3]. At the same time, byfocusing on vocal communication of mate quality, recent studies have overwhelmingly describedhuman voice production and perception in terms of static cues that are almost exclusivelystudied in the absence of social context. This is clearly an oversimplification, as sexual selectionmay also favor the ability to flexibly manipulate the voice to exploit listeners’ tendency to
Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy http://dx.doi.org/10.1016/j.tics.2016.01.002 1© 2016 Elsevier Ltd. All rights reserved.
TICS 1533 No. of Pages 15
GlossaryDual-pathway model: currentneural models implicate twopathways in human vocal control:sensorimotor cortical systems thatsupport the production of learnedvocalizations such as speech andsong; and the limbic system thatsupports innate vocalizations such aslaughter. Although these twopathways have traditionally beenthought of as separate, recentresearch on vocal control suggestsotherwise.Enculturation: the process by whichan individual is exposed to and learnsthe traditional content of a culture orassimilates its practices and values;for instance, nonhuman primates thathave been raised by humans in ahuman-like cultural and socialenvironment.Flexible: characterizes a behaviorthat can change as a function ofvarious (e.g., social, temporal,environmental) factors.Formant: resonant frequencies ofthe supralaryngeal vocal tract thataffect our perception of voice timbre.The shape of the vocal tractinfluences the relative positions offormants and formant spacing,whereas its length (VTL) affectsformant scaling. Longer vocal tractsproduce lower, more closely spacedformants.Fundamental frequency (F0): therate of vocal fold vibration. All elsebeing equal, longer, more massive,and less tense vocal folds tend tovibrate more slowly and produce arelatively lower F0. Voice pitch is theperceptual correlate of F0 (along withits harmonics, integer multiples ofF0).Language: a symbolic, rule-governed system of communicationshared within a group that need notinvolve verbal communication orspeech (e.g., written or signlanguage).Speech: a vocal communicationsystem involving precise coordinationof vocal anatomical structures (larynx,supralaryngeal vocal tract, andarticulators such as tongue and lips)required to articulate specific sounds.Vocal control: the capacity tocontrol the larynx (affecting theproduction of F0) or thesupralaryngeal vocal tract (affectingthe production of formants) in aflexible or voluntary manner, for
Box 1. Source–Filter Theory of Speech Production
In humans and most other mammals, vocal signals are produced by the larynx (the source) and subsequently filtered bythe supralaryngeal vocal tract (the filter) [88,89]. When humans phonate, air that is expelled from the lungs and forcedthrough the closed glottis causes oscillation of the vocal folds within the larynx, which determines F0, also perceived asvoice pitch. The sound waves produced by this oscillation travel up from the larynx and through the pharynx and oral andnasal cavities, which comprise the supralaryngeal vocal tract, to the mouth, through which vocal sounds are radiated(Figure IC). In the process, the vocal tract filters the sound, attenuating certain frequencies and not others, therebyproducing resonant frequencies or formants that affect our perception of voice timbre [90].
Following the principles of biomechanics and sound physics, larger vocal folds vibrate at a slower rate than [22_TD$DIFF] do smallervocal folds, resulting in a relatively lower F0 and perceived pitch; however, regardless of mass [23_TD$DIFF], F0 increases when thevocal folds are stretched and become tenser [16,17]. Thus, the effective mass, length, and tension of the vocal foldsaffect F0. Independently of this, longer vocal tracts produce relatively lower, more closely spaced formants [formantspacing (DF)] than[22_TD$DIFF] do shorter vocal tracts [91,92] and hence a more perceptually resonant voice (Figure IA,B). Both F0and formants are inversely related to body size in most mammals [5,10,11,92]. [24_TD$DIFF]Among humans, these voice–sizerelationships are particularly salient between sexes, where F0 and formants are substantially lower among men thanwomen [29]. However, controlling for sex and age, formants explain several times more variation in body size and shapethan does F0 [92,93].
Manipulations of the tongue, lips, jaw, and soft palate can also alter the shape (rather than length) of the vocal tract,affecting the relative distribution (rather than absolute scaling) of formant frequencies and thereby giving rise to differentspeech sounds. The relative positions of formants (especially F1 and F2 [90]; see, for example, Figure 1D[25_TD$DIFF], Key Figure) playa critical role in speech production and perception and in non-tonal languages play a much greater role than F0. Thelowered position of the human larynx also allows humans to produce a wider range of sounds than any other primate[27,28]. However, whether the human larynx descended due to selection for a more complex vocal repertoire [94] orselection for apparent larger body size [27] remains unclear.
F2
F3
F4
F1
ΔF = 875 HzVTL = 20 cm
F2
F3
F4
F1
ΔF = 1250 HzVTL = 14 cm
F5
F0 = 100 Hz F0 = 200 Hz
Form
ant s
calin
gFu
ndam
enta
l fre
quen
cysc
alin
g
(A)
(B)
(C)
Larynx(housing thevocal folds)
Tongue
Nasal cavity
Oral cavityPharynx
Trachea
Lips
So� palate
Figure I. Voice Frequency Scaling. Each spectrogram illustrates the vowel /eI/ spoken by the sameman, whose voicefrequencies were manipulated using Praat acoustic software [95]. (A) Formant scaling, wherein lower, more closelyspaced formants (left) correspond to a relatively longer vocal tract[19_TD$DIFF] and a perceptually more resonant voice. Apparentvocal tract length (VTL) was estimated from DF; for algorithms see [91,92]. (B) Fundamental frequency (F0) scaling [20_TD$DIFF],wherein a lower F0 and more closely spaced harmonics (multiple integers of F0) (left) correspond to a perceptually lower-pitched voice. Note when comparing panels (A) and (B) that formants and F0 can vary independently. (C) Labeled sagittalMRI of the human vocal apparatus during the production of the sustained vowel /u:/. Note the back position of the tongueand protrusion of the lips. Spectrogram parameters: y-axis, 0–5 kHz; x-axis, 0.25 s; window length, 0.04.
2 Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy
TICS 1533 No. of Pages 15
example as a function of socialcontext.Vocal learning: the capacity toproduce novel vocalizations throughimitation or experience (which doesnot necessarily involve vocal control).Voice modulation: manipulation ofany nonverbal property of the voiceincluding but not limited to F0 andformant frequencies.Volitional: characterizes a voluntaryand intentional behavior. Volitionalvocal control involves the capacity tocontrol the larynx and supralaryngealvocal tract independent of immediatecontext and physiological state.
associate sex-typical voice patterns with sex-typical traits (i.e., the frequency or gender code[6,7,9]). Although several animal species are known to scale their vocal frequencies to exag-gerate their body size or to express threat [10–12], the unusually strong sexual dimorphism informants and F0 that differentiates humans from other animals, including[35_TD$DIFF] nonhuman primates[8], may offer a particularly effective platform by which humans can exploit sex-typical vocalpatterns above and beyond size exaggeration. Hence, we suggest that men and womenroutinely take advantage of the salient sex differences in F0 and formants, using the broadacoustic space between (and within) these male and female categories to express various sex-related attributes such as dominance, masculinity or femininity, and threat [9,13].
It is well known that, with vocal training, actors and voice impersonators can significantly raise orlower the key frequency components of their voices (F0 and formants), often adopting vocalranges characteristic of the opposite sex [2]. In singing, sopranos can raise their F0 and matchtheir first formant to frequencies above 1200 Hz [14], six times the average pitch of a woman'svoice. Some politicians (such as Margaret Thatcher) have had extensive vocal coaching to lowerthe frequency of their voice to exude amore authoritative, powerful persona [15]. Although theseexamples may be extreme or anecdotal, a growing body of empirical evidence suggests thatpeople routinely and volitionally modulate their voice in everyday social contexts such as firstdates and job interviews [36_TD$DIFF], when making a compelling argument [37_TD$DIFF], or [38_TD$DIFF] when attempting to deceivesomeone: the human voice is a dynamic and flexible channel for self-expression and, beyondits role in encoding linguistic information, functions also as a social tool.
In this reviewwe suggest that the ability of humans to flexibly control the size-related source–filterdimensions of our vocal signals (i.e., vocal control) is likely to predate our ability to articulate theverbal dimensions of speech and therefore may provide an evolutionary pathway from nonhu-man primate vocal communication to human speech. We show that this largely overlookedmodulation of nonverbal components is not only highly prevalent and functional in human vocalcommunication but also present in the vocal systems of other, nonhuman mammal species. Wepropose that voice modulation can provide key insights regarding the evolution of articulatedhuman speech, focusing on how comparative explorations of the behavioral and neural bases ofvocal control have the potential to lead to important advances in our understanding of theevolution of human vocal communication and the origins of speech.
Behavioral Evidence of Human Vocal Modulation and its Social FunctionsVocal control involvesmanipulation of the vocal folds and larynx or the supralaryngeal vocal tract,including the articulators such as tongue and lips. Mechanistically, F0 is controlled by manipu-lating the tension and effective length or surface area of the vocal folds; for example, bycontracting or relaxing the thyroarytenoid and cricothyroid muscles or increasing subglottalpressure. By contrast, formants are altered by manipulating the length of the vocal tract – forexample, by lowering the larynx or protruding the lips [4_TD$DIFF] – within limits imposed by the skull andbody size [ [39_TD$DIFF]2,10,16,17][40_TD$DIFF] (Box 1).
Recent studies investigating voice modulation in humans have focused on its functional role,typically in a mating context, and provide evidence that people dynamically alter sexuallydimorphic and size-related vocal features (F0 and formants) to advertise or exaggerate traitsthat are relevant to mate selection and competition. For example, in studies involving simulateddating scenarios, men who perceived themselves as physically dominant relative to their malecompetitor lowered their F0 in conversations with him (whereas men who felt subordinate raisedtheir F0 [13]) and produced more monotone speech with lower F0 variability when describingthemselves to a potential female partner [18]. Men exhibiting masculine traits such as relativelylarge body size, high levels of androgens, or low formant frequencies also [41_TD$DIFF]appear to articulatevowels less clearly than [42_TD$DIFF]do more feminine men by differentially modulating the lower formants
Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy 3
TICS 1533 No. of Pages 15
[19]. The authors [43_TD$DIFF] of this latter study suggest that some men may opt for communicatively lessintelligible phonetic variants, typically associated with lower social economic status but also withtoughness, as a means of communicating their masculinity [19]. Other studies show that menand women volitionally and spontaneously decrease F0 and formants when asked to soundmasculine and increase these frequencies to sound feminine [9]. A similar ability has beenobserved in children aged 6–9 years [20], suggesting that prepubescent boys and girls learn toexpress their gender through the voice by imitating adult models. Our own ongoing researchfurther indicates that men and women from distinct cultures spontaneously lower their F0 andformants to sound physically large and raise these vocal frequencies to sound small. Thesemodulations are in some cases extreme, as men can increase their apparent vocal tract length(VTL) by 25% to sound larger or raise their voice pitch to frequencies characteristic of a smallchild when attempting to imitate a small body size.
Sex-typical voice patterns (i.e., relatively lower frequencies in men's voices and higher frequen-cies in women's voices) are generally considered attractive [3,5]. Recent research efforts todetermine whether men and women can effectively manipulate vocal attractiveness haveproduced mixed results [21–25][44_TD$DIFF], but unequivocally indicate that both sexes modulate theirF0 when speaking to attractive members of the opposite sex. Men and women also appear tovolitionally raise their F0 to express confidence and intelligence [21]. At a perceptual level,listeners should be selected and/or learn to detect vocal modulation, given that misattribution ofspeaker traits such as dominance and attractiveness can be costly. Evidence for this hypothesisis [45_TD$DIFF] also mixed: although listeners correctly discriminate deliberately modulated from habitualspeech [21], modulated voices can affect listeners’ assessments of target traits includingintelligence, dominance, and attractiveness [21,24,25], suggesting that even when detectable,voice modulation can remain effectual (cf. [24]).
The behavioral studies reviewed above show that humans can spontaneously manipulate vocalfrequencies to deemphasize or accentuate various biosocially relevant traits with meaningfulvariation across social contexts, within the limits imposed by various anatomical or mechanisticconstraints on vocal production (see, for example, Box 1). Volitional modulation of F0 andformants is often achieved by exploiting robust perceptual correspondences (e.g., gender [9] orfrequency code [6,7,12]). Such vocal modulation has obvious potential benefits. For example,between- and within-individual variation in F0 and formants can influence others’ social attri-butions and mate-choice decisions [21,25] and predicts socioeconomic outcomes includingsuccess in a job interview [22] and political votes [26]. We suggest that similar advantagesassociated with vocal modulation during social interactions are likely to have led to the selectionof this capacity in our ancestors [7,12,27].
Is Voice Modulation the Origin of Vocal Control?Vocal modulation of source–filter components is important for conveying (or exaggerating)various indexical characteristics of the speaker, as outlined above, but is also necessary forspeech production (Box 1). Thus, although behavioral studies of humans offer compellingevidence for vocal modulation and its influence on others, they do not alone indicate whetherF0 and formant modulation emerged to subserve speech [27,28], to exploit apparent indexicalfeatures of the speaker [7,12,27], or potentially, for both purposes.
To answer this question, wemust consider at least two additional, comparative lines of research.First, we need to examine vocal modulation in other mammals that lack articulated speech,including our closest primate relatives [46_TD$DIFF]as [47_TD$DIFF]well as other mammals known to scale their vocalfrequencies, and to compare this with human nonverbal vocal communication as well as speech.Here it is important to distinguish between involuntary vocal variation (i.e., automatically elicitedby environmental stimuli or different levels of arousal) and more controlled vocal modulation
4 Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy
TICS 1533 No. of Pages 15
(more goal-directed and less directly dependent on arousal or external stimuli, but not neces-sarily implying intentionality). Second, we must understand the underlying neural mechanisms ofnonverbal vocal control and, crucially, determine whether the neural correlates involved innonverbal vocal modulation in humans (including emotional vocalizations such as laughter) [5_TD$DIFF]and in other primates, differ from those involved in volitional human speech production. Here wereview initial work in these two areas that, together with the behavioral studies outlined above,suggests that control over F0 and formants in our ancestors may have emerged to exaggeratesocially relevant traits such as body size and threat potential even before the advent of articulatedspeech.
Vocal Flexibility in Other MammalsBetween- and within-individual differences in F0 and formants play an important role in the socialcommunication of many animals [10,11,28,29]. Several mammalian species scale or modulatetheir vocal frequencies in social interactions (Figure 1, Key Figure)[48_TD$DIFF] (e.g., [30–32][49_TD$DIFF]). Although, in theexamples given in Figure 1, formant modulation in nonhumanmammals appears largely linked toan external stimulus or arousal state, we suggest that such formant scaling neverthelessrepresents a potential path for the evolution of vowel articulation via an increasing complexityin formant modulation. In particular we argue that vocal exaggeration of apparent body size,which has been documented in animals ranging from red deer [50_TD$DIFF] (Cervus elaphus) to humans(Figure 1), may have paved the way for more volitional forms of source and filter modulation andcould have been the main driver in the evolution of vocal control [7,27].
Humans are highly unique in the advanced nature of our voice modulation abilities for two keyreasons. First, we can voluntarily and independently control the source and filter properties of ourvocalizations, and second we can perform these modulations in the complete absence of anassociated inducing experience or state [33–35]. It is important to note that vocal control asdefined above is a phenomenon distinct from vocal learning, which does not necessarilyinvolve flexible manipulation of the vocal anatomy but rather entails the capacity to acquire orconverge call types or entire vocal repertoires through imitation or learning [36,37]. The use ofspecific [51_TD$DIFF] and pre-existing vocalizations in different or novel contexts, and the ability to responddifferentially to the vocalizations of others through experience, is considered distinct from vocalproduction learning, namely because these behaviors are present in a broader range of animalsand are likely to require comparatively less complex neural control [38]. These latter forms oflearning are typically referred to as usage and contextual learning, respectively [39]. Likehumans, many marine mammals and songbirds are capable of complex vocal learning. Indeed,vocal learning evolved independently in multiple lineages [34,38]. Until recently it was consideredto be largely absent in nonhuman primates [40,41]; however, new evidence suggests thatchimpanzees (Pan troglodytes) and orangutans (Pongo) may be capable of some forms of vocallearning [42–44] (cf. [41]).
Although human vocal control capabilities are unparalleled, below we present recent evidencethat nonhuman primates may share our capacity to modulate F0 and formants to perhaps agreater extent than previously thought. Other mammals, such as red deer [6_TD$DIFF], are capable of simple,uniform scaling of formants (Figure 1), but the comparative literature examining flexible vocalcontrol has focused on nonhuman primates due to their relative intelligence and phylogeneticproximity to humans, and also the long-held notion that vocal control subserves spokenlanguage acquisition [27,28].
Vocal Control in Nonhuman Primates?Much of the evidence for control of F0 and formants and imitation in nonhuman primatevocalizations has surfaced in the past 5 years, most of it from captive or enculturated greatapes. For example, captive orangutans [7_TD$DIFF] spontaneously imitate voiceless whistles produced by
Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy 5
TICS 1533 No. of Pages 15
Key Figure
Spectrograms Illustrating Progressive Functional Complexity in FormantModulation Across Several Mammalian Species
F1
F2
F3
F4
0
1.5
0
2
F1
F2
‘leopard’ ‘eagle’ /u:/
‘small size’ /u:/ ‘large size’ /u:/
/i:/
5
5
F1
F2
F3
F4
F1
F2
F3
F4
F1
F2
F1
F2
F3
F4
F1
F2
F3
F4
F1
F2
F3
F4
Body size exaggera�on by vocal tract lengthening
Red deer Humanstag roar
Predator-specific alarm calls Speech sounds (vowels)
Diana monkey Human
(B)
(C) (D)
F1
F2
F3
F4
0
1.5
Unmodulated vocaliza�ons
Bisonbull bellow
(A)
Baboongrunt series
5
F1
F2
F3
F4
10
F1
F2
F3
F4
Rhesus macaquegrunt series
(Formants’ posi�ons and formant spacing remain constant within a call or call series)
(Formants decrease uniformly as the vocalizer increases vocal tract length)
(Formants vary non-uniformly as a func�on ofpredator context)
(Formants vary non-uniformly as a func�on ofvowel sound)
Figure 1. The figure illustrates how formant scaling [15_TD$DIFF] (formants F1–F4 labeled) may have first emerged in mammals for sizeexaggeration and threat displays, becoming increasingly complex over evolutionary time and ultimately allowing [16_TD$DIFF] for thesophisticated formant modulation that characterizes articulated human speech. (A) Calls lacking formant modulationproduced by a male bison (Bison bison), a baboon (Papio hamadryas ursinus), and a female rhesus macaque (Macacamulatta). (B) Male red deer (Cervus elaphus, left), whose highly mobile larynges are positioned unusually low in the vocal trac[96], scale the formant spacing of their roars downward to project a large body size when encountering a threatening malecompetitor [30]; humans (Homo sapiens, right) scale formants similarly when instructed to sound ‘large’. (C) Diana monkeys(Cercopithecus diana) produce alarm calls in which the lowest two formants vary differentially in response to aerial versusterrestrial predators. Reproduced, with permission, from [31]; see also [32] in meerkats (Suricata suricatta). (D) Humansmodulate relative formant positions, particularly F1 and F2, to produce different speech sounds; here we illustrate thedifferences in formant spacing between the vowels /i:/ and /u:/. y-[17_TD$DIFF]axes, frequency (0–x kHz).
6 Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy
t
TICS 1533 No. of Pages 15
humans, indicating voluntary control over the lips and respiratory musculature [45], whereaschimpanzees produce novel and apparently flexible attention-seeking grunts toward humans[46]. Although this latter call type, with extended duration, is not produced in the wild, itdemonstrates a latent capacity to control vocal fold adduction and airflow that is required toproduce sustained laryngeal vibration [52_TD$DIFF], vocal behavior rarely observed in nonhuman primates.Many other primates also possess a rich repertoire of voiceless calls used in the wild, usuallyinvolving articulator manipulation without vocal-fold vibration homologous to human voicelessconsonants [40]. Case studies of enculturated great apes even document some latent abilitiesto voluntarily control the vocal musculature, thus manipulating F0 and individual formants[47–50], and in some cases producing novel rudimentary speech sounds beyond species-typical repertoires [53_TD$DIFF], with phonological features paralleling those observed in human speech[47,49,50].
There is mounting evidence for behavioral and contextual flexibility in great ape vocalizations. Forinstance, wild chimpanzees are capable of voluntary inhibition of affectively triggered food grunts[51], alarm calls [52], greetings [53], and responses to unfamiliar conspecifics [54]. The prefrontalcortex is thought to play a key role in such behaviors [55]. Chimpanzees also preferentiallyproduce snake alarm calls in the presence of group members unaware of the threat [56] [54_TD$DIFF]. [55_TD$DIFF]Thesealarm calls appear flexible and potentially volitional [57], as do their food grunts [58] and joint-travel calls [59]. [56_TD$DIFF]Despite occupying similar habitats, different populations of wild orangutans arenow known to use different calls in the same context [8_TD$DIFF] [44] [57_TD$DIFF], as well as functionally arbitrary calls[60], suggesting a detachment between the production of the acoustic signal and physiologicalinfluence. Finally, among wild bonobos (Pan paniscus), ‘peeps’ and ‘contest hoots’ with thesame acoustic structure are used in various social contexts [61,62], providing the first evidenceof vocal signals used by a nonhuman primate to express a range of emotional states indepen-dent of context and biological function (i.e., functional flexibility).
These studies offer preliminary evidence that vocal adaptations important in human speechproduction are also present or latent in other mammals, and that among nonhuman primatesmanipulation of the larynx and vocal tract may be more flexible than once thought. In somecases, vocal control among nonhuman primates appears independent of affective state, animportant prerequisite for articulated speech [55]. Studies of enculturated apes provideevidence that the potential for frequency modulation beyond simple, uniform scaling of for-mants exists in nonhuman primates, wherein great apes are able in some cases to manipulateindividual formants to produce articulatory contrasts with speech-like rhythm and someapparent degree of voluntary, self-initiated control of the glottis and supralaryngeal vocal tract.However, critical work is now needed to determine whether nonhuman primates modulate F0and formants within call types to express different levels of aggression/subordination, asposited by several authors [7,12,27]. In this regard, mammals whose F0 and formant scalingis clearly tied to communicating internal states or exaggerating physical traits, such as deer(Figure 1), may provide better convergent models for the study of early vocal modulation inhumans.
A comprehensive understanding of vocal control also demands greater insight into the neuralcorrelates of this behavior. Although neocortical control of the vocal apparatus duringvolitional speech production has traditionally been seen as setting humans apart from otherprimates [63,64], the behavioral studies reviewed above suggest that nonhuman primatesmay possess some cognitive control over vocal behavior. Unfortunately, the neural networksresponsible for modulation of vocal motor systems in animals are poorly understood, andthose involved in source and filter modulation are particularly understudied, even in humans.However, studies comparing volitional and spontaneous vocal production in humans can offersome insight.
Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy 7
TICS 1533 No. of Pages 15
Box 2. The Neuroanatomy of Human Vocal Control
The prevailing understanding of the neural systems controlling human vocal behavior owes much to extensiveinvestigation of nonhuman primates such as the squirrel monkey (Saimiri), as well as to brain lesion and stimulationstudies in human patients. A dual-pathway model is posited where midline and lateral motor systems are associated withthe control of (innate) affective and learned vocalizations, respectively [55,63,97–99].
The production of vocal calls of various types is associated with a complex set of midbrain and brainstem structuresincluding the basal ganglia, the periaqueductal grey (PAG), and brainstem nuclei such as the nucleus ambiguus (whichhouses motoneurons innervating the muscles of the larynx). The PAG has been implicated in the gating of vocalizationpatterns, receiving inputs on the motivational and affective [26_TD$DIFF]states of the organism and activating a motor response. Theneural initiation of innate calls and vocalizations (including, in humans, emotional sounds such as laughter and crying)implicates the frontal lobe in human and nonhuman primates and other mammals (specifically, the ACC).
Vocal flexibility sufficient for human speech requires the control of several source- and filter-related elements: (i) finecontrol of the extrinsic and intrinsic laryngeal muscles; (ii) regulation of airflow from the lungs through the oral and nasalcavities; and (iii) the capacity for precise configuration of the supralaryngeal articulators (lips, tongue, jaw, soft palate) [97].Numerous studies, ranging from cortical stimulation to functional imaging, have demonstrated evidence for somatotopiccontrol of the articulators on the human precentral gyrus (e.g., [100–102]). The direct corticobulbar hypothesis arguesthat the flexibility of human vocal behavior, such as the rapid switching on and off of laryngeal signals, can be attributed tothe presence of direct connections from M1 to brainstem motoneurons, which are indirect (via the reticular formation) innonhuman primates. Interestingly, the first functional imaging study of cortical control of the human larynx implicated amore dorsal portion of the precentral gyrus than the ventral sites described for monkeys [103]. This alternative location,adjacent to both the articulatory and respiratory control centers (see also [104]), might facilitate coordinated processessubserving connected and coarticulated speech[27_TD$DIFF] (Figure 2).
Controlled speech production implicates higher-order structures such as Broca's area in the inferior frontal gyrus; this isargued to host a mental ‘syllabary’ important in the selection of learned motor programs [105]. Auditory and soma-tosensory representations have also been implicated in self-monitoring and error correction during voluntary vocalbehavior (e.g., [106,107]).
Neural Correlates of Volitional and Spontaneous Vocal Control in HumansThe advent of functional neuroimaging (PET and fMRI) has brought with it the possibility toinvestigate the systems regulating speech and voice control in the intact brain and thus hasyielded substantial advances in our understanding of human vocal production. This has led tothe development of [9_TD$DIFF] neurobiological models implicating a cortical network of inferior frontal,auditory, and sensorimotor regions in the selection and articulation of spoken syllables and theuse of sensory feedback to guide output processes (e.g., [65]). Although the literature onmammalian vocalizations suggests two pathways for vocal control (i.e., the dual-pathwaymodel; Box 2 and Figure 2), difficulties in imaging small and deep midbrain and brainstemcenters with fMRI means that many existing accounts of vocal behavior in humans focus oncortical, and predominately perisylvian, sensorimotor systems. Recently, some authors haveargued for greater interaction between the dual paths in human speech, positing meaningfulinvolvement of subcortical regions (the basal ganglia) in the affective andmotivational modulationof laryngeal speech signals [63,66,67]. There are numerous sources of evidence that challengethe independence of these two routes, or at least suggest a greater complexity to the system(see, for example, [68,69]), including evidence that patients with lesions in motor cortical regionsretain the ability to swear [70].
Yet in the same way that the behavioral literature has largely neglected the para- or ‘supra’-linguistic aspects of flexible and voluntary human vocal behavior, so has the field of cognitiveneuroscience. As a result, most studies have focused on what is being said (i.e., choice ofphonemes, syllables, and words) and a very small number of studies have specifically focusedon how speech is produced, or flexibly controlled (i.e., pronunciation, indexical cues, andaffective content). Below we review this modest but growing area of research.
The investigation of novel speech sound imitation and pronunciation has revealed variation inboth the activation and the[10_TD$DIFF] structure of typically left-hemisphere [58_TD$DIFF] frontal sites (e.g., inferior frontal
8 Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy
TICS 1533 No. of Pages 15
Periaqueductalgrey (PAG)
Nucleusambiguus
Anteriorcingulate
cortex(ACC)
Supplementarymotor area
(SMA)
Broca’s area andinsula
Premotorcortex
Primary motorcortex (M1) Somatosensory
cortex (S1)
Auditorycortex
Cerebellum
Caudate nucleus
Putamen andglobus pallidus
Larynx/Phona�on area (LPA)
Lip protrus�on
Ver�cal tongue movement- a�er [103]
Key:
Figure 2. Main Brain Regions Implicated in Human Vocal Control. According to the dual-pathway model,sensorimotor cortical systems (and the cerebellum, shaded orange) support the production of learned vocalizations suchas speech and song and involve direct innervation, whereas the limbic system in the anterior cingulate cortex ([18_TD$DIFF]ACC, blue) isresponsible for the initiation of innate vocalizations such as laughter. Several studies have identified an important role for thedorsal striatum (green) in vocal behavior, including foreign language learning (e.g., [74]). A first direct investigation of thecontrol of the laryngeal muscles in humans identified a more dorsal location than had been observed in monkeys [103]. Weargue that this represents an important evolutionary change in the control of vocalizations in humans, potentially supportingthe emergence of articulated speech.
cortex, insula) as well as regions of the inferior parietal cortex (e.g., Rolandic operculum,supramarginal gyrus) and the superior temporal cortex, all associated with individual differ-ences in task performance [71– [59_TD$DIFF]74]. Studies have also highlighted the role of subcorticalstructures in voluntary phonetic learning contexts, where the dorsal striatum (bilateralcaudate and putamen) is involved in initial attempts to imitate a novel sequence of sounds[74]. One experiment showed that the left anterior insula and inferior frontal gyrus wereengaged during voluntary impersonations of speaking styles, accents, and familiar individu-als [75]. A similar study showed increased responses in the left anterior insula (as well as thesupplementary motor area, the anterior cingulate cortex (ACC), and regions of the superiortemporal cortex) when participants voluntarily adopted a question inflection for a train ofnonsense syllables [76].
Similar to research on speech production, work on the neural correlates of song – [60_TD$DIFF]whereinsinging often involves significant voice frequency modulation (see, for example, [14]) – hasidentified four key neural substrates of auditory–motor control, for example in the production andcorrection of melodic pitch: the auditory cortex, the intraparietal sulcus, the anterior insula, andthe ACC [77]. Researchers have reported experience-dependent differences in the engagementof brain regions such as the anterior insula and sensorimotor cortices associated with the controlof breathing (see [78]) and increased use of sensory feedback during vocal performance [79,80].A recent study of vocal pitch processing showed greater engagement of the basal ganglia(bilateral putamen) in the imitation of melodies compared with non-imitative productions andmelody perception [67]. The authors [61_TD$DIFF] of that study argue that establishing how these subcortical
Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy 9
TICS 1533 No. of Pages 15
Box 3. The Special Case of Laughter: Novel Insight from an Animal-[28_TD$DIFF]like Human Vocalization
Human laughter is ubiquitous in interpersonal interactions and serves various emotional and social signaling functions[108]. Importantly, spontaneous (i.e., authentic) laughter can be elicited in the laboratory with relative ease and can becompared with the volitional production of laughter (on demand). These two forms of laughter are developmentally,acoustically, and perceptually distinct [109–111] (Figure I), suggesting different production mechanisms that may alignwith the midline and lateral pathways outlined in Box 2.
Among humans, the hypothalamus appears to be more active during tickling laughter than during volitional, promptedlaughter [68]. This suggests a potential role for the hypothalamus in conveying motivational and affective information tothe PAG in the implementation of laughter motor patterns. The same study found ACC activation during volitional laughterand [29_TD$DIFF] during the suppression of laughter during tickling [30_TD$DIFF], but not during spontaneous laughter, implicating the ACC involitional vocal control [68]. Nonetheless, spontaneous tickling still engages a range of lateral sensorimotor regionsequivalent to activation during volitional laughter; indeed, these appear even more strongly activated when ticklinglaughter is actively suppressed [68]. This finding suggests fairly continuous involvement of newer cortical control systemsin voice production, even for affective and innate vocalizations, and argues against the rather clear demarcation of neuralsystems suggested in the literature (see also Box 2).
Spontaneous human laughter is argued to be similar in form and function to tickling-induced play vocalizations in otherprimates [112]. Both are often produced without voicing [113] and when spontaneous (but not volitional) human laughsare slowed down and pitch proportionally adjusted, they are largely indiscriminable from nonhuman primate vocalizations[109]. Spontaneous human laughter and homologous macaque and chimpanzee vocalizations are characterized bysimilar interval duration, serial organization, and high intra-bout variability in acoustic parameters [114]. The presence ofregular vocal [31_TD$DIFF] fold vibration and consistently egressive airflow in some ape laughter vocalizations – call characteristicspreviously described as markers of human laughter and speech – thus indicates homology in primate vocal controlmechanisms [115].
In addition to spontaneous laughter, adult (but not infant) chimpanzees produce acoustically distinct (e.g., shorter) laughreplications in response to the laughter of conspecifics, with a comparatively delayed response latency [115], implicatingat least some degree of non-automatic vocal control [108,110,115]. Hence, chimpanzees’ capacity to imitate ormodulate laughter appears highly similar to volitional human laughter in both developmental trajectory and its functionin social cohesion.
103
Freq
uenc
y (H
z)
F0 (H
z)
0
104
75
2.1 seconds 1.2 seconds
Voli�onal laughter Spontaneous laughter
Figure I. Human Spontaneous Amusement Laughter (Left) and Volitional Laughter (Right) Produced by theSameMan. Volitional control of laughter as a normative social signal is refined throughout human development whereasspontaneous laughter emerges early in infancy and appears innate, and relatively more animal-like, in its acousticstructure. As illustrated by these waveforms and spectrograms, bouts of spontaneous laughter tend to be longer inoverall duration with shorter individual calls, higher [21_TD$DIFF]fundamental frequency (F0) (plotted in blue), and differing spectralprofiles compared with volitional laughter [109,110]. Nonlinearities in spontaneous laughter vocalizations (e.g., wheezing,glottal whistles) reflect reduced control of vocal behavior during authentic emotional experience [110,116].
10 Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy
TICS 1533 No. of Pages 15
Outstanding QuestionsHow do speakers modulate or scale F0and formants differently across distinctsocial contexts, including, for example,a first date, job interview, or publicspeech?
Does volitional voice modulationprompted in the laboratory differbehaviorally, acoustically, or neurallyfrom the more spontaneous modula-tion observed in real social contexts?
What happens in the brain when wescale formants or lower F0 to soundphysically larger compared with whenwe modulate these vocal features inother contexts, such as during voli-tional or faked conversational laughter?
What are the neural correlates of F0and formant scaling in wild nonhumanprimates and other mammals and howdo these compare with those ofhumans and enculturated primates?
How do individuals vary in the extent towhich they can achieve voice modula-tion and how reliably do these individ-ual differences relate to the ultimateeffectiveness of modulation in alteringthe social assessment or behaviour oflisteners?
How does the multimodal interplaybetween vocal and visual cues influ-ence the effectiveness of voice modu-lation, particularly when vocal andvisual information conflict?
Why do some aphasic patients withlateral lesions retain the ability to swearor imitate animal vocalizations andwhat are the implications for crosstalkbetween the laryngeal motor cortexand the ACC/midline voice-controlpathway?
sites functionally connect with cortical structures is crucial to understanding how they assist inthe auditory-to-motor transforms necessary for imitative behavior.
Other studies have examined the voluntary production of emotional inflections in speech toinvestigate the influence of limbic regions on vocal processes. A comparison of happy andneutral production of syllables revealed increased activation in many cortical and subcorticalregions, crucially including the anterior cingulate and medial temporal cortex as well as thethalamus and midbrain [76]. Recent studies have linked involvement of the ACC and theamygdala to autonomic arousal during the production of angry speech vocalizations [81].The induction of sad (but not happy) affect also predicts activation of the ACC as well aschanges in F0 range [82]. A recent study argued for a more general role for the anterior cingulatein the control of pitch modulations by showing equivalent activation of this region (and of thelaryngeal motor cortex) in the production of both volitional affective and non-affective syllables.These functional imaging studies have thus revealed clear roles for neocortical sensorimotorsystems in the flexible control of human speech and song and our underlying relative expertise inthese behaviors. However, reports of additional involvement of the limbic and striatal regions –
part of an evolutionarily older control system (Box 2) – in comparatively demanding contextsrequiring voluntary modulation of emotional prosody or in phonetic learning are consistent withthe idea that both vocal pathways play a role in voice modulation.
Crucially, then, current evidence from neuroimaging research in humans seems to suggest thatthe limbic and cortical vocal pathways (Figure 2) are highly interactive and that both cortical andsubcortical routes are involved in volitional and spontaneous vocal production [66]. Moreover,vocal flexibility in nonhuman primates suggests that other species have greater neuroanatomicalelaboration of the direct lateral motor cortical route than previously thought or, alternatively, maybe achieving flexibility with older neural structures (e.g., ACC, limbic system). Although there areissues associated with establishing genuine spontaneity in laboratory settings, laughter mayoffer one of the most promising signals with which to interrogate the influence of the differentneural systems controlling vocal output: it allows us to directly compare what happens in thebrain during volitional and spontaneous vocal production in humans and also to comparespontaneous vocalizations produced by humans and nonhuman primates (Box 3).
Concluding Remarks and Future DirectionsVolitional and flexible control of the anatomical structures involved in speech production (thelarynx producing F0 and the supralaryngeal vocal tract producing formants) has long beenthought to set humans apart from other vocalizing animals and is generally considered aprecursor to human speech acquisition and production. We have reviewed preliminary behav-ioral and neural evidence to suggest that the capacity for vocal control is not uniquely human butpresent to some degree in other mammals, including nonhuman primates [62_TD$DIFF](Box 4[63_TD$DIFF]).
We suggest that flexible scaling of source–filter components of vocalizations (F0 and formants)may provide a path for the evolution of vocal control, wherein control of these acousticcomponents may have emerged in our ancestors to express, exaggerate, or even fake physicaltraits or internal motivational states. The fact that many animal species and humans from distinctcultures are capable of scaling their formants to exaggerate their apparent size imparts thepossibility that formant scaling related to size/threat exaggeration (and perception) may consti-tute a precursor of more complex formant modulation involved in speech articulation [27]. Asillustrated in Figure 1, [11_TD$DIFF] linear formant scaling [64_TD$DIFF] as observed in red deer roars[65_TD$DIFF], or [66_TD$DIFF]slightly morecomplex formant [67_TD$DIFF]modulations as observed in the predator calls of diana monkeys, [68_TD$DIFF]effectivelycommunicating size-related information, could have opened the way for increasingly greatervocal control, ultimately leading to full-blown articulated human speech. The sound-symbolicproperties of high front-vowel sounds associated with smalleness (such as /i:/, in which F2 is
Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy 11
TICS 1533 No. of Pages 15
Box 4. Neural Networks Involved in Nonhuman Primate Vocal Control: Preliminary Studies
Although the neural correlates of nonhuman primate vocal control are poorly understood, preliminary studies suggestthat the cortex plays a role and may underlie aspects of flexible vocal behavior in primates. For example, in pigtailedmacaques (Macaca nemestrina), vocalization-selective neurons in the ventral premotor cortex fire during prompted butnot spontaneous vocalization [117]. Implanted electrocorticographic arrays corroborate motor cortex activity duringrhesus macaque (Macaca mulatta) vocal production [118]. Rhesus macaques also possess neurons in the ventrolateralprefrontal cortex that are specifically responsible for the planning of instructed vocalizations [119,120], some of whichmay integrate higher-order auditory processing with cognitive control of vocal motor action. Most recently, and inperhaps the most convincing substantiation of cortical involvement in nonhuman primate vocal behavior to date,researchers observedmotor production-related changes in commonmarmoset (Callithrix jacchus) frontal cortex neuronsduring natural communicative calls [121].
Chimpanzees – who are (along with bonobos) our closest primate relatives and whose laughter calls most stronglyresemble those of humans [112] –may possess greater direct innervation of laryngeal motor nuclei than our more distantape cousins [122]. Quantitative phylogenetic trees constructed based on laughter acoustics of humans and other greatapes map closely onto trees based on genetic similarity [112] and cross-species examinations of laughter highlightcontinuity in the capacity for controlledmodulatory adaptation of an innate vocal repertoire (see also Box 3). Nevertheless,investigation into the neural correlates of laughter in nonhuman primates is a promising but as yet uninvestigated avenueof research.
Nonhuman primates do not appear to possess direct motor cortex innervation of the laryngeal nuclei [55,97] [32_TD$DIFF]; [33_TD$DIFF]however,complex laryngeal control may not be important in the natural primate environment. In macaques, indirect (i.e., at leastdisynaptic) connections from the primary motor cortex to motoneurons innervating hand muscles are present at birthwhereas monosynaptic innervation pathways develop by age 2 years, coinciding with the capacity for fine fingermovements for the skilled hand tasks [2_TD$DIFF] integral to successful survival [34_TD$DIFF] (see [123]). If the development of such monosynapticpathways is at least partly dependent on experience, the motor cortex of an enculturated great ape with species-atypicalvocal modulation capabilities may establish direct connections with laryngeal motoneurons, and the lack of suchinnervation in normal development may be due to a lack of environmental necessity rather than neural capacity.
raised), versus low back-vowel sounds associated with largeness (such as /#:/, in which F2 islowered) could reflect this original function [ [69_TD$DIFF]83,124].
The current neuroimaging literature on vocal control is representative of a subfield influenced bythe historical motivations of the linguistics and neuropsychology (i.e., aphasia) literatures,focusing largely on speech and language production. Thus, we are still lacking studies thatprimarily and systematically investigate the neural control of core source–filter modulations suchas those that may have facilitated exaggeration of body size and cues to sex and identity in ourpreverbal evolutionary past [4,28] and those used ubiquitously in the voluntary vocal communi-cation of attitude, emotional state, and even aspects of personality [84,85]. Hence, future workshould make greater attempts to link neural activation data to the underlying articulatorydynamics and acoustic content of vocal behaviors, including imitation. Real-time anatomicalimaging of the vocal tract using MRI (up to 80 frames/s [86]) offers the opportunity to imagenaturalistic vocal behavior while also measuring neural activation with echoplanar imaging of thebrain. Using amodal analysis approaches such as representational similarity analysis (RSA) [87] [70_TD$DIFF],it is possible to probe the neural representation of articulatory dynamics.
There exists an opportunity to strengthen the links between the behavioral and neural imagingliteratures and form a more coherent cognitive neuroscience of the voice. We thereforeencourage colleagues across disciplines to engage in research that directly examines voicemodulation as a signal with tremendous social, as well as linguistic, significance. In our quest touncover the origins of speech, understanding the behavioral functions and neural mechanismsof both involuntary and voluntary nonverbal modulation in humans, and looking for emergingequivalents in other mammals, is likely to be our best way forward (see Outstanding Questions).
AcknowledgmentsThe authors thank Tecumseh Fitch, Drew Rendall, and Megan Wyman for sharing vocal recordings used in the creation of
Figure 1.
12 Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy
TICS 1533 No. of Pages 15
References
1. Puts, D. et al. (2012) Sexual selection on human faces andvoices. J. Sex Res. 49, 227–243
2. Kreiman, J. and Sidtis, D. (2011) Foundations of Voice Studies:An Interdisciplinary Approach to Voice Production and Percep-tion, Wiley Blackwell
3. Feinberg, D.R. (2008) Are human faces and voices ornamentssignaling common underlying cues to mate value? Evol. Anthro-pol. Issues News Rev. 17, 112–118
4. Owren, M.J. (2011) Human voice in evolutionary perspective.Acoust. Today 7, 24–33
5. Pisanski, K. and Bryant, G.A. (2016) The evolution of voiceperception. In The Oxford Handbook of Voice Studies (Eidsheim,N.S. and Meizel, K.L., eds), Oxford University Press (in press)
6. Ohala, J.J. (1983) Cross-language use of pitch: an ethologicalview. Phonetica 40, 1–18
7. Ohala, J.J. (1984) An ethological perspective on common cross-language utilization of F0 of voice. Phonetica 41, 1–16
8. Puts, D. et al. (2016) Sexual selection on male vocal fundamentalfrequency in humans and other anthropoids. Proc. Biol. Sci. (inpress)
9. Cartei, V. et al. (2012) Spontaneous voice gender imitation abili-ties in adult speakers. PLoS ONE 7, e31353
10. Fitch, W.T. and Hauser, M.D. (2003) Unpacking “honesty”: ver-tebrate vocal production and the evolution of acoustic signals. InAcoustic Communication (Simmons, A.M. et al., eds), pp. 65–137, Springer
11. Taylor, A.M. and Reby, D. (2010) The contribution of source–filtertheory to mammal vocal communication research: advances invocal communication research. J. Zool. 280, 221–236
12. Morton, E.S. (1977) On the occurrence and significance of moti-vation–structural rules in some bird and mammal sounds. Am.Nat. 111, 855–869
13. Puts, D. et al. (2006) Dominance and the evolution of sexualdimorphism in human voice pitch. Evol. Hum. Behav. 27, 283–296
14. Joliveau, E. et al. (2004) Acoustics: tuning of vocal tract reso-nance by sopranos. Nature 427, 116
15. Karpf, A. (2006) The Human Voice, Bloomsbury
16. Titze, I.R. (2011) Vocal fold mass is not a useful quantity fordescribing F0 in vocalization. J. Speech Lang. Hear. Res. 54,520–522
17. Hollien, H. (2014) Vocal fold dynamics for frequency change. J.Voice 28, 395–405
18. Hodges-Simeon, C.R. et al. (2010) Different vocal parameterspredict perceptions of dominance and attractiveness. Hum. Nat.21, 406–427
19. Kempe, V. et al. (2013) Masculine men articulate less clearly.Hum. Nat. 24, 461–475
20. Cartei, V. and Reby, D. (2013) Effect of formant frequency spac-ing on perceived gender in pre-pubertal children's voices. PLoSONE 8, e81022
21. Hughes, S.M. et al. (2014) The perception and parameters ofintentional voice manipulation. J. Nonverbal Behav. 38, 107–127
22. Hughes, S.M. et al. (2010) Vocal and physiological changes inresponse to the physical attractiveness of conversational part-ners. J. Nonverbal Behav. 34, 155–167
23. Fraccaro, P.J. et al. (2011) Experimental evidence that womenspeak in a higher voice pitch to men they find attractive. J. Evol.Psychol. 9, 57–67
24. Fraccaro, P.J. et al. (2013) Faking it: deliberately altered voicepitch and vocal attractiveness. Anim. Behav. 85, 127–136
25. Leongómez, J.D. et al. (2014) Vocal modulation during courtshipincreases proceptivity even in naive listeners. Evol. Hum. Behav.35, 489–496
26. Klofstad, C.A. et al. (2015) Perceptions of competence, strength,and age influence voters to select leaders with lower-pitchedvoices. PLoS ONE 10, e0133779
27. Fitch,W.T. (2000) The evolution of speech: a comparative review.Trends Cogn. Sci. 4, 258–267
28. Ghazanfar, A.A. and Rendall, D. (2008) Evolution of human vocalproduction. Curr. Biol. 18, R457–R460
29. Rendall, D. et al. (2005) Pitch (F0) and formant profiles of humanvowels and vowel-like baboon grunts: the role of vocalizer bodysize and voice-acoustic allometry. J. Acoust. Soc. Am. 117, 944
30. Reby, D. et al. (2005) Red deer stags use formants as assess-ment cues during intrasexual agonistic interactions. Proc. Biol.Sci. 272, 941–947
31. Riede, T. et al. (2005) Vocal production mechanisms in a non-human primate: morphological data and a model. J. Hum. Evol.48, 85–96
32. Townsend, S.W. et al. (2014) Acoustic cues to identity andpredator context in meerkat barks. Anim. Behav. 94, 143–149
33. Fitch, T.W. (2006) Production of vocalizations in mammals. Vis.Commun. 3, 115–121
34. Janik, V.M. (2014) Cetacean vocal learning and communication.Curr. Opin. Neurobiol. 28, 60–65
35. Pinker, S. and Bloom, P. (1990) Natural language and naturalselection. Behav. Brain Sci. 13, 707–727
36. Janik, V.M. and Slater, P.J. (1997) Vocal learning in mammals.Adv. Study Behav. 26, 59–100
37. Petkov, C.I. and Jarvis, E.D. (2012) Birds, primates, and spokenlanguage origins: behavioral phenotypes and neurobiologicalsubstrates. Front. Evol. Neurosci. 4, 12
38. Nowicki, S. and Searcy, W.A. (2014) The evolution of vocallearning. Curr. Opin. Neurobiol. 28, 48–53
39. Janik, V.M. and Slater, P.J. (2000) The different roles of sociallearning in vocal communication. Anim. Behav. 60, 1–11
40. Lameira, A.R. et al. (2014) Primate feedstock for the evolution ofconsonants. Trends Cogn. Sci. 18, 60–62
41. Fischer, J. et al. (2015) Is there any evidence for vocal learning inchimpanzee food calls? Curr. Biol. 25, R1028–R1029
42. Watson, S.K. et al. (2015) Vocal learning in the functionallyreferential food grunts of chimpanzees. Curr. Biol. 25, 495–499
43. Watson, S.K. et al. (2015) Reply to Fischer et al. Curr. Biol. 25,R1030–R1031
44. Wich, S.A. et al. (2012) Call cultures in orang-utans? PLoS ONE7, e36180
45. Lameira, A.R. et al. (2013) Orangutan (Pongo spp.) whistling andimplications for the emergence of an open-ended call repertoire:a replication and extension. J. Acoust. Soc. Am. 134, 2326–2335
46. Hopkins, W.D. et al. (2007) Chimpanzees differentially producenovel vocalizations to capture the attention of a human. Anim.Behav. 73, 281–286
47. Hayes, C. (1951) The Ape in Our House, Harper
48. Taglialatela, J.P. et al. (2003) Vocal production by a language-competent Pan paniscus. Int. J. Primatol. 24, 1–17
49. Perlman, M. and Clark, N. (2015) Learned vocal and breathingbehavior in an enculturated gorilla. Anim. Cogn. 18, 1165–1179
50. Lameira, A.R. et al. (2015) Speech-like rhythm in a voiced andvoiceless orangutan call. PLoS ONE 10, e116136
51. Wilson, M.L. et al. (2007) Chimpanzees (Pan troglodytes) modifygrouping and vocal behaviour in response to location-specificrisk. Behaviour 144, 1621–1653
52. Cheney, D.L. and Seyfarth, R.M. (1985) Vervet monkey alarmcalls: manipulation through shared information? Behaviour 94,150–166
53. Laporte, M.N.C. and Zuberbühler, K. (2010) Vocal greeting behav-iour in wild chimpanzee females. Anim. Behav. 80, 467–473
54. Wilson, M.L. et al. (2001) Does participation in intergroup conflictdepend on numerical assessment, range location, or rank for wildchimpanzees? Anim. Behav. 61, 1203–1216
55. Owren, M.J. et al. (2011) Two organizing principles of vocalproduction: implications for nonhuman and human primates.Am. J. Primatol. 73, 530–544
56. Crockford, C. et al. (2012) Wild chimpanzees inform ignorantgroup members of danger. Curr. Biol. 22, 142–146
57. Schel, A.M. et al. (2013) Chimpanzee alarm call productionmeetskey criteria for intentionality. PLoS ONE 8, e76674
58. Schel, A.M. et al. (2013) Chimpanzee food calls are directed atspecific individuals. Anim. Behav. 86, 955–965
Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy 13
TICS 1533 No. of Pages 15
59. Gruber, T. and Zuberbühler, K. (2013) Vocal recruitment for jointtravel in wild chimpanzees. PLoS ONE 8, e76073
60. Lameira, A.R. et al. (2013) Population specific use of the sametool-assisted alarm call between two wild orangutan populations(Pongopygmaeus wurmbii) indicates functional arbitrariness.PLoS ONE 8, e69749
61. Clay, Z. et al. (2015) Functional flexibility in wild bonobo vocalbehaviour. PeerJ 3, e1124
62. Genty, E. et al. (2014) Multi-modal use of a socially directed call inbonobos. PLoS ONE 9, e84738
63. Ackermann, H. et al. (2014) Brain mechanisms of acoustic com-munication in humans and nonhuman primates: an evolutionaryperspective. Behav. Brain Sci. 37, 529–546
64. Simonyan, K. and Horwitz, B. (2011) Laryngeal motor cortex andcontrol of speech in humans. Neuroscientist 17, 197–208
65. Tourville, J.A. and Guenther, F.H. (2011) The DIVA model: aneural theory of speech acquisition and production. Lang. Cogn.Process. 26, 952–981
66. Ludlow, C.L. (2015) Central nervous system control of voice andswallowing. J. Clin. Neurophysiol. 32, 294–303
67. Belyk, M. et al. (2015) The neural basis of vocal pitch imitation inhumans. J. Cogn. Neurosci. Published online December 22,2015. http://dx.doi.org/10.1162/jocn_a_00914
68. Wattendorf, E. et al. (2013) Exploration of the neural correlates ofticklish laughter by functional magnetic resonance imaging.Cereb. Cortex 23, 1280–1289
69. Belyk, M. and Brown, S. (2015) Pitch underlies activation of thevocal system during affective vocalization. Soc. Cogn. Affect.Neurosci. Published online June 15, 2015. http://dx.doi.org/10.1093/scan/nsv074
70. Van Lancker, D. and Cummings, J.L. (1999) Expletives: neuro-linguistic and neurobehavioral perspectives on swearing. BrainRes. Rev. 31, 83–104
71. Golestani, N. and Pallier, C. (2007) Anatomical correlates offoreign speech sound production. Cereb. Cortex 17, 929–934
72. Reiterer, S.M. et al. (2011) Individual differences in audio-vocalspeech imitation aptitude in late bilinguals: functional neuro-imaging and brain morphology. Front. Psychol. 2, 271
73. Reiterer, S.M. et al. (2013) Are you a good mimic? Neuro-acous-tic signatures for speech imitation ability. Front. Psychol. 4, 782
74. Simmonds, A.J. et al. (2014) The response of the anterior stria-tum during adult human vocal learning. J. Neurophysiol. 112,792–801
75. McGettigan, C. et al. (2013) T’ain’t what you say, it's the way thatyou say it – left insula and inferior frontal cortex work in interactionwith superior temporal regions to control the performance ofvocal impersonations. J. Cogn. Neurosci. 25, 1875–1886
76. Aziz-Zadeh, L. et al. (2010) Common premotor regions for theperception and production of prosody and correlations withempathy and prosodic ability. PLoS ONE 5, e8759
77. Zarate, J.M. (2013) The neural control of singing. Front. Hum.Neurosci. 7, 237
78. Ackermann, H. and Riecker, A. (2010) The contribution(s) of theinsula to speech production: a review of the clinical and functionalimaging literature. Brain Struct. Funct. 214, 419–433
79. Kleber, B. et al. (2010) The brain of opera singers: experience-dependent changes in functional activation. Cereb. Cortex 20,1144–1152
80. Kleber, B. et al. (2013) Experience-dependent modulation offeedback integration during singing: role of the right anteriorinsula. J. Neurosci. 33, 6070–6080
81. Frühholz, S. et al. (2015) Talking in fury: the cortico-subcorticalnetwork underlying angry vocalizations. Cereb. Cortex 25,2752–2762
82. Barrett, J. et al. (2004) The role of the anterior cingulate cortexin pitch variation during sad affect. Eur. J. Neurosci. 19,458–464
83. Hinton, L. et al. (2006) Sound Symbolism, Cambridge UniversityPress
84. McGettigan, C. (2015) The social life of voices: studying theneural bases for the expression and perception of the self and
14 Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy
others during spoken communication. Front. Hum. Neurosci. 9,129
85. Banse, R. and Scherer, K.R. (1996) Acoustic profiles in vocalemotion expression. J. Pers. Soc. Psychol. 70, 614
86. Lingala, S.J. et al. (2015) High spatio-temporal resolution multi-slice real time MRI of speech using golden angle spiral imagingwith constrained reconstruction, parallel imaging, and a novelupper airway coil. In Proceedings of the 23rd Conference of theInternational Society for Magnetic Resonance in Medicine, p.789, ISMRM
87. Kriegeskorte, N. et al. (2008) Representational similarity analysis– connecting the branches of systems neuroscience. Front. Syst.Neurosci. 2, 4
88. Chiba, T. and Kajiyama, M. (1958) The Vowel: Its Nature andStructure, Phonetic Society of Japan
89. Fant, G. (1960) Acoustic Theory of Speech Production, Mouton
90. Titze, I.R. (1994) Principles of Vocal Production, Prentice Hall
91. Reby, D. and McComb, K. (2003) Anatomical constraints gen-erate honesty: acoustic cues to age and weight in the roars of reddeer stags. Anim. Behav. 65, 519–530
92. Pisanski, K. et al. (2014) Vocal indicators of body size in men andwomen: a meta-analysis. Anim. Behav. 95, 89–99
93. Pisanski, K. et al. (2016) Voice parameters predict sex-specificbody morphology in men and women. Anim. Behav. 112, 13–22
94. Lieberman, P.H. et al. (1969) Vocal tract limitations on the vowelrepertoires of rhesus monkey and other nonhuman primates.Science 164, 1185–1187
95. Boersma, P. and Weenink, D. (2016) Praat: Doing Phonetics byComputer. ( http://www.praat.org)
96. Fitch, W.T. and Reby, D. (2001) The descended larynx is notuniquely human. Proc. Biol. Sci. 268, 1669–1675
97. Jürgens, U. (2009) The neural control of vocalization inmammals:a review. J. Voice 23, 1–10
98. Jürgens, U. (2002) Neural pathways underlying vocal control.Neurosci. Biobehav. Rev. 26, 235–258
99. Wild, B. et al. (2003) Neural correlates of laughter and humour.Brain 126, 2121–2138
100. Penfield, W. and Rasmussen, T. (1950) The Cerebral Cortex ofMan: A Clinical Study of Localization of Function, Macmillan
101. Pulvermüller, F. et al. (2006) Motor cortex maps articulatoryfeatures of speech sounds. Proc. Natl. Acad. Sci. U.S.A. 103,7865–7870
102. Bouchard, K.E. et al. (2013) Functional organization of humansensorimotor cortex for speech articulation. Nature 495, 327–332
103. Brown, S. et al. (2008) A larynx area in the human motor cortex.Cereb. Cortex 18, 837–845
104. Loucks, T.M.J. et al. (2007) Human brain activation during pho-nation and exhalation: common volitional control for two upperairway functions. Neuroimage 36, 131–143
105. Levelt, W.J.M. et al. (1999) A theory of lexical access in speechproduction. Behav. Brain Sci. 22, 1–38
106. Golfinopoulos, E. et al. (2011) fMRI investigation of unexpectedsomatosensory feedback perturbation during speech. Neuro-image 55, 1324–1338
107. Tourville, J.A. et al. (2008) Neural mechanisms underlying audi-tory feedback control of speech. Neuroimage 39, 1429–1443
108. Scott, S.K. et al. (2014) The social life of laughter. Trends Cogn.Sci. 18, 618–620
109. Bryant, G.A. and Aktipis, C.A. (2014) The animal nature of spon-taneous human laughter. Evol. Hum. Behav. 35, 327–335
110. Lavan, N. et al. (2015) Laugh like you mean it: authenticitymodulates acoustic, physiological and perceptual properties oflaughter. J. Nonverbal Behav. Published online December 19,2015. http://dx.doi.org/10.1007/s10919-015-0222-8
111. Makagon, M.M. et al. (2008) An acoustic analysis of laughterproduced by congenitally deaf and normally hearing collegestudents. J. Acoust. Soc. Am. 124, 472–483
112. Davila Ross, M. et al. (2009) Reconstructing the evolution oflaughter in great apes and humans. Curr. Biol. 19, 1106–1111
TICS 1533 No. of Pages 15
113. Bachorowski, J-A. and Owren, M.J. (2001) Not all laughs arealike: voiced but not unvoiced laughter readily elicits positiveaffect. Psychol. Sci. 12, 252–257
114. Vettin, J. and Todt, D. (2005) Human laughter, social play, andplay vocalizations of non-human primates: an evolutionaryapproach. Behaviour 142, 217–240
115. Davila-Ross, M. et al. (2011) Aping expressions? Chimpanzeesproduce distinct laugh types when responding to laughter ofothers. Emotion 11, 1013–1020
116. Ruch, W. and Ekman, P. (2001) The expressive pattern of laugh-ter. In Emotion, Qualia, and Consciousness (Kaszniak, A.W., ed.),pp. 426–443, World Scientific
117. Coudé, G. et al. (2011) Neurons controlling voluntary vocalizationin the macaque ventral premotor cortex. PLoS ONE 6, e26822
118. Fukushima, M. et al. (2014) Modeling vocalization with ECoGcortical activity recorded during vocal production in the macaquemonkey. Proc. Int. Conf. Eng. Med. Biol. Soc. 2014, 6794–6797
119. Hage, S.R. and Nieder, A. (2013) Single neurons in monkeyprefrontal cortex encode volitional initiation of vocalizations.Nat. Commun. 4, 2409
120. Hage, S.R. and Nieder, A. (2015) Audio-vocal interaction in singleneurons of the monkey ventrolateral prefrontal cortex. J. Neuro-sci. 35, 7030–7040
121. Miller, C.T. et al. (2015) Responses of primate frontal cortexneurons during natural vocal communication. J. Neurophysiol.114, 1158–1171
122. Ploog, D.W. (1992) The evolution of vocal communication. InNonverbal Vocal Communication: Comparative and Develop-mental Approaches (Papou, H. et al., eds), pp. 6–30, Maisondes Sciences de l’Homme
123. Simonyan, K. (2014) The laryngeal motor cortex: its organizationand connectivity. Curr. Opin. Neurobiol. 28, 15–21
124. Sapir, E. (1929) A study in phonetic symbolism. J. Exp. Psychol.12, 225
Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy 15
Top Related