The level of the 'singing formant' and the source spectra of professional bass singers · STL-QPSR...
Transcript of The level of the 'singing formant' and the source spectra of professional bass singers · STL-QPSR...
Dept. for Speech, Music and Hearing
Quarterly Progress andStatus Report
The level of the ’singingformant’ and the source
spectra of professional basssingers
Sundberg, J.
journal: STL-QPSRvolume: 11number: 4year: 1970pages: 021-039
http://www.speech.kth.se/qpsr
STL-CPSR 4/1970
111. ivlUSICAL ACOUSTICS
A. THE LEVEL O F THE "SINGING FORI.JIANTtt AND THE SOURCE SPECTRA OF PROFESSIONAL BASS SINGERS
J. Sundberg
1. Introduction
The "singing formant" i s an abnormally high spectrum envelope peak
located in the frequency region near 3 kHz. It i s frequently found in a r t i s t ic
singing in V e s t e r n culture, and has been observed by severa l authors ( 1 - 5).
On the other hand, the relative amplitude of this spectral peak has not a s yet
been systematically investigated.
In a previous investigation by the present author i t was shown that the
"singing formantt t can be interpreted a s a formant cluster consisting of the
third, fourth, and fifth formants(6). In this way the presence of the spectrum
envelope peak i s explicable in t e r m s of traditional acoustic theory of speech
production. If this theory i s in fact valid in singing, a l l level variations of
the I ' singing formant" should be predictable f rom the source spectrum fall
and the formant frequencies. The main purpose of the present investigation
i s to find out whether this i s the case o r not. An additional purpose i s to
contribute to the picture showing how professional s ingers of differing voice
qualities produce sung and spolcen vowels, especially a s regards the level
of the "singing formant" and the source spectrum.
Initially, we shall determine how the relative and absolute levels of the
"singing formant" vary when pitch and degree of loudness change ( ~ e c . 2).
Secondly, source spectra will be analyzed using an analysis-by-synthesis
procedure (Sec. 3 ) . The applicability of this method in the case of sung vow-
e l s i s dealt wit11 in a special section (Sec. 3. 1). By means of experiments
with an acoustical model of the vocal t r ac t i t i s shown how transfer functions
characterized by the presence of a "singiizg formant" can be gcncrated.
Moreover, it is proved that such t ransfer functions can be accurately sim-
ulated on a terminal analogue. Thirdly, the source spec t rz obtained f rom
sung vowels will be compared with source spectra employed in normal and
loud speech ( ~ e c . 4). Finally, the resu l t s will be discussed (Scc. 5) and
summarized ( ~ e c . 6).
STL-QPSR 4/1970
2 , Sung vowel spectra
2. i Material
Two professional bass s ingers , identical with subjects B 1 and B4 in the
previous paper , served a s subjects(6). B i has a ve ry dark voicc ranging
f r o m C2 to F4, and B4 has a light voicc ranging f rom F2 to G4.
Both voices were recorded in an anechoic room with a calibrated Briiel &
Kjzr 1" microphone about 17 cm in front of the mouth. The tape recorder
was a tctro-channel Ampex. On one channel a direct r e c o r d k g was made and
on the other an FM recording.
The two subjects sang three sustained vowels ([u], [i], [a]) in four
degrees of loudnc s s (piano, /p/, mezzopiano, /mp/, mezzoforte, /mf/,
forte, /f/) and in four pitches approximately equally spaced within their pitch
ranges. The vowels were sustained for a t leas t two seconds.
2 , 2 Analysis
A tape loop was made of each vowel sound and a spectrum analysis was
performed by a Rodhe & Schwarz FNA spectrograph. Two bandwidths were
used: 200 Hz and 10 Hz. The wide-band analysis shows the relative levels
of the formants. The narrow-band analysis informs about the level relations
between neighboring partials. Idorcover, it depicts the modulation amplitude
of the fundamental due to the vibrato, since the n:th par t ia l does not s h c ~ ~ up
a s a line but a s a wide band of maximum intensity, the width of which equals 2
n t imes the amplitude of the frequency modulation of the fundamental.
The overal l sound p r e s s u r e levels were measured by readings of the
tape output RMS voltage. The level of the f i r s t formant was evaluated a s the
output voltage f rom an 80 d ~ / o c t a v e sloping LP f i l ter adjusted to . 5 kHz for
[u] and [ i ] and to .75 kHz for [a].
2.3 Results
2.3. 1 Overall levels
The overall sound p r e s s u r e level of a vowel sound i s mainly determined
by the amplitude of the f i r s t formant since the amplitudes of the higher for -
mants a r e generally much weaker.
STL-QPSR 4/1970 2 3.
The overal l amplitude depends on sevcra l factors( ' l) . One i s the funda-
mental frequency F O : An octave r i s e in F O will cause the amplitude to r i s e
3 dB, everything e lse kept constant. Another factor is the combination of
formant frequencies. According to theory the sound p res su re level of an [ a ]
will be about 6 dB higher than those of an [u] and an [i], everything e lse kept
constant. A third factor i s the amplitude of the exciting source, depending
itself upon the combination of the muscular conditions in the larynx and the
subglottic pressure .
2 The RMS sound p r e s s u r e level - re . .0002 dyne/cm (SPL) of a l l vowels
a r e given a s a function of pitch in Fig. In-A- I. Each point in the graph r e -
present the value corresponding to one degree of loudness. F o r each vowel
of a given pitch the lowest levels re fer to the weakest degrec of vocal inten-
sity and the highest to the strongest, i. e. the overall levcl r i s e s monotonical-
ly a s loudness i s increased, a s might be expected.
Fig. 111-A-I shows that the S P L i s pitch-dependent a l so in singing: the
higher the pitch, the higher the SPL. However, thc ra te of increase i s a s
rapid a s about 9 d ~ / o c t a v c which is substantially 1;lore than predicted by
theory. The discrepancy, 6 d ~ / o c t a v e , i s much too la rge to be explained by
formant frequency changes (see Sec. 3 below). This indicates that the s ingers
a l te r the source conditions when going f rom a low note to a higher. This (8) assumption i s supported by findings made by other investigators .
At a given pitch the dynamic range between piano and for te i s on the
average 10 dB in the ovcrall SPL.
Comparing the SPI; of the vowels we see. that the [a] i s sung with higher
valucs than the [u] and [i]. However, this does not hold f o r the lowest note.
Moreover we observe that the [u] tends to be weaker in SPL than the [i].
These findings seem to indicate, that the s ingers use the same source strength
for the different vowels on the same pitch, the lowest pitch possibly excepted.
2 .3 .2 Relative level of the "singing formant"
Little i s known about the levcl of the "singing formant". However, it can
be expected to depend on two factors . One i s the density of the third, fourth,
and fifth formants , i.c. how closc they a r c in frequency. The other factor i s
the amount of sound energy generated by thc voice source in this frequency
region. In soft speech the source spectrum envelope i s known to fall off more
I I I I I - - -
- -
- --= * - i-r- -. ',o -
- - 0 - -. 0
-
- I I"" * I I I 1
Q, cn
s 0
s 0
(3 0 0
8 0
STL-QPSR 4/1970 24.
rapidly than in loud speech (9' lo). Consequently, the level difference between
the f i r s t formant and the "singing formant" might be expected to vary with co-
cal intensity.
This level difference, hereaf ter r e fe r red to a s the relative level of the
"singing formant", was measured f rom each wide-band spectrogram. The
values a r e given in Fig. 111-A-2a and b. 'S:'ith very few exceptions the re la -
tive level of the "singing formant" was observed to r i s e with vocal intensity,
so that the lowest values for each pitch and vowel re fer to piano and the high-
e s t to forte.
The range of variation within each pitch and vowel i s considerable especial-
ly for the light voice B4. To the f i r s t approximation we can say that the re la -
tive level of the "singing forrnant" i s independent of pitch. This seems rather
likely in view of the fact that singing pedagogy a ims a t eliminating audible
transit ions between rcgis te rs .
However, a detailed study of Fig. 111-A-2a and b reveals weak trends.
In the case of the dark voice B i the relative level of the "singing formant"
tends to r i s e slightly with pitch. The trend of the light voice i s the opposite,
a t l eas t in [i].
The grea t spread of the values in Fig. 111-A-2a and b i s to a la rge extent
dependent on the considerable influence of the degree of loudness. This in-
fluence i s demonstrated by Fig. 111-A-3a and b, showing the mean rclative
level of the "singing formant" a s a function of the degree of loudness. In this
figure, each point represents the mean of 12 values ( 3 vowels, 4 pitches).
The two graphs have some features in common. As mentioned above the
relative level of the "singing formant" i s ra i sed a s the degrec of vocal inten- (11) sity becorrles stronger. The ra te of increase can be regarded a s logarithmic .
Thus, a s f a r a s the re la t ive level of the "singing formant" i s concerned the
difference between the degree s of loudne s s gradually decrease a s loudness
grows stronger.
Comparing the two graphs we find scvcra l differences. The relative level
of the "singing formant" i s on the average higher for the light voice than for
the dark one, i. e. B i has a weaker "singing formant". The growth of the
rclative level of the "singing formant" in going f rom piano to for te i s much
l a rge r for the light voice. Thus, on the average, B i r a i s e s the relative level
by 12 d 3 (from -28 dB to - 16 dB) whereas B4 r a i s e s it by 18 dB (from -24 dB
in piano to -6 dB in forte).
PIT
Fig. 111-A-2. Level differences between the "singing formant" and the first formant in spectra of vowe'ls (given in IPA symbols) sung in four degrees of vocal intensity (p, mp, m f , f ) . a: dark voice b: light voice
C2 G2 C3 G3 C4 G4 C2 G2 C3 G3 C4 G4
PITCH
c d
P mP m f f P mp rnf f
VOCAL INTENSITY
Fig. 111-A-4. Mean sound p r e s s u r e levels re . .0002 dyne/cm 2
measu red in front of the mouth in 3 vowels ([u] CaJCi]) sung in 4 degrees of vocal intensity (p, mp , rrlf , f ) and 4 pitches. The curves show the sound p r e s s u r e level of the overa l l spec t rum (so- lid l ines) , the f i r s t fo rmant (dashed l ines ) , and the "singing formant" (chain-dashed l ines). The values a r e given a s function of pitch (a and b) and degree of vocal intensity ( c and d). a and c show values of a da rk troice, b and d give those of a light voice.
TABLE 111-A- I. F o r m a n t frequency values u sed in spec t rum matching
A. 1. Subject B l
P i t ch C3
(FOml30Hz) P
2. P i t c h C4
( ~ 0 ~ 2 6 0 ~ ~ ) P
mP
mf
B. 1. Subject B4
P i t c h D ( F O ~ I ~ & Z )
2. P i t ch H ( ~ 0 ~ 2 4 7 % ~ )
b ------ -- -------- v o w e l [a]
F 1 F 2 F 3 ~ 4 ' F 5
---- - -- - _ _ _ _
v o w e l [i]
F 1 F 2 F 3 F4 F5
" - ------ - - - - - - - -
v o x v e l [u] -7 1 F 1 F 2 F 3 F 4 F 5 Hz Hz kHz kHz kHz 1 287 650 2.31 2.69 2.94 i
I 287 686 2.53 2.69 2.94
290 722 2.55 2.63 2.99 1 300 686 2.59 2.66 2.97 1
I
356 738 2.30 2.60 2.97
356 738 2.39 2.67 2.93
356 738 2.63 2.87 3.01 ,
356 770 2.48 2.69 2.82 / I +
I
374 702 2.56 2.73 3.44 I i
390 692 2.66 2.78 3.48 1 372 724 2.64 2.72 3.44 1
I
355 690 2.55 2.80 3.28
355 699 2.54 2.83 3.23
379 703 2.74 2.89 3.22
354 846 2.55 2.79 3.36
354 846 2.66 2.83 3.41
354 846 2.66 2.90 3.41 .
Hz Hz kHz kHz kHz , Hz kHz kHz kHz kHz
465 860 2.55 2.73 3.61 I 295 1.71 2. 10 2.83 2.97
467 854 2.59 2.70 3.65 1 295 1.71 2.21 2.92 3.05 I
485 875 2.57 2.73 3.65 ' 270 1.69 2.29 2.86 2.98
538 910 2.60 2.80 3.50 ' 300 1.66 2.22 2.83 2.97
I 1 5 2 5 932 2.57 2.80 3 .61 / 300 1.64 2.25 2.87 3.05
552 957 2.60 2.81 3.50 j 300 1.64 2.29 2.87 3.05
550 940 2.48 2.73 3. 10 1 300 1.66 2.34 2.84 3.07
576 1000 2.60 2.79 3. 16 300 1.62 2.27 2.84 3.00
- _______ ____---_____-I- - 226 2.19 2.85 3.09 3.39 P
mP 229 2.05 2.74 3.09 3.47
mf 218 2.05 2.65 3. 10 3. 54
f . 229 1.90 2.67 2.79 3.44
1
P I I 307 1.86 2.64 3.04 3.52
m P ; 310 1.87 2.57 3.03 3.39
m f : 299 1.89 2.58 2.94 3.39
f 262 I. 83 2.56 2.89 3.39
P / 315 1.87 2.57 3.14 3.50
*P 315 1.87 2.47 3, 14 3.40
mf
f --
315 1.87 2.47 3.14 3.45
315 1.87 2.47 3.14 3.45
In speech, the source spectrum i s known to fall off by approximately
12 d ~ / o c t a v e ( 9 9 1 3 ) . At weak vocal effort the spectrum slopes m o r e rapid-
ly owing to the smoother movements of the vocal cords. A variation of the
source spectrum slope if experimentally e stablished would provide a reason-
able explanation of the variations of the relative level of the "singing formant".
Therefore, an investigation of the source spectrum slope was undertaken.
A traditional way of obtaining data on the source spectrum i s the analysis - by- synthesis procedure. This technique involves matching of vowel spectra
with the aid of a formant synthesizer. Transfer functions corresponding to
spoken vowels can be accurately simulated on such a terminal analogue.
Hot- ever, it i s doubtful whether this can be done also in the case of sung
vowels. The question i s whether the formant cluster i s made up of three
t 'normalt ' formants. An affirmative answer does not seem self -evident
since five formants located below 4 o r even 3 kHz i s quite abnormal in
speech. This question i s a crucial one. Yhen an analysis-by- synthe s i s
method i s employed, the obtained source spectrum i s entirely dependent on
the cor rec tness of the t ransfer function sirnulation. A se t of experiments
were undertaken in o rde r to study this problem.
3. 1 The methodological problem
The "singing formant" exhibits small intrasubject variations a s regards
i ts frequency location. This supports the hypothesis that those p a r t s of the
vocal t rac t , that a r e least involved in ar t iculatory movements, play an im-
portant role in the generation of the "singing formant". The sinus ivfor-
gagni and the sinus pir i formis a r e situated in such regions. Moreover,
they a r e known to be l a rge r in singing than in speech ( 6 y 14' 15). A widening
of the sinus Morgagni has becn shown to lowcr the fourth formant frequency
considerably ( 9 y 6). The influence on the t ransfer function of the sinus p i r i -
formis has not been examined in great dctail, and there i s no reliable theory
available. Therefore, i t was nece s s a r y to study i ts influence empirically.
This was done in the following way. An acoustical model of the vocal
t r ac t was constructed. The design of the model i s i l lustrated in Fig. 111-A-5.
The larynx tube was simulated by a cylindrical tube of 2 . 5 crn length and
1.2 cm diameter. By means of clay this tube was attached inside a l a r g e r
tube of 19 cm length and 3. 1 cm diameter. The length axis of the two tubes 0 formed an angle of 17 . At the bottom of the smal le r tube a constant volume
SECTION A-A
Fig. 111-A-5. Sections thfough the acoustical model of the vocal tract described in the text.
STL-QPSR 4/1970 28.
velocity sound source, the STL Ionophone, was adapted allowing accuratc ( 16) and direct mcasurcments of the t ransfcr function . A variation of the
sinus pir i formis volume was simulated by filling the bottom portion of the
l a rge r tube with water. In this way a model was obtained that simulated the
acoustical situation a t the glottal end of the vocal t r ac t in a rather rea l i s t ic
manne r.
Two typical t ransfcr functions obtaincd f rom the model a r e given in
Fig. 111-A-6a and b. The solid curvcs re fer to the model with a simulated
sinus pir i formis and the dashed curves to conditions with the "sinus p i r i -
formis t t filled with water. Fig. 111-A- 6a shows t ransfer functions obtained
when the l a rge r tube was unperturbed and Fig. 111-A-6b shows t ransfer func-
tions frorn the model when the l a r g e r tube was perturbed to give an a r e a
function s imilar to that of an [ i].
The volume of the simulated sinus pir i formis i s seen to play a very im-
portant role with regard to the formant situation around 3 kHz. Its effect is
to lower al l formant frequencies but especially the fifth one. This confirms
the findings of Fant obtained f rom electr ical line analogue studies(17). On
the other hand i t contradicts the recent statements of Mermelstein based on (18) theoretical considerations .
Experiments showed that a zero in the t ransfer function occurs close to
the frequency of the fifth formant. It may appear below this frequency if
the sinus pir i formis i s very large. 7,7hen the sinus pir i formis has a size
suitable for giving a cluster of the fourth and fifth formants the zero appeared
a s a rule around 4.5 kHz.
It i s remarkable that the amplitudes of the fourth and fifth formants do
not r i s e m o r e when the frequency distance between thcm i s reduced to such
a considerable cxtcnt. There seem. to be two reasons for this. Onc i s that
the second and third formant frequencies a r e lowered. The other i s thc
zero following the fifth formant. In matching the t ransfer functions obtained
f rom the model it was found that the amplitude reducing effect of this zero
can be simulated by adjusting the frcqucncy of the higher pole correction to
about . 5 kHz above the frequency of the fifth formant.
Fig. 111-A-7a and b compares two t ransfer functions obtained f rom the
model (solid curvcs) with the corresponding matches obtained f rom the t e r -
minal analogue (dashed curves). The difference between the solid and
3 kHz 4
0 1 2 3 kHz 4
Fig. 111-A-6. Transfer functions obtained by ionophone excitation of the acoustical model of the vocal t r ac t shown in Fig. 111-A-5. Solid contours were obtained when the "sinus pir i formis" was par t ly filled with water , dashed contours when i t was completely filled with water. Upper graph (a) perta ins to a cylindric tube, lower graph (b) to a tube per turbed so a s to give an a r e a function s imi la r to that of an [i].
d~ a 0
-10
-20
- 30
-40
- 50
- 60
-70
-80
0 1 2 3 kHz 4
dB 0
- 10 - 20
-30
- 40
- 50 -60
-70
-80
2 3 kHz 4
Fig. 111-A-7. Two t ransfe r functions obtained f r o m a te rmina l analogue (dashed curves) and by ionophone excita- tion of the acoustical model of the vocal t r a c t shown in Fig. 111-A-5 (solid curves) .
STL-QPSR 4/1970 29.
dashed curves l i e s within about f 2 dB up to 3 . 6 kHz. The very narrow
bandwidth occurring in some formants cannot be matched by the analogue.
This lacks importance since bandwidths of such small magnitudes do not (19) appear in the t ransfer functions of human vocal t r ac t s .
The question no\-, a r i s e s whether the t ransfer functions obtained f rom the
model a r e equivalent to those produced in sung vowels by the singers. A
zero is frequently found in the spectra of sung vowels. As a rule, this zero
i s located near 3 kHz. Y e may assume that this zero a r i s e s f rom the sinus
pir i formis cavity. The experience f rom the model experiments support the
assumption that the singer i s able to manipulate the frequency of this zero
by adjusting the size of his sinus piriformis. Especially in the case of the
back vowels the s ingers tend to compensate the lowering of the third formant
frequency in the articulation(6). In this \-Jay they will obtain a higher level of
the "singing formant" than was produced by the model. Thus it s eems rea -
sonably safe to conclude that the s ingers generate the t ransfer functions of
sung vowels in approximately the same \-.ray a s our model. Consequently,
we may conclude that the t ransfer functions of sung vowels can be accurately
simulated on a terminal analogue provided that the zero above the fifth for - rnant frequency i s taken into account in adjusting the higher pole correction.
The experiments with the acoustical mcdel support the following con-
clusions:
(1) The sinus pir i formis lowers a l l formant frequencies r i ~ o ~ o r less .
( 2 ) The sinus pir i formis lowers the fifth formant frequency drastically and adds a zero to it. I ts size seems to determine the frequency location ~f this zero.
(3 ) The sinus pirifornlis in combination with the sinus Morgagni appear to play a crucial role in the generation of t ransfer functions including a "singing formant" cluster.
(4) The frequency of the zero predicted by the model can be taken into account by adjusting the frequency of the higher pole correction.
(5) It can thus be concluded that t ransfer functions exhibiting a "singing formant" cluster can be accurately siri~ulated on a terminal analogue up to about 3 .6 kHz.
3 . 2 Analysi s
3 . 2 . 1 IAaterial
The spectrograms of the sung vowels revealed that both subjects' lo\;.-
pitched notes (5'0 < 100 HZ) and the [a] of the light voice lTJere nasalized.
Ext ra spectrum envelope peaks near . 2 kHz and between 1 and 2 kHz, and
STL-QPSR 4/1970 30.
abnormally wide bandwidths were taken a s evidence for this. Fa i lures to
obtain normal glottograms f rom inversc fi l tering of these vowels supported
the same conclusion. Mainly owing to lack of detailed knowledge about the
t ransfer functions of nasalized vowcls, these vowels were excluded f rom
the mater ial .
The piano se r i e s for [u) sung by Bd was pronounced with a marked
leakage in the glottis especially a t low pitches. This feature can scarce1 y
be regarded a s a character is t ic of a r t i s t ic singing but ra ther a s typical of
an occasional vocal indisposition. Consequently his piano se r i e s for [u]
was excluded f rom the mater ial . The remaining se t s of vowcls whose source
spcctra were analyzed a r c l isted in Table 111-A-2.
Table 111-A-2. Matched vowcls
i 3 e ; r e e o.£ l o ~ d i . ; ~ ~ ~
Voice Pi tch piano m e zzopiano m e zzoforte forte
B i C3 C ~ l C a l C i l CulCalCil [ ~ l [ a l C i l Cul[al[ i l
C4 [ ~ l C a I [ : i l Cul[al[:il [u l [ a lC i l t u lCa l [ i l
3.2,. 2 Procedure
The sung vowel spectra were matched on the terminal analogue men-
tioned above. The synthetic source consisted of a pulse t ra in with a spec-
t rum fall of 12 dB pe r octave. The fundarnental of this pulse t ra in was mod-
ulated with a sine wave of variable frequency and amplitude so a s to give a
simulation of the vibrato. The synthesized vowels were analyzed in the
same way a s the sung vowels. The formant frequency values employed in
the matchings a r e l isted in Table 111-A-2 above.
In matching the vowcls the higher pole correction was adjusted to 3.5
kHz in a l l vowels in accordance with the findings reported in Sec. 3. 1.
The differences between the spectra of the sung and synthesized vowcls
were evaluated a t every harmonic up to . S kHz. Above this frequency the
differences were measured a t cvery .2 kHz interval in the wide-band spec-
t rograms.
1 2 3 kHz
U dB 10 - Ip o.*....:,
1 I I I I
6 4 -
0 1 2 3 kHz
Fig. 111-A-8. Mean source spec t rum envelopes normal ized with r e spec t to a slope of -12 d ~ / o c t a v e and so that the summed deviations f r o m this slope equal zero between 1.0 and 3.2 kHz. Each curve shows an average obtained f r o m seve ra l vowels sung in dif - fe ren t pitches. The pa rame te r i s the degree of vocal intensity. a: da rk voice; vowels: [u][a][i]; pitches: C C b: light voice; vowels: [u][i] ; pi tches Dg , E?;, $:.
STL-QPSR 4/1'370 32.
Also in this case we can observe systemat ical differences between the
curves only below 1 kHz, somewhat m o r e pronounced in the case of B l than
34. Thc lo\v notes tend to show stronger relative al-nplitudes for the lo \ -~er
par t ia ls than the higher notcs.
If the theory of speech production i s applied to Fig. 111-A-9a and b, an
explanation i s provided of the observations made in commenting on Fig.
111-A-2a and b. In the frequency region of interest here , B i vras observed
to r a i se the re la t ivc level of his "singing formant" in going f rom a l o v e r
pitch to a higher. This can be explained with reference to Fig. 111-A-9a in
the same way a s in the case of the dependence of the degree of loudness.
In the casc of B4 the relative level of the "singing formant" was observed
to decrease with pitcb, especially for [i]. On the other hand, the relative
amplitudes of the l o v ~ c r par t ia ls in the source spcctrum displayed a slight
tendency to decrease when pitch i s raised. Thus, Fig. 111-A-2b cannot be
explained by Fig. 111-A-9b in the same way a s in the casc of B 1.
However, B4 does not keep the forrzant frequency values independent of
pitch. As can be scen f rom Table 111-A-I the second, third, and fourth
formant frequencies a r c considerably lowered a s pitch i s increased in t i ] .
This formant lo\-~ering reduce s thc relative level of the I ' singing formant"
1-norz than the slight changes in the voice source spcctrum slope will r a i se
it. Thus it seems a s if the resul ts f rom Fig. 111-A-2b and 111-A-9b a r e not
incompatible.
In the pitch range under consideration the rclativc amplitudes of tl-ie
lows--r source spectrum part ia ls ~ - ~ c r c found to increase with decreasing
pitch. Let u s assume that this i s the casc also if we extrapolate in the
lower pitch range. Accordingly, the rclative level of the "singing formant"
would be vcry weak for very low notcs. Such an effect would be counter-
acted if the singer nasal izes these notes, a s he was in fact found to do:
Nasalization reduces the relative amplitrldc of the f i r s t formant owing to i t s
bandwidth effect. Moreover, it might be that the ex t ra spectrum
envelope peak near . 2 lcIlz gives the vcry low notes some of a desired qua-
l i ty ofttwarrr-th!'
Fig. 111-A-10 i l lustrates how the two types of voices differ a s regards
the racan source spc ctrum envelopes. Each curve r cp rc scnts the average
of 14 envelopes ([i] in p, mp, mf, f and [u] in mp, mf, and f ; both vowels
sung in two pitchc s , one medium, one high. )
0 1 2 3 kHz
Fig. 111-A-10. Mean source spectrum envelopes normalized with respect to a slope of -12 d ~ / o c t a v e and so that the summed deviations f rom this slope equal zero between 1.0 and 3 . 2 kHz. Each curve gives an average obtained f rom the vowel [u] sung in mp, mf, f and the vowel [ i ] sung in p, mp, mf, and f in a medium and high pitch r e l a t ive to the range of the singer. Solid line: dark voice. Dashed line: light voice.
STL-QPSR 4/1970 33.
Once again wc can state that thcrc a r c systematical differences between
the curves bclow I kHz only. The light voice B o a s weaker relative a m -
plitudes in the lowcr source spectrum part ia ls than the dark voice, a s could
be expected. Thc acoustic correlate of "dark" voice might possibly be not
only low formant frcqucncy values but a l so strong rclative amplitudes of the
lowcr source spectrum partials.
This difference between the two voiccs also explains the difference be-
tween them a s rcgards the relative levcl of the "singing formant": Thc
higher relativc lcvel of the "singing formant" of B4 i s duc to a lower relative
amplitude of thc lower source spcctrurn p a r t i a l s (cf. p. 31).
4. Source spectra of spolrcn vowcls
It was dcmonstratcd above that the main factors that give r i s e to the
"singing formant" a r e formant frequcncy shifts and intensity relations with-
in the source spcctrurn. As a sequel of this investigation we have under-
talren a study of these factors in speech, in par t icular the source spectrum.
The mater ia l consistcd of tllc vowcls [u], [o], [a] , [z] , [ e l , [ i ] , [Y],
[u], and [&I. Thcsc vowels wcrc spolren in a c a r r i c r phrase (de va rVrVrV
ja sa). The subjccts spoke a t two ciiffcrcnt loudness levels, one normal and
one high. Thc matcr ia l was recorded in thc same way a s the sung vowels.
Thc vowcl spectra were analyzed with thc 51-channel spectrograph and (20)* matched by mcans of a computer program . The computer program,
developed for matching purposes, cvorlrs with only four formants. This ap-
peared to be insufficient in the case of the two singers ' speech: The lack
of a matching fifth formant appeared in the obtained source spcctrurn en-
vclopc a s a lcvel r i s c starting around 2.4 kHz. Obviously, this r i s c does
not belong to the sourcc spcctrurn and thcreforc information of the source
spectrum was obtained only up to this frcquency limit. In the case of the
spoken vowels the deviations f r o m thc 12 dB/octave slope were normalized
so that the summed deviation values be came zcro between 1 and 2.4 kHz.
The curve s in Fig. III-A- I l a and b give thc averaged deviations f rom
the 12 dB/octavc slope found in the 9 vowcls. A comparison between the
curves pertaining to normal and loud speech reveals a lmost exactly the same
7~ Thc author is indeptcd to his colleagues J. Liljencrants and R. Carlson for thcir assis tancc in pcrforming this analysis.
to 2P kHz
b
dB
0 It0 2P kHz
I I I
Fig. 111-A- I I. Mean source spec t rum envelopes normalized with respec t to a slope of -12 d ~ / o c t a v e and s o that the summed deviations f r o m this slope equal zero be- tween 1.0 and 2 .4 kHz. The curves r e f e r to nine vowels (tu3Co]CaI[=l[elCi1ty1 t~lCB3) spoken a t no rma l (dashed line) and high (chain-dashed l ine) vocal intensity. a: dark voice; the fine solid l ine shows a mean spec t rum envelope normalized in the s a m e way and obtained f r o m sung vowels [a][i)[u] sung in p, mp, mf, and f in pitches C3 and C4. b: light v o i c e ; the fine solid l ine shows a mean spec t rum envelope normalized in the s a m e way and obtained f r o m vowels [u] sung in mp , mi, f and [ i] in p, mp, mf, f in pitches D3, H3, and F4.
I I I C
10
0
-10 I I I I I I
- B1 !
- - NORM.
- - LOUD
- - -
STL-QPSR 4/1970 35 .
fi l tcring in doubtful cascs , one n u s t admit that the f rcqucncy regions in thc
source spcctrum wherc formants occur a r c l c s s rcliablc than thosc without.
Thc fidelity liraits of thc sound reproducing system and thc dcpcndcncc
on the accuracy of formant frequency cstirnations both indicate that an aver-
age source spcctrurr: obtained f rom scvcral vo\-~cls niust bc i ~ o r e rcliablc
than an average spcctruL: obtaincd f rom onc single vo\vcl only. F o r this
reason \-?c havc dcalt only mean sourcc spcctra obtained f rom a t lcas t
two vowels in thc prcscnt investigation.
Let us no\-? corfiparc the observations on the sourcc spcctra wit11 other
invc stigator s ' observations. Unfortunatcly, sourcc spcctra of profc ssional
singers ' sung vowcls have not bccn published ca r l i c r . Ho~.;rever, some in-
vestigations of spoken vo\-~cls havc bccn riiade (12, 13, 2 1)
The main rcsul t of these investigations i s that, on the avcrage, thc
sourcc spcctrum in normal speech fal ls off a t a ra te of about 12 d ~ / o c t a v e .
F r o m the prcscnt rcsul ts we can conclude that the average sourcc spectrum
fall i s about 12 d ~ / o c t a v c , thus rather normal,
The sourcc spectrum in spo1:en vowcls h a , bccn she\-rn to f 2 l l off fzs tc r
in quiet than in normal and loud specch. s o k c long-tcrm spectra published
of continuous spccch support this In our r;.atcrial we have seen
examples of such variations in the sourcc spcctrum fall too. F o r B4 a
mean slope of about 16 d ~ / o c t a v c was found in his piano of [i] , and for B i
thc value of 7. 5 d ~ / o c t a v e was found in loud spccch. Thus, the resu l t s
presented above appear reasonable alsi: in this respect.
According to Fant the amplitude of thc f i r s t formant tends to r i s e m o r c
rapidly than that of the fundamental if the vocal intensity i s increased and
thc fundamental frequency i s lccpt IIis hypothetical cxplana-
tion of this i s that thc glottis ir;lpcdnncc i s mainly iizdilctivc a t lo\v vocal
effort (thc a i r flow i s laminar) and mainly resis t ive a t high voice level (the
a i r f lov~ i s turbulent). Thcsc findings a r c apparently very sirililzr to thosc
madc in this investigation.
In nearly a l l investigations on thc sourcc spectrum charactcr is t ics pub-
lished ea r l i e r a zcro near . O lcHz has bccn mentioned. In our source spec-
t r a such a zcro i s found only occasionnlly. Howcvcr, it must be kept in
mind that for the rcasons just mcntioncd our source spectra rcprescnt
averages f rom scvcral vowels. If the zcro occurs a t different frequencies
STL-QPSR 4/1970
zero close to the fifth formant frequency. The frequency of this zero i s
strongly dependent on the size of the sinus pir i formis . A t ransfer function
of a vowel embellished by a "singing formant" can be accurately simulated
on a terminal analogue provided that the frequency of the zero i s taken into
account in adjusting the higher pole correction.
Considerable variations of the relative level of the "singing formant"
occur in singing. These variations can be ascr ibed to changes in the voice
sourcc spectrum envelope: The relative amplitudes of the lower source
spectrum part ia ls (below 1 k ~ z ) vary with vocal intensity and pitch. As
regards the vocal intensity dependence these sourcc variations give r i s e to
SPL variations of the "singing formant" that a r e between two and three t imes
2s large a s the variations of the overall spectrum SPL. As regards the pitch
dependence the source variations a r e le s s pronounced and occasionally
counteracted by formant frequency changes. Possibly, nasalization of low-
pitched vowels (FO < 100 Hz) i s resor ted to with the purpose of guaranteeing
a sufficiently high relative amplitude of the "singing formant".
The mechanism underlying these alterations needs to be fur ther studied
in future r e search. It most probably involves glottal impedance, subglottal
p r e s sure , and muscular conditions in the larynx.
There scsm to be no considerable differences between the source spec-
t rum employed by the s ingers in normal speech and singing. Moreover,
mos t propert ies of the source spectrum envelope have been recognized in
ea r l i e r investigations dealing with source spectra of untrained voices.
As regards voice quality the only difference between a ve ry dark and a
light voice was found in the relative an~pl i tudes of t h e lower source spectrum
par t ia l s that were found to be l a rge r in the case of the dark voice.
The intensity variation seems to be accompanied by variation of the re la -
tive level of the "singing formant" rather than of the overall spectrum SPL.
The acoustical correlate of a loudness increase i s thus m o r e an increase in
the spec t rur . balance than in the SPL of the total spectrum.
Acknowledgments
Thc author i s indepted to his colleague B. Lindblom for valuable sugges-
tions in reading the manuscript. The work was supported by the Tri-Cen-
tennial Fund of the Banl: of Sweden grant no. 67/48.
refs. on next page
STL-QPSR 4/1970 38.
References
(1) Bartholomew, 7'7. T. : !'A physical definition of ' good voice quality' in the male voicet' , J.Acoust.Soc.Am. - 6, p. 27 (1934).
(2) F r y , D. B. and Man&n, L. : "Basis for the acoustical study of singing", J. Acoust. Soc. Am. - 29, pp. 690-692 (1957).
( 3 ) McGinnis, C. S. , Elnick, M. , and Kraichrnan, M. : "Study of the vowel formants of well-known operatic singers", J . Acoust. Soc. Am. - 23, pp. 440-446 (1951).
(4) Rzhevkin, S. N. : "Certain resu l t s of the analysis of a s inge r ' s voice", Sov. Phys.Acoust. - 2, pp. 2 15-220 (1956).
(5) Vennard, T:?. : Singing, the Mechanism and the Technique (Fischer , Inc. , New York 1967).
( 6 ) Sundberg, J. : "Formant s t ructure and articulation of spoken and sung vowels", Fol.Phon. - 22, p. 28 (1970).
(7) Fant , G . , Fintoft, K . , Liljencrants, J . , Lindblom, B., and arto on^, J. : "Formant-amplitude measurements", J . Acoust. Soc. Am. - 35, p. 1753 (1963).
(8) Rubin, H. J . , LeCover, I A . , and Vennard, YT. : t l V ~ ~ a l intensity, sub- glottic p res su re and a i r flow relationships in s ingers t t , Fol. Phon. 19, p. 393 (1967). -
(9) Fant, G. : Acoustic Theory of Speech Production ( i ~ o u t o n & Co., The Hague 1960).
(10) Lindqvist, J. : "The voice source studied by means of inverse filtering", STL-QPSR l/f970, p. 3.
(11) Cf. Ladefoged, P.: Three Areas of Experimental Phonetics (Oxford University P r e s s , London 1967), p. 35 sqq.
(12) Flanagan, J. L. : "Some propert ies of the glottal sound source t t , J. Speech & Hearing Res. - 1, p. 99 (1958).
(13) M&rtony, J . : "Studies of the voice sourcet1, STL-QPSR 1/1965, p. 4. I t
(14) Flach, id. : "Uber die unterschiedliche Grosse d c r Miorganischen Ventrikel bei Sangernl', Fol. Phon. 16, p. 67 (1964). -
(15) Luchsinger, R. and Arnold, G. E. : Lehrbuch de r Stimm- und Sprach- Heillrunde (springer Verlag, Yienna 1959).
(16) Fransson, F. : "Thc STL-Ionophone sound source", STL-QPSR 2/1965, p. 27.
(17) See ref . (9), p. 105.
(18) ivlermelstein, P. : "On the pir i form recesses and their acoustical ef- fects' ' , FoLPhon. 19, p. 361 (1967); cf. a lso Flach, 1.I. and - Schwickardi, 13. : !'Die Recessus pir i formes unter phoniatrischer Sicht, ibid. 18 , p. 153 (1966). -
(19) Fuj imura, 0. and Lindqvist, J . : '3wccp-tone measurements of the vocal t rac t characteristics", to bc published in J. Acoust. Soc. Am.
(20) Liljencrants, J . : "Arsrnpport 1969", Institutionen for Taloverforing, KTH, p. 7.
(2 1) C a r r , P. B. and Tril l , C. : "Long-term larynx-excitation spectl'a", J.Acoust. Soc.Am. - 35 , p. 2033 (1964).
(22) Fant, G. : ttAcoustic analysis and synthesis of speech with applicztions to Swedish", Er icsson Technics No. 1 (1959). Long-term spectra of trained voices exhibit s i n i l z r t rends (id. Blorribe r g and I<. E l enius, personal corrxxunication).
(23 ) See ref. (9), p. 270 f.
(24) Isshiki, N. : llRcgul;rtory mechanism of voice intensity variation", J . Speech & Hearing Res. - 7, p. 17 (1964).
(25) Isshilci, N. : "Vocal intensity and a i r flow rate", Fol. Phon. -- 17, p. 92 (1965).
(26) Rubin, H. J . : "Experimental studies on vocal pitch and intensity in phonation", Thc Laryngoscopc 2_3_; p. 973 (1963).