The level of the 'singing formant' and the source spectra of professional bass singers · STL-QPSR...

Dept. for Speech, Music and Hearing

Quarterly Progress andStatus Report

The level of the ’singingformant’ and the source

spectra of professional basssingers

Sundberg, J.

journal: STL-QPSRvolume: 11number: 4year: 1970pages: 021-039

http://www.speech.kth.se/qpsr

http://www.speech.kth.se

http://www.speech.kth.se/qpsr

STL-CPSR 4/1970

111. ivlUSICAL ACOUSTICS

A. THE LEVEL O F THE "SINGING FORI.JIANTtt AND THE SOURCE SPECTRA OF PROFESSIONAL BASS SINGERS

J. Sundberg

1. Introduction

The "singing formant" i s an abnormally high spectrum envelope peak

located in the frequency region near 3 kHz. It i s frequently found in a r t i s t ic

singing in V e s t e r n culture, and has been observed by severa l authors ( 1 - 5).

On the other hand, the relative amplitude of this spectral peak has not a s yet

been systematically investigated.

In a previous investigation by the present author i t was shown that the

"singing formantt t can be interpreted a s a formant cluster consisting of the

third, fourth, and fifth formants(6). In this way the presence of the spectrum

envelope peak i s explicable in t e r m s of traditional acoustic theory of speech

production. If this theory i s in fact valid in singing, a l l level variations of

the I ' singing formant" should be predictable f rom the source spectrum fall

and the formant frequencies. The main purpose of the present investigation

i s to find out whether this i s the case o r not. An additional purpose i s to

contribute to the picture showing how professional s ingers of differing voice

qualities produce sung and spolcen vowels, especially a s regards the level

of the "singing formant" and the source spectrum.

Initially, we shall determine how the relative and absolute levels of the

"singing formant" vary when pitch and degree of loudness change ( ~ e c . 2).

Secondly, source spectra will be analyzed using an analysis-by-synthesis

procedure (Sec. 3 ) . The applicability of this method in the case of sung vow-

e l s i s dealt wit11 in a special section (Sec. 3. 1). By means of experiments

with an acoustical model of the vocal t r ac t i t i s shown how transfer functions

characterized by the presence of a "singiizg formant" can be gcncrated.

Moreover, it is proved that such t ransfer functions can be accurately sim-

ulated on a terminal analogue. Thirdly, the source spec t rz obtained f rom

sung vowels will be compared with source spectra employed in normal and

loud speech ( ~ e c . 4). Finally, the resu l t s will be discussed (Scc. 5) and

summarized ( ~ e c . 6).

STL-QPSR 4/1970

2 , Sung vowel spectra

2. i Material

Two professional bass s ingers , identical with subjects B 1 and B4 in the

previous paper , served a s subjects(6). B i has a ve ry dark voicc ranging

f r o m C2 to F4, and B4 has a light voicc ranging f rom F2 to G4.

Both voices were recorded in an anechoic room with a calibrated Briiel &

Kjzr 1" microphone about 17 cm in front of the mouth. The tape recorder

was a tctro-channel Ampex. On one channel a direct r e c o r d k g was made and

on the other an FM recording.

The two subjects sang three sustained vowels ([u], [i], [a]) in four

degrees of loudnc s s (piano, /p/, mezzopiano, /mp/, mezzoforte, /mf/,

forte, /f/) and in four pitches approximately equally spaced within their pitch

ranges. The vowels were sustained for a t leas t two seconds.

2 , 2 Analysis

A tape loop was made of each vowel sound and a spectrum analysis was

performed by a Rodhe & Schwarz FNA spectrograph. Two bandwidths were

used: 200 Hz and 10 Hz. The wide-band analysis shows the relative levels

of the formants. The narrow-band analysis informs about the level relations

between neighboring partials. Idorcover, it depicts the modulation amplitude

of the fundamental due to the vibrato, since the n:th par t ia l does not s h c ~ ~ up

a s a line but a s a wide band of maximum intensity, the width of which equals 2

n t imes the amplitude of the frequency modulation of the fundamental.

The overal l sound p r e s s u r e levels were measured by readings of the

tape output RMS voltage. The level of the f i r s t formant was evaluated a s the

output voltage f rom an 80 d ~ / o c t a v e sloping LP f i l ter adjusted to . 5 kHz for

[u] and [ i ] and to .75 kHz for [a].

2.3 Results

2.3. 1 Overall levels

The overall sound p r e s s u r e level of a vowel sound i s mainly determined

by the amplitude of the f i r s t formant since the amplitudes of the higher for -

mants a r e generally much weaker.

STL-QPSR 4/1970 2 3.

The overal l amplitude depends on sevcra l factors( ' l) . One i s the funda-

mental frequency F O : An octave r i s e in F O will cause the amplitude to r i s e

3 dB, everything e lse kept constant. Another factor is the combination of

formant frequencies. According to theory the sound p res su re level of an [ a ]

will be about 6 dB higher than those of an [u] and an [i], everything e lse kept

constant. A third factor i s the amplitude of the exciting source, depending

itself upon the combination of the muscular conditions in the larynx and the

subglottic pressure .

2 The RMS sound p r e s s u r e level - re . .0002 dyne/cm (SPL) of a l l vowels

a r e given a s a function of pitch in Fig. In-A- I. Each point in the graph r e -

present the value corresponding to one degree of loudness. F o r each vowel

of a given pitch the lowest levels re fer to the weakest degrec of vocal inten-

sity and the highest to the strongest, i. e. the overall levcl r i s e s monotonical-

ly a s loudness i s increased, a s might be expected.

Fig. 111-A-I shows that the S P L i s pitch-dependent a l so in singing: the

higher the pitch, the higher the SPL. However, thc ra te of increase i s a s

rapid a s about 9 d ~ / o c t a v c which is substantially 1;lore than predicted by

theory. The discrepancy, 6 d ~ / o c t a v e , i s much too la rge to be explained by

formant frequency changes (see Sec. 3 below). This indicates that the s ingers

a l te r the source conditions when going f rom a low note to a higher. This (8) assumption i s supported by findings made by other investigators .

At a given pitch the dynamic range between piano and for te i s on the

average 10 dB in the ovcrall SPL.

Comparing the SPI; of the vowels we see. that the [a] i s sung with higher

valucs than the [u] and [i]. However, this does not hold f o r the lowest note.

Moreover we observe that the [u] tends to be weaker in SPL than the [i].

These findings seem to indicate, that the s ingers use the same source strength

for the different vowels on the same pitch, the lowest pitch possibly excepted.

2 .3 .2 Relative level of the "singing formant"

Little i s known about the levcl of the "singing formant". However, it can

be expected to depend on two factors . One i s the density of the third, fourth,

and fifth formants , i.c. how closc they a r c in frequency. The other factor i s

the amount of sound energy generated by thc voice source in this frequency

region. In soft speech the source spectrum envelope i s known to fall off more

I I I I I - - -

- -

- --= * - i-r- -. ',o -

- - 0 - -. 0

-

- I I"" * I I I 1

Q, cn

s 0

s 0

(3 0 0

8 0

STL-QPSR 4/1970 24.

rapidly than in loud speech (9' lo). Consequently, the level difference between

the f i r s t formant and the "singing formant" might be expected to vary with co-

cal intensity.

This level difference, hereaf ter r e fe r red to a s the relative level of the

"singing formant", was measured f rom each wide-band spectrogram. The

values a r e given in Fig. 111-A-2a and b. 'S:'ith very few exceptions the re la -

tive level of the "singing formant" was observed to r i s e with vocal intensity,

so that the lowest values for each pitch and vowel re fer to piano and the high-

e s t to forte.

The range of variation within each pitch and vowel i s considerable especial-

ly for the light voice B4. To the f i r s t approximation we can say that the re la -

tive level of the "singing forrnant" i s independent of pitch. This seems rather

likely in view of the fact that singing pedagogy a ims a t eliminating audible

transit ions between rcgis te rs .

However, a detailed study of Fig. 111-A-2a and b reveals weak trends.

In the case of the dark voice B i the relative level of the "singing formant"

tends to r i s e slightly with pitch. The trend of the light voice i s the opposite,

a t l eas t in [i].

The grea t spread of the values in Fig. 111-A-2a and b i s to a la rge extent

dependent on the considerable influence of the degree of loudness. This in-

fluence i s demonstrated by Fig. 111-A-3a and b, showing the mean rclative

level of the "singing formant" a s a function of the degree of loudness. In this

figure, each point represents the mean of 12 values ( 3 vowels, 4 pitches).

The two graphs have some features in common. As mentioned above the

relative level of the "singing formant" i s ra i sed a s the degrec of vocal inten- (11) sity becorrles stronger. The ra te of increase can be regarded a s logarithmic .

Thus, a s f a r a s the re la t ive level of the "singing formant" i s concerned the

difference between the degree s of loudne s s gradually decrease a s loudness

grows stronger.

Comparing the two graphs we find scvcra l differences. The relative level

of the "singing formant" i s on the average higher for the light voice than for

the dark one, i. e. B i has a weaker "singing formant". The growth of the

rclative level of the "singing formant" in going f rom piano to for te i s much

l a rge r for the light voice. Thus, on the average, B i r a i s e s the relative level

by 12 d 3 (from -28 dB to - 16 dB) whereas B4 r a i s e s it by 18 dB (from -24 dB

in piano to -6 dB in forte).

PIT

Fig. 111-A-2. Level differences between the "singing formant" and the first formant in spectra of vowe'ls (given in IPA symbols) sung in four degrees of vocal intensity (p, mp, m f , f ) . a: dark voice b: light voice

C2 G2 C3 G3 C4 G4 C2 G2 C3 G3 C4 G4

PITCH

c d

P mP m f f P mp rnf f

VOCAL INTENSITY

Fig. 111-A-4. Mean sound p r e s s u r e levels re . .0002 dyne/cm 2

measu red in front of the mouth in 3 vowels ([u] CaJCi]) sung in 4 degrees of vocal intensity (p, mp , rrlf , f ) and 4 pitches. The curves show the sound p r e s s u r e level of the overa l l spec t rum (solid l ines) , the f i r s t fo rmant (dashed l ines ) , and the "singing formant" (chain-dashed l ines). The values a r e given a s function of pitch (a and b) and degree of vocal intensity ( c and d). a and c show values of a da rk troice, b and d give those of a light voice.

TABLE 111-A- I. F o r m a n t frequency values u sed in spec t rum matching

A. 1. Subject B l

P i t ch C3

(FOml30Hz) P

2. P i t c h C4

( ~ 0 ~ 2 6 0 ~ ~ ) P

mP

mf

B. 1. Subject B4

P i t c h D ( F O ~ I ~ & Z )

2. P i t ch H ( ~ 0 ~ 2 4 7 % ~ )

b ------ -- -------- v o w e l [a]

F 1 F 2 F 3 ~ 4 ' F 5

---- - -- - _ _ _ _

v o w e l [i]

F 1 F 2 F 3 F4 F5

" - ------ - - - - - - - -

v o x v e l [u] -7 1 F 1 F 2 F 3 F 4 F 5 Hz Hz kHz kHz kHz 1 287 650 2.31 2.69 2.94 i

I 287 686 2.53 2.69 2.94

290 722 2.55 2.63 2.99 1 300 686 2.59 2.66 2.97 1

I

356 738 2.30 2.60 2.97

356 738 2.39 2.67 2.93

356 738 2.63 2.87 3.01 ,

356 770 2.48 2.69 2.82 / I +

I

374 702 2.56 2.73 3.44 I i

390 692 2.66 2.78 3.48 1 372 724 2.64 2.72 3.44 1

I

355 690 2.55 2.80 3.28

355 699 2.54 2.83 3.23

379 703 2.74 2.89 3.22

354 846 2.55 2.79 3.36

354 846 2.66 2.83 3.41

354 846 2.66 2.90 3.41 .

Hz Hz kHz kHz kHz , Hz kHz kHz kHz kHz

465 860 2.55 2.73 3.61 I 295 1.71 2. 10 2.83 2.97

467 854 2.59 2.70 3.65 1 295 1.71 2.21 2.92 3.05 I

485 875 2.57 2.73 3.65 ' 270 1.69 2.29 2.86 2.98

538 910 2.60 2.80 3.50 ' 300 1.66 2.22 2.83 2.97

I 1 5 2 5 932 2.57 2.80 3 .61 / 300 1.64 2.25 2.87 3.05

552 957 2.60 2.81 3.50 j 300 1.64 2.29 2.87 3.05

550 940 2.48 2.73 3. 10 1 300 1.66 2.34 2.84 3.07

576 1000 2.60 2.79 3. 16 300 1.62 2.27 2.84 3.00

- _______ ____---_____-I- - 226 2.19 2.85 3.09 3.39 P

mP 229 2.05 2.74 3.09 3.47

mf 218 2.05 2.65 3. 10 3. 54

f . 229 1.90 2.67 2.79 3.44

1

P I I 307 1.86 2.64 3.04 3.52

m P ; 310 1.87 2.57 3.03 3.39

m f : 299 1.89 2.58 2.94 3.39

f 262 I. 83 2.56 2.89 3.39

P / 315 1.87 2.57 3.14 3.50

*P 315 1.87 2.47 3, 14 3.40

mf

f --

315 1.87 2.47 3.14 3.45

315 1.87 2.47 3.14 3.45

In speech, the source spectrum i s known to fall off by approximately

12 d ~ / o c t a v e ( 9 9 1 3 ) . At weak vocal effort the spectrum slopes m o r e rapid-

ly owing to the smoother movements of the vocal cords. A variation of the

source spectrum slope if experimentally e stablished would provide a reason-

able explanation of the variations of the relative level of the "singing formant".

Therefore, an investigation of the source spectrum slope was undertaken.

A traditional way of obtaining data on the source spectrum i s the analysis - by- synthesis procedure. This technique involves matching of vowel spectra

with the aid of a formant synthesizer. Transfer functions corresponding to

spoken vowels can be accurately simulated on such a terminal analogue.

Hot- ever, it i s doubtful whether this can be done also in the case of sung

vowels. The question i s whether the formant cluster i s made up of three

t 'normalt ' formants. An affirmative answer does not seem self -evident

since five formants located below 4 o r even 3 kHz i s quite abnormal in

speech. This question i s a crucial one. Yhen an analysis-by- synthe s i s

method i s employed, the obtained source spectrum i s entirely dependent on

the cor rec tness of the t ransfer function sirnulation. A se t of experiments

were undertaken in o rde r to study this problem.

3. 1 The methodological problem

The "singing formant" exhibits small intrasubject variations a s regards

i ts frequency location. This supports the hypothesis that those p a r t s of the

vocal t rac t , that a r e least involved in ar t iculatory movements, play an im-

portant role in the generation of the "singing formant". The sinus ivfor-

gagni and the sinus pir i formis a r e situated in such regions. Moreover,

they a r e known to be l a rge r in singing than in speech ( 6 y 14' 15). A widening

of the sinus Morgagni has becn shown to lowcr the fourth formant frequency

considerably ( 9 y 6). The influence on the t ransfer function of the sinus p i r i -

formis has not been examined in great dctail, and there i s no reliable theory

available. Therefore, i t was nece s s a r y to study i ts influence empirically.

This was done in the following way. An acoustical model of the vocal

t r ac t was constructed. The design of the model i s i l lustrated in Fig. 111-A-5.

The larynx tube was simulated by a cylindrical tube of 2 . 5 crn length and

1.2 cm diameter. By means of clay this tube was attached inside a l a r g e r

tube of 19 cm length and 3. 1 cm diameter. The length axis of the two tubes 0 formed an angle of 17 . At the bottom of the smal le r tube a constant volume

SECTION A-A

Fig. 111-A-5. Sections thfough the acoustical model of the vocal tract described in the text.

STL-QPSR 4/1970 28.

velocity sound source, the STL Ionophone, was adapted allowing accuratc ( 16) and direct mcasurcments of the t ransfcr function . A variation of the

sinus pir i formis volume was simulated by filling the bottom portion of the

l a rge r tube with water. In this way a model was obtained that simulated the

acoustical situation a t the glottal end of the vocal t r ac t in a rather rea l i s t ic

manne r.

Two typical t ransfcr functions obtaincd f rom the model a r e given in

Fig. 111-A-6a and b. The solid curvcs re fer to the model with a simulated

sinus pir i formis and the dashed curves to conditions with the "sinus p i r i -

formis t t filled with water. Fig. 111-A- 6a shows t ransfer functions obtained

when the l a rge r tube was unperturbed and Fig. 111-A-6b shows t ransfer func-

tions frorn the model when the l a r g e r tube was perturbed to give an a r e a

function s imilar to that of an [ i].

The volume of the simulated sinus pir i formis i s seen to play a very im-

portant role with regard to the formant situation around 3 kHz. Its effect is

to lower al l formant frequencies but especially the fifth one. This confirms

the findings of Fant obtained f rom electr ical line analogue studies(17). On

the other hand i t contradicts the recent statements of Mermelstein based on (18) theoretical considerations .

Experiments showed that a zero in the t ransfer function occurs close to

the frequency of the fifth formant. It may appear below this frequency if

the sinus pir i formis i s very large. 7,7hen the sinus pir i formis has a size

suitable for giving a cluster of the fourth and fifth formants the zero appeared

a s a rule around 4.5 kHz.

It i s remarkable that the amplitudes of the fourth and fifth formants do

not r i s e m o r e when the frequency distance between thcm i s reduced to such

a considerable cxtcnt. There seem. to be two reasons for this. Onc i s that

the second and third formant frequencies a r e lowered. The other i s thc

zero following the fifth formant. In matching the t ransfer functions obtained

f rom the model it was found that the amplitude reducing effect of this zero

can be simulated by adjusting the frcqucncy of the higher pole correction to

about . 5 kHz above the frequency of the fifth formant.

Fig. 111-A-7a and b compares two t ransfer functions obtained f rom the

model (solid curvcs) with the corresponding matches obtained f rom the t e r -

minal analogue (dashed curves). The difference between the solid and

3 kHz 4

0 1 2 3 kHz 4

Fig. 111-A-6. Transfer functions obtained by ionophone excitation of the acoustical model of the vocal t r ac t shown in Fig. 111-A-5. Solid contours were obtained when the "sinus pir i formis" was par t ly filled with water , dashed contours when i t was completely filled with water. Upper graph (a) perta ins to a cylindric tube, lower graph (b) to a tube per turbed so a s to give an a r e a function s imi la r to that of an [i].

d~ a 0

-10

-20

- 30

-40

- 50

- 60

-70

-80

0 1 2 3 kHz 4

dB 0

- 10 - 20

-30

- 40

- 50 -60

-70

-80

2 3 kHz 4

Fig. 111-A-7. Two t ransfe r functions obtained f r o m a te rmina l analogue (dashed curves) and by ionophone excitation of the acoustical model of the vocal t r a c t shown in Fig. 111-A-5 (solid curves) .

STL-QPSR 4/1970 29.

dashed curves l i e s within about f 2 dB up to 3 . 6 kHz. The very narrow

bandwidth occurring in some formants cannot be matched by the analogue.

This lacks importance since bandwidths of such small magnitudes do not (19) appear in the t ransfer functions of human vocal t r ac t s .

The question no\-, a r i s e s whether the t ransfer functions obtained f rom the

model a r e equivalent to those produced in sung vowels by the singers. A

zero is frequently found in the spectra of sung vowels. As a rule, this zero

i s located near 3 kHz. Y e may assume that this zero a r i s e s f rom the sinus

pir i formis cavity. The experience f rom the model experiments support the

assumption that the singer i s able to manipulate the frequency of this zero

by adjusting the size of his sinus piriformis. Especially in the case of the

back vowels the s ingers tend to compensate the lowering of the third formant

frequency in the articulation(6). In this \-Jay they will obtain a higher level of

the "singing formant" than was produced by the model. Thus it s eems rea -

sonably safe to conclude that the s ingers generate the t ransfer functions of

sung vowels in approximately the same \-.ray a s our model. Consequently,

we may conclude that the t ransfer functions of sung vowels can be accurately

simulated on a terminal analogue provided that the zero above the fifth for - rnant frequency i s taken into account in adjusting the higher pole correction.

The experiments with the acoustical mcdel support the following con-

clusions:

(1) The sinus pir i formis lowers a l l formant frequencies r i ~ o ~ o r less .

( 2 ) The sinus pir i formis lowers the fifth formant frequency drastically and adds a zero to it. I ts size seems to determine the frequency location ~f this zero.

(3 ) The sinus pirifornlis in combination with the sinus Morgagni appear to play a crucial role in the generation of t ransfer functions including a "singing formant" cluster.

(4) The frequency of the zero predicted by the model can be taken into account by adjusting the frequency of the higher pole correction.

(5) It can thus be concluded that t ransfer functions exhibiting a "singing formant" cluster can be accurately siri~ulated on a terminal analogue up to about 3 .6 kHz.

3 . 2 Analysi s

3 . 2 . 1 IAaterial

The spectrograms of the sung vowels revealed that both subjects' lo\;.-

pitched notes (5'0 < 100 HZ) and the [a] of the light voice lTJere nasalized.

Ext ra spectrum envelope peaks near . 2 kHz and between 1 and 2 kHz, and

STL-QPSR 4/1970 30.

abnormally wide bandwidths were taken a s evidence for this. Fa i lures to

obtain normal glottograms f rom inversc fi l tering of these vowels supported

the same conclusion. Mainly owing to lack of detailed knowledge about the

t ransfer functions of nasalized vowcls, these vowels were excluded f rom

the mater ial .

The piano se r i e s for [u) sung by Bd was pronounced with a marked

leakage in the glottis especially a t low pitches. This feature can scarce1 y

be regarded a s a character is t ic of a r t i s t ic singing but ra ther a s typical of

an occasional vocal indisposition. Consequently his piano se r i e s for [u]

was excluded f rom the mater ial . The remaining se t s of vowcls whose source

spcctra were analyzed a r c l isted in Table 111-A-2.

Table 111-A-2. Matched vowcls

i 3 e ; r e e o.£ l o ~ d i . ; ~ ~ ~

Voice Pi tch piano m e zzopiano m e zzoforte forte

B i C3 C ~ l C a l C i l CulCalCil [ ~ l [ a l C i l Cul[al[ i l

C4 [ ~ l C a I [ : i l Cul[al[:il [u l [ a lC i l t u lCa l [ i l

3.2,. 2 Procedure

The sung vowel spectra were matched on the terminal analogue men-

tioned above. The synthetic source consisted of a pulse t ra in with a spec-

t rum fall of 12 dB pe r octave. The fundarnental of this pulse t ra in was mod-

ulated with a sine wave of variable frequency and amplitude so a s to give a

simulation of the vibrato. The synthesized vowels were analyzed in the

same way a s the sung vowels. The formant frequency values employed in

the matchings a r e l isted in Table 111-A-2 above.

In matching the vowcls the higher pole correction was adjusted to 3.5

kHz in a l l vowels in accordance with the findings reported in Sec. 3. 1.

The differences between the spectra of the sung and synthesized vowcls

were evaluated a t every harmonic up to . S kHz. Above this frequency the

differences were measured a t cvery .2 kHz interval in the wide-band spec-

t rograms.

1 2 3 kHz

U dB 10 - Ip o.*....:,

1 I I I I

6 4 -

0 1 2 3 kHz

Fig. 111-A-8. Mean source spec t rum envelopes normal ized with r e spec t to a slope of -12 d ~ / o c t a v e and so that the summed deviations f r o m this slope equal zero between 1.0 and 3.2 kHz. Each curve shows an average obtained f r o m seve ra l vowels sung in dif - fe ren t pitches. The pa rame te r i s the degree of vocal intensity. a: da rk voice; vowels: [u][a][i]; pitches: C C b: light voice; vowels: [u][i] ; pi tches Dg , E?;, $:.

STL-QPSR 4/1'370 32.

Also in this case we can observe systemat ical differences between the

curves only below 1 kHz, somewhat m o r e pronounced in the case of B l than

34. Thc lo\v notes tend to show stronger relative al-nplitudes for the lo \ -~er

par t ia ls than the higher notcs.

If the theory of speech production i s applied to Fig. 111-A-9a and b, an

explanation i s provided of the observations made in commenting on Fig.

111-A-2a and b. In the frequency region of interest here , B i vras observed

to r a i se the re la t ivc level of his "singing formant" in going f rom a l o v e r

pitch to a higher. This can be explained with reference to Fig. 111-A-9a in

the same way a s in the case of the dependence of the degree of loudness.

In the casc of B4 the relative level of the "singing formant" was observed

to decrease with pitcb, especially for [i]. On the other hand, the relative

amplitudes of the l o v ~ c r par t ia ls in the source spcctrum displayed a slight

tendency to decrease when pitch i s raised. Thus, Fig. 111-A-2b cannot be

explained by Fig. 111-A-9b in the same way a s in the casc of B 1.

However, B4 does not keep the forrzant frequency values independent of

pitch. As can be scen f rom Table 111-A-I the second, third, and fourth

formant frequencies a r c considerably lowered a s pitch i s increased in t i ] .

This formant lo\-~ering reduce s thc relative level of the I ' singing formant"

1-norz than the slight changes in the voice source spcctrum slope will r a i se

it. Thus it seems a s if the resul ts f rom Fig. 111-A-2b and 111-A-9b a r e not

incompatible.

In the pitch range under consideration the rclativc amplitudes of tl-ie

lows--r source spectrum part ia ls ~ - ~ c r c found to increase with decreasing

pitch. Let u s assume that this i s the casc also if we extrapolate in the

lower pitch range. Accordingly, the rclative level of the "singing formant"

would be vcry weak for very low notcs. Such an effect would be counter-

acted if the singer nasal izes these notes, a s he was in fact found to do:

Nasalization reduces the relative amplitrldc of the f i r s t formant owing to i t s

bandwidth effect. Moreover, it might be that the ex t ra spectrum

envelope peak near . 2 lcIlz gives the vcry low notes some of a desired qua-

l i ty ofttwarrr-th!'

Fig. 111-A-10 i l lustrates how the two types of voices differ a s regards

the racan source spc ctrum envelopes. Each curve r cp rc scnts the average

of 14 envelopes ([i] in p, mp, mf, f and [u] in mp, mf, and f ; both vowels

sung in two pitchc s , one medium, one high. )

0 1 2 3 kHz

Fig. 111-A-10. Mean source spectrum envelopes normalized with respect to a slope of -12 d ~ / o c t a v e and so that the summed deviations f rom this slope equal zero between 1.0 and 3 . 2 kHz. Each curve gives an average obtained f rom the vowel [u] sung in mp, mf, f and the vowel [ i ] sung in p, mp, mf, and f in a medium and high pitch r e l a t ive to the range of the singer. Solid line: dark voice. Dashed line: light voice.

STL-QPSR 4/1970 33.

Once again wc can state that thcrc a r c systematical differences between

the curves bclow I kHz only. The light voice B o a s weaker relative a m -

plitudes in the lowcr source spectrum part ia ls than the dark voice, a s could

be expected. Thc acoustic correlate of "dark" voice might possibly be not

only low formant frcqucncy values but a l so strong rclative amplitudes of the

lowcr source spectrum partials.

This difference between the two voiccs also explains the difference be-

tween them a s rcgards the relative levcl of the "singing formant": Thc

higher relativc lcvel of the "singing formant" of B4 i s duc to a lower relative

amplitude of thc lower source spcctrurn p a r t i a l s (cf. p. 31).

4. Source spectra of spolrcn vowcls

It was dcmonstratcd above that the main factors that give r i s e to the

"singing formant" a r e formant frequcncy shifts and intensity relations with-

in the source spcctrurn. As a sequel of this investigation we have under-

talren a study of these factors in speech, in par t icular the source spectrum.

The mater ia l consistcd of tllc vowcls [u], [o], [a] , [z] , [ e l , [ i ] , [Y],

[u], and [&I. Thcsc vowels wcrc spolren in a c a r r i c r phrase (de va rVrVrV

ja sa). The subjccts spoke a t two ciiffcrcnt loudness levels, one normal and

one high. Thc matcr ia l was recorded in thc same way a s the sung vowels.

Thc vowcl spectra were analyzed with thc 51-channel spectrograph and (20)* matched by mcans of a computer program . The computer program,

developed for matching purposes, cvorlrs with only four formants. This ap-

peared to be insufficient in the case of the two singers ' speech: The lack

of a matching fifth formant appeared in the obtained source spcctrurn en-

vclopc a s a lcvel r i s c starting around 2.4 kHz. Obviously, this r i s c does

not belong to the sourcc spcctrurn and thcreforc information of the source

spectrum was obtained only up to this frcquency limit. In the case of the

spoken vowels the deviations f r o m thc 12 dB/octave slope were normalized

so that the summed deviation values be came zcro between 1 and 2.4 kHz.

The curve s in Fig. III-A- I l a and b give thc averaged deviations f rom

the 12 dB/octavc slope found in the 9 vowcls. A comparison between the

curves pertaining to normal and loud speech reveals a lmost exactly the same

7~ Thc author is indeptcd to his colleagues J. Liljencrants and R. Carlson for thcir assis tancc in pcrforming this analysis.

to 2P kHz

b

dB

0 It0 2P kHz

I I I

Fig. 111-A- I I. Mean source spec t rum envelopes normalized with respec t to a slope of -12 d ~ / o c t a v e and s o that the summed deviations f r o m this slope equal zero between 1.0 and 2 .4 kHz. The curves r e f e r to nine vowels (tu3Co]CaI[=l[elCi1ty1 t~lCB3) spoken a t no rma l (dashed line) and high (chain-dashed l ine) vocal intensity. a: dark voice; the fine solid l ine shows a mean spec t rum envelope normalized in the s a m e way and obtained f r o m sung vowels [a][i)[u] sung in p, mp, mf, and f in pitches C3 and C4. b: light v o i c e ; the fine solid l ine shows a mean spec t rum envelope normalized in the s a m e way and obtained f r o m vowels [u] sung in mp , mi, f and [ i] in p, mp, mf, f in pitches D3, H3, and F4.

I I I C

10

0

-10 I I I I I I

- B1 !

- - NORM.

- - LOUD

- - -

STL-QPSR 4/1970 35 .

fi l tcring in doubtful cascs , one n u s t admit that the f rcqucncy regions in thc

source spcctrum wherc formants occur a r c l c s s rcliablc than thosc without.

Thc fidelity liraits of thc sound reproducing system and thc dcpcndcncc

on the accuracy of formant frequency cstirnations both indicate that an aver-

age source spcctrurr: obtained f rom scvcral vo\-~cls niust bc i ~ o r e rcliablc

than an average spcctruL: obtaincd f rom onc single vo\vcl only. F o r this

reason \-?c havc dcalt only mean sourcc spcctra obtained f rom a t lcas t

two vowels in thc prcscnt investigation.

Let us no\-? corfiparc the observations on the sourcc spcctra wit11 other

invc stigator s ' observations. Unfortunatcly, sourcc spcctra of profc ssional

singers ' sung vowcls have not bccn published ca r l i c r . Ho~.;rever, some in-

vestigations of spoken vo\-~cls havc bccn riiade (12, 13, 2 1)

The main rcsul t of these investigations i s that, on the avcrage, thc

sourcc spcctrum in normal speech fal ls off a t a ra te of about 12 d ~ / o c t a v e .

F r o m the prcscnt rcsul ts we can conclude that the average sourcc spectrum

fall i s about 12 d ~ / o c t a v c , thus rather normal,

The sourcc spectrum in spo1:en vowcls h a , bccn she\-rn to f 2 l l off fzs tc r

in quiet than in normal and loud specch. s o k c long-tcrm spectra published

of continuous spccch support this In our r;.atcrial we have seen

examples of such variations in the sourcc spcctrum fall too. F o r B4 a

mean slope of about 16 d ~ / o c t a v c was found in his piano of [i] , and for B i

thc value of 7. 5 d ~ / o c t a v e was found in loud spccch. Thus, the resu l t s

presented above appear reasonable alsi: in this respect.

According to Fant the amplitude of thc f i r s t formant tends to r i s e m o r c

rapidly than that of the fundamental if the vocal intensity i s increased and

thc fundamental frequency i s lccpt IIis hypothetical cxplana-

tion of this i s that thc glottis ir;lpcdnncc i s mainly iizdilctivc a t lo\v vocal

effort (thc a i r flow i s laminar) and mainly resis t ive a t high voice level (the

a i r f lov~ i s turbulent). Thcsc findings a r c apparently very sirililzr to thosc

madc in this investigation.

In nearly a l l investigations on thc sourcc spectrum charactcr is t ics pub-

lished ea r l i e r a zcro near . O lcHz has bccn mentioned. In our source spec-

t r a such a zcro i s found only occasionnlly. Howcvcr, it must be kept in

mind that for the rcasons just mcntioncd our source spectra rcprescnt

averages f rom scvcral vowels. If the zcro occurs a t different frequencies

STL-QPSR 4/1970

zero close to the fifth formant frequency. The frequency of this zero i s

strongly dependent on the size of the sinus pir i formis . A t ransfer function

of a vowel embellished by a "singing formant" can be accurately simulated

on a terminal analogue provided that the frequency of the zero i s taken into

account in adjusting the higher pole correction.

Considerable variations of the relative level of the "singing formant"

occur in singing. These variations can be ascr ibed to changes in the voice

sourcc spectrum envelope: The relative amplitudes of the lower source

spectrum part ia ls (below 1 k ~ z ) vary with vocal intensity and pitch. As

regards the vocal intensity dependence these sourcc variations give r i s e to

SPL variations of the "singing formant" that a r e between two and three t imes

2s large a s the variations of the overall spectrum SPL. As regards the pitch

dependence the source variations a r e le s s pronounced and occasionally

counteracted by formant frequency changes. Possibly, nasalization of low-

pitched vowels (FO < 100 Hz) i s resor ted to with the purpose of guaranteeing

a sufficiently high relative amplitude of the "singing formant".

The mechanism underlying these alterations needs to be fur ther studied

in future r e search. It most probably involves glottal impedance, subglottal

p r e s sure , and muscular conditions in the larynx.

There scsm to be no considerable differences between the source spec-

t rum employed by the s ingers in normal speech and singing. Moreover,

mos t propert ies of the source spectrum envelope have been recognized in

ea r l i e r investigations dealing with source spectra of untrained voices.

As regards voice quality the only difference between a ve ry dark and a

light voice was found in the relative an~pl i tudes of t h e lower source spectrum

par t ia l s that were found to be l a rge r in the case of the dark voice.

The intensity variation seems to be accompanied by variation of the re la -

tive level of the "singing formant" rather than of the overall spectrum SPL.

The acoustical correlate of a loudness increase i s thus m o r e an increase in

the spec t rur . balance than in the SPL of the total spectrum.

Acknowledgments

Thc author i s indepted to his colleague B. Lindblom for valuable sugges-

tions in reading the manuscript. The work was supported by the Tri-Cen-

tennial Fund of the Banl: of Sweden grant no. 67/48.

refs. on next page

STL-QPSR 4/1970 38.

References

(1) Bartholomew, 7'7. T. : !'A physical definition of ' good voice quality' in the male voicet' , J.Acoust.Soc.Am. - 6, p. 27 (1934).

(2) F r y , D. B. and Man&n, L. : "Basis for the acoustical study of singing", J. Acoust. Soc. Am. - 29, pp. 690-692 (1957).

( 3 ) McGinnis, C. S. , Elnick, M. , and Kraichrnan, M. : "Study of the vowel formants of well-known operatic singers", J . Acoust. Soc. Am. - 23, pp. 440-446 (1951).

(4) Rzhevkin, S. N. : "Certain resu l t s of the analysis of a s inge r ' s voice", Sov. Phys.Acoust. - 2, pp. 2 15-220 (1956).

(5) Vennard, T:?. : Singing, the Mechanism and the Technique (Fischer , Inc. , New York 1967).

( 6 ) Sundberg, J. : "Formant s t ructure and articulation of spoken and sung vowels", Fol.Phon. - 22, p. 28 (1970).

(7) Fant , G . , Fintoft, K . , Liljencrants, J . , Lindblom, B., and arto on^, J. : "Formant-amplitude measurements", J . Acoust. Soc. Am. - 35, p. 1753 (1963).

(8) Rubin, H. J . , LeCover, I A . , and Vennard, YT. : t l V ~ ~ a l intensity, subglottic p res su re and a i r flow relationships in s ingers t t , Fol. Phon. 19, p. 393 (1967). -

(9) Fant, G. : Acoustic Theory of Speech Production ( i ~ o u t o n & Co., The Hague 1960).

(10) Lindqvist, J. : "The voice source studied by means of inverse filtering", STL-QPSR l/f970, p. 3.

(11) Cf. Ladefoged, P.: Three Areas of Experimental Phonetics (Oxford University P r e s s , London 1967), p. 35 sqq.

(12) Flanagan, J. L. : "Some propert ies of the glottal sound source t t , J. Speech & Hearing Res. - 1, p. 99 (1958).

(13) M&rtony, J . : "Studies of the voice sourcet1, STL-QPSR 1/1965, p. 4. I t

(14) Flach, id. : "Uber die unterschiedliche Grosse d c r Miorganischen Ventrikel bei Sangernl', Fol. Phon. 16, p. 67 (1964). -

(15) Luchsinger, R. and Arnold, G. E. : Lehrbuch de r Stimm- und Sprach- Heillrunde (springer Verlag, Yienna 1959).

(16) Fransson, F. : "Thc STL-Ionophone sound source", STL-QPSR 2/1965, p. 27.

(17) See ref . (9), p. 105.

(18) ivlermelstein, P. : "On the pir i form recesses and their acoustical ef- fects' ' , FoLPhon. 19, p. 361 (1967); cf. a lso Flach, 1.I. and - Schwickardi, 13. : !'Die Recessus pir i formes unter phoniatrischer Sicht, ibid. 18 , p. 153 (1966). -

(19) Fuj imura, 0. and Lindqvist, J . : '3wccp-tone measurements of the vocal t rac t characteristics", to bc published in J. Acoust. Soc. Am.

(20) Liljencrants, J . : "Arsrnpport 1969", Institutionen for Taloverforing, KTH, p. 7.

(2 1) C a r r , P. B. and Tril l , C. : "Long-term larynx-excitation spectl'a", J.Acoust. Soc.Am. - 35 , p. 2033 (1964).

(22) Fant, G. : ttAcoustic analysis and synthesis of speech with applicztions to Swedish", Er icsson Technics No. 1 (1959). Long-term spectra of trained voices exhibit s i n i l z r t rends (id. Blorribe r g and I<. E l enius, personal corrxxunication).

(23 ) See ref. (9), p. 270 f.

(24) Isshiki, N. : llRcgul;rtory mechanism of voice intensity variation", J . Speech & Hearing Res. - 7, p. 17 (1964).

(25) Isshilci, N. : "Vocal intensity and a i r flow rate", Fol. Phon. -- 17, p. 92 (1965).

(26) Rubin, H. J . : "Experimental studies on vocal pitch and intensity in phonation", Thc Laryngoscopc 2_3_; p. 973 (1963).

The level of the 'singing formant' and the source spectra of professional bass singers · STL-QPSR...

Documents

Transcript of The level of the 'singing formant' and the source spectra of professional bass singers · STL-QPSR...