Three-dimensional vocal tract imaging and formant structure: Varying vocal register, pitch, and...

6
Three-dimensional vocal tract imaging and formant structure: Varying vocal register, pitch, and loudness Kenneth Tom Department of Speech Communication, California State University Fullerton, Fullerton, California 92831 Ingo R. Titze Department of Speech Pathology and Audiology, National Center for Voice and Speech, University of Iowa, Iowa City, Iowa 52242 Eric A. Hoffman Division of Physiologic Imaging, Department of Radiology, University of Iowa College of Medicine, Iowa City, Iowa 52242 Brad H. Story Department of Speech and Hearing Sciences, University of Arizona, Tucson, Arizona 85721 ~Received 26 July 1999; accepted for publication 17 October 2000! Although advances in techniques for image acquisition and analysis have facilitated the direct measurement of three-dimensional vocal tract air space shapes associated with specific speech phonemes, little information is available with regard to changes in three-dimensional ~3-D! vocal tract shape as a function of vocal register, pitch, and loudness. In this study, 3-D images of the vocal tract during falsetto and chest register phonations at various pitch and loudness conditions were obtained using electron beam computed tomography ~EBCT!. Detailed measurements and differences in vocal tract configuration and formant characteristics derived from the eight measured vocal tract shapes are reported. © 2001 Acoustical Society of America. @DOI: 10.1121/1.1332380# PACS numbers: 43.70.Aj, 43.70.Jt, 43.75.Rs @AL# I. INTRODUCTION In the adult male voice, the highest fundamental fre- quencies of the speaking or singing voice are usually pro- duced in falsetto register. Loud, high-pitched phonation in falsetto register can be thought of as one extreme of the range of capabilities of the vocal mechanism. Physiologi- cally, however, falsetto phonation can be produced through- out the upper third to two-thirds of the fundamental fre- quency range at various degrees of vocal intensity. For most voices, there is also some degree of overlap in fundamental frequencies that can be produced in either chest or falsetto register ~Colton and Hollien, 1972; Hollien, 1974, 1977; Titze, 1988, 1994; Welch et al., 1988!. Although falsetto phonation is an intrinsic part of human vocal production, and its use as a phonatory setting in speech ~Laver, 1980! and numerous styles of singing recognized ~Malm, 1967; Giles, 1982!, there are few studies examining systematic changes in three-dimensional vocal tract morphology and corresponding changes in vocal tract resonances in falsetto versus chest ~modal! register phonation. Detailed measurements of 3-D vocal tract dimensions in falsetto and chest register phona- tions would ~1! facilitate more natural computer synthesis/ simulation of phonation in falsetto mode, ~2! provide insights into the nature of articulatory changes in vocal tract shape as a function of register, pitch and loudness, and ~3! allow for the derivation of corresponding formant structure. Formant and vocal tract shape information for falsetto phonations in a singing voice at various pitch/loudness conditions versus normal speech phonation in chest register would also allow one to examine vocal tract changes for the presence of ar- ticulatory manipulations to increase either vocal intensity ~e.g., use of a singer’s formant! or consistency in vocal tim- bre ~vowel modification!. Three-dimensional imaging and measurement tech- niques have been used to characterize the vocal tract con- figurations of numerous phonetic structures and to calculate relevant formant information. Magnetic resonance imaging ~MRI! has been used to acquire 3-D images of vocal tract shapes for vowels and continuants ~Baer et al., 1991; Moore, 1992; Sulter et al., 1992!, sustained dark and light allo- phones of /l/ ~Narayanan et al., 1997!, and rhotics ~Alwan et al., 1997!, as well as a larger inventory of speaker-specific vowels and consonants ~Story et al., 1996!. High-resolution 3-D images of differences in vocal tract shape as a function of register, fundamental frequency, and vocal intensity have not been available. Most acoustically based studies of formant structure in adult speech production have excluded high-pitched phona- tion ~Hillenbrand et al., 1995; Huber et al., 1999; Lienard and Di Benedetto, 1999; Peterson and Barney, 1952!. This is due, at least in part, to practical and theoretical limitations of currently available acoustic analysis techniques. Because spectral harmonics are so widely spaced in phonations with high fundamental frequencies, accurate estimates of formant frequencies and bandwidths in high-pitched falsetto phona- tions are difficult, if not impossible, to obtain using standard acoustic techniques, e.g., linear predictive coding ~LPC! analysis ~Markel and Gray, 1976!. The objectives of the current study were threefold: ~1! to 742 742 J. Acoust. Soc. Am. 109 (2), February 2001 0001-4966/2001/109(2)/742/6/$18.00 © 2001 Acoustical Society of America Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 155.33.16.124 On: Wed, 26 Nov 2014 17:41:29

Transcript of Three-dimensional vocal tract imaging and formant structure: Varying vocal register, pitch, and...

Page 1: Three-dimensional vocal tract imaging and formant structure: Varying vocal register, pitch, and loudness

Redistr

Three-dimensional vocal tract imaging and formant structure:Varying vocal register, pitch, and loudness

Kenneth TomDepartment of Speech Communication, California State University Fullerton, Fullerton, California 92831

Ingo R. TitzeDepartment of Speech Pathology and Audiology, National Center for Voice and Speech, University of Iowa,Iowa City, Iowa 52242

Eric A. HoffmanDivision of Physiologic Imaging, Department of Radiology, University of Iowa College of Medicine,Iowa City, Iowa 52242

Brad H. StoryDepartment of Speech and Hearing Sciences, University of Arizona, Tucson, Arizona 85721

~Received 26 July 1999; accepted for publication 17 October 2000!

Although advances in techniques for image acquisition and analysis have facilitated the directmeasurement of three-dimensional vocal tract air space shapes associated with specific speechphonemes, little information is available with regard to changes in three-dimensional~3-D! vocaltract shape as a function of vocal register, pitch, and loudness. In this study, 3-D images of the vocaltract during falsetto and chest register phonations at various pitch and loudness conditions wereobtained using electron beam computed tomography~EBCT!. Detailed measurements anddifferences in vocal tract configuration and formant characteristics derived from the eight measuredvocal tract shapes are reported. ©2001 Acoustical Society of America.@DOI: 10.1121/1.1332380#

PACS numbers: 43.70.Aj, 43.70.Jt, 43.75.Rs@AL #

reroithgge-one;

n

sinh-Dn

is/

e

aninsulo

f ar-ity

ch-con-lateingact

-

ific

tionave

inna-

ofuseithant

na-rd

I. INTRODUCTION

In the adult male voice, the highest fundamental fquencies of the speaking or singing voice are usually pduced in falsetto register. Loud, high-pitched phonationfalsetto register can be thought of as one extreme ofrange of capabilities of the vocal mechanism. Physiolocally, however, falsetto phonation can be produced throuout the upper third to two-thirds of the fundamental frquency range at various degrees of vocal intensity. For mvoices, there is also some degree of overlap in fundamefrequencies that can be produced in either chest or falsregister ~Colton and Hollien, 1972; Hollien, 1974, 1977Titze, 1988, 1994; Welchet al., 1988!. Although falsettophonation is an intrinsic part of human vocal production, aits use as a phonatory setting in speech~Laver, 1980! andnumerous styles of singing recognized~Malm, 1967; Giles,1982!, there are few studies examining systematic changethree-dimensional vocal tract morphology and correspondchanges in vocal tract resonances in falsetto versus c~modal! register phonation. Detailed measurements of 3vocal tract dimensions in falsetto and chest register photions would ~1! facilitate more natural computer synthessimulation of phonation in falsetto mode,~2! provide insightsinto the nature of articulatory changes in vocal tract shapa function of register, pitch and loudness, and~3! allow forthe derivation of corresponding formant structure. Formand vocal tract shape information for falsetto phonationssinging voice at various pitch/loudness conditions vernormal speech phonation in chest register would also al

742 J. Acoust. Soc. Am. 109 (2), February 2001 0001-4966/2001/

ibution subject to ASA license or copyright; see http://acousticalsociety.org/

--

ne

i-h-

sttaltto

d

ing

est

a-

as

tasw

one to examine vocal tract changes for the presence oticulatory manipulations to increase either vocal intens~e.g., use of a singer’s formant! or consistency in vocal tim-bre ~vowel modification!.

Three-dimensional imaging and measurement teniques have been used to characterize the vocal tractfigurations of numerous phonetic structures and to calcurelevant formant information. Magnetic resonance imag~MRI! has been used to acquire 3-D images of vocal trshapes for vowels and continuants~Baeret al., 1991; Moore,1992; Sulteret al., 1992!, sustained dark and light allophones of /l/~Narayananet al., 1997!, and rhotics~Alwanet al., 1997!, as well as a larger inventory of speaker-specvowels and consonants~Story et al., 1996!. High-resolution3-D images of differences in vocal tract shape as a funcof register, fundamental frequency, and vocal intensity hnot been available.

Most acoustically based studies of formant structureadult speech production have excluded high-pitched photion ~Hillenbrand et al., 1995; Huberet al., 1999; Lienardand Di Benedetto, 1999; Peterson and Barney, 1952!. This isdue, at least in part, to practical and theoretical limitationscurrently available acoustic analysis techniques. Becaspectral harmonics are so widely spaced in phonations whigh fundamental frequencies, accurate estimates of formfrequencies and bandwidths in high-pitched falsetto photions are difficult, if not impossible, to obtain using standaacoustic techniques, e.g., linear predictive coding~LPC!analysis~Markel and Gray, 1976!.

The objectives of the current study were threefold:~1! to

742109(2)/742/6/$18.00 © 2001 Acoustical Society of America

content/terms. Download to IP: 155.33.16.124 On: Wed, 26 Nov 2014 17:41:29

Page 2: Three-dimensional vocal tract imaging and formant structure: Varying vocal register, pitch, and loudness

voot,ct

afowioon

acicecaua

sr

. Nxpine

reRca

arogna

sinagdeifasea

ima

rro

uir–oath

eoThiz

ren

a-

ml /er,

in

zcom-asity

hoas

fre-for

ten-in-s ofg assicthetory

oreralre-tal

jectm-edrepen-n-0 se-the-ol-ns,wel

Redistr

acquire high-resolution three-dimensional images of thecal tract during phonation at various pitch and loudness cditions in falsetto and chest register for a single subjectrained singer,~2! to obtain the corresponding vocal tralength and cross-sectional area functions, and~3! to specifyformant frequencies and bandwidths associated with ephonatory condition. As an alternative to LPC techniquesestimating formants, an analysis by synthesis approachused. Vocal tract area functions derived from high-resolut3-D vocal tract image sets were used to calculate corresping formant data~Titze et al., 1994; Tom, 1996!.

Three-dimensional imaging of the vocal tract can becomplished by obtaining a series of contiguous image slthrough the portion of the body encompassing the votract, segmenting the airway shape from its bordering tissand reconstructing it in three dimensions. Images can bequired with either x-ray computed tomography~x-ray CT! ormagnetic resonance imaging~MRI!. Each technique has itadvantages and disadvantages. In terms of reducing theof any adverse side effects, MRI has the clear advantagehazardous effects have been observed from short term esures to the magnetic fields currently used in MRI scannsystems. For imaging airways, however, MRI techniquhave a number of disadvantages~Baer et al., 1991; Moore,1992; Sulteret al., 1992!. Image resolution and accuracy alimited. Air-to-tissue boundaries can be distorted due to Martifacts, effectively blurring the edges of the vocal traslightly. Tissues that are low in hydrogen content, suchbony structures and teeth, are captured poorly and appebe the same gray scale density as air. Using MRI technolavailable at the time this study was performed, the scanactivation time required to scan an entire vocal tract wapproximately 4 to 5 min~Story et al., 1996!, depending onthe desired resolution and scanning parameters being uThe addition of pauses required for breathing when imagthe vocal tract during actual phonation increased total imacquisition time to 10 to 15 min per vocal tract shape. Unsuch circumstances subject fatigue and movement artbecome an important consideration. Because the prestudy included phonations at high effort conditions thcould not be sustained over the total image acquisition trequired by MRI techniques, the use of MRI was not fesible.

At the time these studies were performed, the scannechoice amongst the x-ray CT technologies was electbeam computed tomography~EBCT! because of the highspeed of image acquisition~100 ms per slice!. For imagingairways, electron beam computed tomography techniqyield images of higher resolution than MRI images. The atissue boundary is captured with greater accuracy and bstructures and teeth are clearly imaged. Using EBCT scners, a high-resolution volumetric study encompassingentire vocal tract can be scanned relatively quickly~12 to18 s!. This comparatively brief image acquisition timgreatly reduces the potential for subject fatigue and assated movement artifact, which can blur resultant images.chief disadvantage associated with EBCT is its use of ioning radiation, which limits the number of scans considesafe ~International Commission on Radiological Protectio

743 J. Acoust. Soc. Am., Vol. 109, No. 2, February 2001

ibution subject to ASA license or copyright; see http://acousticalsociety.org/

-n-a

chrasnd-

-sl

esc-

iskoo-gs

Itstoy

ers

ed.gerctnt

te-

ofn

es

nyn-e

ci-e-

d,

1977; National Council on Radiation Protection and Mesurements, 1987!.

II. IMAGE ACQUISITION AND ANALYSIS

A. Image acquisition protocol

Volumetric images of the vocal tract were scanned froa single male subject for sustained phonation of the voweÄ/under eight phonatory conditions, varying voice registpitch, and loudness levels. The conditions, summarizedTable I, included three sung falsetto register pitch levels~lowpitch, 262 Hz; medium pitch, 349 Hz; high pitch, 466 H!and chest register speech phonation at a self-selectedfortable pitch level. Each of these four pitch levels wscanned at two intensity levels, moderately low intens~mezzo piano, mp! and very loud intensity~fortissimo, ff!.

For the purposes of this study, a trained singer wcould readily and consistently produce falsetto as wellchest register phonations throughout his fundamentalquency and intensity ranges was recruited. The subjectthis study was a 45-year-old adult male who has had exsive singing training in the Western classical tradition,cluding 12 years of vocal study as a baritone and 6 yearstudy as a countertenor. An active performer, he has suna countertenor for the past 11 years performing early muusing a falsetto-based singing technique to vocalize inalto range. The subject’s medical and recent health hiswere unremarkable and there was no history of speechhearing disorders. The subject is a native speaker of GenNorth American English and his speaking fundamental fquency was within normal limits. The subject’s fundamenfrequency range spanned from 69 Hz (C]2) to 392 Hz (G4)in chest register and from 147 Hz (D3) to 587 Hz (D5) infalsetto register.

For each phonatory condition to be scanned, the subwas positioned comfortably in a supine position on the iaging table. His lower neck was supported and stabilizwith a rolled towel. Head positioning was aligned befoeach set of scans such that the Frankfort plane was perdicular to the imaging table and the anatomic midline cetered. Each phonatory condition required approximately 2of actual scanner activation time. Total acquisition time dpended on how long the subject could repeatedly prolongtarget phoneme /Ä/ at a particular phonatory condition without significant movement of vocal tract structures. The flowing system was devised to time scanning interruptiowhich allowed the subject to rest and breathe between vo

TABLE I. Phonatory conditions for vocal tract imaging.

Vowel for all phonations5/Ä/Pitch/Condition Register Pitch (F0) Loudness

B-flat4 ff Falsetto B-flat4 ~466 Hz! very loud ~ff !B-flat4 mp Falsetto B-flat4 ~466 Hz! moderately soft~mp!F4 ff Falsetto F4 ~349 Hz! very loud ~ff !F4 mp Falsetto F4 ~349 Hz! moderately soft~mp!C4 ff Falsetto C4 ~262 Hz! very loud ~ff !C4 mp Falsetto C4 ~262 Hz! moderately soft~mp!Speech loud Chest D3 ~147 Hz! very loudSpeech comfortable Chest B-flat2 ~117 Hz! comfortable loudness

743Tom et al.: 3-D vocal tract imaging

content/terms. Download to IP: 155.33.16.124 On: Wed, 26 Nov 2014 17:41:29

Page 3: Three-dimensional vocal tract imaging and formant structure: Varying vocal register, pitch, and loudness

seouintee

reg

udocto

5dstey,caa

onroes

areto

i.a

slle

nmwa

omiq

nin

m-

ata

havth

anyelfs the3-Dss-

et, an

-

e to

t-is-

ng

thisingof

t fitu-nsd ata-nsft to

tedtheitch

edd,si-n,. Thed ini-t of

ualodel-mostdi-

Redistr

repetitions. Before each condition, the subject producederal trial utterances to gauge how many seconds he cproduce steady, consistent phonations without introducnoticeable movement artifact in the images. When this utance length was determined, the radiology technician timpauses such that he stopped before the end of a voweleration and reinitiated imaging as soon as the subject bethe next reiteration, as monitored over the intercom. Incling these pauses, the total time required to image the vtract for each condition ranged from approximately 6090 s.

B. EBCT scanning parameters

The EBCT images were acquired with an Imatron C-1scanner~Boyd and Lipton, 1983!. Each volume set consisteof 60 contiguous, parallel, axial slices. Slice thickness wamm. These scanned images encompassed the hard palaperiorly, the first tracheal ring inferiorly, the lips anteriorlthe posterior pharyngeal wall posteriorly, and the bucwalls to the left and right of vocal tract air space. Slice scaperture was 100 ms. The field of view~FOV! for each slicewas 21 cm and the image matrix was 5123512 pixels. Theresolution in the plane of imaging~axial! was 0.410 mm,which is near the theoretical limit of the scanner’s resoluti

The accuracy of the image acquisition and analysis pcedures using the Imatron C-150 scanner has been asswith a tubular phantom of known dimensions~Story, 1995!.The phantom consisted of three connected sections offilled tubing placed in a closed water-filled plastic enclosuKnown and measured cross-sectional areas of the phandiffered by 1.8% to 2.0%.

C. Image analysis

Image analysis was accomplished in three stages,image segmentation, 3-D airway reconstruction, and airwmeasurement. These procedures were performed uUNIX-based image display and measurement software caVIDA™ ~volumetric image display and analysis!, which wasdeveloped by Hoffman and colleagues~Hoffman et al.,1992!. Further information regarding all VIDA modules cabe accessed athttp://everest.radiology.uiowa.edu. These iage analysis techniques, as applied to vocal tract airanalysis, have been described in detail by Story~1995! andStory et al. ~1996!.

The vocal tract was segmented, i.e., differentiated frsurrounding tissue, using a seeded region growing technwhereby all airway voxels~3-D pixels! were assigned aunique gray scale value~Hoffman et al., 1983; Udupa,1991!. Reconstruction of the vocal tract in three dimensiowas accomplished using a process called shape-basedpolation ~Raya and Udupa, 1990; Udupa, 1991! on the seg-mented image set, yielding a stack of slices with the savoxel dimension~0.410 mm! along all three axes. The reconstructed 3-D image data from the shape-based interpolvocal tract was the basis for subsequent cross-sectionalmeasurements. The edges of the interpolated airway swere also used to perform 3-D surface renderings of thecal tract. Graphically represented as a 3-D object with

744 J. Acoust. Soc. Am., Vol. 109, No. 2, February 2001

ibution subject to ASA license or copyright; see http://acousticalsociety.org/

v-ldgr-dit-an-al

0

3su-

ln

.-sed

ir-.m

e.,y

ingd

-y

ue

ster-

e

edreape

o-e

use of shading, surface renderings can be displayed atnumber of angles or magnification levels. The display itscannot be measured directly, but allows the user to assesquality of the segmentation procedure and to observeviews of the vocal tract’s outer shape. To measure crosectional areas from the shape-based interpolated data salgorithm designed to study the upper airway~Hoffman andGefter, 1990; Hoffmanet al., 1992! was used. Tube length~in this case, vocal tract length! was quantified using methods described by Storyet al. ~1996!. Formants were obtainedwith a wave-reflection vocal tract model~Kelly and Loch-baum, 1962; Liljencrants, 1985! using area functions fromthe 3-D image data as input, and calculating its responsan impulse excitation.

III. RESULTS AND DISCUSSION

A. Quantitative area functions

The ‘‘raw’’ area functions measured from the volumeric image data for the eight phonatory conditions were dcretized for use in speech simulation~Story et al., 1996;Tom, 1996!. The process of discretization involved choosithe discretized vocal tract length~even multiples of vocaltract sections 0.396 825 cm in length! that best fit the mea-sured vocal tract lengths, normalizing measured data tolength, fitting the data to a cubic spline curve, and samplthe cubic spline curve at equally spaced intervals0.396 825 cm. The discretized vocal tract length that besall eight phonatory conditions was 17.46 cm. Detailed nmerical area functions for the eight phonatory conditiobased on the discretized vocal tracts can be accessehttp://www.ncvs.org/rescol/articles/vocaltract.html. Mesured vocal tract lengths for the eight phonatory conditiocan also be accessed at this website. The change from soloud intensity~higher effort phonations! in the medium- andhigh-pitch falsetto phonations was consistently associawith an increase in measured vocal tract length, whileopposite pattern occurred for speech and low falsetto pconditions.

Numerical area functions for the piriform sinuses usin acoustic modeling are listed in Table II. The main trenwith regard to changes in the 3-D shape of the piriformnuses from soft to loud intensity within each pitch conditiowas an increase in both length and cross-sectional areasmeasured lengths of the piriform sinuses are presenteTable III. Some slight left–right asymmetries in piriform snus length occurred when the superior–inferior alignmen

TABLE II. Piriform sinus area functions in square centimeters, at eqintervals of 0.396 825 cm, expressed as a single branch for acoustic ming purposes. Section 1 represents the area function of the superior-portion of the piriform sinuses. Piriform sinus branch length for all contions was 1.59 cm~four sections at 0.396 825 cm/section!.

SectionNo.

B-flat4ff

B-flat4mp

F4

ffF4

mpC4

ffC4

mpSpeechloud

Speechcomfortable

1 2.39 1.34 2.16 1.82 2.05 2.17 2.58 2.262 2.13 1.13 2.07 1.60 2.24 1.94 2.41 2.213 1.88 1.02 2.19 1.50 2.28 1.36 2.60 1.954 1.26 0.83 1.83 1.04 1.74 0.11 1.29 0.83

744Tom et al.: 3-D vocal tract imaging

content/terms. Download to IP: 155.33.16.124 On: Wed, 26 Nov 2014 17:41:29

Page 4: Three-dimensional vocal tract imaging and formant structure: Varying vocal register, pitch, and loudness

th

oesgtth

onedv

ss

rocaththththorin

ughe

de-nti-oint

y

ee

ed

-nde-thettal

lity,

inse

inthe

inout

ifi-ft/

anousr toin-

itch

in

an-w-mee,

lot-ore aate

tounc-andterter

g-

ab

Redistr

the piriform sinuses was not completely perpendicular totransverse imaging plane of the EBCT scanner.

B. Comparing changes in area functions due tovariations in register, pitch, and loudness

Changes in 3-D vocal tract configuration as a functionchanges in vocal register, pitch, and loudness can be assby comparing differences in area functions along the lenof the vocal tract. The area functions associated withvocal tract shape for the /Ä/ prolongation in the comfortablespeech condition are compared to those of the other phtory conditions in Fig. 1. Beginning at the origin, measurunits on thex axis represent distance in centimeters abothe glottis. On they axis, measured units represent crosectional area in square centimeters.

A characteristic common to the vocal tract shapes acall eight phonatory conditions was a widening of the votract airway above the glottis that begins about 2 cm pastglottis, expands to its widest point at about 4 cm pastglottis, and begins to narrow again at about 5 cm pastglottis. For the most part, this is a consequence ofchanges in cross-sectional area that occur as the pirifsinuses converge with the main vocal tract tube. This findconcurs with Storyet al. ~1996!, who found that the locationof this widening was consistent across all vowels. They sgested that this location’s uniformity served to point out t

TABLE III. Piriform sinus lengths associated with variations in vocal reister, pitch, and loudness levels.

ConditionLeft piriformsinus~cm!

Right piriformsinus~cm!

B-flat4 ff 1.76 1.52B-flat4 mp 1.68 1.68F4 ff 1.76 1.76F4 mp 1.60 1.60C4 ff 1.88 1.88C4 mp 1.15 1.11Speech loud 1.48 1.48Speech comfortable 1.19 1.19

FIG. 1. Comparisons of the area functions for chest register comfortspeech~bold line! and ~a! chest register loud speech~narrow line!; ~b! fal-setto register, low pitch (C4 , 262 Hz!, moderately soft~dashed line! andvery loud ~narrow line!; ~c! falsetto register, medium pitch (F4 349 Hz!,moderately soft~dashed line! and very loud~narrow line!; and ~d! falsettoregister, high pitch (B-flat4 , 466 Hz!, moderately soft~dashed line! and veryloud ~narrow line!.

745 J. Acoust. Soc. Am., Vol. 109, No. 2, February 2001

ibution subject to ASA license or copyright; see http://acousticalsociety.org/

e

fsedhe

a-

e-

ssleeeemg

-

consistency of the image analysis procedures in terms offining the glottal termination. In the current study, the exteof this supraglottal widening varied with phonatory condtions. The absolute cross-sectional area at its widest pwas greater~in some cases, significantly so! than those foundby Storyet al. ~1996! or in previous research, to which thecompared their findings~Baeret al., 1991; Fant, 1960; Yangand Kasuya, 1994!. The widest cross-sectional area of thsupraglottal widening for the /Ä/ vowel in these studies werapproximately 1.1 cm2 ~Story et al., 1996!, 2.2 and 2.9 cm2

~Baeret al., 1991!, 4.1 cm2 ~Fant, 1960!, and 1.9 cm2 ~Yangand Kasuya, 1994!. In the current study this measure rangfrom approximately 3.8 cm2 for the loud speech condition inchest register to 6.2 cm2 for the falsetto register, highpitched, very loud condition. For the speech condition alow-pitch falsetto condition, the supraglottal widening rduced slightly in area as intensity increased. Formedium- and high-pitch falsetto conditions, the supraglowidening increased in dimension.

In Fig. 1~a!, a comparison of comfortable~bold line! andvery loud speech~narrow line! in chest register, the vocatract gesture accompanying the increase in vocal intenswas an overall increase in oral cavity volume that begapproximately 7 cm from the glottis, just anterior to thvowel constriction in the oropharynx. At its widest pointthe oral cavity, the cross-sectional area almost doubled invery loud speech condition~from about 6 to 10 cm2! and themouth opening increased significantly~from about 1 to 6cm2!. This occurred concurrently with a slight reductiondimension of the supraglottal widening that occurred ab3.5 cm past the glottis.

The dimensions of the oral cavity also increased signcantly in changing from comfortable speech to either soloud low-pitch falsetto phonations@Fig. 1~b!#. Within low-pitch falsetto, increased intensity was associated withincrease in the size of the oral cavity and a simultanedecrease in the size of the supraglottal widening, similathe pattern of changes in vocal tract configuration withcreased loudness for the speech~chest register! conditions.For falsetto phonations sung at the medium- and high-plevels @Figs. 1~c! and ~d!#, the oral cavity is again signifi-cantly larger than for comfortable speech. The changesvocal tract shape from moderately soft~dashed line! to veryloud intensity levels~narrow line! at medium and high pitchin falsetto, however, reversed the pattern found in the trsition from lower to higher intensity for the speech and lopitch falsetto conditions. Rather than increasing the voluof the oral cavity, the subject reduced the oral cavity volumwhile simultaneously increasing the volume of the supragtal widening in the lower pharynx. This anterior/posterishift in relative volumes or cross-sectional areas may bstrategy to balance the need for maintaining approximvowel quality ~relatively stableF1 andF2) while simulta-neously preserving vocal timbre~stableF3 andF4).

The greatest contrasts in vocal tract configuration dueregister, pitch, and loudness were observed in the area ftions for comfortable speech phonation in chest registerfor high-pitched very loud sung phonation in falsetto regis@Fig. 1~d!#. In comfortable speech phonation in chest regis

le

745Tom et al.: 3-D vocal tract imaging

content/terms. Download to IP: 155.33.16.124 On: Wed, 26 Nov 2014 17:41:29

Page 5: Three-dimensional vocal tract imaging and formant structure: Varying vocal register, pitch, and loudness

m

try

r-

a

cftdat

t

2

rincineal-v

rdres.nr-ed

ky,

tocalally

thecalstri-

otngas-in

r

stiong-

erracych-

of

neicalimi-er-verre

ardsfal-

aliz-andasfor

r.ofl-ir

innalers,

Part

ging:

is

Redistr

~bold line!, the cross-sectional area peaked at 3.7 cm2 at thewidest point of the supraglottal widening, decreased to 1 c2

or less for the vowel constriction, increased to 5.8 cm2 at thewidest point in the oral cavity, and then gradually reducedabout 1 cm2 at the mouth termination. For high-pitched veloud sung phonation in falsetto register~narrow line!, thecross-sectional area peaked at 6.4 cm2 ~a 73% increase ovemoderately soft! at the widest point of the supraglottal widening, decreased to less than 1.5 cm2 ~a 50% increase! for thevowel constriction, increased to 8.3 cm2 ~a 43% increase! atthe widest point in the oral cavity, then closed down toarea of approximately 5 cm2 ~a 50% increase! at the mouthtermination.

C. Formant structure

The first four formants for each of the eight vocal trashapes are summarized in Table IV. The change from soloud intensity in chest register speech phonations resultean increase in all formant frequencies, and was associwith increases in mouth opening~Pickett, 1999!. As such,the subject followed the previously reported tendencyraiseF1 when increasing vocal intensity~Huberet al., 1999;Sundberget al., 1993!: the F1 increased from 583 to 64Hz.

Changes from comfortable speech in chest registesung phonations in falsetto followed a similar pattern:creases in mouth opening yielded higher formant frequenoverall. For the high-pitched, very loud sung falsetto toF1 ~601 Hz! andF2 ~1230 Hz! values approached a neutrvowel ~Pickett, 1999!, not unexpected for a vocal tract configuration with such a large supraglottal widening, oral caity volume, and mouth opening.

In general, the subject’s vowel formants tended towathose of a phonetically more neutral vowel, when compato norms for spoken /Ä/, especially in the sung phonationF1 values for the subject ranged from 543 to 682 Hz, aF2 ranged from 939 to 1230 Hz. Although the vowel fomants (F1,F2) produced by the subject were below reportmeans for the spoken /Ä/ vowel, they fall within the range ofvalues produced by speakers in normative studies for spo/Ä/, /Å/, or /#/ ~Hillenbrandet al., 1995; Peterson and Barne

TABLE IV. Formant frequencies associated with variations in vocal regter, pitch and loudness levels.

Pitch andloudness F1 ~Hz! F2 ~Hz! F3 ~Hz! F4 ~Hz!

Falsetto registerB-flat4 ff 601 1230 2751 3553B-flat4 mp 612 1139 2735 3611F4 ff 592 1030 2858 3728F4 mp 599 1102 2764 3643C4 ff 604 939 2729 3833C4 mp 582 1062 2738 3850

Chest registerSpeech loud(D3)

682 1058 2740 3851

Speech comfortable(B-flat2)

543 993 2585 3747

746 J. Acoust. Soc. Am., Vol. 109, No. 2, February 2001

ibution subject to ASA license or copyright; see http://acousticalsociety.org/

o

n

ttoined

o

to-es,

-

sd

d

en

1952!. This may be part of the subject’s vocal techniqueretain a perceptually darker vowel quality and warmer votimbre, even as effort is increased, by using a phoneticmore rounded and/or centralized allophone~Titze, 1994!.The tendency for untrained speakers/singers is to raiselarynx with increased effort, and to thereby shorten the votract, increase formant frequencies, and produce a moredent vocal timbre.

A singer’s formant in the 2800–3000-Hz region was nobserved in any of the phonatory conditions. This findiwas not unexpected, since the phenomenon is primarilysociated with classically trained male singers phonatingchest register~Sundberg, 1987!, and not with countertenofalsetto-based singing.

IV. CONCLUSION

Volumetric imaging of the vocal tract using EBCT waused to document 3-D changes in vocal tract configuraduring phonation, which occurred as a function of vocal reister ~chest or falsetto!, pitch ~low, medium, and high in fal-setto register, and speech!, and loudness~soft versus veryloud! for a single male subject. The 3-D images are of highresolution and corresponding measures of greater accuthan those acquired in previous studies using MRI teniques. EBCT techniques~using 3 mm slices! allowed forgreater image resolution~0.410 mm in the plane of imaging!,reduced image acquisition time, and, thus, minimizationmovement artifacts.

Although the results of the entire study are from osubject, a trained countertenor, the data may not be atypof other male singers or speakers, who produce falsetto slarly. For the population of singers in the classical, commcial, and world music arenas, and theatrical and voice-oartists who utilize falsetto phonation more often and mointensively than many speakers, additional research towa comprehensive characterization of the physiology ofsetto phonation is vital. The current data, elicited fromsubject who has developed an optimal technique for vocaing in the falsetto register across a range of frequenciesintensity levels provides insight into a falsetto type that hbeen shown to be reliable and associated with low riskinjury to the tissues of the vocal folds.

ACKNOWLEDGMENTS

The authors would like to express their gratitude to DWilliam Stanford, Dr. Brad Thompson, and the membersthe Division of Physiologic Imaging, Department of Radioogy at the University of Iowa College of Medicine for thecontributions to this study. This research was supportedpart by research grant No. P60 DC 00976 from the NatioInstitute on Deafness and Other Communication DisordNational Institutes of Health.

Alwan, A., Narayanan, S., and Haker, K.~1997!. ‘‘Toward articulatory-acoustic models for liquid approximants based on MRI and EPG data,II. The rhotics,’’ J. Acoust. Soc. Am.101, 1078–1089.

Baer, T., Gore, J. C., Gracco, L. C., and Nye, P. W.~1991!. ‘‘Analysis ofvocal tract shape and dimensions using magnetic resonance imaVowels,’’ J. Acoust. Soc. Am.90, 799–828.

-

746Tom et al.: 3-D vocal tract imaging

content/terms. Download to IP: 155.33.16.124 On: Wed, 26 Nov 2014 17:41:29

Page 6: Three-dimensional vocal tract imaging and formant structure: Varying vocal register, pitch, and loudness

d

er

m.

el

agy.

soa

n

-

ean

ia

ith. R

nd

ts,

-

nion,

dd

gn.

ntal

is

.

tsub-

Redistr

Boyd, D. P., and Lipton, J. J.~1983!. ‘‘Cardiac computed tomography,’’Proc. IEEE71, 298–307.

Colton, R. H., and Hollien, H.~1972!. ‘‘Phonational range in the modal anfalsetto registers,’’ J. Speech Hear. Res.15, 708–713.

Fant, G.~1960!. The Acoustic Theory of Speech Production~Moulton, TheHague!.

Giles, P.~1982!. The Countertenor~Muller, London!.Hillenbrand, J., Getty, L. A., Clark, M. J., and Wheeler, K.~1995!. ‘‘Acous-

tic characteristics of American English vowels,’’ J. Acoust. Soc. Am.97,3099–3111.

Hoffman, E. A., and Gefter, W. B.~1990!. ‘‘Multimodality imaging of theupper airway: MRI, MR spectroscopy, and ultrafast x-ray CT,’’ inSleepand Respiration, edited by F. G. Issa, P. M. Suratt, and J. E. Remm~Wiley-Liss, New York!, pp. 291–301.

Hoffman, E. A., Sinak, L. J., Robb, R. A., and Ritman, E. L.~1983!. ‘‘Non-invasive quantitative imaging of shape and volume of lungs,’’ APhysiol. Soc. 1414–1421.

Hoffman, E. A., Gnanaprakasam, D., Gupta, K. B., Hoford, J. D., Kugmass, S. D., and Kulawiec, R. S.~1992!. ‘‘VIDA: an environment formultidimensional image display and analysis,’’ SPIE Proc. Biomed. ImProc. and 3-D Microscopy, Vol. 1660, San Jose, CA, 10–13 Februar

Hollien, H. ~1974!. ‘‘On vocal registers,’’ J. Phonetics2, 125–143.Hollien, H. ~1977!. ‘‘The registers and ranges of the voice,’’ inApproaches

to Vocal Rehabilitation, edited by M. Cooper and H. C. Cooper~Thomas,Springfield, IL!, pp. 76–121.

Huber, J. E., Stathopoulos, E. T., Curione, G. M., Ash, T. A., and JohnK. ~1999!. ‘‘Formants of children, women and men: The effects of vocintensity variation,’’ J. Acoust. Soc. Am.106, 1532–1542.

International Commission on Radiological Protection.~1977!. Recommen-dations of the International Commission on Radiological Protectio,ICRP Publication 26~Pergamon, Oxford!.

Kelly, J. L., and Lochbaum, C. C.~1962!. ‘‘Speech synthesis,’’ in Proceedings 4th International Congress on Acoustics, paper 642, pp. 1–4.

Laver, J.~1980!. The Phonetic Description of Voice Quality~Cambridge U.P., New York!.

Lienard, J. S., and Di Benedetto, M. G.~1999!. ‘‘Effect of vocal effort onspectral properties of vowels,’’ J. Acoust. Soc. Am.106, 411–422.

Liljencrants, J.~1985!. ‘‘Speech Synthesis with a Reflection-Type LinAnalog,’’ DS dissertation, Department of Speech CommunicationMusic Acoustics, Royal Institute of Tech., Stockholm, Sweden.

Malm, W. P.~1967!. Music Cultures of the Pacific, the Near East and As~Prentice–Hall, Englewood Cliffs, NJ!.

Markel, J. D., and Gray, A. H.~1976!. Linear Prediction of Speech~Springer-Verlag, New York!.

Moore, C. A. ~1992!. ‘‘The correspondence of vocal tract resonance wvolumes obtained from magnetic resonance images,’’ J. Speech Hear35, 1009–1023.

747 J. Acoust. Soc. Am., Vol. 109, No. 2, February 2001

ibution subject to ASA license or copyright; see http://acousticalsociety.org/

s

-

e

n,l

d

es.

Narayanan, S. S., Alwan, A. A., and Haker, K.~1997!. ‘‘Towardarticulatory-acoustic models for liquid approximants based on MRI aEPG data. Part I. The laterals,’’ J. Acoust. Soc. Am.101, 1064–1077.

National Council on Radiation Protection and Measurements.~1987!. Rec-ommendations on Limits for Exposure to Ionizing Radiation, NCRP Re-port No. 91~National Council on Radiation Protection and MeasuremenBethesda, MD!.

Peterson, G. E., and Barney, H. L.~1952!. ‘‘Control methods used in a studyof vowels,’’ J. Acoust. Soc. Am.24, 175–184.

Pickett, J. M.~1999!. The Acoustics of Speech Production~Allyn and Ba-con, Boston!.

Raya, S. P., and Udupa, J. K.~1990!. ‘‘Shape-based interpolation of multidimensional objects,’’ IEEE Trans. Med. Imaging9, 32–42.

Story, B. H. ~1995!. ‘‘Physiologically-based speech simulation using aenhanced wave-reflection model of the vocal tract,’’ Ph.D. dissertatUniversity of Iowa.

Story, B. H., Titze, I. R., and Hoffman, E. A.~1996!. ‘‘Vocal tract areafunctions from magnetic resonance imaging,’’ J. Acoust. Soc. Am.100,537–554.

Sulter, A. M., Miller, D. G., Wolf, R. F., Schutte, H. K., Wit, H. P., anMooyaart, E. L.~1992!. ‘‘On the relation between the dimensions anresonance characteristics of the vocal tract: a study with MRI,’’ MaReson. Imaging10, 365–373.

Sundberg, J.~1987!. The Science of the Singing Voice~Northern Illinois U.P., DeKalb, IL!.

Sundberg, J., Titze, I. R., and Scherer, R. C.~1993!. ‘‘Phonatory control inmale singing: A study of the effects of subglottal pressure, fundamefrequency, and mode of phonation on the voice source,’’ J. Voice7, 15–29.

Titze, I. R.~1988!. ‘‘A framework for the study of vocal registers,’’ J. Voice2, 183–194.

Titze, I. R. ~1994!. Principles of voice production~Prentice–Hall, Engle-wood Cliffs, NJ!.

Titze, I. R., Mapes, S., and Story, B.~1994!. ‘‘Acoustics of the tenor highvoice,’’ J. Acoust. Soc. Am.95, 1133–1142.

Tom, K. ~1996!. ‘‘Intensity control in male falsetto phonation: An analysby synthesis approach,’’ Ph.D. dissertation, University of Iowa.

Udupa, J. K.~1991!. ‘‘Computer aspects of 3-D imaging in medicine: Atutorial,’’ in 3D Imaging in Medicine, edited by J. K. Udupa and G. THerman~CRC, Boca Raton, FL!.

Welch, G. F., Sergeant, D. C., and MacCurtain, F.~1988!. ‘‘Some physicalcharacteristics of the male falsetto voice,’’ J. Voice2, 151–163.

Yang, C-S., and Kasuya, H.~1994!. ‘‘Accurate measurement of vocal tracshapes from magnetic resonance images of child, female, and malejects,’’ Proc. ICSLP94, 623–626, Yokohama, Japan.

747Tom et al.: 3-D vocal tract imaging

content/terms. Download to IP: 155.33.16.124 On: Wed, 26 Nov 2014 17:41:29