UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these...

26
Haskins Laboratories Status Report on Speech Research 1990, SR-101 /102, 194-219 UTargetless" Schwa: An Articulatory Analysis* Catherine P. Browman and Louis Goldsteint 1. INTRODUCfION One of the major goals for a theory of phonetic and phonological structure is to be able to account for the (apparent) contextual variation of phonological units in as general and simple a way as possible. While it is always possible to state some pattern of variation using a special "low- level" rule that changes the specification of some unit, recent approaches have attempted to avoid stipulating such rules, and instead propose that variation is often the consequence of how the phonological units, properly defined, are organized. Two types of organization have been suggested that lead to the natural emergence of certain types of variation. One is that invariantly specified phonetic units may overlap in time, Le., they may be co-produced (e.g., Bell-Berti & Harris, 1981; Browman & Goldstein, 1987; Fowler, 1977; 1981a,b; Liberman & Mattingly, 1985), so that the overall tract shape and acoustic consequences of these co-produced units will reflect their combined influence. A second is that a given phonetic unit may be unspecified for some dimension(s) (e.g., Keating, 1988; Ohman, 1966), so that the apparent variation along that dimension is due to continuous trajectories between neighboring units' specifications for that dimension. A particularly interesting case of contextual variation involves reduced (schwa) vowels in English. Investigations have shown that these vowels are particularly malleable: they take on the acoustic (Fowler, 1981a) and articulatory (e.g., Alfonso & Baer, 1982) properties of neighboring vowels. While Fowler (1981a) has analyzed this variation as emerging from the co-production of the reduced vowels and a neighboring stressed vowel, it might also be the case that schwa is completely unspecified for tongue position. Our thanks to Carol Fowler and Doug Whalen for critiquing the preliminary version of this paper. This work was supported by NSF grant BNS 8820099 and NIH grants HD-01994 and NS·13617 to Haskins Laboratories. 194 This would be consistent with analyses offormant trajectories for medial schwa in trisyllabic sequences (Magen, 1989) that have shown that F2 moves (roughly continuously) from a value dominated by the preceding vowel (at onset) to one dominated by the following vowel (at offset). Such an analysis would also be consistent with the phonological analysis of schwa in French (Anderson, 1982) as an empty nucleus slot. It is possible (although this is not Anderson's analysis), that the empty nucleus is never filled in by any specification, but rather there is a specified "interval" oftime between two fun vowels in which the tongue continuously moves from one vowel to another. The computational gestural model being developed at Haskins Laboratories (e.g., Browrnan & Goldstein, 1987; Browman, Goldstein, Saltzman, & Smith, 1986; Saltzman, Goldstein, Browman, & Rubin, 1988) can serve as a useful vehicle for testing these (and other) hypotheses about the phonetic/phonological structure of utterances with such reduced schwa vowels. As we will see, it is possible to provide a simple, abstract representation of such in terms of gestures and their organization that can yield the variable patterns of articulatory behavior and acoustic consequences that are observed in these utterances. The basic phonetic/phonological unit within our model is the gesture, which involves the formation (and release) of a linguistically significant constriction within a particular vocal tract subsystem. Each gesture is modeled as a dynamical system (or set of systems) that regulates the time-varying coordination of individual articulators in performing these constriction tasks (Saltzman, 1986). The dimensions along which the vocal tract goals for constrictions can be specified are called tract variables, and are shown in the left-hand column of Figure 1. Oral constriction gestures are defined in terms of pairs of these tract variables, one for constriction location, one for constriction degree.

Transcript of UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these...

Page 1: UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these constriction tasks (Saltzman, 1986). The dimensions along which the vocal tract goals for

Haskins Laboratories Status Report on Speech Research1990, SR-101 /102, 194-219

UTargetless" Schwa: An Articulatory Analysis*

Catherine P. Browman and Louis Goldsteint

1. INTRODUCfIONOne of the major goals for a theory of phonetic

and phonological structure is to be able to accountfor the (apparent) contextual variation ofphonological units in as general and simple a wayas possible. While it is always possible to statesome pattern of variation using a special "low­level" rule that changes the specification of someunit, recent approaches have attempted to avoidstipulating such rules, and instead propose thatvariation is often the consequence of how thephonological units, properly defined, areorganized. Two types of organization have beensuggested that lead to the natural emergence ofcertain types of variation. One is that invariantlyspecified phonetic units may overlap in time, Le.,they may be co-produced (e.g., Bell-Berti & Harris,1981; Browman & Goldstein, 1987; Fowler, 1977;1981a,b; Liberman & Mattingly, 1985), so that theoverall tract shape and acoustic consequences ofthese co-produced units will reflect their combinedinfluence. A second is that a given phonetic unitmay be unspecified for some dimension(s) (e.g.,Keating, 1988; Ohman, 1966), so that theapparent variation along that dimension is due tocontinuous trajectories between neighboring units'specifications for that dimension.

A particularly interesting case of contextualvariation involves reduced (schwa) vowels inEnglish. Investigations have shown that thesevowels are particularly malleable: they take onthe acoustic (Fowler, 1981a) and articulatory (e.g.,Alfonso & Baer, 1982) properties of neighboringvowels. While Fowler (1981a) has analyzed thisvariation as emerging from the co-production ofthe reduced vowels and a neighboring stressedvowel, it might also be the case that schwa iscompletely unspecified for tongue position.

Our thanks to Carol Fowler and Doug Whalen for critiquingthe preliminary version of this paper. This work was supportedby NSF grant BNS 8820099 and NIH grants HD-01994 andNS·13617 to Haskins Laboratories.

194

This would be consistent with analyses of formanttrajectories for medial schwa in trisyllabicsequences (Magen, 1989) that have shown that F2moves (roughly continuously) from a valuedominated by the preceding vowel (at onset) to onedominated by the following vowel (at offset). Suchan analysis would also be consistent with thephonological analysis of schwa in French(Anderson, 1982) as an empty nucleus slot. It ispossible (although this is not Anderson's analysis),that the empty nucleus is never filled in by anyspecification, but rather there is a specified"interval" of time between two fun vowels in whichthe tongue continuously moves from one vowel toanother.

The computational gestural model beingdeveloped at Haskins Laboratories (e.g., Browrnan& Goldstein, 1987; Browman, Goldstein,Saltzman, & Smith, 1986; Saltzman, Goldstein,Browman, & Rubin, 1988) can serve as a usefulvehicle for testing these (and other) hypothesesabout the phonetic/phonological structure ofutterances with such reduced schwa vowels. As wewill see, it is possible to provide a simple, abstractrepresentation of such utterance~ in terms ofgestures and their organization that can yield thevariable patterns of articulatory behavior andacoustic consequences that are observed in theseutterances.

The basic phonetic/phonological unit within ourmodel is the gesture, which involves the formation(and release) of a linguistically significantconstriction within a particular vocal tractsubsystem. Each gesture is modeled as adynamical system (or set of systems) thatregulates the time-varying coordination ofindividual articulators in performing theseconstriction tasks (Saltzman, 1986). Thedimensions along which the vocal tract goals forconstrictions can be specified are called tractvariables, and are shown in the left-hand columnof Figure 1. Oral constriction gestures are definedin terms of pairs of these tract variables, one forconstriction location, one for constriction degree.

Page 2: UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these constriction tasks (Saltzman, 1986). The dimensions along which the vocal tract goals for

'Targetless" Schwa: An Articultltory Analysis 195

tract variable articulators involved

LP lip protrusion upper & lower lips, jawLA lip aperture upper & lower lips, jaw

TTCL tongue tip constrict location tongue tip, tongue body, jawTTCD tongue tip constrict degree tongue tip, tongue body, jaw

TeCl tongue body constrict location tongue body, jawTeCD tongue body constrict degree tongue body, jaw

VEL velie aperture velum

velum

tonguebody

center

+ upper lip

+ lower lip

jaw

Figure 1. Tract variables and associated articulators.

Figure 2. Overview of GEST: gestural computationalmodel.

particular phonetic/phonological structure mustreside in the gestural score-in the dynamicparameter values of individual gestures, or intheir relative timing. Given this constraint, it ispossible to test the adequacy of some particularhypothesis about phonetic structure, as embodiedin a particular gestural score, by using the modelto generate the articulatory motions andcomparing these to observed articulatory data.

outputspeech

SYNTHESIZER

ARTICULATORYTASK

DYNAMIC

MODEL

Intendedutterance

LINGUISTIC

GESTURAL

MODEL

The right-hand side of the figure shows theindividual articulatory degrees of freedom whosemotions contribute to the corresponding tractvariable.

The computational system we have developed,sketched in Figure 2, (Browman & Goldstein,1987; Saltzman, Goldstein, Browman, & Rubin,1988) provides a representation for arbitrary(English) input utterances in terms of suchgestural units and their organization over time,called the gestural score. The layout of thegestural score is based on the principles ofintergestural phasing (Browman & Goldstein,1987) specified in the linguistic gestural model.The gestural score is input to the task dynamicmodel (Saltzman, 1986; Saltzman & Kelso, 1987),which calculates the patterns of articulator motionthat result from the set of active gestural units.The articulatory movements produced by the taskdynamic model are then input to an articulatorysynthesizer (Rubin, Baer, & Mermelstein, 1981) tocalculate an output speech waveform. Theoperation of the task dynamic model is assumed tobe "universal." (In fact, it is not even specific tospeech, having originally been developed,Saltzman & Kelso, 1987, to describe coordinatedreaching movements). Thus, all of the language-

Page 3: UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these constriction tasks (Saltzman, 1986). The dimensions along which the vocal tract goals for

196 Browman and Goldstein

The computational model can thus be seen as atool for evaluating the articulatory (and acoustic)consequences of hypothesized aspects of gesturalstructure. In particular, it is well suited forevaluating the consequences of the organizationalproperties discussed above: (1) underspecificationand (2) temporal overlap. Gestural structures areinherently underspecified in the sense that thereare intervals of time during which the value of agiven tract variable is not being controlled by thesystem: only when a gesture defined along thattract variable is active is such control in place.This underspecification can be see in Figure 3,which shows the gestural score for the utterance/paarnJ (ARPAbet is used throughout the text, forphonetic transcription).! Here, the shaded boxesindicate the gestures, and are superimposed onthe tract variable time functions produced whenthe gestural score is input to the task dynamicmodel. The horizontal dimension of the shadedboxes indicates the intervals of time during whicheach of the gestural units is active, while theheight of the boxes corresponds to the "target" orequilibrium position parameter of the second-order dynamical regime. .

Note that during the activation interval of theinitial bilabial closure gesture, Lip Aperture

(vertical distance between the two lips) graduallydecreases, until it approaches the regime's target.However, even after the regime is turned off, LAshows changes over time. Such "passive" tractvariable changes result from two sources: (1) theparticipation of one of the (uncontrolled) tractvariable's articulators in some other tract variablewhich is under active gestural control, and (2) anarticulator-specific "neutral" or "rest" regime, thattakes control of any articulator which is notcurrently active in any gesture. For example, inthe LA case shown here, the jaw contributes to theTBCD gesture (for the vowel) by lowering, andthis has the side effect of increasing LA. Inaddition, the upper and lower lips are not involvedin any active gesture, and so move toward theirneutral positions with respect to the upper andlower teeth, thus further contributing to anincrease in LA. Thus, the geometric structure ofthe model itself, (together with the set ofarticulator neutral values) predicts a specific,well-behaved time function for a given tractvariable, even when it is not being controlled.Uncontrolled behavior need not be stipulated inany way. This feature of the model is important tobeing able to test the hypothesis that schwa maynot involve an active gesture at all.

Input String: /1 paam/;

VELICAPERTURE

TONGUE BODYCONSTRICTION

DEGREE

LIPAPERTURE

GLOTTALAPERTURE

100 200 300

TIME (MS)

400

Figure 3. Gestural score and generated motion variables for Ipaarnl.

Page 4: UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these constriction tasks (Saltzman, 1986). The dimensions along which the vocal tract goals for

"Target1ess" Schwa: An Articulatory Analysis 197

The second useful aspect of the model is theability to predict consequences of the temporaloverlap of gestures, Le., intervals during whichthere is more than one concurrently activegesture. Browman and Goldstein (1987) haveshown that the model predicts differentconsequences of temporal overlap, depending onwhether the overlapping gestures involve thesame or different tract variables, and that thesedifferent consequences can actually be observed inallophonic variations and "casual speech"alternations. Of particular importance toanalyzing schwa is the shared tract variable case,since we will be interested in the effects of overlapbetween an active schwa gesture (if any) and thepreceding or following vowel gesture, all of whichwould involve the Tongue Body tract variables(TBCD, TBCL). In this case, the dynamicparameter values for the overlapping gestures are"blended," according to a competitive blendingdynamics (Saltzman, Goldstein, Browman, &Rubin, 1988; Saltzman & Munhall, 1989). In theexamples we will be examining, the blending willhave the effect of averaging the parameter values.Thus, if both gestures were coextensive for theirentire activation intervals, neither target valuewould be achieved, rather, the value of the tractvariable at the end would be the average of theirtargets.

In this paper, our strategy is to analyzemovements of the tongue in utterances withschwa to determine if the patterns observedprovide evidence for a specific schwa tonguetarget. Based on this analysis, specific hypothesesabout the gestural overlap in utterances withschwa are then tested by means of computersimulations using the gestural model describedabove.

2. ANALYSIS OF ARTICULATORYDATA

Using data from the Tokyo X-ray archive (Miller& Fujimura, 1982), we analyzed /pVlpax'pV2pax/utterances produced by a speaker of AmericanEnglish, where VI and V2 were all possiblecombinations of /iy,eh,aa,ah,uw/. Utterances wereread in short lists of seven or eight items, each ofwhich had the same VI and different V2s. One .token (never the initial or final item in a list) ofeach of the twenty-five utterance types wasanalyzed. The microbeam data tracks the motionof five pellets in the mid-sagittal plane. Pelletswere located on the lower lip (L), the lower incisorfor jaw movement (J), and the midline of the

tongue: one approximately at the tongue blade (B),one at the middle of the tongue dorsum (M), andone at the rear of the tongue dorsum (R).

Ideally, we would use the information in tonguepellet trajectories to infer a time-varyingrepresentation of the tongue in terms of thedimensions in which vowel gesture· targets aredefined, e.g., for our model (or for Wood, 1982),location and degree of tongue body constriction.(For Ladefoged & Lindau, 1989, the specificationswould be rather in terms offormant frequencieslinked to the factors of front-raising and back­raising for the tongue). Since this kind oftransformation cannot currently be performedwith confidence, we decided to describe the vowelsdirectly in terms of the tongue pellet positions. Asthe tongue blade pellet (B) was observed to belargely redundant in these utterances (notsurprising, since they involve only vowels andbilabial consonants), we chose to measure thehorizontal (X) and vertical (Y) positions (inEuclidean space) for the M and R pellets for eachvowel. While not ideal in one sense, the procedurehas the advantage that the analysis does notmake any a priori assumptions about the correctparameterization of tongue shape.

The first step was to find appropriate timepoints at which to measure the position of thepellets for each vowel. The time course of eachtongue pellet dimension (MX, MY, RX, RY) wasanalyzed by means of an algorithm that detecteddisplacement extrema (peaks and valleys). To theextent that there is a characteristic pellet valueassociated with a given vowel, we may expect tosee such a displacement extremum, that is,movement towards some value, then away again.The algorithm employed a noise level of one x-raygrid unit (= approximately .33 mm); thus,movements of a single unit in one direction andback again did not constitute extrema. Only theinterval that included the full vowels and themedial schwa was analyzed; final schwas were notanalyzed. In general, an extremum was found thatcoincided with each full vowel, for each pelletdimension, while such an extremum was missingfor schwa in over half the cases. The pelletpositions at these extrema were used as the basicmeasurements for each vowel. In cases where aparticular pellet dimension had no extremumassociated with a vowel, a reference point waschosen that corresponded to the time of anextremum of one of the other pellets. In general,MY was the source of these reference points forfull vowels, and RY was the source for schwa, as

Page 5: UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these constriction tasks (Saltzman, 1986). The dimensions along which the vocal tract goals for

198 Browman and Goldstein

these were dimensions that showed the fewestmissing extrema. After the application of thisalgorithm, each vowel in each utterance wascategorized by the value at a single referencepoint for each of the four pellet dimensions. Sincepoints were chosen by looking only at data fromthe tongue pellets themselves, these are referredto as the "tongue" reference points.

To illustrate this procedure, Figure 4a showsthe time courses of the M, R, and L pellets (onlyvertical for L) for the utterance Ipiypax'piypax/with the extrema marked with dashed lines. Theacoustic waveform is displayed at the top. Fororientation, note that there are four displacementpeaks marked for LY, corresponding to the raisingof the lower lip for the four bilabial closuregestures for the consonants. Between these peaksthree valleys are marked, corresponding to theopening of the lips for the three vowels. For MX,MY, and RX, an extremum was found associatedwith each of the full vowels and the medial schwa.For RY, a peak was found for schwa, but not forVI. While there is a valley detected following thepeak for schwa, it occurs during the consonantclosure interval, and therefore is not treated asassociated with V2. Figure 4b shows the sameutterance with the complete set of "tongue"reference points used to characterize each vowel.Reference points that have been copied from otherpellets (MY in both cases) are shown as solid lines.Note that the consonant closure intervalextremum has been deleted.

Figure 5 shows the same displays for theutterance Ipiypax'paapax/. Note that, in (a), ex~

trema are missing for schwa for MX, MY and RX.This is typical of cases in which there is a largepellet displacement between V1 and V2. The tra­jectory associated with such a displacement movesfrom V1 to V2, with no intervening extremum (oreven, in some cases, no "flattening" of the curve).

As can be seen Figures 4 and 5, the referencepoints during the schwa tend to be relatively latein its acoustic duration. As we will be evaluatingthe relative contributions of V1 and V2 indetermining the pellet positions for schwa, wedecided to also use a reference point earlier in theschwa. To obtain such a point, we used the valleyassociated with the lower lip for the schwa-thatis, approximately the point at which the lipopening is maximal. This point, called the "lip"reference, typically occurs earlier in the (acoustic)vowel duration than the "tongue" reference point,as can be seen in Figures 4 and 5. Anotheradvantage of the "lip" reference point is that all

tongue pellets are measured at the same momentin time. Choosing points at different times fordifferent dimensions might result in an apparentdifferential influence of V1 and V2 acrossdimensions. The "lip" reference point was usedonly for the medial schwa; the full vowelreferences used for comparison with the "lip"schwa point were the same as those used forcomparison with the "tongue" schwa point, andwere determined using the tongue extremumalgorithm described above.

2.1 ResultsFigure 6 shows the positions of the M (on the

right) and R (on the left) pellets for the fullvowels, plotted in the mid-sagittal plane such thatthe speaker is assumed to be facing to the right.The ten points for a given vowel are enclosed in anellipse indicating their principal components (twostandard deviations along each axis). The tongueshapes implied by these pellet positions areconsistent with cinefluorographic data for Englishvowels (e.g., Harshman, Ladefoged, & Goldstein,1977; Neary, 1980; Perkell, 1969). For example,liyl is known to involve a shape in which the frontof the tongue is bunched forward and up towardthe hard palate, compared, for example to leW,which has a relatively unconstr~cted shape. Thisfronting can be seen in both pellets. In fact, overall vowels, the horizontal components of themotion of the two pellets are highly correlated(r=.939 in the full vowel data, between RX andMX over the 25 utterances), The raising for liylcan be seen in M (on the right), but not in R, forwhich liyl is low, lower for example than laa!. Thelow position of the back of the tongue dorsum for/iyl can, in fact, be seen in mid-sagittalcinefluorographic data. Superimposed tonguesurfaces for different English vowels (e.g.,Ladefoged, 1982) reveals that the curves for liyland laa! cross somewhere in the upper pharyngealregion, so that in front of this point, liyl is higherthan laa!, while in back of this point, laa! ishigher. This suggests that the R pellet in thecurrent experiment is far enough back to bebehind this cross-over point. luwl involves raisingof the rear of the tongue dorsum (toward the softpalate), which is here reflected in the raising of .both the Rand M pellets. In general, the verticalcomponents of the two pellets are uncorrelatedacross the set of vowels as a whole (r=.020),reflecting, perhaps the operation of twoindependent factors such as "front-raising" and"back-raising" (Ladefoged, 1980).

Page 6: UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these constriction tasks (Saltzman, 1986). The dimensions along which the vocal tract goals for

"Targetless" Schwa: An Articulatory Analysis 199

WAVEFORM

TONGUE MIDDLEhorizontal

( MX)

ph

ph

p

TONGUE MIDDLEvertical

(MY)

TONGUE REARhorizontal

(RX)

TONGUE REARvertical

( RV)

LOWER LIPvertical

(LY)

o

TIME (ms)

700

Figure 4. Pellet time traces for Ipiypax'piypaxl (lPA symbols are used in the figure). (a) Position extrema indicated bydashed lines.

WAVEFORM

TONGUE MIDDLEhorizontal

( MX)

ph

p ph

p

TONGUE MIDDLEvertical

( MY)

TONGUE REARhorizontal

(RX)

TONGUE REARvertical

( RY)

LOWER LIPvertical

( LV)

o

TIME: (ms)

700

(b) "Tongue" reference points indicated by dashed and solid lines (for Middle and Rear pellets).

Page 7: UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these constriction tasks (Saltzman, 1986). The dimensions along which the vocal tract goals for

,TONGUE MIDDLE !

horizontal ---~,(MX) -----'f-----~-----------

zoo

WAVEFORM

ph

Browman and Goldstein

h a p

TONGUE MIDDLEvertical

( MY)

TONGUE REARhorizontal

(RX)

TONGUE REARvertical

( RY)

LOWER LIPvertical

( LY)

o

TIME (ms)

700

Figure 5. Pellet time traces for Ipiypax'paapaxl (lPA symbols are used in the figure). (a) Position extrema indicated bydashed lines.

WAVEFORM

TONGUE MIDDLEhorizontal

(MX)

p p

TONGUE MIDDLEvertical

( MY)

TONGUE REARhorizontal

(RX)

TONGUE REARvertical

( RY)

LOWER LIPvertical

( LY)

o

TIME (ms)

700

(b) "Tongue" reference points indicated by dashed and solid lines (for Middle and Rear pellets).

Page 8: UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these constriction tasks (Saltzman, 1986). The dimensions along which the vocal tract goals for

"Targetless" Schwa: An Articulatory Analysis

NenoI

201

I

I

~1

....C)o

,-L..I -'

Figure 6. Pellet positions for full vowels, displayed in mid-sagittal plane with head facing to the right: Mid pellets onthe right, Rear pellets on the left. The ellipses indicate two standard deviations along axes determined by principalcomponent analysis. Symbols I = ARPAbet liy/, U = luw/, E = leh/, X = lahl, and A = laa!. Units are X-ray units (= .33mm).

The pellet positions for schwa, using the"tongue" reference points, are shown in the samemid-sagittal plane in Figure 7, with the full vowelellipses added for reference. The points arelabelled by the identity of the following vowel (V2)in (a) and by the preceding vowel (VI) in (b).Figure 8 shows the parallel figure for schwameasurements at the "lip" reference point. In bothfigures, note that the range of variation for schwais less than the range of variation across theentire vowel space, but greater than the variation

for any single full vowel. Variation in MY isparticularly large compared to MY variation forany full vowel. Also, while the distribution of theR pellet positions appears to center around thevalue for unreduced lahl, which might be thoughtto be a target for schwa, this is clearly not the casefor the M pellet, where the schwa values seem tocenter around the region just above lehl. For bothpellets, the schwa values are found in the center ofthe region occupied by the full vowels. In fact, thisrelationship turns out to be quite precise.

Page 9: UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these constriction tasks (Saltzman, 1986). The dimensions along which the vocal tract goals for

202 Browman and Goldstein

300,..-------------------·--------------------,

220

IEXI

u

A

18014022O+----..,.-------.-----,.----.,..------,,.----t-----r----.....---,-----r-'

260

Figure 7. Pellet positions for schwa at "tongue" reference points, displayed in right-facing mid-sagittal plane as inFigure 6. The ellipses are from the full vowels (Figure 6), for comparison. Symbols I = ARPAbet fiyf, U = fuwf, E = fehl,X = fahf, and A = faa!. Units are X-ray units (= .33 mm). (a) Schwa pellet positions labelled by the identity of thefollowing vowel (V2).

300

IU

EUU X

u

260

22O-t-----r-----,------,r----,----;-----,----r---,-----,----r'140 180

(b) Schwa pellet positions labelled by the identity of the preceding vowel (VI).

220

Page 10: UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these constriction tasks (Saltzman, 1986). The dimensions along which the vocal tract goals for

"Targetless" Schwa: An Articulatory Analysis 203

300..---------------------------------------,

260

E

A

U

U

A XB

I

X

22O+----,------'--,-----r---~--_r_--_+--____,r__--~--,_--_,_J

140 180 220

Figure 8. Pellet positions for schwa at "lip" reference points, displayed as in Figure 7 (including ellipses from Figure 6).Symbols I = ARPAbet liyl, U = luwl, E = lehl, X = lahI, and A = laa/. Units are X-ray units (= .33 mm). (a) Schwa pelletpositions labelled by the identity of the following vowel (V2).

300

260U UUuE

22O+------;---..,----r----,-----r-----j----,----..,..-----,.-----r--!1~ 1~

(b) Schwa pellet positions labelled by the identity of the preceding vowel (VI).

220

Page 11: UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these constriction tasks (Saltzman, 1986). The dimensions along which the vocal tract goals for

204 Browman and Goldstein

Figure 9 shows the mean pellet positions foreach full vowel and for schwa ("lip" and "tongue"reference points give the same overall means), aswell as the grand mean of pellet positions acrossall full vowels, marked by a circle. The meanpellet positions for the schwa lie almost exactly ontop of the grand mean for both the M and Rpellets. This pattern of distribution of schwapoints is exactly what would be expected if therewere no independent target for schwa but rather acontinuous tongue trajectory from VI to V2. Givenall possible combinations of trajectory endpoints(VI and V2), we would expect the mean value of apoint located at (roughly) the midpoint of these 25trajectories to have the same value as the mean ofthe endpoints themselves.

If it is indeed the case that the schwa can bedescribed as a targetless point on the continuoustrajectory from VI to V2, then we would expectthat the schwa pellet positions could be predictedfrom knowledge of VI and V2 positions alone, withno independent contribution of schwa. To test this,we performed stepwise multiple linear regressionanalyses on all possible subsets of the predictorsof VI position, V2 position, and an independent

schwa factor, to determine which (linear)combinations of these three predictors bestpredicted the position of a given pellet dimensionduring schwa. The analysis finds the values of theb coefficients and the constant k, in equations like(1) below, that give the best prediction of theactual schwa values.

(1) schwa(predicted) =bl*VI + b2*V2 + k

The stepwise procedure means that variables areadded into an equation such as (1) one at a time,in the order of their importance to the prediction.The procedure was done separately for equationswith and without the constant term k (usingBMDP2R and 9R). For those analyses containingthe constant term (which is the y-intercept), krepresents an independent schwa contribution tothe pellet position-when it is the only predictorterm, it is the mean for schwa. Those analyseswithout the constant term (performed using 9R)enabled the contributions of VI and V2 to bedetermined in the absence of this schwacomponent. Analyses were performed separatelyfor each pellet dimension; and for the "tongue" and"lip" reference points.

320~-------------..

300

280 .tEl

U:J+U

260. . ~

a+·~~A

240 +£ +. 1:1£1

I:IA220. ~

200 , , , ,120 140 160 180 200 220 240

Figure 9. Mean pellet positions for full vowels and schwa, displayed in right-facing mid-sagittal plane as in Figure 6.Phonetic symbols are indicated using IPAi the grand mean of all the full vowels is indicated by a circled square. Unitsare X-ray units (= .33 mm).

Page 12: UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these constriction tasks (Saltzman, 1986). The dimensions along which the vocal tract goals for

"Targetless" Schwa: An Articulatory Analysis 205

The results for the "tongue" points are shown inthe right-hand column of Table 1. For each pellet,the various combinations of terms included in theequation are rank ordered according to thestandard error of the schwa prediction for thatcombination, the smallest error shown at the top.

Table 1. Regression results for X-ray data.

"lip" reference point "tongue" reference point

Standard StandardTerms Error Terms Error

MX k+v1+v2 4.6 MX k+v2+v1 45k+v1 55 k+v2 4.6v1+v2 5.6 k 62k 63 v2+v1 6.4vI 8.6 v2 10.8

MY k+v1+v2 42 MY k+v1+v2 4.7k+v1 5.0 k+v1 6.4v1+v2 7.6 k 8.7k 10.3 v1+v2 9.1vI 11.9 vI 15.1

RX k+v2+v1 3.8 RX k+v2+v1 4.0k+v2 3.9 k+v2 4.0k 4.8 k 5.8v1+v2 7.0 v2+v1 7.7vI 11.7 v2 10.6

RY k+v2+v1 4.1 RY k+v2+v1 45v2+v1 4.9 k+v2 4.6k+v2 5.1 v2+v1 53k 6.8 v2 62v2 7.0 k 7.1

In all cases, the equation with all three termsgave the least error (which is necessarily true).Interestingly, however, for MX, RX and RY, theprediction using the constant and V2 differed onlytrivially from that using all three variables. Thisindicates that VI does not contribute substantiallyto schwa pellet positions at the "tongue" referencepoint. In addition, it indicates that anindependent schwa component is important to theprediction, because V2 alone, or in combinationwith VI, gives worse prediction than V2 plus k.For MY, all three terms seem to be important­removing anyone of them increases the error.Moreover, the second-best prediction involvesdeleting the V2 term, rather than VI. [Thereduced efficacy of V2 (and increased efficacy ofVI) in predicting the MY value of schwa may bedue, in part, to the peak determination algorithmemployed. When VI or V2 was laa! or lahl, the

criteria selected a point for MY that tended to bemuch later in the vowel than the point chosen forthe other pellet dimensions (Figure 5b gives anexample of this). Thus, for V2, the point chosen forMY is much further in time from the schwa pointthan is the case for the other dimensions, while forVI, the point chosen is often closer in time to theschwa point.]

The overall pattern of results can be seengraphically in Figure 10. Each panel shows therelation between "tongue" pellet positions forschwa and the full vowels: VI in the top row andV2 in the bottom row, with a different pelletrepresented in each column.The points in the toprow represent the pellet positions for theutterances with the indicated initial vowel(averaged across five utterances, each with adifferent final voweD, while the bottom row showsthe average for the five utterances with theindicated final vowel. The relation between schwaand V2 is quite systematic-for every pellet, thelines do not cross in any of the panels of thebottom row-while for VI (in the top row), therelationship is only systematic for RY (andsomewhat for MY).

Turning now to the "lip" reference points,regression results for these points are found in theleft-hand column of Table 1. The best predictionagain involves all three terms, but here, in everycase except RX, the best two-term prediction doessubstantially worse. Thus VI, which waseffectively irrelevant at the "tongue" point, doescontribute to the schwa position at this earlier"lip" point. In fact, for these three pellets, thesecond best prediction combination alwaysinvolves VI (with either V2 or k as thesecond term). This pattern of results can beconfirmed graphically in Figure 11. Comparingthe VI effects sketched in the top row of panels inFigures 10 and 11, notice that there is morespread between the schwa pellets at the "lip" point(Figure 11) than at the "tongue" point (Figure 10).This indicates that the schwa pellet was moreaffected by VI at the "lip" point. There is alsosomewhat less crossover for MX and RX in the"lip" figure, indicating increased systematicity ofthe VI effect.

In summary, it appears that the tongue positionassociated with medial schwa cannot be treatedsimply as an intermediate point on a direct tonguetrajectory from VI to V2. Instead,there is evidence that this V1-V2 trajectory iswarped by an independent schwa component.

Page 13: UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these constriction tasks (Saltzman, 1986). The dimensions along which the vocal tract goals for

206 Browman and Goldstein

I\) I\) I\)

g ~ ~

< - <

\1V.... ....:s:

J\~~ ><

< < '-8N

N

I\) I\) I\) I\)(0) CD (0) CD0 0 0 0

--< <

\~/-I.... .... 0

3: ::s~

A~~ -< to

c:< < CD

N -N -::0

~ .... ... CD(0) CD (0)~0 0 0 .....

CD~

< < CD.... .. ....

W ::sJJ

(")

~~>< CD

< < • .::l •N N

I\) I\) I\) I\)

~ CD ~ CD0 0

~ =[J •.< <

'W>.... ....

0 •:D (r) .....

~

11I~~ -<

< < ...J'I.) N

Figure 10. Relation between full vowel pellet positions and "tongue" pellet positions for schwa. The top row displaysthe pellet positions for utterances with the indicated initial vowels, averaged across 5 utterances (each with a differentfinal vowel). The bottom row displays the averaged pellet positions for utterances with the indicated final vowels.Units are X-ray units (= .33 mm).

Page 14: UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these constriction tasks (Saltzman, 1986). The dimensions along which the vocal tract goals for

"Targetless" Schwa: An Articulatory Analysis 207

N N N N0 C1I

8 ~ 0 0

< <

\1V• -l.....~

u

Jl\~u ><

< < ...II.)

II.)

N N N NCo) co Co) co0 0 0 0

< <

\\\/-l. -.... -as: r-

u_.

u

A~-< "'C--

< < :cII.)II.)

('t)-h

- - - ('t)N co Co) co

0.11(0 0 0 0

('t)~

< - <

W(")

-l. ('t)-l.

C) :JJu

!~><

< < .GI:J •II.)

II.)

N N N I\)Co) co to) co0 0 0 0

~ =c •.< <

~I>

-l.....0 •

:0 ~ .....u IIrh

u-<

< < ...II.)

II.)

Figure 11. Relation between full vowel pellet positions and "lip" pellet positions for schwa. The top row displays thepellet positions for utterances with the indicated initial vowels, averaged across 5 utterances (each with a differentfinal vowel). The bottom row displays the averaged pellet positions for utterances with the indicated final vowels.Units are X-ray units (= .33 mm).

Page 15: UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these constriction tasks (Saltzman, 1986). The dimensions along which the vocal tract goals for

208 Browman and Goldstein

The importance of this warping can be seen, inparticular, in utterances where VI and V2 areidentical (or have identical values on a particularpellet dimension). For example, returning to theutterance Ipiypax'piypax/ in Figure 4, we canclearly see (in MX, MY and RX) that there isdefinitely movement of the tongue away from theposition for liyl between the VI and V2. This effectis most pronounced for liy/. For example, for MY,the prediction error for the equation without aconstant is worse for Ipiypax'piypax/ than for anyother utterance (followed closely by utterancescombining liyl and luw/; MY is very similar for liyland luwf). Yet, it may be inappropriate to considerthis warping to be the result of a target specific toschwa, since, as we saw earlier, the meantongue position for schwa is indistinguishablefrom the mean position of the tongue across allvowels. Rather the schwa seems to involve awarping of the trajectory toward an overallaverage or neutral tongue position. Finally, wesaw that VI and V2 affect schwa positiondifferentially at two points in time. The influence

of the VI endpoint can only be seen at the "lip"point, relatively early in the schwa, while V2influence can be seen throughout. In the nextsection, we propose a particular model gesturalstructure for these utterances, and show that itcan account for the various patterns that we haveobserved.

3. ANALYSIS OF SIMULATIONSWithin the linguistic gestural model of

Browman and Goldstein (1987), we expect to beable to model the schwa effects we have observedas resulting from a structure in which there is anactive gesture for the medial schwa, but completetemporal overlap of this gesture and the gesturefor the following vowel. The blending caused bythis overlap should yield the V2 effect on schwa,while the VI effects should emerge as a passiveconsequence of the differing initial conditions formovements out of different preceding vowels.

An example of this type of organization is shownin Figure 12, which is the gestural score wehypothesized for the utterance Ipiypax'paapax/.

VEL

TICl

nco

TeCl

TeCD

LA

lP

GlO

100 200 300 400 500 600 700 800 900

TIME (ms)

Figure 12. Gestural score for Ipiypax'paapaxl. Tract variable channels displayed, from top to bottom, are: VELum,Tongue Tip Constriction Location and Constriction Degree, Tongue Body Constriction Location and ConstrictionDegree, Lip Aperture, Lip Protrusion, and GLOttis. Boxes indicate gestural activation; the shaded boxes indicateactivation for schwa.

Page 16: UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these constriction tasks (Saltzman, 1986). The dimensions along which the vocal tract goals for

"Targetless" Schwa: An Articulatory Analysis 209

As in Figure 3, each box indicates the activationinterval of a particular gestural control regime,that is, an interval of time during which thebehavior of the particular tract variable iscontrolled by a second order dynamical systemwith a fixed "target" (equilibrium position),frequency, and damping. The height of the boxrepresents the tract variable "target." Four lipaperture (LA) closure-and-release gestures areshown, corresponding to the four consonants. Theclosure and release components of these gesturesare shown as separate boxes, with the closurecomponents having the smaller target for LA, Le.,smaller interlip distance. In addition, four tonguebody gestures are shown, one for each of thevowels-VI, schwa, V2, schwa. Each of thesegestures involves simultaneous activation of twotongue body tract variables, one for constrictionlocation (TBCL) and one for constriction degree(TBCD). The control regimes for the VI andmedial schwa gestures are contiguous and non­overlapping, whereas the V2 gesture begins at thesame point as the medial schwa and thuscompletely overlaps it. In other words, during theacoustic realization of the schwa (approximately),the schwa and V2 gestural control regimes bothcontrol the tongue movements; the schwarelinquishes active control during the followingconsonant, leaving only the V2 tongue gestureactive in the next syllable. While the postulation

of an explicit schwa gesture overlapped by V2 wasmotivated by the particular results of section 2,the general layout of gestures in these utterances(their durations and overlap) was based onstiffness and phasing principles embodied in thelinguistic model (Browman & Goldstein, 1987).

Gestural scores for each of the twenty-fiveutterances were produced. The activationintervals were identical in all cases; the scoresdiffered only in the TBCL and TBCD targetparameters for the different vowels. Targets usedfor the full vowels were those in our tract variabledictionary. For the schwa, the target values (forTBCL and TBCD) were calculated as the mean ofthe targets for the five full vowels. The gesturalscores were input to the task dynamic model(Saltzman, 1986), producing motions of the modelarticulators of the articulatory synthesizer(see Figure 1). For example, for utterance/piypax'piypax/, Figure 13 shows the resultingmotions (with respect to a fixed reference on thehead) of two of the articulators-the center of thetongue body circle (C), and the lower lip,superimposed on the gestural score. Motion of thetongue is shown in both horizontal and verticaldimensions, while only vertical motion of thelower lip is shown. Note that the lower lip movesup for lip closure (during the regimes with thesmall LA value). Figure 14 shows the results for/piypax'paapax/.

p p a p pWAVEFORM

TONGUE CENTERhorizontal

ICX)

TONGUE CENTERvertical

ICY)

LOWER UPvertical

100 200 300 400 500 600 700 800 900

TIME (ms)

Figure 13. Gestural score for Ipiypax'piypaxl (IPA symbols are used in the figure). Generated movements (curves) areshown for the tongue center and lower lip. Boxes indicate gestural activation; the shaded boxes indicate activation forschwa.

Page 17: UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these constriction tasks (Saltzman, 1986). The dimensions along which the vocal tract goals for

210 Browman and Goldstein

p p P Q PWAVEFORM

TONGUE CENTERhorizontal

(CX)

TONGUE CENTERvertical

(CY)

LOWER LIPvertical

100 200 300 400 500 600 700 800 900

TIME (ms)

Figure 14. Gestural score for /piypax'paapaxl (lPA symbols are used in the figure). Generated movements (curves) areshown for the tongue center and lower lip. Boxes indicate gestural activation; the shaded boxes indicate activation forschwa.

The articulator motions in the simulations canbe compared to those of the data in the previoussection (Figures 4 and 5). One difference betweenthe model and the data stems from the fact thatthe major portion of the tongue dorsum is modeledas an arc of circle, and therefore all points on thispart of the dorsum move together. Thus, it is notpossible to model the differential patterns ofmotion exhibited by the middle (M) and rear (R) ofthe dorsum in the X-ray data. In general, themotion of CX is qualitatively similar to both MXand RX (which, recall, are highly correlated). Forexample, both the data and the simulation show asmall backward movement for the schwa inIpiypax'piypax/; in Ipiypax'paapax/, both show alarger backwards movement for schwa, with thetarget for laa! reached shortly thereafter, early inthe acoustic realization ofV2. The motion ofCYinthe simulations tends to be similar to that of MYin the data. For example, in Ipiypax'paapax/, CYmoves down from liyl to schwa to laa!, and thetarget for laa! tends to be achieved relatively late,compared to CX. Movements corresponding to RYmotions are not found in the displacement of thetongue body circle, but would probably be reflectedby a point on the part of the model tongue'ssurface that is further back than that section lyingon the arc of a circle.

The model articulator motions were analyzed inthe same manner as the X-ray data, once the timepoints for measurement were determined. Sincefor the X-ray data we assumed that displacement

extrema indicated the effective target for thegesture, we chose the effective targets in thesimulated data as the points to measure. Thus,points during VI and V2 were chosen thatcorresponded to the point at which the vowelgestures (approximately) reached their targetsand were turned off (right hand edges of thetongue boxes in Figures 13 and 14). For schwa,the "tongue" reference point was chosen at thepoint where the schwa gesture was turned off,while the "lip" reference was chosen at the lowestpoint of the lip during schwa (the same criterionas for the X-ray data).

The distribution of the model full vowels in themid-sagittal plane (CX x CY) is shown in Figure15. Since the vowel gestures are turned off onlyafter they come very close to their targets, there isvery little variation across the 10 tokens of eachvowel. The distribution of schwa at the "tongue"reference point is shown in Figure 16, labelled bythe identity ofV2 (in a) and VI (in b), with the fullvowel ellipses added for comparison. At thisreference point that occurs relatively late, thevowels are clustered almost completely by V2, andthe tongue center has moved a substantial portionof the way toward the following full vowel. Thedistribution of schwa values at the "lip" referencepoint is shown in Figure 17, labelled by theidentity of V2 (in a), and of VI (in b). ComparingFigure 17(a) with Figure 16(a), we can see thatthere is considerably more scatter at the "lip"point than at the later "tongue" point.

Page 18: UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these constriction tasks (Saltzman, 1986). The dimensions along which the vocal tract goals for

1400

-

-

1300 -

-

-

"Targetless" Schwa: An Articulatory Analysis 211

1200850

I I I

950I I

1050

Figure 15. Tongue center (C) positions for model full vowels, displayed in mid-sagittal plane with head facing to theright. The ellipses indicate two standard deviations along axes determined by principal component analysis. Symbols1= ARPAbet fiyf, U = fuwf, E = fehl, X = fahl, and A = faa!. Units are ASY units (= .09 mm).

Page 19: UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these constriction tasks (Saltzman, 1986). The dimensions along which the vocal tract goals for

212 Browman and Goldstein

1400

-rlll

"- utHJ

jI. ~

1300 - ~ "-

-

1200850

I I 1950

I I1050

Figure 16. Tongue center (0 positions for model schwa at "tongue" reference points, displayed in right-facing mid­sagittal plane as in Figure 15. The ellipses are from the model full vowels (Figure 15), for comparison. Symbols I =ARPAbet fiyf, U = fuw/, E = lehl, X = labl, and A = laa!. Units are ASYunits (= .09 mm). (a) Model schwa positionslabelled by the identity of the following vowel (V2).

1400

-A~E

"- A'!e

~~

1300 - ~

-

1050II1200 I I I

850 950

(b) Model schwa positions labelled by the identity of the preceding vowel (VI).

Page 20: UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these constriction tasks (Saltzman, 1986). The dimensions along which the vocal tract goals for

1400

"Targetless" Schwa: An Articulatory Analysis 213

IU

"U IXEU

¥IA

XEX E

1300 - A c:::.

"U E A

.;.-

A

-

~

1200 I I I I T850 950 1050

Figure 17. Tongue center (C) positions for model schwa at "lip" reference points, displayed as in Figure 16 (includingellipses from Figure 15). Symbols I = ARPAbet fiyf, U = fuwf, E = fehl, X = fahl, and A = faa!. Units are ASYunits (=.09mm). (a) Model schwa positions labelled by the identity of the following vowel (V2).

1400

-Ul

"U~ IE

UE

'*IA XE E

X1300 U c:::. "-A A E

l-

A

-

1200

850I I

950I I

1050

(b) Model schwa positions labelled by the identity of the preceding vowel (VI).

Page 21: UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these constriction tasks (Saltzman, 1986). The dimensions along which the vocal tract goals for

214 Browman and Goldstein

We tested whether the simulations capturedthe regularities of the X-ray data by runningthe same set of regression analyses on thesimulations as were performed on the X-ray data.The results are shown in Table 2, which has thesame format as the X-ray data results in Table 1.

Table 2. Resression results ofsimulations.

"lip" reference point "tongue" reference point

Standard StandardTerms Error Terms Error

ex k+vl+v2 75 ex k+v2+vl 52vl+v2 9.14 k+v2 5.6k+vl 20.4 v2+vl 13.2k 29.2 V2 20.0vI 33.3 k 28.6

IT k+vl+v2 4.7 IT k+v2+vl 32vl+v2 13.5 k+V2 3.9k+vl 19.7 v2+vl 18.8k 29.1 Vl 28.0vI 41.6 k 30.4

The same patterns are found for the simulationsas for the data. At the "tongue" reference point, forboth CX and CY the best two-term predictioninvolves the schwa component (constant) and V2,and this prediction is as good as that using allthree terms. Recall that this was the case for allpellet dimensions except for. MY, whosedifferences were attributed to differences in thetime point at which this dimension was measured.(In the simulations, CX and CY were alwaysmeasured at the same point in time.) Theseresults can be seen graphically in Figure 18,where the top row of panels shows the relationbetween V1 and schwa, and the bottom row showsthe relation between V2 and schwa. The samesystematic relation between schwa and V2 can beseen in the bottom row as in Figure 10 or the X­ray data, that is, no crossover. (The lack ofsystematic relations between V1 and schwa in theX-ray data, indicated by the crossovers in the toprow of Figure 10, is captured in the simulations inFigure 18 by the lack of variation for the schwa inthe top row.) Thus, the simulations capture themajor relation between the schwa and thesurrounding full vowels at the "tongue" referencepoint.

At the earlier "lip" reference point, thesimulations also capture the patterns shown bythe data. For both CX and CY, the three-term

predictions in Table 2 show substantially lesserror than the best two-term prediction. This wasalso the case for the data in Table 1 (except forRX), where V1, V2 and a schwa component(constant) all contributed to the prediction of thevalue during schwa. This can also be seen in thegraphs in Figure 19, which shows a systematicrelationship with schwa for both V1 and V2.

In summary, for the simulations, just as for theX-ray data, V1 contributed to the pellet position atthe "lip" reference, but not to the pellet position atthe "tongue" point, while V2 and an independentschwa component contributed at both points.Thus, our hypothesized gestural structureaccounts for the regularities observed in the data.The gestural control regime for V2 beginssimultaneously with that for schwa and overlapsit throughout its active interval. This accounts forthe fact that V2 and schwa effects can be observedthroughout the schwa, as both gestures unfoldtogether. However, Vl effects are passiveconsequences of the initial conditions whenthe schwa and V2 gestures are "turned on," andthus, their effects disappear as as the tongueposition is attracted to the "target" (equilibriumposition) associated with the schwa and V2regimes.

3.1 Other simulationsWhile the X-ray data from the subject analyzed

here argue against the strongest form of thehypothesis that schwa has no tongue target, wedecided nevertheless to perform two sets of

. simulations incorporating the strong form of an"unspecified" schwa to see exactly where and howthey would fail to reproduce the subject's data. Inaddition, if the synthesized speech were found tobe correctly perceived by listeners, it wouldsuggest that this gestural organization is at leasta possible one for these utterances, and might befound for some speakers. In the first set ofsimulations, one of which is exemplified in Figure20, the gestural scores took the same form as inFigure 12, except that the schwa tongue bodygestures were removed. Thus, active control of V2began at the end of V1, and, without a schwagesture, the tongue trajectory moved directly fromV1 to V2. During the acoustic interval corre­sponding to schwa, the tongue moved along thisV1-V2 trajectory. The resulting simulations inmost cases showed a good qualitative fit to thedata, and produced utterances whose medialvowels were perceived as schwas. The problemsarose in utterances in which V1 and V2 were thesame (particularly when they were high vowels).

Page 22: UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these constriction tasks (Saltzman, 1986). The dimensions along which the vocal tract goals for

"Targetless" Schwa: An Articulatory Analysis 215

u· • i"Tongue" Reference ./1.

a c °E

ex cy

1000

~13S0

• •IlOO 1250

V1 II V2 V1 II V2

1000

~13S0 ::::::

• • ~GOO -- 1250

~D---..o

V1 • V2 V1 It V2

Figure 18. Relation between model full vowel tongue center positions and tongue center positions at "tongue"reference point for model schwas. The top row displays the tongue center positions for utterances with the indicatedinitial vowels, averaged across 5 utterances (each with a different final vowel). The bottom row displays the averagedtongue center positions for utterances with the indicated final vowels. Units are ASY units (= .09 mm).

1350 ~•

~1250

V1 II V2

1000

1000

1100

"Lip" Reference

ex

~ •...---~

V1 II V2

• ~~~

V1 II V2

1350

1250

u· • i./1.

a c °E

Cy

~~ •

/V1 II V2

Figure 19. Relation between model full vowel tongue center positions and tongue center positions at "lip" referencepoint for model schwas. The top row displays the tongue center positions for utterances with the indicated initialvowels, averaged across 5 utterances (each with a different final vowel). The bottom row displays the averaged tonguecenter positions for utterances with the indicated final vowels. Units are ASY units (= .09 mm).

Page 23: UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these constriction tasks (Saltzman, 1986). The dimensions along which the vocal tract goals for

216 Browman and Goldstein

Figure 20 portrays the simulation for.Ipiypax'piypax/: the motion variables generatedcan be compared with the data in Figure 4. The"dip" between VI and V2 was not produced in thesimulation, and, in addition, the medial vowelsounded like liyl rather than schwa. Thisorganization does not, then, seem possiblefor utterances where both VI and V2 are highvowels.

We investigated the worst utterance(lpiypax'piypaxl) from the above set of simulationsfurther, generating a shorter acoustic interval forthe second vowel (the putative schwa) by decreas­ing the interval (relative phasing) between thebilabial gestures on either side of it. An exampleof a score with the bilabial closure gestures closertogether is shown in Figure 21. At relatively shortdurations as in the figure (roughly 50 ms), thepercept of the second vowel changed from liyl toschwa. Thus, the. completely targetless organi­zation may be workable in cases where thesurrounding consonants are only slightly separat­ed. In fact, this suggests a possible historicalsource for epenthetic schwa vowels that break upheterosyllabic clusters. They could arise fromspeakers increasing the distance between thecluster consonants slightly, until they no longeroverlap. At that point, our simulations suggestthat the resulting structure would be perceived asincluding a schwa-like vowel.

The second set of simulations involving an"unspecified" schwa used the same gesturalorganization as that portrayed in the score inFigure 20, except that the V2 gesture was delayedso that it did not begin right at the offset of the VIgesture. Rather, the V2 regime began approxi-

mately at the beginning of the third bilabialclosure gesture, as in Figure 22. Thus, there wasan interval of time during which no tongue bodygesture was active. The motion of the tongue tractvariables during this interval was determined bythe neutral positions associated with the specifictongue body articulators, and by the motion of thejaw, which was implicated in the ongoing bilabialclosure and release gestures. The results,displayed in Figure 22, showed that the previousproblem with Ipiypax'piypax/ was solved, sinceduring the unspecified interval between the twofull vowels, the tongue body lowered (from fiylposition) and produced a perceptible schwa.Unfortunately, this "dip" between VI and V2 wasseen for all combinations of VI and V2, which wasnot the case in the X-ray data. For example, thisdip can be seen for Ipaapax'paapax! in Figure 23;in the X-ray data, however, the tongue raisedslightly during the schwa, rather than lowering.(The "dip" occurred in all the simulations becausethe neutral position contributing to the tonguebody movement was that of the tongue bodyarticulators rather than that of the tongue bodytract variables, and thus was relative to the jaw,which, in turn, was lowering as part of the labialrelease). In addition, because the onset for V2 wasso late, it would not be possible for V2 to affect theschwa at the "lip" reference point, as was observedin the X-ray data. Thus, this hypothesis also failedto capture important aspects of the data. The besthypothesis remains the one tested first-whereschwa has a target of sorts, but is still "colorless,"in that its target is the mean of all (flanking)vowels, and is completely overlapped by thefollowing vowel.

pWAVEFORM

TONGUE CENTERhorizontal

(CX)

p .. ~" p p

TONGUE CENTER ~-------------------vertical

(CY)

LOWER LIPvertical

100 200 300 400 500 600 700 800

TIME (ms)

Figure 20. Gestural score plus generated movements for Ipiyp_'piypJ, with no activations for schwa. The acousticinterval between the second and third bilabial gestures is perceived as an liy/. (lPA symbols are used in the figure).

Page 24: UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these constriction tasks (Saltzman, 1986). The dimensions along which the vocal tract goals for

"Targetless" Schwa: An Articulatory Analysis 217

p p P i PWAVEFORM

TONGUE CENTER -horizontal

(CX)

---- ---TONGUE CENTER ----vertical(CY)

LOWER LIPvertical

100 200 300 400 500 600 700 800

TIME (ms)

Figure 21. The same gestural score for Ipiyp_'piypj as in Figure 20, but with the second and third bilabial gesturescloser together than in Figure 20. The acoustic interval between the second and third bilabial gestures is perceived as aschwa. Generated movements (curves) are shown for the tongue center and lower lip. UPA symbols are used in thefigure).

p p p p

WAVEFORM

TONGUE CENTERhorizontal

(CX)

TONGUE CENTERvertical

(CY)

LOWER LIPvertical

~r==:::::::::====:;-----.=========~-----

100 200 300 400 500 600 700 800

TIME (ms)

Figure 22. The same gestural score for Ipiyp_'piypj as in Figure 20, but with the onset of the second full vowel/iytdelayed. The acoustic interval between the second and third bilabial gestures is perceived as a schwa. Generatedmovements (curves) are shown for the tongue center and lower lip. OPA symbols are used in the figure).

Page 25: UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these constriction tasks (Saltzman, 1986). The dimensions along which the vocal tract goals for

218

WAVEFORM

TONGUE CENTERhorizontal

(CX)

TONGUE CENTERvertical

(CY)

LOWER LIPvertical

Browman and Goldstein

P Q P ~ P Q P[. [II

1- -

-- ----- t.-I

""LL'?l~\~1/- n

100 200 300 400 soo 600 700 800

TIME (ms)

Figure 23. The same ge~tu.ral score as in Figure 22, except with tongue targets appropriate for the utterance/paap_'paapj. The acoustic mterval between the second and third bilabial gestures is perceived as a schwa. Generatedmovements (curves) are shown for the tongue center and lower lip. (lPA symbols are used in the figure).

4. CONCLUSIONWe have demonstrated how an explicit gestural

model of phonetic structure, embodying the possi­bilities of underspecification ("targetlessness") andtemporal overlap ("co-production"), can be used toinvestigate the contextual variation of phoneticunits, such as schwa, in speech. For the particularspeaker and utterances that we analyzed, therewas clearly some warping of the V1-V2 trajectorytowards a neutral position for an interveningschwa. The analyses showed that this neutralposition has to be defined in the space of tractvariables (the linguistically-relevant goal space),rather than being the consequence of neutralpositions for individual articulators. Therefore, atarget position for schwa was specified, althoughthis target is completely predictable from the restof the system; it corresponds to the mean tonguetract variable position for all the full vowels.

The temporally overlapping structure of thegestural score played a key role in accounting forthe time course of V1 and V2 effects on schwa.These effects were well modeled by a gesturalscore in which active control for schwa wascompletely overlapped by that for V2. This overlapgave rise to the observed anticipatory effects,

while the carryover effects were passive conse­quences of the initial conditions of the articulatorswhen schwa and V2 begin. (This fits well withstudies that have shown qualitative asymmetriesin the nature of carryover and anticipatory effects,see Recasens, 1987).

How well the details of the gestural score willgeneralize to other speakers and other prosodiccontexts remains to be investigated. There isknown to be much individual variation in thestrength of anticipatory vs. carryover coarticula­tion in utterances like those employed here, andalso in the effect of stress (Fowler, 1981b; Magen,1989). In addition, schwas with differentphonological/morphological characteristics (e.g. inplural I-axzl and past tense I-axd/) may showdifferent behavior, either with respect to overlapor targetlessness. The kind of modeling developedhere provides a way of analyzing the complexquantitative data of articulation so that phono­logical issues such as these can be addressed.

REFERENCESAlfonso, P., & Baer, T. (1982). Dynamics of vowel articulation.

umguage and Speech, 25, 151-173.Anderson, S. (1982). The analysis of French schwa. umguage, 58,

535-573.

Page 26: UTargetless Schwa: An Articulatory Analysis* · individual articulators in performing these constriction tasks (Saltzman, 1986). The dimensions along which the vocal tract goals for

"Targetless" Schwa: An Articultztory Analysis 219

FOOTNOTES·Paper presented at Second Conference on Laboratory Phonology,

Edinburgh, Scotland, 28 June 1989-3 July 1989.tAlso Department of Linguistics: Yale University.1ARPAbet - IPA equivalents:

ARPAbet IPA

iyeh E

Ohman, S. E. G. (1966). Coarticulation in VCV utterances:Spectrographic measurements. Journal of the Acoustical Society ofAmerica, 39,151-168.

Perkell, J. S. (1969). Physiology of speech production: Results andimplications of a quantitative cineradiographic study. (ResearchMonograph No. 53). Cambridge, MA: The MIT Press.

Recasens, D. (1987). An acoustic analysis of V-to-C and V-to-Vcoarticulatory effects in Catalan and Spanish VCV sequences.Journal of Phonetics, 15, 299-312.

Rubin, P., Baer, T., & Mermelstein, P. (1981). An articulatorysynthesizer for perceptual research. Journal of the AcousticalSociety ofAmerica, 70, 321-328.

Saltzman, E. (1986). Task dynamic coordination of the speecharticulators: A preliminary model. In H. Heuer &C. Fromm (Eds.), Generation and modulation of action patterns(Experimental Brain Research Series 15). New York: Springer­Verlag.

Saltzman, E., Goldstein, L., Browman, C. P., & Rubin, P. E. (1988).Dynamics of gestural blending during speech production.Presented at First Annual International Neural NetworkSociety (INNS), Boston, September 6-10, 1988.

Saltzman, E., & Kelso, J. A. S. (1987). Skilled actions: A taskdynamic approach. Psychological Review, 94, 84-106.

Saltzman, E. L., & Munhall, K. G. (1989). A dynamical approach togestural patterning in speech production. Ecological Psychology1(4),333-382..

Wood, S. (1982). X-ray and model studies of vowel articulation.Working Papers, 23. Lund University.

Bell-Berti, F., & Harris, K. S. (1981). A temporal model of speechproduction. Phonetica, 38, 9-20.

Browman, c.P., & Goldstein, L. (1987). Tiers in ArticulatoryPhonology, with some implications for casual speech. HaskinsLaboratories Status Report on Speech Research, SR-92, 1-30. Toappear in J. Kingston & M.E. Beckman (Eds.), Papers inLaboratory Phonology I: Between the Grammar and the Physics ofSpeech. Cambridge: Cambridge University Press.

Browman, C. P., Goldstein, L., Saltzman, E., & Smith, C. (1986).GEST: A computational model for speech production usingdynamically defined articulatory gestures. Journal of theAcoustical Society ofAmerica, A 80. sm. (Abstract)

Fowler, C. A. (1977). Timing control in speech production.Bloomington: Indiana University Linguistics Oub.

Fowler, C. A (1981a). Perception and production of coarticulationamong stressed and unstressed vowels. Journal of Speech andHearing Research, 24,127-139.

Fowler, C. A (1981b). A relationship between coarticulation andcompensatory shortening. Phonetica, 38, 35-50.

Harshman, R., Ladefoged, P., & Goldstein, L. (1977). Factoranalysis of tongue shapes. Journal of the Acoustical Society ofAmerica, 62, 693-707.

Keating, P. A. (1988). Underspecification in phonetics. Phonology,5,275-292.

Ladefoged, P. (1980). What are linguistic sounds made of?Language,56, 485-502.

Ladefoged, P. (1982). A course in phonetics (2nd ed.). New York:Harcourt Brace Jovanovich.

Ladefoged, P., & Lindau, M. (1989). Modeling articulatory­acoustics relations: A comment on Stevens' "On the quantalnature of speech:' Journal ofPhonetics, 17, 99-106.

Liberman, A. M., & Mattingly, I. G. (1985). The motor theory ofspeech perception revised. Cognition, 21, 1-36.

Magen, H. (1989). An acoustic study of vowel-ta-vowel coarticulationin English. Doctoral dissertation, Yale University, Departmentof Linguistics.

Miller, J. E., & FUjimura, O. (1982). Graphic displays of combinedpresentations of acoustic and articulatory information. The BellSystem Technical Journal, 61, 799-810.

Neary, T. M. (1980). On the physical interpretation of vowelquality: Cinefluorographic and acoustic evidence. Journal ofPhonetics, 8, 213-241.

aaahuwax

aA

U

a